Black holes and the fate of quantum information

I have written about the fate of classical information interacting with black holes fairly extensively on this blog (see Part 1Part 2, and Part 3). Reviewers of the article describing those results nearly always respond that I should be considering the fate of quantum, not classical information. 

In particular, they ask me to comment on what all this means in the light of more modern controversies, such as black hole complementarity and firewalls. As if solving the riddle of what happens to classical information is not nearly good enough. 

I should first state that I disagree with the idea that it is necessary to discuss the fate of quantum information in an article that discusses what happens to classical information. I'll point out the differences between those two concepts here, and hopefully I'll convince you that it is perfectly reasonable to discuss these independently. However, I have given in to these requests, and now written an article (together with my colleague Kamil Bradler at St. Mary's University in Halifax, Canada) that studies the fate of quantum information that interacts with a black hole. Work on this manuscript explains (in part) my prolonged absence from blogging.

The results we obtained, it turns out, do indeed shed new light on these more modern controversies, so I'm grateful for the reviewer's requests after all. The firewalls have "set the physics world ablaze", as one blogger writes. These firewalls (that are suggested to surround a black hole) have been invented to correct a perceived flaw in another widely discussed theory, namely the theory of black hole complementarity due to the theoretical physicist Leonard Susskind. I will briefly describe these in more detail below, but before I can do this, I have to define for you the concept of quantum entanglement.

Quantum entanglement lies at the very heart of quantum mechanics, and it is what makes quantum physics different from classical physics. It is clear, as a consequence, that I won't be able to make you understand quantum entanglement if you have never studied quantum mechanics. If this is truly your first exposure, you should probably consult the Wiki page about quantum entanglement, which is quite good in my view. 

Quantum entanglement is an interaction between two quantum states that leaves them in a joint state that cannot be described in terms of the properties of the original states. So, for example, two quantum states $\psi_A$ and $\psi_B$ may have separate properties before entanglement, but after they interact they will be governed by a single wavefunction $\psi_{AB}$ (there are exceptions). So for example, if I imagine a wavefunction $\psi_A=\sigma|0\rangle +\tau|1\rangle$ (assuming the state to be correctly normalized) and a quantum state B simply given by $|0\rangle$, then a typical entangling operation $U$ will leave the joint state entangled:

      $U(\sigma|0\rangle +\tau|1\rangle)|0\rangle=\sigma|00\rangle +\tau|11\rangle$.    (1)

The wavefunction on the right hand side is not a product of the two initial wavefunctions, and in any case classical systems can never be brought into such a superposition of states in the first place. Another interesting aspect of quantum entanglement is that it is non-local. If A and B represent particles, you can still move one of the particles far away (say, to another part in the galaxy). They will still remain entangled. Classical interactions are not like that. At all.

A well-known entangled wavefunction is that of the Einstein-Podolsky-Rosen pair, or EPR pair. This is a wavefunction just like (1), but with $\sigma=\tau=1/\sqrt{2}$. The '0' vs '1' state can be realized via any physical quantum two-state system, such as a spin 1/2-particle or a photon carrying a horizontal or vertical polarization. 

What does it mean to send quantum information? Well, it just means sending quantum entanglement! Let us imagine a sender Alice, who controls a two-state quantum system that is entangled with another system (let us call it $R$ for "reference"), this means that her quantum wavefunction (with respect to $R$) can be written as 

                $|\Psi\rangle_{AR}=\sigma|00\rangle_{AR} +\tau|11\rangle_{AR}$    (2)

where the subscripts $AR$ refer to the fact that the wavefunction now "lives" in the joint space $AR$. $A$ and $R$ (after entanglement) do not have individual states any longer.

Now, Alice herself should be unaware of the nature of the entanglement between $A$ and $R$ (meaning, Alice does not know the values of the complex constants $\sigma$ and $\tau$). She is not allowed to know them, because if she did, then the quantum information she would send would become classical. Indeed, Alice can turn any quantum information into classical information by measuring the quantum state before sending it. So let's assume Alice does not do this. She can still try to send the arbitrary quantum state that she controls to Bob, so that after the transmittal her quantum state is unentangled with $R$, but it is now Bob's wavefunction that reads

           $|\Psi\rangle_{BR}=\sigma|00\rangle_{BR} +\tau|11\rangle_{BR}$    (3). 

In this manner, entanglement was transferred from $A$ to $B$. That is a quantum communication channel.

Of course, lots of things could happen to the quantum entanglement on its way to Bob. For example, it could be waylayed by a black hole. If Alice sends her quantum entanglement into a black hole, can Bob retrieve it? Can Bob perform some sort of magic that will leave the black hole unentangled with $A$ (or $R$), while he himself is entangled as in (3)?

Whether or not Bob can do this depends on whether the quantum channel capacity of the black hole is finite, or whether it vanishes. If the capacity is zero, then Bob is out of luck. The best he can do is to attempt to reconstruct Alice's quantum state using classical state estimation techniques. That's not nothing by the way, but the "fidelity" of the state reconstruction is at most 2/3. But I'm getting ahead of myself.

Let's first take a look at this channel. I'll picture this in a schematic manner where the outside of the black hole is at the bottom, and the inside of the black hole is at the top, separated by the event horizon. Imagine Alice sending her quantum state in from below. Now, black holes (as all realistic black bodies) don't just absorb stuff: they reflect stuff too. How much is reflected depends on the momentum and angular momentum of the particle, but in general we can say that a black hole has an absorption coefficient $0\leq\alpha\leq1$, so that $\alpha^2$ is just the probability that a particle that is incident on the black hole is absorbed.

So we see that if $n$ particles are incident on a black hole (in the form of entangled quantum states $|\psi\rangle_{\rm in}$), then $(1-\alpha^2)n$ come out because they are reflected at the horizon. Except as we'll see, they are in general not the pure quantum states Alice sent in anymore, but rather a mixture $\rho_{\rm out}$. This is (as I'll show you) because the black hole isn't just a partially silvered mirror. Other things happen, like Hawking radiation. Hawking radiation is the result of quantum vacuum fluctuations at the horizon, which constantly create particle-antiparticle pairs. If this happened anywhere but at the event horizon, the pairs would annihilate back, and nobody would be the wiser. Indeed, such vacuum fluctuations happen constantly everywhere in space. But if it happens at the horizon, then one of the particles could cross the horizon, while the other (that has equal and opposite momentum), speeds away from it. That now looks like the black hole radiates. And it happens at a fixed rate that is determined by the mass of the black hole. Let's just call this rate $\beta^2$.

As you can see, the rate of spontaneous emission does not depend on how many particles Alice has sent in. In fact, you get this radiation whether or not you throw in a quantum state. These fluctuations go on before you send in particles, and after. They have absolutely nothing to do with $|\psi\rangle_{\rm in}$. They are just noise. But they are (in part) responsible for the fact that the reflected quantum state $\rho_{\rm out}$ is not pure anymore. 

But I can tell you that if this were the whole story, then physics would be in deep deep trouble. This is because you cannot recover even classical information from this channel if $\alpha=1$. Never mind quantum. In fact, you could not recover quantum information even in the limit $\alpha=0$, a perfectly reflecting black hole! (I have not shown you this yet, but I will). 

This is not the whole story, because a certain gentleman in 1917 wrote a paper about what happens when radiation is incident on a quantum mechanical black body. Here is a picture of this gentleman, along with the first paragraph of the 1917 paper:

Albert Einstein in 1921  (source: Wikimedia) 

Albert Einstein in 1921  (source: Wikimedia) 

 In Einstein's 1917 article "On the quantum theory of radiation" what Einstein discovered in that paper is that you can derive Planck's Law (about the distribution of radiation emanating from a black body) using just the quantum mechanics of absorption, spontaneous emission, and stimulated emission of radiation. Stimulated emission is by now familiar to everybody, because it is the principle upon which Lasers are built. What Einstein showed in that paper is that stimulated emission is an inevitable consequence of absorption: if a black body absorbs radiation, it also stimulates the emission of radiation, with the same exact quantum numbers as the incoming radiation.

Here's the figure from the Wiki page that shows how stimulated emission makes "two out of one":

Quantum "copying" during stimulated emission from an atom (source: Wikimedia)

Quantum "copying" during stimulated emission from an atom (source: Wikimedia)

In other words, all black bodies are quantum copying machines!

"But isn't quantum copying against the law?"

Actually, now that you mention it, yes it is, and the law is much more stringent than the law against classical copying (of copy-righted information, that is). The law (called the no-cloning theorem) is such that it cannot--ever--be broken, by anyone or anything. 

The reason why black bodies can be quantum copying machines is that they don't make perfect copies, and the reason the copies aren't perfect is the presence of spontaneous emission, which muddies up the copies. This has been known for 30 years mind you, and was first pointed out by the German-American physicist Leonard Mandel. Indeed, only perfect copying is disallowed. There is a whole literature on what is now known as "quantum cloning machines" and it is possible to calculate what the highest allowed fidelity of cloning is. When making two copies from one, the highest possible fidelity is $F$=5/6. That's an optimal 1->2 quantum cloner. And it turns out that in a particular limit (as I point out in this paper from 2006) black holes can actually achieve that limit!  I'll point out what kind of black holes are optimal cloners further below.

All right, so now we have seen that black holes must stimulate the emission of particles in response to incoming radiation. Because Einstein said they must. The channel thus looks like this:

In addition to the absorbed/reflected radiation, there is spontaneous emission (in red), and there is stimulated emission (in blue). There is something interesting about the stimulated "clones". (I will refer to the quantum copies as clones even though they are not perfect clones, of course. How good they are is central to what follows). 

Note that the clone behind the horizon has a bar over it, which denotes "anti". Indeed, the stimulated stuff beyond the horizon consists of anti-particles, and they are referred to in the literature as anti-clones, because the relationship between $\rho_{\rm out}$ and $\bar \rho_{\rm out}$ is a quantum mechanical NOT operation. (Or, to be technically accurate, the best NOT you can do without breaking quantum mechanics.) That the stimulated stuff inside and outside the horizon must be particles and anti-particles is clear, because the process must conserve particle number. We should keep in mind that the Hawking radiation also conserves particle number. The total number of particles going in is $n$, which is also the total number of particles going out (adding up stuff inside and outside the horizon). I checked. 

Now that we know that there are a bunch of clones and anti-clones hanging around, how do we use them to transfer the quantum entanglement? Actually, we are not interested here in a particular protocol, we are much more interested in whether this can be done at all. If we would like to know whether a quantum state can be reconstructed (by Bob) perfectly, then we must calculate the quantum capacity of the channel. While how to do this (and whether this calculation can be done at all) is technical, one thing is not: If the quantum capacity is non-zero then, yes, Bob can reconstruct Alice's state perfectly (that is, he will be entangled with $R$ exactly like Alice was when he's done). If it is zero, then there is no way to do this, period.

In the paper I'm blogging about, Kamil and I did in fact calculate the capacity of the black hole channel, but only for two special cases: $\alpha=0$ (a perfectly reflecting black hole), and $\alpha=1$ (a black hole that reflects nothing). The reason we did not tackle the general case is that at the time of this writing, you can only calculate the quantum capacity of the black hole channel exactly for these two limiting cases. For a general $\alpha$, the best you can do is give a lower and an upper bound, and we have that calculation planned for the future. But the two limiting cases are actually quite interesting.

[Note: Kamil informed me that for channels that are sufficiently "depolarizing", the capacity can in fact be calculated, and then it is zero. I will comment on this below.]

First: $\alpha=0$. In that case the black hole isn't really a black hole at all, because it swallows nothing. Check the figure up there, and you'll see that in the absorption/reflection column, you have nothing in black behind the horizon. Everything will be in front. How much is reflected and how much is absorbed doesn't affect anything in the other two columns, though. So this black hole really looks more like a "white hole", which in itself is still a very interesting quantum object. Objects like that have been discussed in the literature (but it is generally believed that they cannot actually form from gravitational collapse). But this is immaterial for our purposes: we are just investigating here the quantum capacity of such an object in some extreme cases. For the white hole, you now have two clones outside, and a single anticlone inside (if you would send in one particle). 

Technical comment for experts: 

A quick caveat: Even though I write that there are two clones and a single anti-clone after I send in one particle, this does not mean that this is the actual number of particles that I will measure if I stick out my particle detector, dangling out there off of the horizon. This number is the mean expected number of particles. Because of vacuum fluctuations, there is a non-zero probability of measuring a hundred million particles. Or any other number.  The quantum channel is really a superposition of infinitely many cloning machines, with the 1-> 2 cloner the most important. This fundamental and far-reaching result is due to Kamil. 

So what is the capacity of the channel? It's actually relatively easy to calculate because the channel is already well-known: it is the so-called Unruh channel that also appears in a quantum communication problem where the receiver is accelerated, constantly. The capacity looks like this:

Quantum capacity of the white hole channel as a function of z

Quantum capacity of the white hole channel as a function of z

In that figure, I show you the capacity as a function of $z=e^{-\omega/T}$, where $T$ is the temperature of the black hole and $\omega$ is the frequency (or energy) of that mode. For a very large black hole the temperature is very low and, as a consequence, the channel isn't very noisy at all (low $z$). The capacity therefore is nearly perfect (close to 1 bit decoded for every bit sent). When black holes evaporate, they become hotter, and the channel becomes noisier (higher $z$). For infinitely small black holes ($z=1$) the capacity finally vanishes. But so does our understanding of physics, of course, so this is no big deal. 

What this plot implies is that you can perfectly reconstruct the quantum state that Alice daringly sent into the white hole as long as the capacity $Q$ is larger than zero. (If the capacity is small, it would just take you longer to do the perfect reconstructing.). I want to make one thing clear here: the white hole is indeed an optimal cloning machine (the fidelity of cloning 1->2 is actually 5/6, for each of the two clones). But to recreate the quantum state perfectly, you have to do some more work, and that work requires both clones. But after you finished, the reconstructed state has fidelity $F=1$.) 

"Big deal" you might say, "after all the white hole is a reflector!"

Actually, it is a somewhat big deal, because I can tell you that if it wasn't for that blue stimulated bit of radiation in that figure above, you couldn't do the reconstruction at all! 

"But hold on hold on", I hear someone mutter, from far away. "There is an anti-clone behind the horizon! What do you make of that? Can you, like, reconstruct another perfect copy behind the horizon? And then have TWO?"

So, now we come to the second result of the paper. You actually cannot. The capacity of the channel into the black hole (what is known as the complementary channel) is actually zero because (and this is technical speak) the channel into the black hole is entanglement breaking. You can't reconstruct perfectly from a single clone or anti-clone, it turns out. So, the no-cloning theorem is saved. 

Now let's come to the arguably more interesting bit: a perfectly absorbing black hole ($\alpha$=1). By inspecting the figure, you see that now I have a clone and an anti-clone behind the horizon, and a single clone outside (if I send in one particle). Nothing changes in the blue and red lines. But everything changes for the quantum channel. Now I can perfectly reconstruct the quantum state behind the horizon (as calculating the quantum capacity will reveal), but the capacity in front vanishes! Zero bits, nada, zilch. If $\alpha=1$, the channel from Alice to Bob is entanglement breaking.  

It is as if somebody had switched the two sides of the black hole! 

Inside becomes outside, and outside becomes inside!

Now let's calm down and ponder what this means. First: Bob is out of luck. Try as he might, he cannot have what Alice had: the same entanglement with $R$ that she enjoyed. Quantum entanglement is lost when the black hole is perfectly absorbing. We have to face this truth.  I'll try to convince you later that this isn't really terrible. In fact it is all for the good. But right now you may not feel so good about it.

But there is some really good news. To really appreciate this good news, I have to introduce you to a celebrated law of gravity, the equivalence principle

The principle, due to the fellow whose pic I have a little higher up in this post, is actually fairly profound. The general idea is that an observer should not be able to figure out whether she is, say, on Earth being glued to the surface by 1g, or whether she is really in a spaceship that accelerates at the rate of 1g (g being the constant of gravitational acceleration on Earth, you know: 9.81 m/sec$^2$). The equivalence principle has far reaching consequences. It also implies that an observer (called, say, Alice), who falls towards (and ultimately into) a black hole, should not be able to figure out when and where she passed the point of no return. 

The horizon, in other words, should not appear as a special place to Alice at all. But if something dramatic would happen to quantum states that cross this boundary, Alice would have a sure-fire way to notice this change: she could just keep the quantum state in a protected manner at her disposition, and constantly probe this state to find out if anything happened to it. That's actually possible using so-called "non-demolition" experiments. So, unless you feel like violating another one of Einstein's edicts (and, frankly, the odds are against you if you do), you better hope nothing happens to a quantum state that crosses from the outside to the inside of a black hole in the perfect absorption case ($\alpha=1$). 

Fortunately, we proved (result No. 3) that you can perfectly reconstruct the state behind the horizon when $\alpha=1$, that this capacity is non-zero. And that as a consequence, the equivalence principle is upheld. 

This may not appear to you as much of a big deal when you read this, but many many researchers have been worried sick about this, that the dynamics they expect in black holes would spell curtains for the equivalence principle. I'll get back to this point, I promise. But before I do so, I should address a more pressing question.

"If Alice's quantum information can be perfectly reconstructed behind the horizon, what happens to it in the long run?"

This is a very serious question. Surely we would like Bob to be able to "read" Alice's quantum message (meaning he yearns to be entangled just like she was). But this message is now hidden behind the black hole event horizon. Bob is a patient man, but he'd like to know: "Will I ever receive this quantum info?"

The truth is, today we don't know how to answer this question. We understand that Alice's quantum state is safe and sound behind the horizon--for now. There is also no reason to think that the on going process of Hawking radiation (that leads to the evaporation of the black hole) should affect the absorbed quantum state. But at some point or other, the quantum black hole will become microscopic, so that our cherished laws of physics may lose their validity. At that point, all bets are off. We simply do not understand today what happens to quantum information hidden behind the horizon of a black hole, because we do not know how to calculate all the way to very small black holes. 

Having said this, it is not inconceivable that at the end of a black hole's long long life, the only thing that happens is the disappearance of the horizon. If this happens, two clones are immediately available to an observer (the one that used to be on the outside, and the one that used to be inside), and Alice's quantum state could finally be resurrected by Bob, a person that no doubt would merit to be called the most patient quantum physicist in the history of all time. 

Now what does this all mean for black hole physics?

I have previously shown that classical information is just fine, and that the universe remains predictable for all times. This is because to reconstruct classical information, a single stimulated clone is enough. It does not matter what $\alpha$ is, it could even be one. Quantum information can be conveyed accurately if the black hole is actually a white hole, but if it is utterly black then quantum information is stuck behind the horizon, even though we have a piece of it (a single clone) outside of the horizon. But that's not enough, and that's a good thing too, because we need the quantum state to be fully reconstructable inside of the black hole, otherwise the equivalence principle is hosed.  And if it reconstructable inside, then you better hope it is not reconstructable outside, because otherwise the no-cloning theorem would be toast. 

So everything turns out to be peachy, as long as nothing drastic happens to the quantum state inside the black hole. We have no evidence of something so drastic, but at this point we simply do not know. 

Now what are the implications for black hole complementarity? The black hole complementarity principle was created from the notion (perhaps a little bit vague) that, somehow, quantum information is both reflected and absorbed by the black hole channel at the same time. Now, given that you have read this far in this seemingly interminable post, you know that this is not allowed. It really isn't. What Susskind, Thorlacius, and 't Hooft argued for, however, is that it is OK as long as you won't be caught. Because, they argued, nobody will be able to measure the quantum state on both sides of the horizon at the same time anyway!

Now I don't know about you, but I was raised believing that just because you can't be caught it doesn't make it alright to break the rules.  And what our more careful analysis of quantum information interacting with a black hole has shown, is that you do not break the quantum cloning laws at all. Both the equivalence principle and the no-cloning theorem are perfectly fine. Nature just likes these laws, and black holes are no outlaws.

Adventurous Alice encounters a firewall? Credit: Nature.com

Adventurous Alice encounters a firewall? Credit: Nature.com

 

What about firewalls then? Quantum firewalls were proposed to address a perceived inconsistency in the black hole complementarity picture. But you now already know that that picture was inconsistent to begin with. Violating no-cloning laws brings with it all kinds of paradoxes. Unfortunately, the firewall hypothesis just heaped paradoxes upon paradoxes, because it proposed that you have to violate the equivalence principle as well. This is because that hypothesis assumes that all the information was really stored in the Hawking radiation (the red stuff in the figures above). But there is really nothing in there, so that the entire question of whether transmitting quantum information from Alice to Bob violates the concept of "monogamy of entanglement" is moot. The Hawking radiation can be entangled with the black hole, but it is no skin off of Alice or Bob, that entanglement is totally separate. 

So, all is well, it seems, with black holes, information, and the universe. We don't need firewalls, and we do not have to invoke a "complementarity principle". Black hole complementarity is automatic, because even though you do not have transmission when you have perfect reflection, a stimulated clone does make it past the horizon. And when you have perfect transmission ($\alpha$=1) a stimulated clone still comes back at you. So it is stimulated emission that makes black hole complementarity possible, without breaking any rules. 

Of course we would like to know the quantum capacity for an arbitrary $\alpha$, which we are working on. One result is already clear: if the transmission coefficient $\alpha$ is high enough that not enough of the second clone is left outside of the horizon, then the capacity abruptly vanishes. Because the black hole channel is a particular case of a "quantum depolarizing channel", discovering what this critical $\alpha$ is only requires mapping the channel's error rate $p$ to $\alpha$. 

I leave you with an interesting observation. Imagine a black hole channel with perfect absorption, and put yourself into the black hole. Then, call yourself "Complementary Alice", and try to send a quantum state across the horizon. You immediately realize that you can't: the quantum state will be reflected. The capacity to transmit quantum information out of the black hole vanishes, while you can perfectly communicate quantum entanglement with "Complementary Bob". Thus, from the inside of the perfectly absorbing black hole it looks just like the white hole channel (and of course the reverse is true for the perfectly reflecting case). Thus, the two channels are really the same, just viewed from different sides! 

This becomes even more amusing if you keep in mind that (eternal) black holes have white holes in their past, and white holes have black holes in their future. 

Survival of the nicest: Why it does not pay to be mean

According to the "MIT Technology Review", the world of game theory is "currently on fire". To be sure, I am not the arsonist. But, judging by the interest a paper by Bill Press and Freeman Dyson that was published a year ago in the Proceedings of the National Academy of Sciences has created, the flamboyant description is not completely off the mark. Before I describe the contents of the Press and Dyson paper (which has been done, to be sure, by many a blogger before me, see here or here), let me briefly tell you about the authors.

I know both of them in different ways. Bill Press is a Professor of Computer Science as well as Professor of Integrative Biology at UT Austin. Incidentally, UT Austin is a sister campus within the BEACON Center for the Study of Evolution in Action, which I also belong to here at Michigan State University. Bill Press is an astrophysicist by training, but perhaps most famous for a book on scientific computing, called "Numerical Recipes". That's where I know him from. If you've used scientific computing in the 90s and 00s, say, then you know this book. Bill Press is also the President of the American Association for the Advancement of Science (AAAS), which you may know as the publisher of Science Magazine. AAAS elected me a fellow in 2011, so I'm perfectly happy with this organization, as you can imagine. Press was also a "Prize Postdoctoral Fellow" at Caltech (incidentally as was I), and actually grew up in Pasadena (I spent almost 20 years there).  So, besides the fact that he is way more famous than I am, there are some interesting parallels in our careers. I was once offered a Director's postdoc fellowship at Los Alamos National Laboratory that I declined. He accepted the Deputy Laboratory Director position there in 1998. He worked on black hole physics and is kinda famous for his work with Teukolsky. I worked on black hole physics… and if you read this blog then you know that story. So the parallels are, at best, marginal. He now works in computational biology, and I work there too, and that's where this story really has its beginning.

I've known of Freeman Dyson ever since I took Quantum Field Theory in grad school at SUNY Stony Brook. I can't and won't sing his praises here because my blog posts tend to be on the long side to begin with. I will probably always be able to write down the Schwinger-Dyson equation, perhaps even when somewhat inebriated (but I've never tried this). I've met him a couple of times but I do not expect that he would remember me. He is 89 now, so many people were surprised to hear that the mathematics of the PNAS paper mentioned above were entirely his (as recounted here). I'm not that surprised, because I've known the theoretical physicist Hans Bethe quite well in the last thirteen years of his life, and he was still able to calculate. And Bethe was 85 when I first met him. So I know that this is possible as long as you do not stop.  

So, what did Press and Dyson do? To begin to understand this, you have to have heard of the "Prisoner's Dilemma". 

The Prisoner's Dilemma (PD for short), is a simple game. Two players, two moves. The back story is a bit contrived: Two shady characters commit a crime, but are discovered as they flee the scene, speedily. 

The authorities nab both of them, but are unsure which of the two is the actual perpetrator. They are ready to slap a "speeding while leaving a crime scene" conviction on both of them ("What, no grand-jury indictment? Have you even heard of "trial by jury of peers"?), when they are interrogated one more time, separately. 

"Listen", the grand interrogator whispers to one of them, "you can go scot-free if you tell me it was the other guy". "Whoa. An incentive.", thinks perp. No. 1. "I get one year for accelerated haste, but I get 5 years if they figure we both were in on the heist".   "If I say nothing we both get a year, and are out by Christmas on good behavior. But if they offer the same deal to my buddy, I bet he squeals like a pig. He gets out, and I go in for 20. I better rat him out". As a consequence, they rat each other out and spend 5 years in the slammer. They could have both spent Christmas at home. But instead, they chose to "defect" rather than "cooperate". How lame! 

Theory says, actually, not lame at all. Theory says: "Under these circumstances, thou shalt betray your fellow man!" That theory, by the way, is due to no other than John Forbes Nash

, who even got a Nobel for it. Well, John von Neumann, the all-around genius and originator of Game Theory would have had a few choice words for that prize, but that's for another blog. "A Nobel for applying Brouwer's fixed point theorem? Are you f***ing kidding me?", is what Johnny might have said. (von Neumann's friends called him Johnny. I doubt anybody called Nash 'Johnny'). But let's agree not to digress. 

So, Nash says: "Screwing your adversary is the rational choice in this situation". And that's certainly correct. The aforementioned fixed point theorem guarantees it. Nowadays, you can derive the result in a single line. And you thought Nobel prizes are hard to get? Well, today you may have to work a bit harder than that.

Here then, is the dilemma in a nut shell. "If we both cooperate by not ratting each other out, we go home by Christmas". But there is this temptation: "I rat out the other guy and I go free, like, now (and the other guy won't smell flowers until the fourth Clinton administration). But if we both rat each other out, then we may regret that for a long time."

You can view the dilemma in terms of a payoff matrix as follows. A matrix? Don't worry: it's just a way to arrange numbers. We both keep silent: one year for each of us. We both rat each other out: 3 years for each of us, because now the cops are certain we did it. But if I rat you out while you're being saintly…. then I go free and you spend serious time (like, 20 years). Of course, the converse holds true too: If I hold my breath while you squeal, then and I am caught holding the bag.

We can translate the "time-in-the-slammer" into a payoff-to-the-individual instead like this: hard time=low payoff. Scot-free=high payoff. You get the drift. We can then write a payoff matrix for the four possible plays. Staying silent is a form of cooperation, so we say this move is "C". Ratting out is a form of defection, so we call this move "D'.

The four possible plays are now: we both cooperate: "CC" (payoff=3), we both defect: "DD" (payoff=1), and we play unequally: "CD" or "DC". If I cooperate while you defect, I get nada and you get 5. Of course, the converse is true if I defect and you cooperate.  So let's arrange that into a matrix:

The matrix entry contains the payoff to the "row player", that is, the player on "the left of the matrix", as opposed to the top. Two players could do well by cooperating (and earning 3s), while the rational choice is both players defect, earning 1s each. What gives?

What gives is, the perps are not allowed to talk to each other! If they could, they'd certainly conspire to not rat each other out. I've made this point in a paper three years ago that was summarily ignored by the community. (I concede that, perhaps, I didn't choose a very catchy title.) Still, I'm bitter. Moving on.

How could players, engaged in this dilemma, communicate? Communication in this case entails that players recognize each other. "Are you one of these cheaters, or else a bleeding-heart do-gooder?" is what every player may want to know before engaging the adversary. How could you find out? Well, for example, by observing your opponent's past moves! This will quickly tell you what kind of an animal you're dealing with. So a communicator's strategy should be to make her moves conditional on the opponent's past moves (while not forgetting what you yourself did last time you played).

So let's imagine I (and my opponent) make our moves conditional on what we just did the last time we played. Because there are four possible histories (CC, CD, DC, and DD, meaning, the last time we played, I played C and you played C, or I played C and you played D, etc.) Now instead of a strategy such as "I play C no matter what you do!" I could have a more nuanced response, such as: "If I played C (believing in the good of mankind) and you played D (ripping me off, darn you) then I will play D, thank you very much!". Such nuanced strategies are clearly far more effective than unconditional strategies, because they are responsive to the context of the opponent's play, given your own.

Strategies that make their choice dependent on previous choices are communicating. Such communicating strategies are a relatively new addition to the canon of evolutionary game theory (even though one of the "standard" strategies called "Tit-for-Tat", is actually a strategy that communicates). Because this strategy ("TFT" for short) will come up later, let me quickly introduce you to it: TFT does to its opponent what the opponent just did to TFT 

So, TFT is sort of an automaton replaying the Golden Rule, isn't it? "I'll do unto you what you just did to me!" Indeed, it kinda is, and and it is quite a successful strategy all in all. Let me introduce a notation to keep track of strategies, here. It will look like math, but you really should think of it as a way to keep track of stuff. 

So, I can either cooperate or defect. That means, I either play option "C" or option "D". But you know, I reserve the right to simply have a bias for one or the other choice. So my strategy should really be a probability to play C or D. Call this probability $p$. Keep in mind that this probability is the probability to cooperate. The probability to defect will be $1-p$.

If communication is involved, this probability should be dependent on what I just learned (my opponent's last play). The last exchange is four possible situations: CC, CD, DC, an DD: the first: I and my opponent just played "C", the last one: I and my opponent just played "D". A strategy then is a set of four probabilities $$ p(CC), p(CD), p(DC), p(DD). $$ Here, $p(CC)$ is the probability to cooperate if my previous move was 'C' and my opponent's was 'C' also, and so on. Clearly, armed with such a set of probabilities, you can adjust your probability to cooperate to what your opponent plays. The TFT strategy, by the way, would be given in terms of this nomenclature as TFT=(1,0,1,0), that is, TFT cooperates if the opponent's last move was "C", and defects otherwise. What other strategies are out there, if the probabilities can be arbitrary, not just zero and one?

That's the question Bill Press asked himself. Being a computational guy, he figured he'd write a program that would have all possible strategies play against all other possible strategies. Good idea, until the supercomputers he enlisted for this task started to crash. The reason why Press's computers were crashing (according to him, in an interview at Edge.org), was that he assumed that the payoff that a strategy would receive would depend on the way it played (that is, the four probabilities written above). 

Well, you would think that, wouldn't you? I certainly would have. Me and my opponent both have four probabilities that define the way we play. I would assume that my average payoff (assuming that we play against each other a really long time), as well as my opponent's average payoff, will depend on these eight probabilities. 

That is where you would be wrong, and Freeman Dyson (according to Press) sent Press a proof that this was wrong the next day. The proof is actually fairly elegant. What Dyson proved is that one player (let us call her "ZD") can choose her four probabilities judiciously in such a manner that the opponent will receive a payoff that only depends on ZD's strategy, and nothing else.

ZD, by the way, does not stand for Zooey Deschanel, but rather "Zero Determinant", which is a key trick in Dyson's proof. I thought I might clarify that.

The opponent (let's call him "O") can do whatever he wants. He can change strategy. He can wail. He can appeal to a higher authority. He can start a petition on Change.org. It won't make a dime of  a difference. ZD can change her strategy (while still staying within the realm of judiciously chosen "ZD strategies", and change O's payoff. Unilaterally. Manipulatively. Deviously.

To do this, all that ZD has to do ist to set two of her probabilities to be a very specific function of the two others. I can show you exactly how to do this, but the formula is unlikely to enlighten you. 

OK, fine, you can see it, but I'll just give you the simplified version that uses the payoffs in the matrix I wrote above (you know, the one with 5,3,1, and 0 in it), rather than the general case described by Press and Dyson. Let's also rename the clunkish $$p(CC), p(CD), p(DC), p(DD)$$ as $$p_1,p_2,p_3,p_4$$

Then, all ZD has to do is to choose $$p_2=2p_1-1+p_4$$ $$p_3=\frac12(1-p_1+3p_4)$$

See, I told you that this isn't particularly enlightening. But you can see that there is a whole set of these strategies, defined by a pair of probabilities $p_1,p_4$. These can be arbitrary within bounds: they have to be chosen such that $p_2$ and $p_3$ are between zero and one.

So let's take a step back. ZD can be mean and force O to accept a payoff he does not care for, and cannot change. Does ZD now hold the One Ring that makes the Golden Rule a quaint reminder of an antiquated and childish belief in the greater good? 

Not so fast. First,  ZD's payoff under the strategy she plays isn't exactly the cat's meow. For example, if ZD's opponent plays "All-D" (that is, the strategy (0,0,0,0) that is the game-theoretic equivalent of the current Republican-dominated congress:  "No,no.no,no") ZD forces All-D to take a constant payoff alright, but this is higher than what ZD receives herself! This is because ZD's payoffs do depend on all eight probabilities (well, six, because two have been fixed by playing a ZD strategy), and playing All-D does not make you rich, if you play ZD or not. So you better not play ZD against All-D, but you might be lucky against some others.

Take, for example, one of the best communicating strategies ever designed, one called either "Win-Stay-Lose-Shift" (WSLS), or alternatively "Pavlov". WSLS was invented by Martin Nowak and Karl Sigmund in a fairly famous Nature paper. This strategy is simple: continue doing what you're doing when you're winning, but change what you do when you lose. So, for example, if you and your opponent cooperate, then you should be cooperating next time ('cause getting a '3' is winning). Hence, $p_1=1$. On the contrary, if you just played 'D' and so did your opponent, this would not be called winning, so you should switch from defection to cooperation. Thus, $p_4=1$. If you played 'C' and your opponent suckered you with a 'D', then that is definitely not called winning, so you should switch to 'D' next: $p_2=0$. If you instead played 'D' to the hapless opponent's 'C', well that's a major win, and you should continue what you're doing: $p_3=0$.  

How does WSLS fare against ZD? Well, WSLS gets handed his lunch! ZD sweeps the floor with WSLS. I can give you the mean expected payoff that WSLS gets when playing a perfectly vanilla ZD with $p_1=0.99$ and $p_4=0.01$. (That's a ZD strategy that likes to cooperate,  but is fairly unforgiving.) WSLS get an average of 2 against this strategy, just as much as any other strategy will gain. If ZD is playing those numbers, it doesn't matter what you do. You get two and like it. That's the whole point. ZD is a bully.

What does ZD get against WSLS? What ZD receives generally (against any opposing strategy) is a more complicated calculation. It's not terribly hard but it requires pencil and paper. A lot of pencil and paper, to be exact. (OK, fine, maybe not for everyone, required a lot of pencil and paper. And eraser. A lot of eraser.) The result is too large to show on this blog, so: sorry!

Alright, this is not the days of Fermat because we can link stuff, so there it is, linked. You learned nothing from that. If you plug in the WSLS strategy (1,0,0,1) into that unwieldy formula you just not looked at, you get:

The mean payoff a ZD strategy defined by $p_1=0.99, p_4=0.01$ receives from WSLS is $11/27\approx 2.455$. That's a lot more than what WSLS gets (namely 2). That's Grand Larceny, isn't it? ZD the victorious bully, while WSLS is the victim?

Not so fast, again.

You see, in real situations, we don't necessarily know who we play. We're just a bunch of strategies playing games. I may figure out "what kind of animal" you are after a couple of moves. But within a population of coexisting agents, you know, the kind that evolves according to the rules that Charles Darwin discovered, you play with those you meet.

And in that case, you, the domineering ZD strategist, may just as well run into another ZD strategist! For all you know, that ZD player may be one of your offspring!

What does ZD get against another ZD? I'll let you think this over for a few seconds.

Tic

                       Toc

                                             Tic

                                                               Toc

Keep in mind, ZD forces ANY strategy to accept a fixed payoff.

                       Tic

                                       Toc

ZD must accept the same payoff she forces on others when playing against other ZDs!

But that is terrible! Because once ZD has gained a foothold in a population that ZD can rip off, that means that ZD has achieved a sizable population fraction. Which means that ZD is going to have to play against versions of itself more and more. And fare terribly, just like all the other schmuck strategies she rips off.

But, what if the schmuck strategy plays really well against itself? Like, cooperates and reaps a mean payoff of 3?

You know, like, for example, WSLS?

That would mean that ZD would be doomed even though it wins every single friggin' battle with WSLS! Every one! But at some point, it becomes clear that winning isn't everything. You've got to be able to play with your own kind. WSLS does that well. ZD... not so much. ZD will take your lunch money, and ends up eating its own lunch too. How perfectly, weirdly, appropriate.

I'll leave you with the plot that proves this point, which is almost the one that appeared on the MIT Technology Review post that stated that the world of Game Theory is on fire. In that plot, the green lines show the fraction of the population that is WSLS, while the blue lines show the ZD fraction for different starting fractions. For example, if ZD (blue line) starts at 0.9, it competes with a green line that starts at 0.1. And green wins, no matter how many ZDs there are at the outset. The bully gets her come-uppance!

The plot looks the same as what appeared on MIT Technology Review, but on the original blog post they used a figure from the first version of the paper, where we showed that ZD would lose against All-D in a population setting. We realized soon after that using All-D is not such a great example, because All-D wins against ZD in direct competition too. So we chose WSLS to make our point instead. But it doesn't really matter. ZD sucks because it is so mean, and has no choice to also be mean against "its own kind". That's what takes it down.

If I was naive, I'd draw parallels to politics right here at the conclusion. But I'm not that naive. These are games that we investigated. Games that agents play. There, winning isn't everything, because karma is a bitch, so to speak.  In real life, we are still searching for definitive answers.

These results (and a whole lot more) are described in more detail in the publication, available via open access:

C. Adami and A. Hintze: Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything: Nature Communications 4 (2013) 2193.

The curious case of the lethal publication

Disclaimer: I refuse to believe anything I wrote below. And so should you :-)


I have a confession to make. I think I may have been responsible for the demise of some very eminent scientists. Unwittingly, and unwillingly, of course. But the evidence is really quite damning, and I shudder to think what horrible deed I will commit next. Perhaps the only way for me to clear my conscience is by coming clean with the whole story. So bear with me here, as I free myself from this heavy burden.

 

It all began in 1986, when I started graduate school in Physics at Stony Brook University on Long Island. I had worked on solitons in nonlinear field theories already at Bonn University, but had never published anything about them. These solitons, called "Skyrmions" after their inventor Tony Skyrme, were supposed to be a good model of the nucleon. I wrote two papers about them in 1987. Shortly after publication of the first, Tony Skyrme died mysteriously (he was only 64). It said, at the time, that he died from an error in anesthesia while undergoing routine knee surgery, and I certainly believed it. Today I am not so sure anymore. I think I am to blame. When I write papers about somebody's life work, it seems like I seal their fate.

 

But I'm getting ahead of myself. Surely this was just a coincidence. It was just a pair of papers--and the second one was only published in 1988, because the publisher had lost the manuscript at one point. Yes, we sent in manuscripts by mail during the olden days. 

 

When I turned my attention from nuclear physics to theoretical biology, one of my first papers was on the concept of self-organized criticality, invented and championed by the physicist Per Bak. Per and I were  friends for the most part. He would tell me about how he handled submissions to the journal Physical Review Letters when he was editor there, and how that got him fired. (When the stack of manuscripts on the desk of a fellow editor got too tall, he just sent "accepted" notices out to all of the authors. Those were the days.) But the publication of my article "Self-organized criticality in living systems" was not something he seemed to be very fond of. I know that because I have good evidence that he was the referee on the submission to Science magazine that was ultimately unsuccessful. Anyway, he did not survive that publication. He passed away in circumstances that are quite shocking. I'm still sadddened by it. And, looking back, I feel even more guilty. 

 

But, I'm sure you understand, I had no inkling at that point that I wielded some sort of nefarious power. I was just following my career, publishing papers in the areas that interested me. And at that point in time, I had become interested in the theory of computation, mainly because I started to work in the field  of quantum computation. One of the seminal people in this area was Rolf Landauer, who showed that the only thing that costs you energy in a computational process is the act of erasure. I even sat next to him at a dinner once! But why did I just have to write this paper on "Complexity, Computation, and Measurement" in 1996, where I introduced the concept of "Landauer machines"? Rolf was a hero of mine, but unfortunately that paper was just too much. It was not long after it was published that Rolf succumbed. 

 

Naturally I could still ascribe all this to circumstance, and certainly I did. So when I wrote a paper about "Entropic Bell Inequalities" in 1996, I was not worried. After all, John Bell had died in 1990, and the other people possibly inolved in this work were those that created the concept of "EPR pairs" that were so fundamental to that paper. "EPR" stands for "Einstein-Podolsky-Rosen", of course. Einstein died in 1955, Boris Podolsky in 1966, and Nathan Rosen in.. Oh no! Rosen was still alive when we wrote our paper! But he was 86 already, and so it was just natural that he passed on even while our paper was still in revision. Or was it?

 

After this experience, I learned to be more careful. For example, I learned that if you write a paper together with the person whose seminal contribution you address, that person is safe. This is how Ismail Zahed and Gerry Brown survived the onslaught of my publications. And I felt safe writing papers on the classical theory of information, because its originator (the eminent Claude Shannon) was surely dead by the time I ventured on the scene. My article on the "Physical Complexity of Symbolic Sequences" that appeared in 2000 showed that you could use Shannon information as a proxy for the complexity of biomolecular sequences in general. But, I had no idea that Shannon was still alive. Nobody knew! He would go around to information theory conferences, listening to conversations between researchers during breaks, and then introduce himself. Of course, nobody would believe he was Shannon! 

 

So Shannon was alive after all, until this cursed paper of mine appeared! Another hero of mine slain! I was dumbfounded. Mortified. I vowed never to write a paper again. 

 

But you know that I could not keep this up for long. Like you yourself would have done, I shrugged this off as pure coincidence mixed with a pinch of paranoia. After all, Landauer and Shannon were old. It wasn't me, I repeated to myself. So I started on another project: to understand the abundance distribution of species. These distributions are fairly general: you can write down distributions of any taxon, as a function of how many subtaxa each taxon generated. So, for example, I wanted to know what is the distribution of number of species that each genus has created, or the distribution of genus sizes (number of genera) that each family generated, and so on. With my student Johan Chu, I developed a theory that would predict these disctributions based on only two (and sometimes just one) parameter, using the theory of branching processes. To test the theory, we wrote to John Sepkoski, a paleontologist at the University of Chicago, who as his life's work has created data sets (compendia) of marine animal families and genera. He kindly sent us his compendia, which we used for our paper that ultimately appeared in the Proceedings of the National Academy. John was 50 at the time, so surely he was safe.

 

But he was not. John Sepkoski passed away at the age of 50 of an apparent heart attack, just before my paper appeared in print. My heart broke a little too. What is this curse?

Could it get any worse? What if I wrote a paper on the growth of complexity in biology, a subject that is very near to the heart of eminent evolutionary biologist and paleontologist Stephen J. Gould? Could my paper "What is Complexity?" that appeared in 2002 have anything to do with his sudden passing? Perhaps I should not have addressed the concept of "punctuated equilibrium", one of hist best-known contributions?
 

I can't go on. I don't want to look at the list of my publications anymore, for fear of what I might find. Best not to think about it anymore. I hope that my upcoming paper in Nature Communications about a certain type of strategy in game theory (so-called ZD strategies) doesn't have any negative consequences for anyone. Bill Press, one of the co-authors of the original paper on ZD strategies is 65 and in perfect health, as far as I know. And Press's co-author, the eminent Freeman Dyson is... oh oh. Dyson is 89.
 

 

What is Information? (Part 4: Information!)

This is the 4th (and last) installment of the "What is Information" series. The one where I finally get to define for you what information is. The first three parts can be found here: Part 1, Part 2Part 3. Let me quickly summarize the take-home points of parts1-3, just in case you are reading this temporally removed from parts 1-3.

1.) Entropy, also known as "uncertainty", is something that is mathematically defined for a "random variable". But physical objects aren't mathematical. They are messy complicated things. They become mathematical when observed through the looking glass of a measurement device that has a finite resolution. We then understand that a physical object does not "have an entropy". Rather, its entropy is defined by the measurement device I choose to examine it with.  Information theory is a theory of the relative state of measurement devices. 

2.) Entropy, also known as uncertainty, quantifies how much you don't know about something (a random variable). But in order to quantify how much you don't know, you have to know something about the thing you don't know. These are the hidden assumptions in probability theory and information theory. These are the things you didn't know you knew.

3.) Shannon's entropy is written as "log p", but these "p" are really conditional probabilities if you know that they are not uniform (all the same for all states). They are not uniform given what else you know. Like that they are not uniformly distributed. Duh.

Alright, now we're up to speed. So we have the unconditional entropy, which is the one where we know nothing about the system that the random variable describes. We call that $H_{\rm max}$, because an unconditional entropy must be maximal: it tells us how much we don't know if we don't know anything. Then there is the conditional entropy $H=-\sum_i p_i\log p_i$, where the $p_i$ are conditional probabilities. They are conditional on some knowledge. Thus, $H$ tells you what remains to be known. Information is "what you don't know minus what remains to be known given what you know". There it is. Clear?

"Hold on, hold on. Hold on for just a minute"

What?

"This is not what I've been reading in textbooks."

You read textbooks?

"Don't condescend. What I mean is that this is not what I would read in, say, Wikipedia, that you link to all the time. For example, in this link to `mutual information'. You're obviously wrong. 

Because, you know, it's Wiki!"

So tell me what it is that you read.

"It says there that the mutual information is the difference between the entropy of random variable $X$,  $H(X)$, and the conditional entropy $H(X|Y)$, which is the conditional entropy of variable $X$ given you know the state of variable $Y$."

"Come to think of it, you yourself defined that conditional entropy at the end of your part 3

. I think it is Equation (5) there!" And there is this Venn diagram on Wiki. It looks like this:"

Source: Wikimedia

Source: Wikimedia

Ah, yes. That's a good diagram. Two variables $X$ and $Y$. The red circle represents the entropy of $X$, the blue circle the entropy of $Y$. The purple thing in the middle is the shared entropy $I(X:Y)$, which is what $X$ knows about $Y$. Also what $Y$ knows about $X$. They are the same thing.

"You wrote $I(X:Y)$ but Wiki says $I(X;Y)$. Is your semicolon key broken?"

Actually, there are two notations for the shared entropy (a.k.a information) in the literature. One uses the colon, the other the semicolon. Thanks for bringing this up. It confuses people. In fact, I wanted to bring up this other thing....

"Hold on again. You also keep on saying "shared entropy" when Wiki says "shared information". You really ought to pay more attention."

Well, you. That's a bit of a pet-peeve of mine. Just look at the diagram above. The thing in the middle, the purple area, it's a shared entropy. Information is shared entropy. "Shared information" would be, like, shared shared entropy. I think that's a bit ridiculous, don't you think?

"Well, if you put it like this. I see your point. But why do I read "shared information" everywhere?"

That is, dear reader, because people are confused about what to call entropy, and what to call information. A sizable fraction of the literature calls what we have been calling "entropy" (or uncertainty) "information". You can see this even in the book by Shannon and Weaver

(which, come to think of it, was edited by Weaver, not Shannon). When you do this, then what is shared by the "informations" is "shared information". But that does not make any sense, right?

"I don't understand. Why would anybody call entropy information? Entropy is what you don't know, information is what you know. How could you possibly confuse the two?"

I'm with you there. Entropy is "potential information". It quantifies "what you could possibly

know". But it is not what you actually know. I think, between you and me, that it was just sloppy writing at first, which then ballooned into a massive misunderstanding. Both entropy and information are measured in bits, and so people would just flippantly say: "a coin has two bits of information", when they mean to say "two bits of entropy". And it's all downhill from there. 

I think I've made my point here, I hope. Being precise about entropy and information really matters. Colon vs. semicolon does not. Information is "unconditional entropy minus conditional entropy". When cast as a relationship between two random variables $X$ and $Y$, we can write it as $I(X:Y)=H(X)-H(X|Y)$.

And because information is symmetric in the one who measures and the one who is being measured (remember: a theory of the relative state of measurement devices...) this can also be written as $I(X:Y)=H(Y)-H(Y|X)$.

And both formulas can be verified by looking at the Venn diagram above.

"OK, this is cool."

"Hold on, hold on!"

What is it again?

"I just remembered. This was all a discussion that came after I brought up Wikipedia, that says that information was $I(X:Y)=H(X)-H(X|Y)$, while you said it was $H_{\rm max}-H$, where the $H$ was clearly an entropy that you write as $H=-\sum_i p_i\log p_i$. All you have to do is scroll up, I'm not dreaming this!"

So you are saying that textbooks say $I=H(X)-H(X|Y)$  (1) while I write instead $I=H_{\rm max}-H(X)$,   (2) where $H(X)=-\sum_i p_i\log p_i$. Is that what you're objecting to?

"Yes. Yes it is."

Well, here it is in a nutshell. In (1), information is defined as the difference between the actual observed entropy of $X$, minus the actual observed entropy of $X$ given that I know the state of $Y$ (whatever that state may be). 

In (2), information is defined as what I don't know about $X$ (without knowing any of the things that we may implicitly know already), and the actual uncertainty of $X$. The latter does not mention a system $Y$. It quantifies my knowledge of $X$ without stressing what it is I know about $X$. If the probability distribution with which I describe $X$ is not uniform, then I know something about $X$. My $I$ in Eq. (2) quantifies that. Eq. (1) quantifies what I know about $X$ above and beyond what I already know via Eq. (2). It quantifies specifically the information that $Y$ conveys about $X$. So you could say that the total information that I have about $X$, given that I also know the state of $Y$, would be $I_{\rm total}=H_{\rm max}-H(X) + H(X)-H(X|Y)=H_{\rm max}-H(X|Y)$.

So the difference between what I would write and what textbooks write is really only in the unconditional term: it should be maximal. But in the end, Eqs. (1) and (2) simply refer to different informations. (2) is information, but I may not be aware how I got into possession of that information. (1) tells me exactly the source of my information: the variable $Y$. 

Isn't that clear?

"I'll have to get back to you on that. I'm still reading. I think I have to read it again. It sort of takes some getting used to." 

I know what you mean. It took me a while to get to that place. But, as I has hinted at in the introduction to this series, it pays off big time to have your perspective adjusted, so that you know what you are talking about when you say "information". I will be writing a good number of blog posts that reference "information", and many of those are a consequence of research that was only possible when you understand the concept precisely. I wrote a series on information in black holes already (starting here). That's just the beginning. There will be more to come, for example on information and cooperation. I mean, how you can only fruitfully engage in the latter if you have the former. 

I know it sounds more like a threat than a promise. I really mean it to be a promise. 

Why NP can't possibly be P

Do you think P=NP? Or were you just curious what these letter combinations could possibly mean in an otherwise well-constructed sentence?

Whether or not P=NP is actually one of the "big" unsolved problems in mathematics. You can be a million bucks richer by proving it one way or the other. (It's one of the

seven math problems

deemed worthy of a million bucks by the

Clay Mathematics Institute

.)

So this P=NP (or not) thing is a pretty big deal. What's it all about?

A lot has been written about this problem, and if you want to get up to speed about it, the

Wiki entry

is not a bad place to start.  But I'll try my best to give you a non-technical introduction that will make you care. Because the question that was first asked by the mathematician and computer scientist

Stephen Cook 

 is profound. I'm not kidding here: Mathematics, beauty, and Rachmaninov's second piano concerto may be involved. Well, at least, in this post. 

I want you to think about two different things. Two seemingly completely unrelated things. 

First thing: How long does it take to find the answer to difficult problems? Of course, putting it like this is not precise enough. So let's be more specific. How hard is it to find one rare thing in the midst of $N$ not so rare things? 

"Well, it depends", you would answer, and you would be right. It depends on a lot of things. For example, suppose I want you to find a red dot in the middle of very many black dots. That's easy, you can do this visually in no time flat, because the red dot sticks out (left panel below). If the dot you are supposed to find is not so easy to distinguish, then it might take you a while longer (blue dot in the right panel below). If I asked you instead to tell me whether or not any of the dots are different at all (and I don't tell you what the difference is) you may have to look at every dot individually. And if there are $N$ dots, that's how long it would take you on average.

Easy Search                                                                                  Less Easy Search

But problems of size $N$ (problems that take $N$ steps to solve) are still in the class of easy problems. A little harder are problems that take polynomial time to solve (like $N^2$ steps, or even $N^{10,000}$).

Problems of this type are said to belong to the class "P", where P stands for polynomial. They are (for the most part) easy for a computer. Things get problematic when the $N$ is not in the base, but in the exponent. Like, $2^N$ or $e^N$. The exponential function rises sooo much faster than any polynomial, that the difference between the two (polynomial versus exponential) is not one of degrees. It is one of substance.

Now, the Second thing: Suppose somebody hands you a sheet of paper and claims: "THIS is the solution to this extremely hard problem you have been talking about". 

How long will it take you to confirm (or ridicule) said claim? You would think: "Pretty fast I gather", and you would be right (most of the time). In the example of the search above, if I tell you that the red dot is THAT ONE in the upper left corner, all you would have to say is: "OK, right, I see it". For the second panel, it's in the lower right corner (the blue dot). See, it's so much easier if I show you the solution!

There is a name for problems that are really hard, but for which any proposed solution can be checked really quickly. This class of problems is called "NP": the class of problems for which we can verify a solution in polynomial time, but may not be able to 

find

 it in polynomial time. 

Strictly speaking, NP stands for "non-deterministic polynomial".  It means that a computer very unlike the one you read this on (a non-deterministic one) could solve the problem in polynomial time.  An unpredictable computer, so to speak. Maybe a quantum one. Maybe that has to remain the subject of a different post. One that speculates about whether it is obvious that there must be a classical algorithm that factorizes primes in polynomial time.

Let's just think of the problems in the class NP as those that we have no idea how to solve in polynomial time. I'm oversimplifying here, but this is a blog, not a textbook. 

So, I can imagine that you would immediately agree that checking a solution is a hell of a lot faster than finding a solution. Because that makes sense. 

We indeed believe that's true. But we do not know.

Let me give you an example hard problem to illustrate all this. It's kind of the grand-daddy of hard problems, perhaps in part because it is the one that Stephen Cook used to formulate the problem. In fact, he showed that if you could solve

that

problem, then you could solve many many others right away, because this grand-daddy is part of a class of problems that can all be related to each other (called the NP-complete problems). This is cool because it means that if you've solved

any

of the problems in the set in polynomial time, then you've solved them all in polynomial time.

This grand-daddy of problems is the 3-SAT problem, and I urge you to be awake through my description of it. (All other descriptions would put you to sleep much faster anyway.) If you can get through this, I promise that there are symphonies ahead.

"Do I really have to endure a 30 seconds lecture about 3-SAT?"

You brush your teeth, don't you? That's 60 seconds right there! (If not, you should reconsider your brushing routine. Just sayin'.)

So here it goes. 30 seconds. 

You know about logic already. If A and B are two variables that can be either TRUE or FALSE, then you know whether A$\vee B$ is TRUE or FALSE, depending on the truth value of A and B. Now let's get C into it. I can now ask, for example, whether A$\vee$B$\vee\neg$ C is true. This $\neg$ symbol, by the way, is just the logical NOT. So, if A is TRUE, then $\neg$A is FALSE. And $\vee$ stands for OR, of course, while $\wedge$ simply means AND. 

Well, you say, obviously, this clause A$\vee$B$\vee\neg$ C is TRUE if any of A or B is true, or C is false. Indeed, to check a single clause, that's easy. But what if we take two clauses (each made from three variables (three, get it, 3-SAT?)  and ask whether the AND of these two clauses is true. Like, for example, (A$\vee$B$\vee\neg$ C)$\wedge (\neg$ B$\vee$C$\vee\neg$ D).  

"Well OK", you say, now I have to determine whether each of the clauses is TRUE to figure out whether the AND of the two clauses is true. And I see what you did there: in any of the clauses, variables can reoccur. "And you're down 15 seconds, by the way. "

All right. Now imagine I have a given number of variables A,B,C, to whatever. Like N of them.  And I make tons of clauses from them (each of them ORs of three variables), ANDed together. And I call the resulting expression a "statement". And I ask you the all-important question: 

"Could the resulting statement ever be TRUE?" 

It could look like this:

(R$\vee$T$\vee\neg$ C)$\wedge (\neg$ B$\vee$C$\vee\neg$ D)$\wedge(\neg$ W$\vee$B$\vee\neg$ M)$\wedge$(F$\vee\neg$ G$\vee$ D)$\wedge$(W$\vee$T$\vee\neg $R)

Or it could be a lot longer. Asking whether the statement can ever be TRUE is asking whether the statement can be "satisfied". And because the statement is composed of clauses with three variables, it is the "3-Satisfiability" problem: 3-SAT.

Now, I don't expect you to be able to answer this question just by looking at the statement. And, I also realize that my 30 seconds are up. But you could of course just test it. If there are $N$ variables, you just have to test $2^N$ possible cominations of TRUE and FALSE for these variables, plug them into the statement, and check if it ever returns 1. But you understand immediately that this will take a long time if $N$ is large. An exponential amount of time, because $2^N$ depends on $N$ exponentially.

(Really it does. Because $2^N=e^{\ln(2) N}$. If you are more comfortable with exponentials to the base e.)

Now, let me imagine for a moment that you have been thinking about this for longer than the 30 seconds it took you to read through this simple example. Please, let me indulge in that thought just for the sake of it. 

You may have come up with this simple thought:

"What if..... every variable appears only once in any clause? Shouldn't it be obvious that some combination will result in a TRUE statement?"

Yes. Yes indeed, that is quite right. I'm impressed.

"But wait, wait. What if there are only, like, two variables A and B, but a gazillion clauses. Isn't it obvious that there are bound to be contradictions between clauses so that not all clauses could possibly be right at the same time (kind of like A and $\neg A$ cannot be both be TRUE at the same time! Am I right? Am I?"

Gosh, you. You're like, totally right. If that happens (a small number of variables, a large number of clauses) you can fairly quickly rule out that the statement can ever be TRUE. Indeed, figuring out whether a statement of many clauses but a small number of variables, or a large number of variables and just a few clauses, is usually quick and easy. 

For anything inbetween, however, we think it'll take checking essentially all $2^N$ to figure out satisfiability. Sorry. Unless, of course, there is a trick nobody has yet thought of. An algorithm of unheard of power. 

Remember when I wrote that I want you to think about two things? Two very different things? 

The first was:  How long does it take to solve a really hard problem? In the case of 3-SAT, we think it'll take an "NP" amount of time, on average. NP: a very very long time. Possibly as long as the universe has been around (for big $N$.). But we can check any proposed solution quickly. 

Just imagine: someone gives you a sequence of variables A,B,C, etc. and claims that 

"A=TRUE, B=FALSE, C=FALSE.... will satisfy the statement!"

You can check that in no time at all, by just plugging it into the formula. Voilà! Either that someone was right, or not. The time to check any proposed solution is polynomial in the problem size, that is, it is P. 

Is P (time to check a solution) equal to NP (time to find a solution)? That's preposterous right? "No way!" you exclaim. Yet, nobody has ever been able to prove that this is not so. People have tried, and  

even claimed they succeeded

So we have to entertain, at this point in time, the possibility that P=NP, as preposterous as that may sound. Because the people who have been trying to prove the opposite aren't exactly amateurs. There is, after all, a million bucks riding on it. So could it be that P=NP?  What would the world be like it if it was true?

First, let us think about this: Is anything like this (P=NP) even possible at all? It would mean that we can create rare things that are "TRUE" (by some measure) as easily as we can check that any thing is TRUE (by some measure). Let's go back to 3-SAT to see what this means. Let's imagine that we deal with a hard case of 3-SAT: that there is actually only a single lonesome set of assignements to the variables that results in the statement to be TRUE. In that case, the statement is satisfiable, but proving it will require finding the proverbial needle in a haystack. We know that we can check every proposition quickly. But if we don't have any way of guesssing what a good solution is, then we have to check all of them. Checking each may be quick. Checking all is not, because there are an exponential number of candidates. 

How could we find this one needle in a haystack? Is something like this even thinkable in principle?

It is indeed! Think of a different problem: creating a poignant and meaningful piece of poetry. 

"That's ridiculous!", you exclaim, "How can you liken a creative process such as writing poetry to an automated procedure?"

Hold on there for a moment. Take you favorite piece of poetry. Think about it. Recite it in your head.

What, you don't have one? Really, you should spend a bit more time away from the computer, smell the roses, so to speak. What about this one, about the

thing with feathers

? No? What about

Hamlet's soliloquy

? Well, you should have a favorite piece of poetry, is all I'm saying. I have more, but I can tell you're getting tired of this discussion already. 

Now, given the medium in which those pieces of poetry that I referred to are rendered (sequences of letters drawn from an alphabet of 26 symbols, plus the space), they are extraordinarily rare. I think I can even trust you to calculate how rare, if you imagine the "blind monkey" way of writing a poetry on a type writer. You know, the kind that (theoretically speaking) can reproduce all the works of Shakespeare when randomly hacking away at a keyboard.

The chance of even producing a Shakespearean sonnet by this method before the sun engulfs this planet are pretty much nil, you can imagine. That's if the monkey types completely randomly. Now, what if it's not completely random? What if, say, you write random 9 letter sequences, and keep all those that match any word (or combination of words) that Shakespeare ever wrote?

Well, that would be an algorithm that still runs for a long time, but

it can actually produce a Shakespeare sonnet

The one called "

A Lover's Complaint

", no less. How is this possible?

The answer is that any deviation from a random search is an algorithm, a heuristic. Heuristics can accelerate search at sometimes formidable rates. The one I described is, when you think about it, an

evolutionary

one. You keep the good stuff (even if it was produced by a random process), and you try, try, again.

Yes it is true, evolution is an algorithm--a heuristic--that can produce complexity from a diet of random chance. I'm pretty sure you're with me on this. But there are plenty of other algorithms. For example (for the problem of reconstituting Shakespeare's writings) you could bias the way letters are produced, for example by imposing blanks between letters at the rate dictated by the average length of words in English.

"This is cheating!", you might exclaim. But that is precisely what an algorithm is: finding ways to cheat, so that you don't have to examine every possible solution.

In the light of this, we can now formulate the central problem of P=NP.

Are there ways to cheat so that we can find the truly hard stuff fairly quickly? For example, as quickly as we can check whether the proposed solution really is a solution?

Let's use the art metaphor one more time. There are "solutions" (time series of notes) that we call music that are generally acknowledged to be "good". I'm sure you can be with me here: this is neither well-defined nor do I think it can be. (This is not stopping us from trying: stay tuned for my lab's efforts in this direction). And not everybody agrees. If there was an obvious algorithm for good music, then we would not be paying successful composers the money they earn. "Not everybody can be a Mozart", is what we say.

Think about it: we (usually) can tell fairly quickly whether a piece of music is any good. Can we write such a piece as quickly?

Every fibre in our body says: "Hell no!"

Now, we don't usually think of our musical appreciation abilities as algorithmic, but perhaps we should.  The generation of music can be algorithmic (and in fact the generation of music via evolutionary algorithms was the subject of a recent article within the

Proceedings of the National Academy of Sciences

). And yours truly wrote a commentary about that article

here

. But I digress. As is my wont.

If P=NP, then there must be an algorithm that creates beatutiful music, beautiful poetry, beautiful fiction, you name it. Of course, just because we haven't found it (trust me, we haven't), that doesn't mean it does not exist. Can a machine be programmed to produce these kind of rare pieces?

The answer to the latter question is, of course, "We don't know". All I know is that it is unlikely that a computational algorithm (the way we understand it) can produce Rachmaninov's second piano concerto, for example. I have a personal view about this, because I've worked on his 2nd (and to a lesser extent 3rd) concerto since 1982. But I digress as usual.

Sergej Rachmaninov

(1873-1943)

I'm not arguing here for a non-algorithmic view of human performance or ingenuity. What I am suggesting is that the origin of these rare achievements is found in the history of the machine that produced it (the personal life experience of the composer, here, Sergej). Mathematical algorithms do not take experiences into account: they produce solutions from given input data. But the human algorithm takes a lot more input into account: this input data could be essentially infinite (our experience of the world up to the present). Even more, the algorithm people use is not deterministic, in the sense that no two people are alike. We get a "solution" very very rarely. Some personal histories, blended with exquisite mental algorithms operating within that person's brain, can produce unmatched beauty, once in a while.

So where does this leave us? NP is probably not P, because if it was then there would be no place for genius in this world. Genius is an algorithm that takes advantage of history: the accumulated experiences (guided by curiosity) that make up a person's baggage. It's good that NP does not equal P. Life, the world, the universe, would be boring if NP=P. It's not a proof, and I don't expect to receive a check in the mail for it. But I'm fairly confident that I'm right about that.