Sometimes we are lulled into a belief that we actually know how biology works. Not that biology is simple, mind you, but after a while we "get the hang of it" and think we have it under control. Sure, we know that molecular biology is complicated: Genes have evolved for eons to be transcribed into mRNA that itself is translated into proteins, who do all the work to build and maintain a cell, and perform computations that lead to decisions that increase the chance of the organism to leave offspring and continue this gigantic experiment of natural selection.
And then once in a while you find that it is not quite the way you thought. Like, you may discover that the expression of genes is influenced by things other than proteins that bind to DNA regulatory regions. When I was a postdoc at Caltech taking part in informal computational biology brainstorming sessions for example, I asked the assembled Caltech braintrust: “Couldn’t RNA bind to other RNA to influence expression levels?” and was laughed out of the room. “RNA does not bind to RNA, silly”, I was schooled, and I could almost feel the virtual hands petting the young physicist’s head that had not studied enough biology.
This was 1997 when I was a Burroughs-Wellcome fellow in Computational Biology, quite a bit of time before the role of microRNAs in the regulation of gene expression was recognized. That was a "It's not quite that simple" moment. And of course, if you try to understand biology, these moments keep on coming. "Bacteria don't have adaptive immune systems" our wisdom would teach the students, and then someone somewhere discovers the CRISPR system, and the rest is again history and we find that it's not that simple. There are plenty more of these moments, and many more to come. I would like to tell you about one such moment that I was involved in recently, schooling me about the concept of pseudogenes. I'm not sure how much it will change the standing paradigm, but it sure made me rethink biology again: "It's not quite that simple".
Here's what we know about pseudogenes. They are dead genes. Genes that once were (expressed, that is). Standard lore says that pseudogene either stopped being expressed or lost their ability to function even when expressed. Dead as a doornail, thus. Wikipedia tells us that much:
"Pseudogenes are dysfunctional relatives of genes that have lost their gene expression in the cell or their ability to code protein".
Dysfunctional. That's good. But there is an important element there: "Relatives of genes". What's that all about?
Think about it. How could a gene that is functional possibly die? It's there for a reason, no? Genes evolve to enhance the likelihood of survival of the organism. You mess with it, the organism dies or performs much worse. Genes can't die without ultimately taking the organism with it. How can there be pseudogenes at all?
The answer to this conundrum lies in the fantastic arsenal of molecular mechanisms that billions of years of evolution have brought us. The origin of most of evolutionary novelty today (and in the recent as well as not so recent past) can be traced back to gene duplications. The molecular mechanisms that mess with perfect inheritance aren't just point mutations. They include insertions, deletions, and wholesale duplications of genetic material. Never mind how that happens in detail. Let's focus on that duplication process. Once in a while, an organism ends up with two (or even more) copies of the same gene. That means you (the organism) have one good such gene, and a copy with unlimited potential. This gene has the right stuff, as well as the enviable luxury of freedom: keep what you have and become whatever you want! It is the privilege of the offspring of the well-heeled.
So what is the fate that awaits these duplications? No point in imitating their parent's function: let's explore new frontiers! In the land of genetics, however, their possibilities of future advancement are highly constrained. They may not need to perform the parent's function (as they are ably doing that already) and can explore new functions via mutations. But the mutations that they incur aren't exactly helping. In fact, they are mostly not helpful AT ALL. If a mutation does not propel you (the duplicated gene) to new functions, it likely will abort the one you had, either by messing with the sequence so that the protein does not fold anymore, or even worse, truncates the protein early (meaning that a stop codon was inserted in the middle of the sequence), making a short and most likely non-functional protein. There are plenty of examples of such deceased genes in the genome. Another way to kill a gene is to mess with its transcriptional logic (the on-off switch to make the protein, so to speak). If the on-off switch is mutated so that it is permanently in the off position, well then you have another pseudogene.
According to this model, pseudogenes should play no active role in the functioning of a cell. A passive role has been described, where pseudogenes are thought to be a "reservoir" of potentially active sequences that can be thought of as an "evolutionary playground", from which new genes can arise like a Phoenix from its ashes, so to speak.
But there have been tantalizing hints that pseudogenes also have an active role, that they are only "mostly dead", to throw in a Princess Bride reference. In the last 15 years a mounting number of cases have been reported where pseudogenes are transcribed nonetheless, and contribute to the organism's function in one way or the other. Interestingly, many of these cases are linked to diseases, such as cancer. In fact, as the ENCODE collaboration has been insisting (and in so doing providing another one of the "it's not quite that simple" moments), almost all of our genome is transcribed at one point or other, whether functional or not.
In my lab I have a project to study the evolution of drug resistance of HIV (the virus that causes AIDS), using cell cultures in which the virus is exposed to different conditions. The project is led by the very talented Dr. Aditi Gupta, who after a Ph.D. in computational biology joined my lab and taught herself the necessary bench science to pull off these kinds of experiments. Aditi generates lots of RNAseq data in this project. For those asleep in the last 10 years, RNAseq is a method to accurately measure the level of transcription of genes. It's quite a fantastic method that has all but replaced the microarrays that used to be the staple of molecular biology. I used to teach bioinformatic analysis of microarrays: you can believe me when I say that you should not trust microarray data as far as the next trash bin.
But RNAseq is fantastically accurate, if you know how to analyze it. And fortunately Aditi does, so she decided on a whim to compare the RNAseq profile of cells infected with HIV and compared it to a control that was not infected. When you do this, you immediately notice the differential expression of genes that are a part of obvious pathways linked to infection. While reading papers about the possible role of pseudogenes in cancers, she started wondering whether pseudogenes also were expressed in her HIV samples. And sure enough, not only were pseudogenes expressed, but they were differentially expressed, meaning that they were either much more, or much less, expressed in the infected cells compared to the uninfected cells.
What's up with that? So she applied extremely stringent criteria for differential expression (at least a four-fold difference), corrected for false-discovery, and finally arrived at a list of 21 pseudogenes that, somehow or other, seem to play an active functional role in HIV infection.
So what do you do with that? Well, the first thing is you check out the parents of these pseudogenes. Because as you remember from what I wrote earlier (yes, I realize, figuratively eons ago) pseudogenes are "dysfunctional relatives". So each pseudogene has a parent from whence it sprang. What are these pseudos parents? What do they do for a living?
First of all, about a third of the parent genes of the pseudogenes in our study are also differentially expressed in HIV infection, which is significant because--obviously--a random set of genes does not have a third of them implicated in HIV infection. Taken together, about half of the parent genes play a role in viral infections (for example, some are also active in influenza). Thus, most of the parents are one way or the other involved in fighting the infection. However, it is not true that parent and pseudogene are always both up-, or both down-regulated (we say these pairs have "synergistic expression"). There are plenty of examples of parent-up, pseudo-down, and vice versa (we call this "antagonistic expression").
A typical functional mechanism for synergistic expression is this. Imagine that a host gene, when expressed, is exposed to a microRNA (possibly activated by a virus) that attempts to silence this gene. Up-regulating the expression of pseudogene copies of that parent now would make sense for the cell, as the pseudogene RNA product could serve as a decoy--as a molecular sponge so to speak--to soak up the microRNAs triggered by the invader. Such an interaction has been seen with the protein PTEN and its pseudogene offspring PTENP1, involved in cancer . We see many examples of such synergistic interactions in HIV as well (but of course we cannot be sure that the mechanism is similar).
An example of antagonistic interactions comes from a case when a pseudogene's mRNA product attracts the attention of a protein that helps stabilize the mRNA of the parent gene. By pulling away the stability-inducing protein, the pseudogene indirectly reduces the expression of the parent gene (because the parent's mRNA is now unstable and degrades). In this case the reduced expression gives rise to insulin resistance and Type 2 diabetes . We have examples consistent with such a pattern in our list of pseudogene-gene pairs also, but again we do not have experimental evidence to support any particular mechanistic hypothesis. Indeed, we can also see patterns where a pseudogene transcript is normally expressed in uninfected cells, but suppressed under HIV infection. At the same time, the parent gene is up-regulated under infection. In general, the interaction between the pseudogene and the gene doesn't have to be direct, as in the examples I gave. It is even possible that the differential regulation of the pair has nothing to do with each other. But because of the similarity of the transcripts, we expect that the link is often fairly direct.
So we see that pseudogenes can have hidden lives. The ENCODE project suggests that one out of five pseudogenes are transciptionally active, but given the opportunistic nature of evolution (and the increasing evidence that these interactions come to the fore in particular in disease states), the fraction might be much higher. Because these pseudogenes, it appears, were never dead to begin with, we should not call them "zombiegenes". Instead, we should simply call them "shadowgenes": they are a shadow of their parent, live hidden lives most of the time, but come out of hiding when coaxed out by an invader. Then, depending on whether they help defend the cell, or aid and abet the aggressor, they can be hero or villain. Which is quite appropriate for the shadow's denizens.
The manuscript  describing the results of the differential expression of pseudogenes in HIV was published in the journal Viruses.
 Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465 (2010) 1033–1038.
 Chiefari, E. et al. Pseudogene-mediated posttranscriptional silencing of HMGA1 can result in insulin resistance and type 2 diabetes. Nat. Commun. 1 (2010) 40.
 A. Gupta, C.T. Brown, Y.-H. Zheng, and C. Adami. Differentially-expressed pseudogenes in HIV-1 infection. Viruses 7 (2015) 5191-5205.