The genetic code is the correspondence of 20 standard amino acids with codon triplets on mRNA facilitated through coupling to tRNA1. The consensus view is that the genetic code emerged from an “RNA world”2. In this world all the catalytic functions and of the cell were accomplished by RNA enzymes (ribozymes)2 including replication of an RNA genome. Ten of the twenty standard amino acids have been synthesised abiotically3 and it is likely that these existed in the RNA world. There are several models for how the genetic code originated including the Direct RNA Template (DRT) theory4 which speculates that early codon assignments resulted from a direct interaction of abiotic amino acids with sites (codons) on a precursor of the ribosome. The Coding Coenzyme Handle (CCH) theory5 proposes that amino acids were originally used as cofactors by ribozymes. The amino acids associated with oligonucleotides (precursor tRNA) which contained anticodons that base paired with codons on cofactor requiring ribozymes (precursor mRNA). All models require an initial interaction of amino acid with RNA and this can be explained through the steric interaction theory6.
Steric interaction theory suggests that codon assignment was most heavily dependent on the initial binding of amino acids to RNA sites that contained multiple copies of their modern day codon or anticodon.. Subsequent utilisation of the binding sequences in codon-anticodon recognition can explain initial amino acid codon assignment6. Steric interaction theory is supported by RNA molecules selected from random sequences that bind specific amino acids by their standard genetic code codon, anticodon or both more often than would be expected by chance6. Steric interaction explains how abiotic amino acids were initially assigned codons in the genetic code, however it does not explain how the remaining amino acids that are not synthesised abiotically were assigned codons.
Similar amino acids have codons with a similar sequence1. This lessens the effect of point mutation, and mistranslation as chances are a similar amino acid will be substituted as result of a codon change. It has been suggested that initially a large proportion of the available 64 codons (all possible combinations of 4 ribonucleotides in a triplet) would have been filled by the ten abiotic amino acids as too many termination codons would make the coding of viable proteins unlikely7. The coevolution theory8 suggests that the remaining ten standard amino acids were introduced into the code through the evolution of amino acid biosynthetic pathways and codon capture from the abiotic amino acids. Two mechanisms for codon reassignment are indicated by the theory. The first mechanism is based on steric interaction. Nascent biotically synthesised amino acids competed with abiotic amino acids with similar structure/properties for binding to their cognate tRNA. For example the negatively charged Asp and Glu are both encoded by GAN (N = any ribonucleotide) suggesting that Asp stole its codons from the abiotically synthesised Glu7. The second mechanism is pretranslational modification (pretran synthesis). This involves the modification of an amino acid already coupled to a tRNA molecule. Pretran synthesis provides a direct mechanism for codon stealing as the biotic amino acid will assume all the codons which the tRNA recognises. The pretran mechanism is supported through organisms thought to be phylogenetically closest to the LUCA utilising pretran synthesis7. Amino acids with similar properties were grouped with similar codons due to biosynthesis of new amino acids which stole codons from biosynthetic precursor abiotic amino acids.
It can be argued that the error minimising properties of the code cannot be explained solely on the basis of the coevolution theory9. The theory of error minimisation posits that the negative affects of excess mutation on proteins positively selected for a code that reduced the likelihood of this occurrence by grouping like amino acids with like codons. It is probable that the selective force of error minimisation made improvements on the groupings already inherently created by coevolution.
The Frozen accident theory10 accounts for the apparent halt of amino acid expansion at twenty. Once the twenty amino acids had evolved no more could be coded as the benefits of increased protein versatility were cancelled out by disruption to the proteome. This has been largely born out by the near universality of the genetic code. However some organisms do have an additional amino acid selenocysteine or pyrrolysine inserted at stop codons7. It is less deleterious to convert stop codons to sense and all organisms with 21 amino acids add at stop. With only three stop codons universally available this may limit the theoretical amino acid total to twenty-two. Nine species of the yeast genus Candida have a sense change in their nuclear code in which CUG is changed from a Leu to a Ser codon11. These codon reassignments strongly suggest that the code is not frozen however the rare nature of changes to the nuclear code suggest it is strongly selected against.
With amino acid numbers frozen at about twenty and 64 possible codons redundancy is inevitable. Redundancy roughly correlates with how common the amino acid is in proteins1. This adds another layer of protection against point mutation with the most common amino acids receiving the greatest protection. In addition it has been suggested that the genetic code was formed in a high pressure environment and that this force selected amino acids for high redundancy that would form proteins that could withstand high pressure12. Redundancy is required to reduce the number of nonsense codons and reduce protein error rates.
The genetic code originates from an RNA world, with codon assignments initially being highly dependent on steric interactions. Additional amino acids were coded in a manner that grouped like amino acids with like codons via coevolution. Amino acid codon groupings were further refined by the mechanism of error minimisation. Standard amino acids numbers have not increased beyond twenty due to the high penalties of proteome disruption. Increase however is possible. Redundancy has a positive effect in giving amino acids which are used extensively added protection against mutational change. The code is still evolving although at an incredible low rate.
1. Lewin, B. 2004. Genes. 8th ed. New Jersey: Pearson Prentice Hall.
2. Yarus, M. 2002. Phenotype of the Ribocyte. Annu Rev Genet. 36: 125–151.
3. Miller, S.L. 1987. Which organic compounds could have occurred on the prebiotic earth. Cold Spring Harbor Symposia on Quantitative Biology. 52: 17–27.
4. Yarus, M. 1998. Amino Acids as RNA Ligands: A Direct-RNA-Template Theory for the
Code’s Origin. J Mol Evol. 47: 109–117.
5. Szathmáry E. 1999. The origin of the genetic code: Amino acids as cofactors in an RNA world. Trends Genet. 15: 223–229.
6. Yarus, M. Caporaso, J.G. and Knight, R. 2005. Origins of the genetic code: The Escaped Triplet Theory. Annu Rev Biochem. 74: 179–198.
7. Jukes, T.H. 1966. Molecules and Evolution. New York, NY: Columbia University Press.
8. Wong, J.T-F. 2005. Coevolution theory of the genetic code at age thirty. BioEssays 27: 416–425.
9. Freeland, S.J. Wu, T. Keulmann, N. 2003. The case for an error minimizing standard genetic code. Orig Life Evol Biosph. 33: 457-477.
10. Crick, F.H.C. 1968. The origin of the genetic code. Journal of Molecular Biology. 38: 367–379.
11. Watanabe, K. Ueda, T. 2001. Evolution of the Genetic Code. In: Encyclopedia of Life Sciences. John Wiley & Sons, Inc. http://www.els.net/ [doi: 10.1038/npg.els.0000548].
12. Di Giulio, M. 2005. The origin of the genetic code: theories and their relationships, a review. BioSystems 80: 175–184.