The Genetic Code: A Universal Template for Protein Translation
All known organisms share the 'central dogma' of molecular biology. DNA is transcribed into mRNA that is translated into protein. During the discovery of the genetic code, Francis Crick hypothesized that translation required a mediator to aid mRNA-guided translation according to a number of specifics.
Amongst these specifics, Crick postulated that a triplet, a group of three bases, codes for one amino acid. He also proposed a degenerate code "that is, in general, one particular amino-acid [that] can be coded by one of several triplets of bases."1
Protein Translation and Codon Usage
Indeed, Crick was right. The mediator, now known as a tRNA, reads complementary mRNA sequences in triplets via a triplet anticodon. tRNA anticodon recognition is ambiguous. A minimum of 31 different tRNA anticodons are required to translate the 61 sense codons of the standard genetic code found in transcribed mRNA. But how can 31 anticodons correspond to 61 codons?
The apparent promiscuity in tRNA anticodon recognition is partially explained by the tRNA-specific RNA-residue, inosine. Inosine recognizes the nucleotides U, C or A and more specifically recognizes the third base of codons, the so-called wobble position. This feature of inosine facilitates a 'degenerate' genetic code, and can introduce codon bias, where 31 anticodons can recognize and translate 61 sense codons.
Codons mutated in the wobble position are recognized by the same anticodon-tRNA as the native codon. These mutations, called synonymous mutations, preserve the protein sequence. From an evolutionary perspective, synonymous mutations are considered neutral, because they have no effect on the overall fitness of the individual carrying the mutation.
However, synonymous mutations do change protein expression levels, although they don't change the amino acid sequence. These mutations may also alter post-translational modifications, conformation, stability and function. This is why construct design for heterologous protein expression in synthetic biology is important. Codon optimization, for which you can use a codon optimization tool, can be used to introduce synonymous mutations that will favor efficient soluble functional protein expression.
In protein expression, the synonymous mutations described above may not actually be neutral, because certain codons are translated more efficiently than others—creating codon bias. Furthermore, a synonymous mutation in a codon with a limited availability of corresponding tRNA anticodons could result in lower protein expression due to ribosome stalling. This can present a problem in synthetic biology applications. Many organisms display biased use of certain synonymous codons, and it is generally accepted that codon biases reflect a balance between mutational biases and natural selection for translational optimization.
How Is Codon Usage Measured?
The most common measurement for codon usage is the Codon Adaptation Index (CAI). This index examines the codon usage (resulting from codon bias) in highly expressed genes from a species and assesses the codons that are preferentially used in that reference set.2
Codon Optimization Tool
Nowadays, a variety of programs exist to help you determine the codon usage (and codon bias) in your favorite species, called codon optimization tools. For example, CodonW is an open source software program, which was written by John Peden, who is a member of the laboratory that first proposed the CAI. CodonW simplifies the multivariate analysis necessary to determine codon usage.
Codon Usage in Health and Disease
Recently, the fact that normal cells and cancer cells from the same individual have different codon usage, and codon bias, has received considerable attention. Such findings suggest that the expression of tRNA genes is under the control of distinct transcriptional programs. We currently don't know whether differential tRNA expression contributes to cancer or other diseases, but researchers speculate that aberrant tRNA pools can boost or silence the expression of oncogenes or tumor suppressor genes, respectively.3
Codon Usage in Forced Protein Expression and Synthetic Biology
Difficulties in Protein Expression
In the laboratory, investigators often want to express proteins across species (e.g., human proteins in E. coli cells). One might think that the genetic code permits expression of any open reading frame (ORF) in any organism, which is partially right. However, the presence of rare codons in the transgenic mRNA can result in suboptimal ribosome use and depletion, which ultimately reduces the levels of heterologous protein expression.
Traditional approaches in protein expression and synthetic biology involve mutating transgene codons to those that are preferentially used by the host species (i.e., reduce codon bias), but such changes may increase the risk of amino acid starvation as well as altering the equilibrium of tRNA pools. Also, for a whole protein, manual codon optimization for expression in two species is very demanding, and for optimal expression in three or four species this is an almost impossible task.
Today, newer strategies for the optimization of heterologous protein expression take into account many factors, such as global nucleotide content, local mRNA folding, codon bias, a codon ramp or codon correlations.
Solutions for Protein Expression
There is number of algorithms available to help you with codon optimization, many of which focus solely on codon usage tables. It's worth knowing what to look out for when you decide to use an algorithm for codon optimization.
A good platform will allow you to alter both naturally occurring and recombinant gene sequences to achieve the highest possible productivity levels in your chosen expression system. Ideally, you want an algorithm that considers a range of critical factors involved in protein expression, such as codon adaptability, mRNA structure, and various cis-elements in transcription and translation. One platform that satisfies these criteria is the OptimumGene algorithm from GenScript, which has optimized over 50,000 gene sequences to date in almost all major expression systems.
As a prerequisite, all three parts of the central dogma need to perform well so that transcription, mRNA stability, and translation are efficient and in harmony with the codon pool. For ultimate efficiency in these parts, it is necessary to consider a number of factors:
Parameters for Transcriptional Efficacy:
- Avoid high GC content and CpG-methylated sequences, cryptic splicing sites and negative CpG islands.
- Employ codon-optimized cDNAs with optimal TATA boxes and termination signals to maximize the likelihood of high yield protein expression.
- Be sure to include a termination signal in your cDNA.
Considerations for Translational Efficiency:
Codon usage is a key determining factor for efficient protein expression, but in bacteria and archaea the Shine Dalgarno (SD) sequence also plays a pivotal role. This sequence is important in both translation initiation and efficiency, and mRNA sequences with SD homology negatively impact protein translation because the SD homologous region competes with the bona fide SD sequence for binding to the 16S rRNA.5 The available algorithms optimize mRNA sequences to avoid SD homology.
The free energy of 5' mRNA ends also has a significant impact on corresponding protein levels. This was elegantly shown by expressing 154 GFP mutants in E. coli, where hairpins engineered into the 5' mRNA end reduced GFP expression by up to 250-fold, compared to an optimal codon-optimized construct. The 5' stable free mRNA energy accounted for more than half of the cases of reduced GFP protein expression, which is 10-fold more than any of the other parameters measured in that study.4 Codon optimization ensures that the 5' mRNA end is unlikely to form stable hairpins, thus facilitating optimal mRNA loading and protein translation.
The AU-rich elements (AREs) found in the 3' untranslated region of the mRNA affect mRNA stability. These elements serve as binding sites for proteins and microRNAs (miRNAs) that promote mRNA degradation and, thereby, decrease protein levels. In that same vein, premature poly-A sites and internal ribosomal binding sites also decrease protein levels by aberrant processing of the mRNA and decreased translation.
Considerations for Protein Folding:
Optimization of protein folding is necessary so that the newly synthesized protein is chaperoned to its correct secondary and tertiary structures. Apart from the transcriptional and translational tools outlined above, the codon context can be optimized as well as the interaction between codon and anticodon to ensure faithful protein refolding.
The Future of Protein Expression
In the future, we are likely to see developments in directed cDNA translation for efficient and targeted protein expression in different cell types. Such applications are warranted by the fact that proliferating and non-dividing cells from the same organism can actually have different codon usage patterns.3 Furthermore, the apparent differential codon usage in cancerous versus normal cells presents a novel and attractive therapeutic target, especially for synthetic biology applications. For example, gene expression of tumor suppressor or death-inducing cDNAs could be preferentially elevated in cancer cells through optimizing codon preferences of these cells.
There is little doubt that advances in codon optimization have the potential to significantly improve recombinant protein production, synthetic biology, and tailored therapy design, including gene therapy and nucleic acid-based vaccines. However, before these applications can be fully realized, a deeper understanding of the effects of codon optimization is needed, since concerns surrounding the risk of anti-drug antibody development have been raised several times. Such undesired effects can not only reduce drug efficacy but also cause immunological reactions, threatening the safety of codon-optimized therapeutic proteins. If these concerns can be alleviated, we will be closer than ever before to a new era of personalized medicine.
For more insights into what the future might hold for protein expression and therapeutics, take a look at this critical review.
- Crick, F. H., Barnett, L., Brenner, S., and Watts-Tobin, R. J. (1961). General nature of the genetic code for proteins. Nature. 192:1227–32.
- Sharp PM, Li WH. (1987) The codon Adaptation Index–a measure of directionalsynonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15(3):1281–95. Gingold, H., Tehler, D., Christoffersen, N. R., Nielsen, M. M., Asmar, F., Kooistra,
- S. M., Christophersen, N. S., Christensen, L. L., Borre, M., Sorensen, K. D., et al. (2014). A dual program for translation regulation in cellular proliferation and differentiation. Cell. 158:1281–92.
- Kudla G, Murray AW, Tollervey D, Plotkin JB. (2009) Coding-sequence determinants of gene expression in Escherichia coli. Science. 324(5924):255–8.
- Li, G.-W.; Oh, E.; Weissman, J. S. (2012) The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature, 484(7395):538–541.