Most amino acids are encoded by multiple triplets of nucleotides (i.e., synonymous codons) that differ in the third codon position (1) or rarely in the second codon position (2). Usage of synonymous codons for a given amino acid is not at equal frequencies both within and between genomes (2-5). Different organisms exhibit species specific preference towards a subset of synonymous codons for coding particular amino acids (3, 6). This non random usage of synonymous codons is referred as SCUB and is an essential characteristic of both prokaryotic and eukaryotic genomes (2).
Though synonymous mutations are generally neutral or silent due to no change in amino acid sequence, SCUB are reported to have profound effects on gene expression and function (7-10). Population genetic studies suggested that mutational biases due to nucleotide compositional constraints or weak selection on specific codons might be contributing to SCUB (10-12). Many studies confirmed that SCUB is higher in genomic regions on which substantial purifying selection acts at amino acid level (10-13). Highly expressed genes experience stronger SCUB than genes with low expression (4). Moreover, evolutionarily conserved protein coding regions show stronger SCUB (14). However, the role of various physiological processes that contribute to SCUB in protein coding regions of genomes has remained elusive (10).
Contributing major forces on SCUB fall into three categories: (i) nucleotide compositional constraints (15), (ii) translational elongation rate optimization by natural selection, and (iii) a balance between mutational pressure, selection and genetic drift in a finite population (11, 16, 17). Other contributing factors include interaction between codons and anticodons (18), efficacy of replication (19) and usage of codon pairs (20).
A plastid genome of higher plants comprises of 80 protein coding genes (PCGs), 4 rRNA genes and 30 tRNA genes (21). Since most proteins in the chloroplast are essential for photosynthesis, protein coding regions of chloroplast are highly conserved in higher plants, although a few exceptions exist (22, 23). In plastidic genome, mutational pressure favours high representation of A/T and it appears to be the major factor shaping SCUB (24 - 26). However, codon usage of psbA gene is found to be highly correlated with the corresponding t-RNA population in the chloroplast, indicating the possible influence of selection for translational efficiency on psbA. Previous studies on SCUB and variety of factors influencing its diversification in chloroplast genome, revealed that even though mutational bias is predominant, selection on codon usage cannot be nullified. A recent finding suggested that intron evolution and DNA methylation could be considered as potential factors that frame SCUB in land plants (27).
The angiosperm genus Oenothera (Family: Onagraceae) is commonly distributed in South Africa (28) and North America (29). Genus Oenothera is considered to be well suited for understanding the various molecular aspects of speciation process as it is amongst well-characterized plant genera (30). The genus Oentothera has been regarded as an ideal model to study evolution of plant genomes (particularly plastids), since substantial information about its systematics and genetics are available (30). In Oenothera, plastome-genomes are highly incompatible. However, fertile plants have been evolved due to (i) the exchange of plastids and nuclei between species, and (ii) the exchange of individual chromosome or complete haploid set between species (30).
Plastid genomes of 5 Oenothera sp. have been completely sequenced (30) and revealed that all plastomes are genetically distinct due to wide nucleotide substitution, small insertions, deletions and repetitions (30). In addition, phylogenetic analysis proved that these plastomes differ from common ancestral origin of vascular plants by a 56 kb inversion within the large single copy region (30). These findings suggested Oenothera plastomes as most suitable candidates to explore SCUB as well as the factors influencing its diversification. Basic features of molecular evolution can be identified by determining evolutionary patterns at synonymous sites in codons. Hence, in the present study, major objectives were i) investigation of trends associated with synonymous codon usage in 5 distinct Oenothera plastomes to obtain an insight into the major forces that shape SCUB and ii) identification of putative preferred codons in the PCG of plastomes that helps to optimize heterologous gene expression. Correlation analysis of various codon usage indices provided a better understanding about the pattern of SCUB in the plastomes. Identification of putative optimal codons would certainly pave the way for developing transplastomic Oenothera sp. for enabling evolutionary biologists to study underlying molecular mechanisms behind plant genome evolution.
3. Materials and Methods
3.1. Sequence Data and Nucleotide Compositions
Complete nucleotide sequences of all 5 plastomes of Oenothera sp. viz., Oenothera argillicola, Oenothera biennis, Oenothera elata, Oenothera glazioviana and Oenothera parviflora were retrieved from National Centre for Biotechnology Information (NCBI) website and details were presented (Table 1). PCGs of each genome were extracted and coding sequences (CDS) that contain less than 300 codons were excluded in order to avoid sampling errors. Integrity of the CDS was evaluated by examining the presence of initiation and termination codons at appropriate places without any internal stop codons. Duplicate sequences were removed from the dataset. Thus, final data set for analysis contained 54 CDS for O. argillicola, O. biennis, O. elata and O. parviflora and 53 CDS for O. glazioviana.
Overall and local nucleotide compositions (i.e., nucleotide contents at 1st, 2nd and 3rd codon positions) were calculated for each CDS. Spearman’s rank correlation analysis was used to reveal the correlations between overall and silent base contents such as A3, T3, G3 and C3 to unravel intrinsic properties of SCUB.
3.2. Indices of Codon Usage
3.2.1. Relative Synonymous Codon Usage (RSCU)
RSCU value of each codon was calculated to study the trend associated with SCUB in PCGs of Oenothera plastomes. RSCU value has been extensively used to study codon usage of PCGs in various genomes as it is independent of amino acid composition. RSCU values were calculated using the following equation (31).
If RSCU value of a particular codon is greater than 1, it indicates the biased codon usage (31).
3.2.2. Effective Number of Codons (ENC)
ENC of particular gene has been widely used in codon usage research to measure the extent of SCUB of that particular gene (32). ENC can vary from 20 (strictly biased; 1 codon for 1 amino acid) to 61 (no bias; all synonymous codons are used equally for each amino acid family). It is considered as an effective method to measure SCUB because it is independent of gene length (32). Preference towards particular codons in each amino acid family due to either selection or mutational pressure reduces the value of ENC. If ENC value of a gene is 35 or less, that particular gene can be considered as highly biased and vice versa. Expression levels of highly biased genes have been considered as high.
Expected ENC value under no selection can be calculated for any value of GC3 as per the equation (32)
Where s = GC3
A plot between calculated ENC value of each CDS and its corresponding GC3 value was developed for all Oenothera sp. to provide an understanding of the influence of GC compositional constraints in shaping SCUB. If majority of genes are grouped on or just fall below the left/right hand side of the expected GC3 curve, GC3 compositional constraints will be suggested as the major force that determine SCUB (32). If majority of genes are grouped considerably below the expected GC3, selection may be the significant force in shaping SCUB (32).
3.2.3. Codon Adaptation Index (CAI)
CAI is used to measure the extent of SCUB towards a subset of codons in each amino acid family of a given gene on the basis of preferred codons (translationally optimal) (33) in highly expressed genes such as ribosomal proteins and translational elongation factors. CAI is a good indicator of the level of expression as it takes into account all 59 synonymous codons in a quantitative manner (33). CAI value of a gene may vary from 0 to 1, a lower value indicates less SCUB (low expression level) and higher value close to 1 indicates higher SCUB (high expression level) for a given gene. In the present study, ribosomal protein coding genes of each Oenothera species were used as the reference set of highly expressed genes for finding out CAI values for corresponding species (34).
Where wn = relative adaptness of nth codon, L = number of codons
3.2.4. Synonymous codon usage order (SCUO)
SCUO is used for quantitative evaluation of relationship between GC composition at each codon position and SCUB for a gene and it is computed as per the equation (35)
Tukey test was used to analyse the differences in SCUB within genomes and the Wilcoxon two-sample test was used to compare the SCUB across five plastomes.
3.3. Correspondence Analysis (COA)
COA was performed to study the various characteristics of SCUB across different PCG in each Oenothera plastome (36) based on RSCU values (37, 38). All PCGs were plotted in a 59 dimensional vector space based on the usage of 59 synonymous codons. Each PCG is regarded as a 59 dimensional vector and RSCU value of each codon is represented as a dimension (39). Major variations in the trend associated with synonymous codon usage were explained by the first axis with subsequent axes explaining diminishing amounts of variance (40). Spearman’s rank correlation analysis was used to reveal correlations between various codon usage indices described above and major axes of COA as this method of correlation is independent of any kind of distributional assumptions (41).
3.4. Identification of Putative Optimal Codons
To identify putative optimal codons/ preferred codons, 10% of PCG located on both extremes left and right of axis 1 of COA were chosen to form 2 data sets (42). For each of the 59 synonymous codons, Chi square test was applied to the 2 ×2 table that was constructed from the above 2 data sets. First row of the table contains observed frequencies of a codon and the second row contains total frequency of other synonymous alternatives of that particular codon (41).
3.5. Cluster Analysis
Cluster analysis on RSCU values (39) was performed in order to understand the grouping of Oenothera species according to the codon usage. In cluster analysis, a 5×59 matrix was generated in which rows and columns correspond to pooled RSCU values of 59 codons and five Oenothera species, respectively. Clustering of Oenothera species occurred based on the RSCU values by unweighted pair group average clustering using Euclidean distance.
3.6. Bioinformatic and Statistical Softwares
Total base compositions and base compositions at each codon positions were calculated by using MEGA version 5.2.2 (43). Dambe version 5.3.31 (44) was used to find out RSCU values. Online version of codonW (45; http://mobyle.pasteur.fr/cgi-bin/portal.py) was used to estimate ENC, hydropathy score (a number indicating hydrophobic/hydrophilic properties of side chain of an amino acid) and aromaticity (frequency of aromatic amino acids) values. CAI values were calculated by using CAI calculator 2 (46). CodonO (47) was employed to compute SCUO values (35). All kinds of statistical analysis including correspondence analysis and cluster analysis were carried out using PAST version 2.12 (48) and the significance was measured at 5% level.
4.1. Intrinsic Properties of Synonymous Codon Usage
Total and synonymous nucleotide contents were estimated. In the plastomes of O. argillicola, AT content was higher than GC content. Among the silent base contents, viz., A3, T3, G3, and C3 (A, T, G and C contents at 3rd codon position), T3 was noted to be higher than all others with a mean and standard deviation (SD) of 36.64 and 4.78, respectively. Lowest nucleotide content at silent site was noted to be C3 with a mean and SD of 14.54 and 3.36, respectively. Spearman’s rank correlation analysis revealed strong positive correlations between A and A3, T and T3, G and G3, and C and C3. Whereas significant negative correlations were observed between other heterogenous nucleotide contents (Table 2). Strong negative correlation between A and T3 and vice versa suggested the possible influence of AT at silent sites (AT3) in shaping SCUB of PCGs in O. argillicola plastomes. Additionally, high positive correlations of GC3 with G, C and GC contents indicated GC compositional constraints might also be present. However, no correlation existed between GC3 and any of A/T contents. These complex correlations revealed that nucleotide compositional constraints play a crucial role in framing SCUB across PCGs in O. argillicola plastomes. Similar patterns of correlations were identified in other 4 Oenothera plastomes examined (Table 2).
4.2. GC Composition Influences on SCUB
GC composition has been regarded as an important force that shape codon and amino acid usages (49). Total GC content and GC composition at 3 codon positions of all selected PCGs of Oenothera plastomes were calculated and a dot plot was produced with respective SCUO values (Figure 1). Strong linear but negative correlations were found between SCUO and variables such as GC3 (r = -0.495, p < 0.01), GC1 (r = -0.442, p < 0.01) and GC (r = -0.353, p < 0.01). Among these variables, dependency of SCUO on GC3 (SCUO = 0.008 (GC3)+0.481) was noticed as stronger as revealed by the highest correlation between them. To study the influence of overall GC on local compositions, linear correlation analysis was performed for GC with all local GC contents. GC was linearly correlated with GC1 (r = 0.840, p < 0.01), with GC2 (r = 0.613, p < 0.01) and with GC3 (r = 0.560, p < 0.01). Whereas GC3 was in high correlation with GC1 (r = 0.471, p < 0.01) but not with GC2 (r = - 0.138). Similarly, GC1 was also found not correlated with GC2. Similar pattern of linear correlations were also observed in other Oenothera species. These results suggested that overall GC content, GC1 and GC3 influenced SCUB in all examined Oenothera plastomes. Thus, mutational pressure has significant role in dictating SCUB across PCGs in Oenothera plastomes. Difference in SCUB among five species of Oenothera was compared by Wilcoxon two-sample test and the result was indicative of no significant difference in SCUB between any two species.
4.3. Features of Overall and Strand Specific Relative Synonymous Codon Usage
Overall and strand specific synonymous codon usage were examined (Table 3). In 18 synonymous families of amino acid, A and T ending codons were used more frequently than G and C ending codons, indicating an AT-rich nature of plastomes. Most of the 3, 4 and 6 fold degenerate amino acid families were observed to use T ending codons except Gly and Arg. Strand specific codon usage bias was observed for 6 fold degenerate amino acid Arg in all Oenothera sp. except O. elata. For Arg, codon usage was biased towards CGT in minus strand for all species whereas in plus strand, codon usage was biased towards CGA in O. biennis, O. glazioviana and O. parviflora. However, both CGT and CGA were used at equal frequencies to code Arg in minus strand encoded genes. Four fold degenerate amino acid Val used GTA most often in all plus strand encoded genes whereas all minus strand encoded genes used GTT most frequently (Strand specific codon usage). Chi-square analysis on codon count of 10% genes distributed on extreme left and extreme right of axis 1 revealed 5 statistically over represented codons (i.e., putative optimal codons) in O. argillicola (i.e., GCT, GAA, CAT, AAT and CCT), 1 in O. biennis (CGA), 4 in O. elata (i.e., TGT, AAT, GTT and GTA), 7 in O. glazioviana (i.e., GAT, TTT, CAT, AAT, CCT, CGT and TCT) and 2 in O. parviflora (i.e., GCT and GTA). All putative optimal codons used A/T ending codons only.
4.4. Quantification of SCUB
ENC has been used as a reliable tool in SCUB analysis as it is effective for short genes and for skewed usage of amino acids (32). ENC value of a gene clearly demonstrates SCUB in a range from extreme bias to minimal bias. Plotting ENC values of genes against corresponding GC3 values displays major characteristics of synonymous codon usage patterns of PCGs in a genome. In this study, majority of protein coding genes were grouped on the left hand side of the expected GC3 curve in all chosen Oenothera sp. (Figure 2). Hence, GC3 compositional constraints might influence SCUB across PCGs in Oenothera plastomes. However, some genes were located considerably below the expected GC3 curve indicating the possible influence of some other force such as natural selection in framing SCUB. No significant correlation was observed between GC3 and GC12 in neutrality plot (Figure 3). This suggested that intragenomic GC mutational bias on GC content at all codon position is low, which in turn indicates high conservation of GC content. Furthermore, narrow distribution of GC contents was observed in neutrality plot, revealing the role of selection in framing SCUB. Association between A, T and G, C was analyzed using parity rule 2 (PR2) bias plot and noticed that A and T contents were used more proportionally than G and C contents (Figure 4).
4.5. Various Factors Affecting SCUB
The COA on RSCU values of PCGs in 5 Oenothera plastomes was carried out and positions of PCGs along first 4 axes are given in Figure 5. The first 4 axes accounted for 34.59%, 34.76%, 34.66%, 35.57% and 34.49% of total variation in O. argillicola, O. biennis, O. elata, O. glazioviana and O. parviflora, respectively. No single major explanatory axis was found to detail variations in all the chosen plastomes. Significant negative correlations were found between axis 1 and indices, indicating gene expression levels such as ENC and CAI in all chosen species (Table 4). Axis 4 was in high negative correlation with ENC in O. argillicola and O. parviflora. Another index of level of gene expression, viz., CAI was positively correlated with axis 4 of O. argillicola and was negatively correlated with axis 2 of O. glazioviana. Significant correlations of various axes of COA with gene expression indices such as ENC and CAI suggested the influence of gene expression levels in the SCU variation across PCGs in Oenothera plastomes. No correlations were observed between any of the 4 axes of COA and gene length or aromaticity. In O. glazioviana, hydropathy score was in significant negative correlation with axis 2, but no such correlation was observed in any other Oenothera sp. Strong negative correlation between T3 content and axis 2 in all Oenothera sp. indicated its high influence on SCU variations. Interestingly, GC3 content was in strong negative correlation with axis 3 in all chosen species and was in positive correlation with axis 4 in O. elata. This suggested that GC3 and T3 influences SCU variations considerably in all examined PCGs of Oenothera plastomes. Correlation analysis between first 4 axes of COA and RSCU value of 59 synonymous codons revealed certain significant negative correlations in all chosen species, i.e., axis 1 with GCC , TGC, GAT, GGG, CAT, AAT, CCC, CCG, AGA and TCG and axis 2 with TGC, GGT, CAT, CTT, TTG, AGA, CGT and GTT (Table 5). Though correlations existed between other 2 axes (i.e., axes 3, 4) and RSCU value of certain codons, it was observed to be species specific. These results pointed out that mutational pressure combined with weak selection might be acting on the PCGs of all Oenothera plastomes to cause SCUB. Cluster analysis revealed no major differences in synonymous codon usage across genetically distinct Oenothera plastomes as all Oenothera species formed only one cluster (Figure 6).
All preferred codons were found to use A/T ending codons in Oenothera plastomes as plastid chromosomes are AT rich (5, 40). Mutational pressure towards or against GC composition determine the ATGC compositions of a genome (40, 50). In all examined Oenothera plastomes, AT3 (AT content at silent sites) is expected to be an important factor in SCU variation across PCGs. However, strong positive correlations existed between GC3 and individual G/C contents. This suggested that GC3 may also be considered as one of the possible factors. This can be explained by extremely low GC3 that influences SCU considerably (32). Therefore, high AT3 (~ 68.10%) and low GC3 (~ 31.70%) can be regarded as the major factors behind SCU variation in Oenothera plastomes similar to what has already been reported in Coffea arabica (5), Populus alba (51), and in both Nicotiana tabacum and Oryza sativa (26). Moreover, point mutations, repetitions, insertions/ deletions and inversions were reported to contribute to base compositional changes in Oenothera (30). The impact of these mutations may reflect in SCU variations across PCGs in Oenothera plastomes.
Influence of GC composition on SCU was further elucidated by correlation analysis between SCUO and GC composition at each codon positions. Apparent linear relationship was found between overall GC content, GC1 and GC3 of all examined PCGs. As observed in grass models (40), we herein noticed that GC3 was the dominant factor in framing SCUB in Oenothera sp. This result suggested mutational pressure as significant driving force of SCU variations in Oenothera sp. ENC Vs GC3 plot also confirmed the role of GC composition on SCUB as most of the PCGs lie on or just below the expected curve. However, grouping of some genes considerably below the expected curve points out the influence of weak selection. In addition, neutrality plot showed narrow distribution of GC and no correlation was found between GC3 and GC12. Slope of the GC12 Vs GC3 plot was close to 0, indicating the role of specific evolutionary pressure (i.e., selection pressure) in shaping SCUB. Thus, selection against mutational pressure may be acting on the PCGs at Oenothera plastomes, and intragenomic GC mutational bias on GC content was small similar to other plastomes (52). In a single stranded DNA, Chargaff’s 2nd parity rule states that a almost equals T and G almost equals C (53, 54). However, PR2 bias plot analysis confirmed the deviation from Chargaff’s 2nd parity rule in organellar DNA as A and T contents were used more proportionally than G and C contents in Oenothera plastomes.
Strand specific codon usage bias was observed in Oenothera plastomes: 6- and 4-fold degenerate amino acid for Arg and Val, respectively. This may be due to the intrinsic efficiencies of individual codons (55) and may not be correlated with translation efficiencies. Though 5 species of Oenothera are closely related, number of putative optimal codons varied for each species (i.e., 1 to 7). All optimal codons used A/T at their ending position as observed in C. arabica (5), T. aestivum and H. vulgare (40). Thus mutational bias can be regarded as a major factor for SCU variations in Oenothera plastomes (24). If other selection pressures are absent, this mutational bias towards A/T ending codons would certainly increase the RSCU value of synonymous/T ending codons to more than 1 (40).
The COA on RSCU values of Oenothera plastomes revealed no single major explanatory axis to explain the total variations. This pointed out that apart from the 2 major forces behind SCUB, viz., mutational bias and natural selection, some other factors may be acting on the PCGs to cause SCUB. Similar observation was found in pooid grass models (40). CAI and ENC values have been proven as reliable indices for measuring the level of gene expression (32, 42). First axis of COA was in significant negative correlation with all examined species of Oenothera (P < 0.01). This suggested clearly that gene expression level also has considerable influence in SCU variation across PCGs. Influence of gene expression levels on SCUB in plant genes was recently reported in Zea mays (56) and also in Oncidium ramsey (34). Significant negative correlations of axis 2 with T3, and axis 3 with GC3 in all Oenothera species suggest the influence of T3 and GC3 contents in framing SCUB. Though no correlation was observed between length/ aromaticity and various axes of COA in all species, hydropathy score was in significant negative correlation with second axis in O. Glazioviana. This suggests the role of hydropathic character of proteins in SCU variations as observed in O. ramsey (34) and in grass models (40). Moreover, certain codons were found to have significant negative correlation with axes 1 and 2 in all species. Among them, more than 60% of codons contained pyrimidine at the 3rd positions. These results suggested that mutational pressure combined with weak selection dictates SCU in all examined PCGs of Oenothera plastomes.
All examined plastomes belong to the subsection Euoenothera (biennis group) (29). Interestingly, high degree of phenotypic variation was observed among members of biennis group across various disjunct populations in different places of North America (29). Thus, small disjunct populations of Oenothera sp. are expected to experience genetic drift since random mutations in small population lead to random fixation over a period of time (57). Unexpected evolutionary changes are considered as a result of random process such as genetic drift rather than natural selection (59).
We conclude that the present finding would certainly facilitate studies on plant genome evolution as Oenothera sp. are considered to be suitable for studying compartmental co-evolution (30). Moreover, putative optimal codons for each species were identified and those codons can be used for optimization of heterologous gene expression by introducing point mutations (56).
There is no acknowledgement
RRN and GD conceptualized the study. RRN and NTR contributed equally to this study, both of them equally carried out most of the experiments and wrote the manuscript. VRD, MBN and GD have critically revised the manuscript and the experimental design. MBN, TS and TCV helped in experiments. All the authors have read and approved the final manuscript.
No financial support from any agencies.
The authors declare that they have no competing interests.