^{1}Department of Mathematics, University of Wisconsin, Madison, USA

^{2}Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, IR Iran

^{3}Department of Computer Science, University of British Columbia, Vancouver, Canada

^{4}Department of Computer Science , School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, IR Iran

Abstract

Background: RNA molecules play many important regulatory, catalytic and structural roles in the cell, and RNA secondary structure prediction with pseudoknots is one the most important problems in biology. An RNA pseudoknot is an element of the RNA sec ondary structure in which bases of a single-stranded loop pair with complementary bases outside the loop. Modeling these nested structures (pseudoknots) causes numerous com putational diffilties and so it has been generally neglected in RNA structure prediction algorithms. Objectives: In this study, we present a new heuristic algorithm for the Prediction of RNA Knotted structures using Tree Adjoining Grammars (named PreRKTAG). Materials and Methods: For a given RNA sequence, PreRKTAG uses a genetic algorithm on tree adjoining grammars to propose a structure with minimum thermodynamic energy. The genetic algorithm employs a subclass of tree adjoining grammars as individuals by which the secondary structure of RNAs are modeled. Upon the tree adjoining grammars, new crossover and mutation operations were designed.The finess function is defied ac cording to the RNA thermodynamic energy function, which causes the algorithm conver gence to be a stable structure. Results: The applicability of our algorithm is demonstrated by comparing its iresults with three well-known RNA secondary structure prediction algorithms that support crossed structures. Conclusions: We performed our comparison on a set of RNA sequences from the RNAseP database, where the outcomes show effiency and practicality of the proposed algorithm.

1. Background RNA is an active component in various cellular process es. They act as carriers of genetic information, catalysts in cellular processes and mediators in determining the expression level of genes. Whereas the folding of RNA or its secondary structure is very important for func tionality of RNA, thus it is of great interest to propose computational techniques for predicting RNA second ary structural features. Several algorithms are currently designed for the prediction of RNA secondary structure (1-8). The most successful approach for RNA structure prediction is comparative sequence analysis in which covering residues are identical to a multiple sequence alignment of diffrent RNAs sequences with similar structures (9, 10). The accepted secondary structures of most structural and catalytic RNAs were generated by comparative sequence analysis. Comparative meth ods (7, 11-13) use a given set of RNA sequences and their corresponding structures to predict a structure for a new RNA sequence. In the comparative methods, the sequences in the given set must be in the same family with the new RNA, structurally or as a nucleotides ar rangement. Hence, these methods fail to predict a struc ture for a single RNA sequence or a small family of RNAs with little sequence diversity. For these cases, prediction of RNA structure based on free energy minimization is the most widely used technique (5, 6, 14-17). One of the well-known applied approaches in these methods is the dynamic programming. Zuker introduced the dynamic programming approach for predicting RNA structures (17). Furthermore, Walter et al. (18) improved the algo rithm by using thermodynamics rules and Zuker (8) pre sented a web-based predictor application. Nevertheless, due to undiscovered biochemical reasons, results of the comparative methods are more similar to RNA natural structure than results of latter methods. However, one of the diffilties on the problem of RNA structure pre diction is the existence of a certain type of RNA second ary structures called pseudoknots. This type of struc ture is composed of both nested and crossed base pairs. Figure 1 represents the structure of Hepatitis Delta Virus (HDV) in two graphical shapes. Rivas and Eddy (6) intro duced an approach to predict pseudoknot structures by using a subset of context sensitive grammar. This ap proach is also considered by Uemura et al. (16), Lyngsø and Pedersen (3), and Cai et al. (19). In addition, heuristic approaches were also applied to capture nested struc tures. Shapiro and Navetta (20) introduced a parallel genetic algorithm to predict nested structures that was one of the famous predictors. Thereafter van Batenburg et al. (1) demonstrated that GA is a promising method in RNA structure prediction. Batenburg's GA method is part of Structure Analysis of RNA (STAR) package. Ren et al. (21) presented a heuristic algorithm called Hotknots for prediction of RNA structure including pseudoknots. They have improved the quality of the former predic tion algorithms has become and their algorithm often performs better than the genetic algorithm from the STAR package and PKnotsRG-mfe algorithm by Reeder and Giegerich (15). Uemura et al. (16), Akutsu (14), and Kato et al. (22) demonstrated, Tree Adjoining Grammars (TAG) are suitable data structure for a capturing an RNA secondary structure. A TAG is a kind of formal rewriting system to generate a set of trees associated to strings (23). More specially, the structured information of the derived tree in TAGs is of great signifiance in both syntactical and semanti cal concepts. Uemura et al. (16) and Kato et al. (22) illus trated wide ranges of crossed structures which could be captured by a subclass of TAG, called Extended Simple Linear Tree Adjoining Grammars (ESL-TAG), where their corresponding parsing algorithm uses O(n5) time and O(n3) space. There may be numerous adjunct trees in the grammar which represent secondary structures of an RNA. This means that for an RNA sequence, many structures might be possibly found. Thus, the problem is like as a search problem for an optimal structure with respect to thermodynamics energy parameters. 2. Objectives In this manuscript, we present a new genetic algorithm, PreRkTAG, to predict RNA secondary structures includ ing pseudoknots that is developed based on a sub-class of the adjoining grammars. PreRkTAG uses a practical data structure for capturing RNA crossed structures. Our designed genetic algorithm can search a broad area of feasible solutions of the given RNA structure. This genetic algorithm is composed of two evolution ary functions which allow free pairing of nucleotides in structures. This increases the probability of discovering new structures that cannot be represented by the regu lar ESL-TAG. This ability yields to more coverage area of the feasible solutions rather than ESL-TAG covers. On the other hand, a common problem of heuristic algorithms is converging to local minimums, which is highly pos sible especially with a wide range of feasible solutions. In PreRkTAG we attempt to tackle this problem by using novel evolutionary functions. The evaluation of PreRk TAG against celebrated energy-based prediction algo rithms, namely, the STAR, Hotknots, and Pknots, shows high sensitivity and high specifiity of PreRkTAG in the most cases. The evaluation was performed on 24 RNA sequences chosen from the RNAseP database. Our algo rithm correctly identifis more than 80% of base pairs. 3. Materials and Methods The employed data structure for capturing RNA crossed structures was introduced in preceding research (14) and in order to synchronize the notifiations the fist part of this section is focused on ESL-TAG. Consequently, the details of PreRkTAG algorithm are described includ ing the employed inner functions. 3.1. RNA Secondary Structure Modeling Generally, an ESL-TAG is a 5-tuple grammar G = (N, T, S, I, A) where N and T are fiite sets of non-terminal and ter minal symbols, respectively. S is the start symbol and I is fiite set of initial trees (center trees). Finally, the fiite set A is a set of adjunct trees (auxiliary trees). The mem bers of the set N are divided into two subsets; Active and Inactive. During transition process of the grammar G, the active nodes can be replaced by a member of the set A, where inactive nodes do not. Based on the division, when a tree does not have any active node, it would be a matured tree. By adjoining two trees, s and t, a tree t' would be obtained, and t →s t' represents the adjuncted transition. The transitive closure of → is →* and a tree t' is derived from t, where t →* t' for some t ϵ (A ᴜ I). A set of trees of an ESL-TAG G is defied as T (G) = {t | s →*t, s ϵ I where t is matured tree}. For modeling the secondary structure of an RNA with pseudoknot, the tree adjoining grammar G = (N, T, S, I, A) is defied as follows. The set N is a fiite set of non-terminals {X, Y, Z}, where active non terminals are marked with asterisk {X*, Y*, Z*}. The set T is fiite set of terminals {a, u, c, g} as abbreviation of nu cleotide acids. Moreover, S is a start symbol and the set of initial trees I is composed of just one tree as depicted in Figure 2 (this tree is tree of type 1 and denoted by cen ter tree). The last parameter A, is a fiite set of adjunct trees, such as given in Figure 2. This set is composed of four types. As depicted in the Figure 2 , type 2 and type 3 represent base pairs. Moreover via preorder traversal of trees of these types, the type 2 represents crossed base pairs denoted by T2u and T2d, and the type 3 represents nested base pairs denoted by T3L and T3R. In addition, type 4 is introduced to generate a single base without any pairing. Trees of type 5 are used to combine sub structures by using trees of type T5Ld and T5Rd, and inserting a tree into the current one by using trees of type T5Lu and T5Ru. As mentioned, the key idea of this modeling is from Uemura et al. (16). Figure 3 depicts the process of constructing the crossed base pairs "agacuu" which is not generated by any context free grammar. Uemura et al. (16) described a computational obstacle of ESL-TAG trees via an example that is shown in Figure 4 . In this example, it is shown that two dissimilar sequenc es can be captured by one ESL-TAG structure. 3.2. Genetic Algorithm Genetic Algorithm (GA) is kind of Evolutionary Algo rithms (EA) which via broad wide searching inside of probable solution space, analyzes a population of solu tions instead of a single candidate. Goldberg (24) proved the GA convergence theory. Thereafter applicability of

the solutions is a critical step in designing the evolu tionary functions. Novel evolutionary functions are de signed according to tree adjoining grammars and equip the PreRkTAG algorithm towards global convergence. In advance to illustrating the evolutionary functions, a short description of feasible population and data struc ture is presented as follows. 3.3. Generating Population Population of a GA is a set of feasible solutions and a so lution is feasible if the length of its corresponding struc ture is the same as the length of the given sequence. In dividuals of the population in PreRkTAG are generated the algorithm was extended in wide range of optimiza tion problems. The principle of GA is competition of the solutions where a scoring function (finess function) measures their optimality. GA methodology is composed of two evolutionary processes; the fist one is applied on to a solution and by randomly modifying the solution’s properties creates a new solution (Mutation). Another one works on two diffrent randomly selected solutions and generates two new solutions by combining prede termined features of the randomly selected solutions (Crossover). These two evolutionary functions are ap plied during the progress of the algorithm, where both these functions are derived from biological concepts (24). In this optimization method, the evolutionary functions are supposed to inhibit solutions to converge to local minimums. Therefore, designing effient and practical algorithms for these functions play signifiant role as they converge to desirable results. Since these functions apply on the domain of feasible solutions, consideration of the data structure which represents randomly by using ESL-TAG. As described in the previous section, an ESA-TAG starts with an initial state and could be extended easily to cover as many nodes as required. Hence, using ESL-TAG we are able to generate a tree with any given numbers of residues. To construct each ele ment of population, we randomly select a transitive clo sure operator and apply it on initial trees. In this way a matured tree can be constructed in length of a given RNA sequence. Hence, a tree shows a probable structure for RNA sequences with the same length. Generating the population only depends on the length of the given RNA sequence and the rest of the algorithm adjusts these structures to nucleotide arrangement of the given RNA sequence. It is noted that outcome of the algorithm is an optimal structure for the given RNA sequence, which is a feasible structure for all RNA sequences with the same length but is not the optimal structure for them. Generating the population in this way leads to a wide range of feasible solutions where the finess function will narrow them down to the optimal solution of the given RNA sequence. Figure 5 shows an example of a constructed structure for a sequence of length 5. 3.4. Evolutionary Functions We present a detailed description of the evolutionary functions. These functions perform blindly in terms of considering the context of the given RNA. The muta tion function takes a structure from the population and based on its length calculates the number of terminal nodes that might be mutated. Subsequently the mutat ed terminal nodes are selected by uniform distribution. Terminal nodes inside of adjunct trees have three diffr ent types, thus this function acts in three diffrent ways. These treatments are discussed as follows: 1) If the selected node is a single node, then the mutation function selects another single node uniformly at ran dom and pairs them up. The novel structure has a new pair which its ESL-TAG transitive closure is unknown. Hence, the new structure must be parsed by ESL-TAG parser to obtain the corresponding transitive closure. 2) If the selected node is a member of nested pair, then the tree reconstruction is not needed and just their ad junct tree is replaced with two adjunct trees with single terminals. 3) If the selected node is a member of crossed pair, then breaking up this pair is equal to replacing the corre sponding adjunct tree by two adjunct trees; each of them has a terminal node in opposite side of the tree. Algorithm 1 shows the mutation function. This algorithm includes two sub-functions; the "Random (L)" returns a random value L' (0 < L' < L), and the "Length (tree)" re turns the number of terminal nodes of the given tree. In addition, as illustrated (Algorithm 1), in some cases new obtained structures must be parsed again to recon struct the ESL-TAG tree, so in this algorithm the "Recon struct" function is employed to fid ESL-TAG tree of new structures. The next evolutionary function is Crossover that crosses two selected solutions and merges com plimentary parts. The crossover function selects two solutions via normal distribution from the population. Thereafter this function extracts two sub-trees with the same length (number of terminal nodes) and random starting points from each of the selected trees and the crossover function swaps the sub-trees. Removing a sub tree from a tree might come with breaking pairs when ever a pair contains a node inside and another node out side of the sub-tree (this pair is called an exteriorly pair). These nodes will remain single when the swapping process is done. Constructing the trees blindly, with re spect to context of the given RNA sequence, makes the crossover function much easier and helps to avoid lo cal minimums. The crossover algorithm is presented in Algorithm 2. Two introduced inner functions in the Mu tation function (Algorithm 1) are used in this algorithm as well. In addition, two other functions are applied; "Move Down (node, length)" is a function for traversing the given tree from its node to attain length number of terminal nodes, and "Swap (Sub Tree0, Sub Tree1)" is a function for swapping sub-trees which results in two new trees. Figure 6 demonstrates a sample of the cross over process. 3.5. Fitness Function To measure the optimality of a predicted RNA structure, as it has been recently proposed by most methods, ther modynamics rules (17, 18, 25) are applied as the finess function. In this case, the finess function estimates the dis crepancy of simulated structures by measuring energy of structures, where better structure has lower energy. Upon thermodynamics rules, constructing a base pair needs a specifi amount of energy, where in some cases the base pair releases energy (negative value) and in other cases it gains energy (positive energy). In the thermodynamics rules, many parameters effct on the amount of energy and in a simpli fid model, this amount depends on nucleotide acids that compose a pair and also their neighbor base pairs. For exam ple when ‘AA’ in 5’ - 3’ and ‘UU’ in 3’ - 5’, the orientation(AAUU ) Algorithm 1. Mutation Function Algorithm 2. Crossover Function This function includes six inner functions; “Random (L)” returns a random value L’ (0 < L’ < L), the “Length (Sequence)” function that counts the number of terminal nodes of Sequence, “Popu lationGeneration (Pop, Number, S)” that generates Number trees of length of S using ESL-TAG, “GetRand ()” that returns a random value between [0:1], our fitness function which is called “ComputScore (Pop)” function by using thermodynamic roles computes the Pop’s members energy value, and finally the “RouletteWheel (N, Pi, Pj)” via the roulette wheel method copies ((N*PopulationNumber) / 100) members ofPi into Pj. makes two consecutive pairs, and the energy of correspond ing structure will be reduced by -0.9. One example of assign ing energy to two consecutive pairs is borrowed from (5) that is shown in Table 1. In this table, every row and every column shows a pair and the entry (i, j) shows amount of energy of appearing j’th pair after i’th pair. Since these are thermody namics energies and also were assessed experimentally, there is no need to have a symmetric table. We also used the same energy values in our finess function. For a given sequence S, with no base pair, the energy value is equal to zero. Assume e (i, j) denotes the energy value of base pair between i th and j th nucleotides, thus ∑i,jϵSe(i,j) shows the energy value of se quence S. The energy of RNA sequences obtained by the ther modynamic rules implies that number of single bases (with out pair), coaxial, and stacked base pairs have impressive effcts on the energy value of the corresponding sequence (17, 18). Our finess function uses the current version of Turner group thermodynamic rules (5). Within the ith iteration of the genetic algorithm, ten percent of highest score of the current population (that is called Pi) are directly copied into the next population (Pi+1). Then remaining 90 percent of Pi+1 are produced by applying the roulette wheel method on the union of P i and evaluated individuals (Emem bers) of Pi. The evaluated individuals (Emembers) are new mem bers generated by performing the evolution functions on the members of P i. Function Mutation ( t: Tree) Begin MutationNumber = Random (Length(t)) For i=1 to MutationNumber Do Begin node0 = A random node from t; If node0 is a bases of a pair Then Begin node1 = pair of node0; If pair (node0, node1) is a crossed pair Then Begin Break up pair (node0, node1); Reconstruct t; End Else Mark bases node0 and node1 as single bases; End Else Begin node1 = A random single node from t; Mark node0 and node1 as a base pair; Reconstruct t; End; End; Return (t); End. Function Crossover ( t0, t1: Tree) Begin CrossLength = Random (Length (t0)); node0 = A random node fromt0 ; node1 = A random node from t1 ; SubTree0 = MoveDown (node0, CrossLength); SubTree1 = MoveDown (node1, CrossLength); If sub-trees have exteriorly pair Then Break them up; Swap (SubTree0, SubTree1); Return (t0, t1); End.

4. Results To assess our algorithm its accuracy has been evaluated and compared with three common thermodynamic based algorithms: Hotknot (21), Pknots (9), and the STAR (1). To compare PreRkTAG with these algorithms, we measured the accuracy of these algorithms on a set of RNAs from the RNAseP data base. Since genetic algo rithms have random behavior, every execution of our algorithm on a given sequence may result in diffrent structure. Hence, the PreRkTAG algorithm was applied three times on every sequence and the best structure of each run was stored. Then one of these three with the minimum energy was reported as the predicted struc ture. Because of the proposed evolution algorithms, in many cases, the mean of the three energy values were close to the picked minimum energy. The RNAseP data base includes natural structures of this set, therefore in this study the accuracy is measured by considering two diffrent parameters; sensitivity and specifiity of folded sequences. Sensitivity is defied by the number of base pairs that are predicted correctly per number of base pairs in natural structure. The specifiity is equal to the number of true predicted base pairs per number of predicted base pairs. Predictors were employed on 24 sequences from RNAseP data base, which are tabulated in Table 2 . The comparison of sensitivity between our al gorithm and the other algorithms are given in Table 3 . Also Table 4 depicts results of comparing specifiity of these algorithms. Unfortunately because of the length of the test sequences, which are fairly long, the Pknots algorithm could not obtain results within a reasonable timeframe on our hardware. Therefore, in Table 3 and Table 4 the Pknots results are not shown. We also provide running time complexity of the examined algorithms in the Table 5. As mentioned, for every RNA sequence, the PreRkTAG has been applied three times and we indicat ed outcome of the one with minimum energy. For calcu lating the running time of the algorithm, we followed the same idea and considered the running time of the one with minimum energy. It is well known that choos ing suitable parameter values for genetic algorithms is very diffilt (24). There is an inverse relationship be tween the number of generations and the size of popu lation, so that large population size converges fast and Algorithm 3. PreRkTAG Algorithm Function PreRkTAG ( S: RNA Sequences) Begin PNumber = Random (Length( S)); PopulationGeneration ( P0, PNumber, S); i= 0; stop-condition = False ; While (stop-condition = False) Do Begin Emembers = { } ; ComputScore(Pi) ; Copy ten percent of high scores Pi into Pi1+; For each t in Pi Do Begin If GetRand() > Mutation Probability Then Begin t' = Mutation (t) ; Emembers = Emembers t'; End; End; For each t0 in Pi Do Begin If GetRand() > Crossover Probability Then Begin t1 = A random member of Pi ; (t'0, t'1) = Crossover(t0, t1); Emembers = Emembers (t'0, t'1); End; End ; ComputScore(Emembers) ; RouletteWheel (90, Pi Emembers, Pi1+); If Three previous iterations had equal best score Then stop-condition = True ; If i exceeds flip flops limitation Then stop-condition=True ; i++; End; End. requires the consideration of a few number of genera tions. On the other hand, large population size requires more evaluations per generation so as to avoid the de cline of the rate of convergence. Generally, the optimal size of population depends on the domain complexity of the feasible solutions. In addition, the probabilities of performing evolutionary functions play a critical role to avoid trapping in local minimums. Small values for the crossover and mutation rates yield to copying most of the individuals to the next population directly which is benefiial in terms of decreasing the running time com plexity as it is unnecessary to compute finess values for most of the individuals. However, the small rates may

cause to converge to local minimums. Considering the behavior of genetic algorithms according to these rates shows that high probabilities for performing the muta tion function allow for a greater diversity of solutions at very little expense. However, these high probabilities change the concept of genetic algorithms and convert them into random search procedures. The probability of performing the crossover function also has its own diffilties. The high value of the crossover rate helps the evolution progress and causes to it to jump out of local minimums while increasing the chance of con verging to the global minimum. On the other hand, it increases the running time complexity dramatically. A theoretical investigation of the optimal size of popula tion, crossover and mutation rates has been carried out by Goldberge (24) who has found that the best values for the population size is a value base on the length of the solutions. Goldberge showed that a practical crossover rate is in the range of [0.6, 0.95] and for the mutation rate should be in [0.001, 0.2]. In order to evaluate PreRk TAG algorithm, we used Goldberg’s investigation for de termining our GA’s control parameter values. We tested diffrent values for the size of population, crossover and mutation rates on a training set of RNAs. We empirically

determined the best value for “population size” as a func tion of length of RNA sequences. In addition, we found that the best “mutation rate” is 0.2 and “crossover rate” is 0.8. 5. Discussion In this manuscript, we proposed a new genetic algorithm for RNA secondary structure prediction based on a sub class of Tree Adjoining Grammars. This algorithm gives accurate results in a reasonable amount of time and has specifiity and sensitivity comparable with existing tools such as Hotknot, Pknots, and the STAR algorithms. Be cause of the capability of regular ESL-TAG to cover a large class of RNAs' knotted structures, the time complexity of the parsing algorithm damages its integrity, where in our study, we modifid implementation of the parsing algo rithm to reduce running complexity. Since using thermodynamics rules has been demonstrat ed as the most successful approach for prediction of sin gle RNA sequence, we used these rules in finess function. We designed novel ideas for the evolutionary functions which yielded to accurate predictions close to the natural structure. The eligibility of tree adjoining grammar for capturing RNAs with knotted structure and also its ef iient data structure led us to employ this representation of RNA in the genetic algorithm. It is strongly believed this data structure could be improved and extended to ward covering bigger class of knotted structures and a heuristic algorithm like genetic algorithm can use it to predict complicated knotted structures. The algorithm, PreRkTAG, was implemented and tested on a set of 24 diffrent RNA sequences and compared with three well known algorithms. The results of the algorithm show that sensitivity of sev enteen out of twenty-four samples and specifiity of eigh teen out of the twenty-four samples are higher than three well-known algorithms. These results show the practical ity of the algorithm. This algorithm handles certain class of pseudoknots captured by ESL-TAG and in future work, we intend to extend the algorithm for larger class of pseudoknots. Acknowledgements We would like to express our sincere thanks to Profes sor C.W. Pleij for his kindly helps and guidance on dis

cussions that led to an improved understanding of the problem characteristics. The authors are also very grate ful to Professor D.H. Turner for his guidance on solving our biophysical problems. Finally we thank Professors F.H.D. van Batenburg, M. Zuker, R. Markham, and E. Rivas for their helps for comparing our method vs. theirs. The authors would like to thank the anonymous referees for their helpful suggestions. Authors’ Contribution Initial idea of the research was from Torabi Dashti, Ah rabian, Nowzari-Dalini, and Zare-Mirakabad. All authors participated in algorithm design, and the structure and organization of the manuscript. Torabi Dashti and Aghaeepour implemented the algorithms and tested on diffrent data sets. All authors contributed to read and approved the fial manuscript. Financial Disclosure None declared. Funding/Support The authors would like to acknowledge the fiancial support of University of Tehran for this research under grant number 6103010/1/02.

References

1. van Batenburg FH, Gultyaev AP, Pleij CW. An APL-programmed genet ic algorithm for the prediction of RNA secondary structure. J Theor Biol. 1995;174(3):269-80. 2. Hu YJ. GPRM: A genetic programming approach to fiding common RNA secondary structure elements. Nucleic Acids Res. 2003;31(13):3446-9. 3. Lyngso RB, Pedersen CN. RNA pseudoknot prediction in energy based models. J Comput Biol. 2000;7(3-4):409-27. 4. Markham NR, Zuker M. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res. 2005;33(Web Server issue):W577-81. 5. Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006;16(3):270-8. 6. Rivas E, Eddy SR. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol. 1999;285(5):2053-68. 7. Ruan J, Stormo GD, Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots.

Bioinformatics. 2004;20(1):58-66. 8. Zuker M. Mfold web server for nucleic acid folding and hybridiza tion prediction. Nucleic Acids Res. 2003;31(13):3406-15. 9. Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994;22(11):2079-88. 10. Zare-Mirakabad F, Sadeghi M, Ahrabian H, Nowzari-Dalini A. RNA Comp: a new method for RNA secondary structure alignment. MATCH Commun. Math Comput Chem. 2009;61(3):789-816. 11. Gorodkin J, Stricklin SL, Stormo GD. Discovering common stem loop motifs in unaligned RNA sequences. Nucleic Acids Res. 2001;29(10):2135-44. 12. Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD. Identifying con straints on the higher-order structure of RNA: continued develop ment and application of comparative sequence analysis methods. Nucleic Acids Res. 1992;20(21):5785-95. 13. Tahi F, Gouy M, Regnier M. Automatic RNA secondary structure pre diction with a comparative approach. Comput Chem. 2002;26(5):521- 30. 14. Akutsu T. Dynamic programming algorithms for RNA second ary structure prediction with pseudoknots. Discrete App Math. 2000;104(1–3):45-62. 15. Reeder J, Giegerich R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics. 2004;5:104. 16. Uemura Y, Hasegawa A, Kobayashi S, Yokomori T. Tree adjoin ing grammars for RNA structure prediction. Theoret Comp Sci. 1999;210(2):277-303. 17. Zuker M, Jaeger JA, Turner DH. A comparison of optimal and subop timal RNA secondary structures predicted by free energy minimi zation with structures determined by phylogenetic comparison. Nucleic Acids Res. 1991;19(10):2707-14. 18. Walter AE, Turner DH, Kim J, Lyttle MH, Muller P, Mathews DH, et al. Coaxial stacking of helixes enhances binding of oligoribonucleo tides and improves predictions of RNA folding. Proc Natl Acad Sci U S A. 1994;91(20):9218-22. 19. Cai L, Malmberg RL, Wu Y. Stochastic modeling of RNA pseu doknotted structures: a grammatical approach. Bioinformatics. 2003;19(Suppl 1):i66-73. 20. Shapiro BA, Navetta J. A massively parallel genetic algorithm for RNA secondary structure prediction. J Supercomp. 1994;8(3):195-207. 21. Ren J, Rastegari B, Condon A, Hoos HH. HotKnots: heuristic predic tion of RNA secondary structures including pseudoknots. RNA. 2005;11(10):1494-504. 22. Kato Y, Seki H, Kasami T. On the generative power of grammars for RNA secondary structure. IEICE Trans Inf Syst. 2005;88(1):53-64. 23. Joshi AK, Levy LS, Takahashi M. Tree adjunct grammars. J Comp Sys Sci. 1975;10(1):136-63. 24. Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley. 1989 25. Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH, Neilson T, et al. Improved free-energy parameters for predictions of RNA du plex stability. Proc Natl Acad Sci U S A. 1986;83(24):9373-7.