Resumen
Uno de los problemas más importantes en la Bioinformática y la Biología Computacional es la búsqueda y reconstrucción de árboles filogenéticos, que describan, lo más realmente posible, la evolución de las especies.La Inferencia filogenética es considerada como un problema de complejidad NP-completo por la exploración del espacio de búsqueda conformado por todas las posibles topologías existentes según el número de especies en análisis, cuyo tamaño incrementa exponencialmente por cada una de ellas, convirtiéndolo en un caso de estudio para ser abordado con técnicas metaheurísticas. El problema de la inferencia filogenética se puede formular en base a dos objetivos a optimizar de forma simultánea (La Máxima Verosimitud y la Máxima Parsimonia).
Por esta razón hemos adaptado una de las técnicas metaheurísticas de mayor referencia en el campo de la optimización multiobjetivo, el algoritmo Nondominated Sorting Genetic Algorithm-II (NSGA-II) a la inferencia de árboles filogenéticos incorporando nuevas estrategias de exploración, con el objetivo de conocer cuál es su rendimiento al intentar resolver este tipo de problemas.
Para esta implementación hemos integrado las funcionalidades del framework de optimización multiobjetivo jMetalCpp, el conjunto de librerías bioinformáticas BIO++ y la funciones filogenéticas de la librería PLL (Phylogenetic Likelihood Library).
Los resultados obtenidos demuestran un rendimiento competitivo tanto bajo un enfoque biológico como de optimización frente a los resultados publicados en el estado del arteCitas
Beume, N., Naujoks, B., and Emmerich, M. (2007). Sms-emoa: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181(3), 1653 – 1669.
Brent, R.P. (1973) Algorithms for minimization without derivatives. Engelwood Clifts, New Jersey.
Cancino, W. and Delbem, A. C. (2007). A multi-objective evolutionary approach for phylogenetic inference. In S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu, and T. Murata, editors, Evolutionary Multi-Criterion Optimization, volume 4403 of Lecture Notes in Computer Science, pages 428–442. Springer Berlin Heidelberg.
Congdon, C.B. (2002) Gaphyl: An evolutionary algorithms approach for the study of natural evolution. GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1057-1064.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2), 182–197.
Dutheil, J., Gaillard, S., Bazin, E., Gl ´emin, S., Ranwez, V., Galtier, N., and Belkhir, K. (2006). Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC bioinformatics, 7, 188.
Felsenstein, J. (2004). Inferring Phylogenies. Palgrave Macmillan. Flouri, T., Darriba, D., and Aberer, A. J. (2014). The Phylogenetic Likelihood Library.
Flouri, T., Darriba, D. & Aberer, A.J. (2014) The Phylogenetic Likelihood Library.
Göeffon, A., Richer, J.-M., and Hao, J.-K. (2008). Progressive tree neighborhood applied to the maximum parsimony problem. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 5(1), 136–45.
Guindon, S. et al., 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic biology, 59(3), pp.307–21.
Handl, J., Kell, D. B., and Knowles, J. (2007). Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 4(2), 279–92.
Knowles, J.; Corne, D., The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimization, Evolutionary Computation, 1999. Proceedings of the 1999 Congress on , vol.1, no., pp.,105 Vol. 1, 1999.
Katoh, K., Kuma, K.i. & Miyata, T. (2001) Genetic algorithm-based maximum-likelihood analysis for molecular phylogeny. Journal of Molecular Evolution, 53, 477-484.
Lemmon, A.R. & Milinkovitch, M.C. (2002) The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation.
Lewis, P. O., A Genetic Algorithm for Maximum-Likelihood Phylogeny Inference Using Nucleotide Sequence Data, pp. 277–283, 1996.
López-Camacho, E., García Godoy, M. J., Nebro, A. J., and Aldana-Montes, J. F. (2014). jMetalCpp: optimizing molecular docking problems with a C++ metaheuristic framework. Bioinformatics (Oxford, England), 30(3), 437–8.
Matsuda, H. (1995) Construction of phylogenetic trees from amino acid sequences using a genetic algorithm. Sciences, Computer, p. 560.
Nebro, Antonio J., Durillo, Juan J., Luna, Francisco, Dorronsoro, Bernabé, Alba, Enrique, MOCell: A cellular genetic algorithm for multiobjective optimization, International Journal of Intelligent Systems, vol 24, pp 726--746, 2009.
Poladian, L., Jermiin, L.: Multi-Objective Evolutionary Algorithms and Phylogenetic Inference with Multiple Data Sets. Soft Computing 10(4), p. 359-368. (2006).
Press, W., Flannery, B. Teukolsky, S. y Vetterling, W.T., 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge.
Santander-Jiménez, S. and Vega-Rodríguez, M. A. (2013a). Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference. Biosystems, 114(1), 39 – 55.
Santander-Jiménez, S. and Vega-Rodríguez, M. A. (2013b). A multiobjective proposal based on the firefly algorithm for inferring phylogenies. In L. Vanneschi, W. Bush, and M. Giacobini, editors, Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, volume 7833 of Lecture Notes in Computer Science, pages 141–152. Springer Berlin Heidelberg.
Santander-Jiménez, S. & Vega-Rodríguez, M.a. (2014) A hybrid approach to parallelize a fast non-dominated sorting genetic algorithm for phylogenetic inference. Concurrency and Computation: Practice and Experience.
Sheneman, L. (2005) A Survey of Evolutionary Crossover Operators as Applied to Phylogenetic Inferencing. pp. 1-13.
Srinivas, N. and Deb, K. “Multi-objective function optimization using non-dominated sorting genetic algorithms,” Evolutionary Computation, vol. 2, no. 3, pp. 221–248, 1994.
Stamatakis, A. (2004). Distributed and Parallel Algorithms and Systems for Inference of Huge Phylogenetic Trees based on the Distributed and Parallel Algorithms and Systems for Inference of Huge Phylogenetic Trees based on the Maximum Likelihood Method. Ph.D. thesis.
Steel, M. and Penny, D. (2000). Parsimony, Likelihood, and the Role of Models in Molecular Phylogenetics. Molecular biology and evolution, pages 839–850.
Tuffley, C. & Steel, M. (1997) Links between Maximun Likelihood and Maximun Parsimony under a simple model of site substitution. 59, 581-607.
Zhang, Q. and Li, H. (2007). Moea/d: A multiobjective evolutionary algorithm based on decomposition. Evolutionary Computation, IEEE Transactions on, 11(6), 712–731.
Zwickl, D.J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under maximum likelihood criterion. Ph.D. Thesis.
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.
Derechos de autor 2019 Cristian Zambrano-Vega, Antonio J. Nebro, José F. Aldana-Montes