Abstract
One of the more important problems in Bioinformatics and Computational Biology is the search and reconstruction of the best phylogenetic tree that explains, in the best way possible, the evolution of species from a given dataset. Phylogenetic inference is considered as a NP-hard problem by the complex exploration of the tree space of topologies possible, that increases exponentially with the number of species in the input dataset, becoming in an ideal study to be addresses with metaheuristics. The phylogenetic inference can be formulated as a bi-objective optimization problem, having two objectives (Maximum Likelihood and Maximum parsimony) to be optimized at the same time.
For this reason we have applied one of the most popular multi-objective metaheuristics, the Nondominated Sorting Genetic Algorithm-II (NSGA-II) adapted to phylogenetic inference problem, incorporating new exploration strategies with the aim to evaluate the performance of this algorithm.
We have integrated the features of the multi-objective optimization framework jMetalCpp, the set of C++ libraries for Bioinformatics BIO++ and the phylogenetic functions of the Phylogenetic Likelihood Library (PLL).
The obtained results show a high-competitive performance under a biological and optimization perspective.
References
Beume, N., Naujoks, B., and Emmerich, M. (2007). Sms-emoa: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181(3), 1653 – 1669.
Brent, R.P. (1973) Algorithms for minimization without derivatives. Engelwood Clifts, New Jersey.
Cancino, W. and Delbem, A. C. (2007). A multi-objective evolutionary approach for phylogenetic inference. In S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu, and T. Murata, editors, Evolutionary Multi-Criterion Optimization, volume 4403 of Lecture Notes in Computer Science, pages 428–442. Springer Berlin Heidelberg.
Congdon, C.B. (2002) Gaphyl: An evolutionary algorithms approach for the study of natural evolution. GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1057-1064.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2), 182–197.
Dutheil, J., Gaillard, S., Bazin, E., Gl ´emin, S., Ranwez, V., Galtier, N., and Belkhir, K. (2006). Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC bioinformatics, 7, 188.
Felsenstein, J. (2004). Inferring Phylogenies. Palgrave Macmillan. Flouri, T., Darriba, D., and Aberer, A. J. (2014). The Phylogenetic Likelihood Library.
Flouri, T., Darriba, D. & Aberer, A.J. (2014) The Phylogenetic Likelihood Library.
Göeffon, A., Richer, J.-M., and Hao, J.-K. (2008). Progressive tree neighborhood applied to the maximum parsimony problem. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 5(1), 136–45.
Guindon, S. et al., 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic biology, 59(3), pp.307–21.
Handl, J., Kell, D. B., and Knowles, J. (2007). Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 4(2), 279–92.
Knowles, J.; Corne, D., The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimization, Evolutionary Computation, 1999. Proceedings of the 1999 Congress on , vol.1, no., pp.,105 Vol. 1, 1999.
Katoh, K., Kuma, K.i. & Miyata, T. (2001) Genetic algorithm-based maximum-likelihood analysis for molecular phylogeny. Journal of Molecular Evolution, 53, 477-484.
Lemmon, A.R. & Milinkovitch, M.C. (2002) The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation.
Lewis, P. O., A Genetic Algorithm for Maximum-Likelihood Phylogeny Inference Using Nucleotide Sequence Data, pp. 277–283, 1996.
López-Camacho, E., García Godoy, M. J., Nebro, A. J., and Aldana-Montes, J. F. (2014). jMetalCpp: optimizing molecular docking problems with a C++ metaheuristic framework. Bioinformatics (Oxford, England), 30(3), 437–8.
Matsuda, H. (1995) Construction of phylogenetic trees from amino acid sequences using a genetic algorithm. Sciences, Computer, p. 560.
Nebro, Antonio J., Durillo, Juan J., Luna, Francisco, Dorronsoro, Bernabé, Alba, Enrique, MOCell: A cellular genetic algorithm for multiobjective optimization, International Journal of Intelligent Systems, vol 24, pp 726--746, 2009.
Poladian, L., Jermiin, L.: Multi-Objective Evolutionary Algorithms and Phylogenetic Inference with Multiple Data Sets. Soft Computing 10(4), p. 359-368. (2006).
Press, W., Flannery, B. Teukolsky, S. y Vetterling, W.T., 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge.
Santander-Jiménez, S. and Vega-Rodríguez, M. A. (2013a). Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference. Biosystems, 114(1), 39 – 55.
Santander-Jiménez, S. and Vega-Rodríguez, M. A. (2013b). A multiobjective proposal based on the firefly algorithm for inferring phylogenies. In L. Vanneschi, W. Bush, and M. Giacobini, editors, Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, volume 7833 of Lecture Notes in Computer Science, pages 141–152. Springer Berlin Heidelberg.
Santander-Jiménez, S. & Vega-Rodríguez, M.a. (2014) A hybrid approach to parallelize a fast non-dominated sorting genetic algorithm for phylogenetic inference. Concurrency and Computation: Practice and Experience.
Sheneman, L. (2005) A Survey of Evolutionary Crossover Operators as Applied to Phylogenetic Inferencing. pp. 1-13.
Srinivas, N. and Deb, K. “Multi-objective function optimization using non-dominated sorting genetic algorithms,” Evolutionary Computation, vol. 2, no. 3, pp. 221–248, 1994.
Stamatakis, A. (2004). Distributed and Parallel Algorithms and Systems for Inference of Huge Phylogenetic Trees based on the Distributed and Parallel Algorithms and Systems for Inference of Huge Phylogenetic Trees based on the Maximum Likelihood Method. Ph.D. thesis.
Steel, M. and Penny, D. (2000). Parsimony, Likelihood, and the Role of Models in Molecular Phylogenetics. Molecular biology and evolution, pages 839–850.
Tuffley, C. & Steel, M. (1997) Links between Maximun Likelihood and Maximun Parsimony under a simple model of site substitution. 59, 581-607.
Zhang, Q. and Li, H. (2007). Moea/d: A multiobjective evolutionary algorithm based on decomposition. Evolutionary Computation, IEEE Transactions on, 11(6), 712–731.
Zwickl, D.J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under maximum likelihood criterion. Ph.D. Thesis.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright (c) 2019 Cristian Gabriel Zambrano Vega