Improvements to the parallel solution based on collective operations for the phylogenetic tree reconstruction bootstrap in PhyML 3.0

Authors

  • Martha Ximena Torres Delgado Universidade Estadual de Santa Cruz

Keywords:

phylogenetic tree reconstruction; parallel processing; MPI

Abstract

Phylogenetics determines the evolutionary relationships between groups of species, through a phylogenetic tree. PhyML is among the main programs for the reconstruction of phylogenetic trees. Bootstrap is a statistical method used to measure the confidence of a given data set, which is usually applied in the analysis of inferred phylogenetic trees. In PhyML this method has two MPI parallel implementations: with point-to-point operations and collective operations. The second version is more efficient than the first, however it has a limitation on the number of bootstrap to be used due to the increase in memory consumption. In order to solve this problem, three proposals were developed. The objectives of this work were to carry out the validation of these versions together with performance tests. The validation showed that the proposed solutions present results equivalent to the point-to-point version. In the performance simulations, two solutions were shown to be superior to the point-to-point version, with the best one achieving gains of 28.46% and 39.64% for 32 and 64 processes, respectively. Therefore, the enhancements allow alternatives to the point-to-point version without limiting memory.

Downloads

Download data is not yet available.

References

ATGC: South of France bioinformatics platform PhyML 3.0 Benchmarks. Disponível em: <http://www.atgc-montpellier.fr/phyml/benchmarks>. Acesso em: 06 jul. 2020.

AVILA, R. B.; NAVAUX, P. O. A.; LOMBARD, P.; LEBRE, A. and DENNEULIN, Y. Performance evaluation of a prototype distributed NFS server In: 16th Symposium on Computer Architecture and High Performance Computing, 2004, Foz do Iguacu, 2004, p. 100-105. doi: 10.1109/SBAC-PAD.2004.33.

FELSENSTEIN, J. Inferring Phylogenies. Sunderland, Massachusetts: Sinauer Associates, 2004. 664 p.

GUINDON, S.; GASCUEL, O. A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood. Systematic Biology, v. 52 n. 5, p. 696-704, 2003. Disponível em: <https://pubmed.ncbi.nlm.nih.gov/14530136/>. Acesso: 06 jul. 2020.

ISMAIL, R,; HAMID, N.; OTHMAN, M. and LATIP, R. Performance analysis of message passing interface collective communication on Intel Xeon quad-core Gigabit Ethernet and InfiniBand clusters. Journal of Computer Science, v. 9, p. 455-462, jan. 2013. Disponível em: <https://thescipub.com/abstract/10.3844/jcssp.2013.455.462/>. Acesso: 06 jul. 2020.

LATHAM, R.; ROSS, R. and THAKUR, R. The Impact of File Systems on MPI-IO Scalability. In: Kranzlmüller D., Kacsuk P., Dongarra J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2004. Lecture Notes in Computer Science, vol 3241. Springer, Berlin, Heidelberg. Disponível em: < https://link.springer.com/chapter/10.1007/978-3-540-30218-6_18 />. Acesso: 06 jul. 2020.

ROBINSON, D.F. and FOULDS, L.R. Comparison of phylogenetic trees. Mathematical Bioscienses, v.53 (1), p. 131-147, 1981.

SORIA-CARRASCO, V.; TALAVERA, G.; IGEA, J. and CASTRESANA, J. The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees. Bioinformatics, v.23 (21), p. 2954-2956, 2007. doi:10.1093/bioinformatics/btm466

TICONA, W. G. C. Algoritmos evolutivos multi-objetivo para a reconstrução de árvores filogenéticas. 2008. 134 f. Tese (Doutorado em Ciências na área de Ciência da Computação e Matemática Computacional) -Instituto de ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos. Disponível em: < https://www.teses.usp.br/teses/disponiveis/55/55134/tde-02042008-142554/publico/tese_waldo_corregida.pdf>. Acesso: 06 jul. 2020.

TREEBASE: A Database of Phylogenetic Knowledge. Disponível em: <http://treebase.org/treebase-web/home.html>. Acesso em: 06 jul. 2020.

TORRES M. and DA SILVA J.O. Parallel Solution Based on Collective Communication Operations for Phylogenetic Bootstrapping in PhyML 3.0. In: Alves R. (eds) Advances in Bioinformatics and Computational Biology. BSB 2018. Lecture Notes in Computer Science, vol 11228. Springer, Cham.

PATTENGALE, N.D.; ALIPOUR M.; BININDA-EMOND, O.R.P.; MORET B.M.E. and STAMATAKIS, A. How Many Bootstrap Replicates Are Necessary?. In: Batzoglou S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science, vol 5541. Springer, Berlin, Heidelberg.

TREEBASE: A Database of Phylogenetic Knowledge. Disponível em: <http://treebase.org/treebase-web/home.html>. Acesso em: 06 jul. 2020.

VERLI, Hugo. Bioinformática: da biologia à flexibilidade molecular. São Paulo: Sbbq, 2014. 282 p. Disponível em: <http http://www.gradadm.ifsc.usp.br/dados/20171/7600011-3/Bioinformatica_1.1.pdf>. Acesso em: 06 jul. 2020.

Published

2021-02-08

Issue

Section

Artigo Científico Original

How to Cite

Improvements to the parallel solution based on collective operations for the phylogenetic tree reconstruction bootstrap in PhyML 3.0 . (2021). Colloquium Exactarum. ISSN: 2178-8332, 12(3), 39-52. https://journal.unoeste.br/index.php/ce/article/view/3597

Similar Articles

1-10 of 21

You may also start an advanced similarity search for this article.