On the Scalability of Computing Triplet and Quartet Distances

Morten Kragelund Holt, Jens Johansen, and Gerth Stølting Brodal

In Proc. 16th Workshop on Algorithm Engineering and Experiments, pages 9-19, 2014.

Abstract

In this paper we present an experimental evaluation of the algorithms by Brodal et al. [SODA 2013] for computing the triplet and quartet distance measures between two leaf labelled rooted and unrooted trees of arbitrary degree, respectively. The algorithms count the number of rooted tree topologies over sets of three leaves (triplets) and unrooted tree topologies over four leaves (quartets), respectively, that have different topologies in the two trees.

The algorithms by Brodal et al. maintain a long sequence of variables (hundreds for quartets) for counting different cases to be considered by the algorithm, making it unclear if the algorithms would be of theoretical interest only. In our experimental evaluation of the algorithms the typical overhead per node is about 2 KB and 10 KB per node in the input trees for triplet and quartet computations, respectively. This allows us to compute the distance measures for trees with up to millions of nodes. The limiting factor is the amount of memory available. With 31 GB of memory all our input instances can be solved within a few minutes.

In the algorithm by Brodal et al. a few choices were made, where alternative solutions possibly could improve the algorithm, in particular for quartet distance computations. For quartet computations we expand the algorithm to also consider alternative computations, and make two observations: First we observe that the running time can be improved from O(max(d1, d2) n lg n) to O(min(d1, d2) n lg n), where n is the number of leaves in the two trees, and d1 and d2 are the maximum degrees of the nodes in the two trees, respectively. Secondly, by taking a different approach to counting the number of disagreeing quartets we can reduce the number of calculations needed to calculate the quartet distance, improving both the running time and the space requirement by our algorithm by a constant factor.

Copyright notice

Copyright © 2014 by the Society for Industrial and Applied Mathematics.

Online version

alenex14.pdf (495 Kb)

DOI

10.1137/1.9781611973198.2

Slides

alenex14.pdf (1016 Kb), alenex14.pptx (692 Kb)

BIBTEX entry

@inproceedings{alenex14,
  author = "Morten Kragelund Holt and Jens Johansen and Gerth St{\o}lting Brodal",
  booktitle = "Proc. 16th Workshop on Algorithm Engineering and Experiments",
  doi = "10.1137/1.9781611973198.2",
  isbn = "978-1-61197-319-8",
  pages = "9-19",
  title = "On the Scalability of Computing Triplet and Quartet Distances",
  year = "2014"
}