he table below and the javascript calculator following it provide values for the statistical significance of a match between two incongruent phylogenetic trees, reported as P-values. These P-values give the probability that two bifurcating rooted trees, with a given number (or less) of mismatching branches, would match by chance.
The number of incongruent branches is determined relative to the maximum agreement subtree (MAST) between two trees. A MAST is the "core" subtree that is common between two trees. The number of incongruent branches is equal to the minimum number of branches that must be pruned from one of the real trees to get the MAST. An example from John Harshman's analysis of crocodile species is given in the figure below (Harshman et al. 2003).
Two incongruent crocodile phylogenies. The tree at left is based upon morphological data; the tree at right on the molecular sequence of the c-myc proto-oncogene (Harshman et al. 2003). The common MAST is shown in black. According to the distance metric described above, the distance between the two trees is one branch, due to the misplaced Gavialis branch indicated in magenta. The significance of the match between these two incongruent phylogenies is P ≤ 0.00077. Additionally, Harshman et al. performed an independent phylogenetic analysis with mitochondrial genes, which gave exactly the same tree as the c-myc proto-oncogene data. The overall significance for these three independent trees is P ≤ 7.4 × 10-8. |
In the table below, the rows list values for a comparison of two trees with increasing numbers of taxa. The columns list the significance for a given number of differences between the two trees. Incongruency of "1 adjacent" refers to the case where a branch is misplaced by only one adjacent node (i.e., two branches next to each other are swapped relative to the other tree). The remaining columns labelled 1 through 10 refer to the case where x branches or less are misplaced anywhere in the tree. High statistical significance (P < 0.01, or greater than 99% confidence) is indicated by light blue. Statistical significance (P < 0.05, or greater than 95% confidence) is indicated by pink. Equivocal values (0.05 < P < 0.50) are indicated by white. Highly insignificant values (P > 0.50) are indicated by red, and impossible values are colored black.
Number of taxa | Maximum P-value for two trees incongruent by given number of branches: | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
exactly congruent | 1 adjacent | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
4 | 0.067 | 0.20 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
5 | 0.0095 | 0.038 | 0.28 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
6 | 0.0011 | 0.0052 | 0.050 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
7 | 9.6 x 10-5 | 5.8 x 10-4 | 0.0067 | 0.20 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
8 | 7.4 x 10-6 | 5.2 x 10-5 | 6.8 x 10-4 | 0.030 | 0.53 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
9 | 4.9 x 10-7 | 3.9 x 10-6 | 6.2 x 10-5 | 0.0035 | 0.089 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
10 | 2.9 x 10-8 | 2.6 x 10-7 | 4.6 x 10-6 | 3.3 x 10-4 | 0.012 | 0.22 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
11 | 1.5 x 10-9 | 1.5 x 10-8 | 3.0 x 10-7 | 2.7 x 10-5 | 0.0012 | 0.032 | 0.49 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
12 | 7.2 x 10-11 | 8.0 x 10-10 | 1.8 x 10-8 | 1.9 x 10-6 | 1.1 x 10-4 | 0.0037 | 0.076 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 |
13 | 3.1 x 10-12 | 3.8 x 10-11 | 9.1 x 10-10 | 1.2 x 10-7 | 8.3 x 10-6 | 3.5 x 10-4 | 0.0095 | 0.17 | 1.00 | 1.00 | 1.00 | 1.00 |
14 | 1.2 x 10-13 | 1.6 x 10-12 | 4.3 x 10-11 | 6.6 x 10-9 | 5.6 x 10-7 | 2.9 x 10-5 | 9.9 x 10-4 | 0.022 | 0.33 | 1.00 | 1.00 | 1.00 |
15 | 4.6 x 10-15 | 6.6 x 10-14 | 1.8 x 10-12 | 3.3 x 10-10 | 3.3 x 10-8 | 2.1 x 10-6 | 8.7 x 10-5 | 0.0025 | 0.048 | 0.62 | 1.00 | 1.00 |
16 | 1.6 x 10-16 | 2.4 x 10-15 | 5.6 x 10-14 | 1.5 x 10-11 | 1.8 x 10-9 | 1.3 x 10-7 | 6.7 x 10-6 | 2.3 x 10-4 | 0.0056 | 0.095 | 1.00 | 1.00 |
17 | 5.2 x 10-18 | 8.3 x 10-17 | 2.1 x 10-15 | 6.4 x 10-13 | 8.6 x 10-11 | 7.5 x 10-9 | 4.5 x 10-7 | 1.9 x 10-5 | 5.6 x 10-4 | 0.012 | 0.18 | 1.00 |
18 | 1.5 x 10-19 | 2.7 x 10-18 | 7.4 x 10-17 | 2.5 x 10-14 | 3.8 x 10-12 | 3.9 x 10-10 | 2.7 x 10-8 | 1.4 x 10-6 | 4.9 x 10-5 | 0.0013 | 0.024 | 0.32 |
19 | 4.5 x 10-21 | 8.1 x 10-20 | 2.3 x 10-18 | 8.9 x 10-16 | 1.6 x 10-13 | 1.8 x 10-11 | 1.5 x 10-9 | 8.6 x 10-8 | 3.7 x 10-6 | 1.2 x 10-4 | 0.0027 | 0.046 |
20 | 1.2 x 10-22 | 2.3 x 10-21 | 7.3 x 10-20 | 3.0 x 10-17 | 5.9 x 10-15 | 7.8 x 10-13 | 7.3 x 10-11 | 4.9 x 10-9 | 2.5 x 10-7 | 9.2 x 10-6 | 2.5 x 10-4 | 0.0054 |
Number of taxa | ||||||||||||
exact match | 1 adjacent | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
For an exact match between two trees (no incongruence):
P = (2N-2)(N-2)! / (2N-3)!
or
P = 1 / (2N-3)!!
where "!!" is double factorial notation and N = # of taxa. For an incongruency of "1 adjacent" branch:
P = (2N-2)(N-1)! / (2N-3)!
For an incongruency of I branches, misplaced anywhere between two trees:
P ≤ (2N-I-2)(N-I-2)!N! / (2[N-I]-3)!(N-I)!I!
or
P ≤ (N!/(N-I)!I!) / (2[N-I]-3)!!
where N = # of taxa and I = # of incongruent branches.
This last P-value calculation is an upper bound. That is, this P-value is an overestimation, since the actual P-value is very likely to be lower (better). P is the ratio of the maximum number of possible incongruent trees over the total number of possible trees. However, in the final equation the calculated maximum number of incongruent trees includes nonunique trees (i.e., some of the incongruent trees have the same topology and thus are counted more than once). For example, for N = 4 and I = 1, this calculation gives P ≤ 1.3333, while the exact P = 0.73333. At large N and I, P converges on the exact value.
These equations can be extended easily to the case of discrepancies between more than two trees, each of the same number of taxa. The probability that k rooted, binary, N-taxa trees have at most I incongruent branches is:
P ≤ (N!/(N-I)!I!) / ((2[N-I]-3)!!){k - 1}
Equivalently, this is the probability that two or more N-taxa trees will share the same MAST of size N - I or greater. The Javascript calculator above uses this equation to determine its P-values.
I would appreciate hearing from anyone who has any ideas on how to correct for nonunique trees. I independently derived most of these equations in the summer of 2002. Later I discovered via personal correspondence that Mike Steel had also derived these equations and was soon to publish all but the last in an upcoming book (Bryant et al. 2002). It appears that the final equation was independently derived by both me and Mike Steel, and to my knowledge it remains unpublished.
Li, W.-H. (1997). Molecular Evolution. Sunderland, MA, Sinauer Associates. p. 102.
Bryant, D., MacKenzie, A. and Steel, M. (2002). "The size of a maximum agreement subtree for random binary trees." In: Bioconsensus II. DIMACS Series in Discrete Mathematics and Theoretical Computer Science (American Mathematical Society). ed., M.F. Janowitz.
Harshman, J., Huddleston, C. J., Bollback, J. P., Parsons, T. J., and Braun, M. J. (2003). "True and false gharials: a nuclear gene phylogeny of crocodylia." Syst Biol. 52: 386-402. [PubMed]
Home Page | Browse | Search | Feedback | Links | ||
The FAQ | Must-Read Files | Index | Creationism | Evolution | Age of the Earth | Flood Geology | Catastrophism | Debates |