Exercise 6: Fitting models of discrete character evolution

For this exercise we will fit models of discrete character evolution. We can use a dataset from a paper by Brandley et al. (2008).

First, load the data files into R. You can download the files here:

  1. brandley_table.csv
  2. squamate.tre
library(ape)
library(phytools)
sqData<-read.csv("brandley_table.csv")
sqTree<-read.nexus("squamate.tre")
plotTree(sqTree,type="fan",lwd=1,fsize=0.5)

plot of chunk unnamed-chunk-1

The data set that we have is composed of measurements for continuous & discrete characters. E.g.:

head(sqData)
##               Species       Family   SVL   TL   SE FLL HLL Fingers Toes N
## 1 Agamodon anguliceps Amphisbaenia  74.7  8.3  3.1 0.0   0       0    0 1
## 2    Amphisbaena alba Amphisbaenia 474.0 40.1 12.6 0.0   0       0    0 6
## 3       Bipes biporus Amphisbaenia 167.2 16.0  3.6 6.1   0       5    0 6
## 4   Bipes caniculatus Amphisbaenia 182.5 29.6  4.5 6.5   0       4    0 3
## 5   Bipes tridactylus Amphisbaenia 127.5 36.0  3.7 6.0   0       3    0 2
## 6     Blanus cinereus Amphisbaenia 174.9 21.8  4.1 0.0   0       0    0 5
##     Ecology            Source    PC1    PC2    PC3 Morph Ecology.1
## 1 burrowing Zug et al. (2001) -0.880 -0.410 -2.726     1         1
## 2 burrowing Zug et al. (2001) -1.078  2.057 -1.055     1         1
## 3 burrowing Zug et al. (2001) -0.021  0.566 -2.780     1         1
## 4 burrowing Zug et al. (2001) -0.220  0.634 -1.932     1         1
## 5 burrowing Zug et al. (2001) -0.418  0.028 -1.522     1         1
## 6 burrowing Zug et al. (2001) -1.132  0.296 -1.673     1         1

Now we're ready to fit our models. We will do this using the function fitDiscrete in the R package 'geiger'.

library(geiger)

We will model the number of digits in the hindfoot. In class, we can try to use fitMk, just because it runs a little faster (though it may be less accurate as a consequence):

species<-sqData$Species
species<-gsub(" ","_",species)
toes<-setNames(as.character(round(sqData$Toes)),
    species)
fitER<-fitDiscrete(sqTree,toes,model="ER")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
print(fitER,digits=3)
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##               0         1         2         3         4         5
##     0 -0.004580  0.000916  0.000916  0.000916  0.000916  0.000916
##     1  0.000916 -0.004580  0.000916  0.000916  0.000916  0.000916
##     2  0.000916  0.000916 -0.004580  0.000916  0.000916  0.000916
##     3  0.000916  0.000916  0.000916 -0.004580  0.000916  0.000916
##     4  0.000916  0.000916  0.000916  0.000916 -0.004580  0.000916
##     5  0.000916  0.000916  0.000916  0.000916  0.000916 -0.004580
## 
##  model summary:
##  log-likelihood = -235.614300
##  AIC = 473.228600
##  AICc = 473.244225
##  free parameters = 1
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
plot(fitER)
title(main="Fitted 'ER' model\nfor number of digits in hindfoot")

plot of chunk unnamed-chunk-4

Or, we can try an ordered model:

model<-matrix(c(0,1,0,0,0,0,
    2,0,3,0,0,0,
    0,4,0,5,0,0,
    0,0,6,0,7,0,
    0,0,0,8,0,9,
    0,0,0,0,10,0),6,6)
rownames(model)<-colnames(model)<-0:5
fitOrdered<-fitDiscrete(sqTree,toes,model=model)
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
## Warning in fitDiscrete(sqTree, toes, model = model): Parameter estimates appear at bounds:
##  q13
##  q14
##  q15
##  q16
##  q24
##  q25
##  q26
##  q31
##  q35
##  q36
##  q41
##  q42
##  q46
##  q51
##  q52
##  q53
##  q61
##  q62
##  q63
##  q64
print(fitOrdered,digits=3)
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##               0         1         2       3       4      5
##     0 -5.34e-17  5.34e-17  0.00e+00  0.0000  0.0000  0.000
##     1  2.11e-02 -2.11e-02  1.09e-15  0.0000  0.0000  0.000
##     2  0.00e+00  8.02e-02 -3.06e+00  2.9843  0.0000  0.000
##     3  0.00e+00  0.00e+00  4.22e+00 -4.2527  0.0295  0.000
##     4  0.00e+00  0.00e+00  0.00e+00  0.0649 -4.2881  4.223
##     5  0.00e+00  0.00e+00  0.00e+00  0.0000  0.2411 -0.241
## 
##  model summary:
##  log-likelihood = -203.914844
##  AIC = 427.829687
##  AICc = 428.720376
##  free parameters = 10
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.01
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
plot(fitOrdered,show.zeros=FALSE,signif=4)
title(main="Fitted ordered model\nfor number of digits in hindfoot")

plot of chunk unnamed-chunk-5

Finally, the most parameter rich model:

fitARD<-fitDiscrete(sqTree,toes,model="ARD")
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
print(fitARD,digits=3)
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##               0         1         2         3         4         5
##     0 -8.23e-07  7.47e-09  2.68e-07  2.20e-07  1.30e-07  1.98e-07
##     1  1.51e-02 -1.51e-02  7.90e-07  4.34e-07  2.84e-07  5.33e-07
##     2  9.13e-04  3.14e-02 -3.26e-02  1.39e-06  1.29e-05  2.95e-04
##     3  1.39e-02  5.81e-07  5.14e-03 -4.81e-02  1.38e-05  2.90e-02
##     4  2.23e-08  2.10e-02  1.35e-05  1.43e-02 -8.87e-02  5.35e-02
##     5  2.57e-13  2.97e-07  9.17e-04  4.13e-04  4.57e-03 -5.90e-03
## 
##  model summary:
##  log-likelihood = -194.252862
##  AIC = 448.505723
##  AICc = 456.699556
##  free parameters = 30
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.01
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
plot(fitARD,show.zeros=FALSE,signif=4)
title(main="Fitted 'ARD' model\nfor number of digits in hindfoot")

plot of chunk unnamed-chunk-6

And, the AIC values (& weights) from our fitted models:

aicc<-setNames(
    c(fitER$opt$aicc,fitOrdered$opt$aicc,
    fitARD$opt$aicc),
    c("ER","Ordered","ARD"))
aic.w(aicc)
##         ER    Ordered        ARD 
## 0.00000000 0.99999916 0.00000084

This result suggests that a model in which digits can be lost & regained, but in an ordered fashion, fits best by far.

Written by Liam J. Revell & Luke J. Harmon. Last updated 11 December 2017.