Solution to Challenge Problem 5: Fitting Pagel's (1994) model
The challenge problem was as follows.
Download the following two files:
Use the phytools function fitPagel
to fit Pagel's model for correlated binary-trait
evolution on the tree. What do you find?
Visualize the character data on the tree. What does this tell you about the basis for your result? Do you feel confident that x & y have evolved in an evolutionarily correlated manner?
First, let's load packages:
library(phytools)
Now load data & tree from file:
tree<-read.tree("fitPagel-challenge.tre")
tree
##
## Phylogenetic tree with 64 tips and 63 internal nodes.
##
## Tip labels:
## t1, t2, t3, t4, t5, t6, ...
##
## Rooted; includes branch lengths.
X<-read.csv("fitPagel-challenge.csv",row.names=1)
X
## x y
## t1 a 0
## t2 a 0
## t3 a 0
## t4 a 0
## t5 a 0
## t6 a 0
## t7 a 0
## t8 a 0
## t9 a 0
## t10 a 0
## t11 a 0
## t12 a 0
## t13 a 0
## t14 a 0
## t15 a 0
## t16 a 0
## t17 a 0
## t18 a 0
## t19 a 0
## t20 a 0
## t21 a 0
## t22 a 0
## t23 a 0
## t24 a 0
## t25 a 0
## t26 a 0
## t27 a 0
## t28 a 0
## t29 a 0
## t30 a 0
## t31 a 0
## t32 a 0
## t33 b 0
## t34 b 1
## t35 b 0
## t36 b 1
## t37 b 0
## t38 b 1
## t39 b 0
## t40 b 1
## t41 b 0
## t42 b 1
## t43 b 0
## t44 b 1
## t45 b 0
## t46 b 1
## t47 b 0
## t48 b 1
## t49 b 0
## t50 b 1
## t51 b 0
## t52 b 1
## t53 b 0
## t54 b 1
## t55 b 0
## t56 b 1
## t57 b 0
## t58 b 1
## t59 b 0
## t60 b 1
## t61 b 0
## t62 b 1
## t63 b 0
## t64 b 1
Now, let's apply Pagel's (1994) method using fitPagel
:
x<-setNames(X[,1],rownames(X))
y<-setNames(as.factor(X[,2]),rownames(X))
fit.xy<-fitPagel(tree,x,y)
fit.xy
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -49.7805537 49.6784348 0.1021189 0.0000000
## a|1 149.0361017 -149.1382206 0.0000000 0.1021189
## b|0 0.1021208 0.0000000 -49.7805556 49.6784348
## b|1 0.0000000 0.1021208 149.0361017 -149.1382225
##
## Dependent (x & y) model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 0.00000 0.000 0.0000 0.00000
## a|1 58.75552 -117.511 0.0000 58.75549
## b|0 0.00000 0.000 -186.1582 186.15825
## b|1 0.00000 0.000 186.1582 -186.15824
##
## Model fit:
## log-likelihood AIC
## independent -40.21542 88.43084
## dependent -24.95330 65.90660
##
## Hypothesis test result:
## likelihood-ratio: 30.52424
## p-value: 3.827566e-06
##
## Model fitting method used was fitMk
plot(fit.xy,lwd.by.rate=TRUE)
We can also fit the model with dependent "x"
or dependent
"y"
variables, e.g.:
fit.x<-fitPagel(tree,x,y,dep.var="x")
fit.x
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -49.7805537 49.6784348 0.1021189 0.0000000
## a|1 149.0361017 -149.1382206 0.0000000 0.1021189
## b|0 0.1021208 0.0000000 -49.7805556 49.6784348
## b|1 0.0000000 0.1021208 149.0361017 -149.1382225
##
## Dependent (x only) model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -12.91098 12.91098 0.00000 0.0000000
## a|1 38.87035 -39.78071 0.00000 0.9103622
## b|0 0.00000 0.00000 -12.91098 12.9109836
## b|1 0.00000 0.00000 38.87035 -38.8703515
##
## Model fit:
## log-likelihood AIC
## independent -40.21542 88.43084
## dependent -40.10414 92.20828
##
## Hypothesis test result:
## likelihood-ratio: 0.2225532
## p-value: 0.8946912
##
## Model fitting method used was fitMk
plot(fit.x,lwd.by.rate=TRUE)
fit.y<-fitPagel(tree,x,y,dep.var="y")
fit.y
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -49.7805537 49.6784348 0.1021189 0.0000000
## a|1 149.0361017 -149.1382206 0.0000000 0.1021189
## b|0 0.1021208 0.0000000 -49.7805556 49.6784348
## b|1 0.0000000 0.1021208 149.0361017 -149.1382225
##
## Dependent (y only) model rate matrix:
## a|0 a|1 b|0 b|1
## a|0 -0.1021188 0.0000000 0.1021188 0.0000000
## a|1 980.0758543 -980.1779731 0.0000000 0.1021188
## b|0 0.1021168 0.0000000 -3285.9026832 3285.8005663
## b|1 0.0000000 0.1021168 3284.3732413 -3284.4753581
##
## Model fit:
## log-likelihood AIC
## independent -40.21542 88.43084
## dependent -26.40669 64.81338
##
## Hypothesis test result:
## likelihood-ratio: 27.61746
## p-value: 1.006804e-06
##
## Model fitting method used was fitMk
plot(fit.y,lwd.by.rate=TRUE)
Invariably, these 'dependent' models seem to fit better than our independent models. In particular:
aic<-setNames(c(fit.xy$independent.AIC,
fit.x$dependent.AIC,fit.y$dependent.AIC,
fit.xy$dependent.AIC),c("ind","dep:x","dep:y",
"dep:xy"))
aic.w(aic)
## ind dep:x dep:y dep:xy
## 0.00000471 0.00000071 0.63334516 0.36664941
suggests that the weight of evidence favors a model in which either y depends on x or in which x & y are interdependent.
But are they?
Let's now examine the pattern of character variation on the tree:
dotTree(tree,X,fsize=0.6,ftype="i",pts=F)
Here we see that rather than showing a consistent relationship, character y
merely seems to change a lot in the red ("b"
) clade for character
x, but not at all in the black ("a"
) clade.
This shows that the model is sensitive to finding a significant result in the absence of a true correlation between the characters, merely because of a heterogeneous rate in one character or the other.
This is essentially the result identified by Maddisson & Fitzjohn (2015).
Written by Liam J. Revell. Last updated 28 July 2018.