Exercise 3: Phylogenetically independent contrasts

In this tutorial we will learn to apply the independent contrasts method of Felsenstein (1985) for estimating the evolutionary correlation between characters.

Felsenstein (1985) identified (and, to a large extent, solved) a problem that had previously recognized, but that was underappreciated: and that is, that the data for species cannot be treated as independent data points from the point of view of statistical analysis.

Fitting a linear regression model with contrasts in R

To learn how to use the contrasts method to fit linear models in R, we'll first need to learn some basics in linear model.

Here, I will load some data that we have already seen from lizards in the neotropical clade Anolis.

The data files we will use can be downloaded here:

  1. anole.data.csv.
  2. Anolis.tre.
obj<-read.csv("anole.data.csv",row.names=1)
obj
##                     SVL      HL     HLL     FLL     LAM      TL
## ahli            4.03913 2.88266 3.96202 3.34498 2.86620 4.50400
## alayoni         3.81570 2.70212 3.27950 2.80245 3.07527 4.07265
## alfaroi         3.52665 2.37816 3.30542 2.48366 2.73387 4.41601
## aliniger        4.03656 2.89884 3.64623 3.15908 3.15677 4.54173
## allisoni        4.37539 3.35896 3.96069 3.44620 3.23921 5.05911
## allogus         4.04014 2.86103 3.94018 3.33829 2.80827 4.52189
## altitudinalis   3.84299 2.85273 3.25665 2.88466 3.19846 4.16762
## alumina         3.58894 2.41783 3.44101 2.63466 2.69425 4.66676
## alutaceus       3.55489 2.43405 3.35110 2.56994 2.78245 4.55473
## angusticeps     3.78860 2.69018 3.23424 2.74548 2.86502 4.17223
## argenteolus     3.97131 2.75342 3.85752 3.26076 3.25025 4.75503
## argillaceus     3.75787 2.61359 3.35404 2.89988 3.02318 4.37048
## armouri         4.12168 3.06894 3.98845 3.37523 2.86689 4.71407
## bahorucoensis   3.82745 2.77540 3.69687 2.91831 2.80896 4.76335
## baleatus        5.05306 3.87847 4.73633 4.24528 3.44588 5.70632
## baracoae        5.04278 3.92857 4.72552 4.19824 3.69051 5.78458
## barahonae       5.07696 3.88092 4.76629 4.25025 3.43950 5.69561
## barbatus        5.00395 4.07397 4.51634 4.19162 3.39410 5.03053
## barbouri        3.66393 2.46801 3.45810 2.84025 2.41364 4.55342
## bartschi        4.28055 3.09878 4.18102 3.56553 3.28801 5.06832
## bremeri         4.11337 2.86044 3.90039 3.30585 2.97009 4.72996
## breslini        4.05111 3.00184 3.94958 3.33550 2.74677 4.82870
## brevirostris    3.87415 2.63920 3.66806 3.19585 3.03143 4.34161
## caudalis        3.91174 2.69233 3.64868 3.19965 2.94333 4.36260
## centralis       3.69794 2.52205 3.27249 2.80047 2.87894 4.38583
## chamaeleonides  5.04235 4.12856 4.52661 4.18213 3.51925 5.03155
## chlorocyanus    4.27545 3.07128 3.92347 3.37616 3.32593 4.97128
## christophei     3.88465 2.69607 3.74315 3.24156 2.98497 4.52612
## clivicola       3.75873 2.62919 3.59003 2.91615 2.85538 4.60138
## coelestinus     4.29797 3.12764 3.96332 3.41641 3.20854 5.04332
## confusus        3.93844 2.75838 3.76970 3.18221 2.85608 4.42371
## cooki           4.09154 2.93022 3.92244 3.30786 3.01479 4.89025
## cristatellus    4.18982 3.02450 4.01065 3.45393 3.08046 4.76218
## cupeyalensis    3.46201 2.31221 3.17127 2.43147 2.56376 4.52332
## cuvieri         4.87501 3.78908 4.69174 4.13141 3.39324 5.65807
## cyanopleurus    3.63016 2.46678 3.48687 2.67912 2.72007 4.74826
## cybotes         4.21098 3.16044 4.08094 3.50367 2.91083 4.85735
## darlingtoni     4.30204 3.21286 3.80977 3.33861 3.13549 4.49981
## distichus       3.92880 2.71507 3.74769 3.28819 3.03557 4.33953
## dolichocephalus 3.90855 2.88824 3.71678 2.89803 2.94351 4.87903
## equestris       5.11399 3.97656 4.75764 4.26757 3.72262 5.75012
## etheridgei      3.65799 2.50607 3.68383 2.89028 2.76044 4.51655
## eugenegrahami   4.12850 2.81780 4.08143 3.49302 3.22573 4.86316
## evermanni       4.16561 3.00815 3.95661 3.43032 3.31310 4.76766
## fowleri         4.28878 3.09388 4.07720 3.53726 2.97009 5.30472
## garmani         4.76947 3.60137 4.51390 3.97805 3.30958 5.52986
## grahami         4.15427 3.04139 3.92669 3.38041 3.28017 4.81155
## guafe           3.87746 2.67639 3.70675 3.15604 2.88914 4.35295
## guamuhaya       5.03695 4.11969 4.58159 4.14393 3.59647 5.15329
## guazuma         3.76388 2.65309 3.05933 2.58081 2.93307 3.72491
## gundlachi       4.18810 3.05589 4.10680 3.48066 2.79925 4.88040
## haetianus       4.31654 3.29590 4.22252 3.61242 2.93190 5.03960
## hendersoni      3.85983 2.81146 3.69235 2.89748 2.90119 4.86071
## homolechis      4.03281 2.83492 3.84678 3.29391 2.93592 4.52981
## imias           4.09969 2.85293 3.98565 3.41402 2.94375 4.65242
## inexpectatus    3.53744 2.42414 3.32925 2.46873 2.74432 4.60069
## insolitus       3.80047 2.66088 3.30379 2.75910 2.75087 3.93131
## isolepis        3.65709 2.60933 3.10660 2.67816 3.00959 3.84291
## jubar           3.95260 2.76726 3.75630 3.21232 2.91083 4.41338
## krugi           3.88650 2.75824 3.74280 3.05050 2.92595 4.89789
## lineatopus      4.12861 3.08489 4.00563 3.40064 2.86179 4.83357
## longitibialis   4.24210 3.13074 4.16673 3.59074 2.85046 4.97288
## loysiana        3.70124 2.55961 3.29042 2.86581 2.90001 4.07280
## lucius          4.19891 3.01269 4.08725 3.54050 3.31943 4.89860
## luteogularis    5.10109 3.95631 4.76912 4.27415 3.69997 5.73405
## macilentus      3.71576 2.50716 3.50014 2.69124 2.70694 4.67887
## marcanoi        4.07948 2.98856 3.93748 3.33499 2.93827 4.81486
## marron          3.83181 2.66263 3.62530 3.15137 2.92108 4.28998
## mestrei         3.98715 2.78904 3.84586 3.25122 2.89939 4.51303
## monticola       3.77061 2.61602 3.77539 2.99206 2.85608 4.71892
## noblei          5.08347 3.98972 4.77654 4.26465 3.74571 5.81288
## occultus        3.66305 2.52932 2.94088 2.49856 2.75722 3.64910
## olssoni         3.79390 2.53732 3.62697 2.82685 2.68045 4.93759
## opalinus        3.83838 2.67525 3.61507 3.07685 3.03601 4.41582
## ophiolepis      3.63796 2.43599 3.33231 2.72045 2.61274 4.49759
## oporinus        3.84567 2.71668 3.26919 2.87864 3.04452 4.14313
## paternus        3.80296 2.68938 3.35633 2.82450 2.99523 4.38717
## placidus        3.77397 2.62394 3.03639 2.66671 2.79536 3.84734
## poncensis       3.82038 2.63844 3.54908 2.90001 2.63391 4.78582
## porcatus        4.25899 3.23143 3.83813 3.29191 3.35154 4.94983
## porcus          5.03803 4.10583 4.51046 4.14064 3.50572 5.20247
## pulchellus      3.79902 2.72121 3.55535 2.90369 2.85477 4.76540
## pumilis         3.46686 2.28442 3.03187 2.59615 2.81442 4.13262
## quadriocellifer 3.90162 2.69606 3.70555 3.15652 2.84395 4.41491
## reconditus      4.48261 3.35907 4.41143 3.79420 3.05076 5.23530
## ricordii        5.01396 3.84360 4.72415 4.20138 3.41388 5.65557
## rubribarbus     4.07847 2.89425 3.96135 3.35641 2.86751 4.56108
## sagrei          4.06716 2.83515 3.85786 3.24267 2.91872 4.77603
## semilineatus    3.69663 2.56225 3.56689 2.71825 2.70627 4.81083
## sheplani        3.68292 2.49486 2.85877 2.45531 2.71918 3.74943
## shrevei         3.98300 2.91355 3.81600 3.20757 2.76989 4.65644
## singularis      4.05800 2.91768 3.68946 3.17729 3.14975 4.59542
## smallwoodi      5.03510 3.95492 4.75021 4.19821 3.73134 5.74107
## strahmi         4.27427 3.14616 4.21885 3.64446 2.99069 4.94861
## stratulus       3.86988 2.71712 3.57823 3.09671 3.09993 4.36548
## valencienni     4.32152 3.19328 3.83121 3.38065 3.20445 4.61083
## vanidicus       3.62621 2.49031 3.31618 2.53310 2.57858 4.54037
## vermiculatus    4.80285 3.65602 4.61175 3.92550 3.30124 5.55271
## websteri        3.91655 2.69056 3.65170 3.19690 2.97354 4.31559
## whitemani       4.09748 3.04389 3.97375 3.36546 2.78226 4.83677

Now we can fit a model in which hindlimb length (HL) varies as a function of overall body size (SVL).

Without taking phylogeny into account, these characters do seem to be correlated (as one might expect):

plot(HL~SVL,data=obj,
    xlab="log(SVL cm)",
    ylab="log(HL cm)",pch=21,bg="grey",
    cex=1.5)

plot of chunk unnamed-chunk-2

Now let's fit our OLS regression model to see:

fit.ols<-lm(HL~SVL,data=obj)
fit.ols
## 
## Call:
## lm(formula = HL ~ SVL, data = obj)
## 
## Coefficients:
## (Intercept)          SVL  
##      -1.356        1.054
summary(fit.ols)
## 
## Call:
## lm(formula = HL ~ SVL, data = obj)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.177180 -0.050404 -0.004079  0.036935  0.170580 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.35557    0.06543  -20.72   <2e-16 ***
## SVL          1.05378    0.01588   66.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07003 on 98 degrees of freedom
## Multiple R-squared:  0.9782, Adjusted R-squared:  0.978 
## F-statistic:  4403 on 1 and 98 DF,  p-value: < 2.2e-16
plot(HL~SVL,data=obj,
    xlab="log(SVL cm)",
    ylab="log(HL cm)",pch=21,bg="grey",
    cex=1.5)
abline(fit.ols,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-3

However, we cannot forget that when our data are phylogenetic, the assumption independent and identical distribution of the residual error does not hold. Consequently, we need to take the phylogeny into account. One way to do this is by using PICs:

library(ape)
anolis.tree<-read.tree("Anolis.tre")
hl<-setNames(obj[,"HL"],rownames(obj))
svl<-setNames(obj[,"SVL"],rownames(obj))
pic.hl<-pic(hl,anolis.tree)
pic.svl<-pic(svl,anolis.tree)
fit.pic<-lm(pic.hl~pic.svl+0)
fit.pic
## 
## Call:
## lm(formula = pic.hl ~ pic.svl + 0)
## 
## Coefficients:
## pic.svl  
##   1.031
summary(fit.pic)
## 
## Call:
## lm(formula = pic.hl ~ pic.svl + 0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.271892 -0.031462  0.001936  0.043760  0.234712 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## pic.svl  1.03120    0.01902   54.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07017 on 98 degrees of freedom
## Multiple R-squared:  0.9677, Adjusted R-squared:  0.9674 
## F-statistic:  2941 on 1 and 98 DF,  p-value: < 2.2e-16
plot(pic.hl~pic.svl,xlab="PICs for log(SVL)",
    ylab="PICs for log(HL)",bg="grey",
    cex=1.5,pch=21)
abline(fit.pic,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-4

So we can see that in this case, there is not a huge effect of taking phylogeny into consideration.

Exploring the effect of ignoring the phylogeny in statistical analyses

Although in our previous example taking the phylogeny into account (or not) was of relatively little consequence. However, it is easy to imagine (& simulate) conditions under which taking the phylogeny into account when fitting a regression model can be much more important.

Let's do just that.

First load phytools:

## load phytools
library(phytools)

Next, I will set the seed; however, it will be more interesting if each of us use different seeds so that we might obtain different results:

set.seed(21) ## set your own seed - just choose an integer
## simulate a birth-death tree with relatively high extinction
tree<-NULL
while(is.null(tree)) 
    tree<-pbtree(n=100,b=1,d=0.6,extant.only=T)
plotTree(tree,ftype="off")

plot of chunk unnamed-chunk-6

## simulate uncorrelated Brownian evolution
x<-fastBM(tree)
y<-fastBM(tree)
par(mar=c(5.1,4.1,2.1,2.1))
plot(x,y,cex=1.5,pch=21,bg="grey")
fit.ols<-lm(y~x)
abline(fit.ols,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-6

fit.ols
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      0.6084       0.2717
summary(fit.ols)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6293 -1.7604 -0.1977  1.6363  3.9555 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.60838    0.29285   2.077   0.0404 *  
## x            0.27167    0.06541   4.153    7e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.947 on 98 degrees of freedom
## Multiple R-squared:  0.1497, Adjusted R-squared:  0.141 
## F-statistic: 17.25 on 1 and 98 DF,  p-value: 7.002e-05

Remember, these data were simulated in the absence of an evolutionary correlation between x & y!

We can see from this example that it is not difficult for phylogeny to induce a type I error.

Here it is because clusters of closely related taxa have highly similar phenotypes. In other words, they are not independent data points about the evolutionary process for x and y on the tree:

## this is a projection of the tree into morphospace
phylomorphospace(tree,cbind(x,y),label="off",node.size=c(0,0))
points(x,y,pch=21,bg="grey",cex=1.5)
abline(fit.ols,lwd=2,lty="dashed",col="red")

plot of chunk unnamed-chunk-7

Now let's see if by using Felsenstein's (1985) algorithm to substitute the contrasts between species (& nodes) for the original values we can resolve our type I error in this case:

ix<-pic(x,tree)
iy<-pic(y,tree)
fit.pic<-lm(iy~ix+0)
fit.pic
## 
## Call:
## lm(formula = iy ~ ix + 0)
## 
## Coefficients:
##       ix  
## -0.02307
plot(ix,iy,cex=1.5,pch=21,bg="grey",
    xlab="phylogenetically independent contrasts for x",
    ylab="phylogenetically independent contrasts for y")
abline(fit.pic,lwd=2,lty="dashed",col="red")
## add axes
abline(h=0,lty="dotted",col="grey")
abline(v=0,lty="dotted",col="grey")

plot of chunk unnamed-chunk-8

summary(fit.pic)
## 
## Call:
## lm(formula = iy ~ ix + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98240 -0.81763 -0.03328  0.72463  2.83386 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)
## ix -0.02307    0.11502  -0.201    0.841
## 
## Residual standard error: 1.027 on 98 degrees of freedom
## Multiple R-squared:  0.0004105,  Adjusted R-squared:  -0.009789 
## F-statistic: 0.04025 on 1 and 98 DF,  p-value: 0.8414

This is not an example of a “real” relationship that has been removed with contrasts! Remember, we know our data were simulated without a genuine evolutionary relationship between x & y. The relationship that we measure in the OLS case is a spurious relationship driven by the phylogeny.

Written by Liam J. Revell. Last updated 26 June 2018