Exercise 4: Phylogenetic generalized least squares regression and phylogenetic generalized ANOVA

First, we need to load the data & tree in R.

As always, we we need certain packages to read the phylogeny & run the analyses.

library(ape)
library(nlme)
library(geiger)
library(phytools)
## Loading required package: maps

Now we have to load the data & tree to memory in R.

You should be able to find the files on the course page:

  1. Barbetdata.csv
  2. BarbetTree.nex
barbets<-read.csv("Barbetdata.csv",header=TRUE,row.names=1)
barbets
##                                          wing    Lnalt     patch   colour
## Calorhamphus_fuliginosus_fuliginosus 4.388257 5.298317  2.000000 1.666667
## Calorhamphus_fuliginosus_hayi        4.427239 5.298317  2.000000 1.666667
## M_armillaris                         4.532599 7.170120  6.333333 4.000000
## M_asiatica_asiatica                  4.611152 6.802395  7.333333 5.000000
## M_asiatica_davisoni                  4.605170 7.003065  6.666667 3.333333
## M_australis_duvaucelli               4.282206 6.620073  9.000000 4.000000
## M_chrysopogon                        4.829113 6.620073  9.000000 5.333333
## M_corvina                            4.824306 7.313220  2.000000 2.000000
## M_eximia                             4.359270 6.620073  7.666667 5.000000
## M_faiostricta                        4.693181 5.298317  5.000000 5.000000
## M_flavifrons                         4.487512 6.907755  5.000000 3.666667
## M_franklinii_auricularis             4.550714 7.313220  8.000000 5.666667
## M_franklinii_franklinii              4.604170 7.313220  7.666667 5.000000
## M_franklinii_ramsayi                 4.570579 7.313220  7.333333 5.000000
## M_haemacephala_indica                4.387014 6.109248  6.666667 4.000000
## M_henricii                           4.557030 6.163315  7.000000 4.666667
## M_incognita                          4.548600 6.802395  9.000000 3.333333
## M_javensis                           4.695925 6.214608  8.666667 4.333333
## M_lagrandieri                        4.898586 7.003065  7.000000 4.333333
## M_lineata_hodgsoni                   4.857484 5.991465  3.666667 2.666667
## M_monticola                          4.616110 6.802395  7.000000 5.000000
## M_mystacophanos                      4.553877 6.745236 10.333333 5.000000
## M_oorti_annamensis                   4.558079 7.313220  9.333333 5.000000
## M_oorti_nuchalis                     4.588024 6.620073  9.666667 5.666667
## M_oorti_oorti                        4.514151 7.170120  9.000000 5.000000
## M_pulcherrima                        4.521789 7.467371  5.666667 3.666667
## M_rafflesi                           4.741448 4.605170  8.333333 5.000000
## M_rubricapillus_malabarica           4.387014 6.109248  6.333333 4.000000
## M_rubricapillus_rubricapillus        4.346399 6.109248  9.000000 5.000000
## M_virens                             4.953712 7.495542  2.000000 2.000000
## M_viridis                            4.587006 6.214608  5.666667 3.000000
## M_zeylanica                          4.682131 5.991465  2.000000 2.333333
## Psilopogon_pyrolophus                4.773224 7.130899 10.000000 6.333333
##                                       Frequency      Length Lnote
## Calorhamphus_fuliginosus_fuliginosus  20.468894   3.1207387 0.260
## Calorhamphus_fuliginosus_hayi         22.483670   3.1371727 0.230
## M_armillaris                          -7.135924 -11.9001412 0.030
## M_asiatica_asiatica                  -10.153448   0.3671016 0.025
## M_asiatica_davisoni                   -9.700025   0.4492719 0.030
## M_australis_duvaucelli                 1.206501  -2.5198843 0.030
## M_chrysopogon                        -17.961279   0.1493287 0.100
## M_corvina                            -16.777077   0.8625655 0.115
## M_eximia                              -2.308277   3.2467332 0.030
## M_faiostricta                        -13.550036   0.3944917 0.045
## M_flavifrons                          -5.997081   1.7302251 0.140
## M_franklinii_auricularis             -10.305768   1.7466592 0.140
## M_franklinii_franklinii               -8.885098   1.7411811 0.135
## M_franklinii_ramsayi                  -8.648623   1.7685712 0.115
## M_haemacephala_indica                -16.782037   3.2248211 0.070
## M_henricii                            -9.201584  -6.4335091 0.125
## M_incognita                           -8.166673   0.3616236 0.040
## M_javensis                           -17.204273  -5.6678766 0.050
## M_lagrandieri                         -5.910024   3.0440464 0.400
## M_lineata_hodgsoni                   -12.746571   0.3287555 0.070
## M_monticola                          -16.093079  -7.7148614 0.060
## M_mystacophanos                      -15.948289  -9.0316506 0.060
## M_oorti_annamensis                    -9.654623  -0.1503694 0.030
## M_oorti_nuchalis                     -11.342388  -6.2060987 0.045
## M_oorti_oorti                        -11.091419   0.7530051 0.140
## M_pulcherrima                         -7.183282  -5.5981665 0.300
## M_rafflesi                           -22.419596 -16.9434720 0.050
## M_rubricapillus_malabarica           -16.255444   3.2357771 0.050
## M_rubricapillus_rubricapillus        -17.699099  -0.6955755 0.030
## M_virens                              -6.799948   2.9892662 0.500
## M_viridis                             -8.780492  -0.0736771 0.040
## M_zeylanica                          -13.327782   0.4273598 0.060
## Psilopogon_pyrolophus                  5.990051 -10.1022207 0.300

These data are from a study in which the purpose was to identify factors that were associated with the evolucion of characteristics of song in an group of Asian bird species called “barbets”.

btree<-read.nexus("BarbetTree.nex")

One useful step that we haven’t covered so far is to ensure that all the species in the data are in the tree & vice versa.

Remeber that if there are any spelling differences, even small, then the tree & dataset will not coincide.

To establish that we have the same species in our data as in the tree, or vice versa, we can use a function in the geiger package called name.check.

obj<-name.check(btree,barbets)
obj
## $tree_not_data
## [1] "M_asiatica_chersonesus"    "M_australis_australis"    
## [3] "M_haemacephala_celestinoi" "M_haemacephala_delica"    
## [5] "M_haemacephala_intermedia" "M_haemacephala_rosea"     
## [7] "M_lineata_lineata"         "M_oorti_faber"            
## [9] "M_oorti_sini"             
## 
## $data_not_tree
## character(0)

An alternative for this can be:

#check if all species in the tree are in the database
setdiff(btree$tip.label, row.names(barbets))
## [1] "M_australis_australis"     "M_haemacephala_intermedia"
## [3] "M_haemacephala_delica"     "M_lineata_lineata"        
## [5] "M_haemacephala_rosea"      "M_asiatica_chersonesus"   
## [7] "M_haemacephala_celestinoi" "M_oorti_faber"            
## [9] "M_oorti_sini"
#check if all species in the database are in the tree
setdiff(row.names(barbets),btree$tip.label)
## character(0)

We should see that there are 9 species in the tree for which we do not have data. We have to prune these species from the tree to proceed with the analysis. We can do this using the function drop.tip as we learned:

btree.cut<-drop.tip(btree,obj$tree_not_data)

Now the data & the tree should coincide in both number of species and names. We can verify this using name.check:

name.check(btree.cut,barbets)
## [1] "OK"

Now that we are sure that the tree and dataset match, we can start to explore the phylogenetic GLS.

There are two primary packages that can be used to conduct PGLS: ape (with nlme) and caper. There are differences between the two packages in how they work and the information each package and corresponding function returns.

First, we will explore phylogenetic GLS in ape.

Phylogenetic GLS is basically a linear model in which the covariance (correlation) structure between species is permitted to match that expected under a Brownian motion process* of evolution on the tree. (*Or other processes.) Consequently, the first step is to define this covariance structure. We do this as follows:

bm<-corBrownian(1,btree.cut)
bm
## Uninitialized correlation structure of class corBrownian

We have defined a variance-covariance structure based on the model of Brownian evolution that we learned earlier.

The first model we will fit is for a simple analysis investigating the relationship between altitude at which a species is found and the length of the note in its song.

model1<-gls(Lnote~Lnalt, data=barbets, correlation=bm)
plot(Lnote~Lnalt,data=barbets,cex=1.5,pch=21,bg="grey")
abline(model1,lwd=2,col="darkgrey",lty="dashed")

summary(model1)
## Generalized least squares fit by REML
##   Model: Lnote ~ Lnalt 
##   Data: barbets 
##         AIC     BIC   logLik
##   -68.33596 -64.034 37.16798
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## 
## Coefficients:
##                   Value  Std.Error   t-value p-value
## (Intercept) -0.17975456 0.14940859 -1.203107  0.2380
## Lnalt        0.05475373 0.02162382  2.532103  0.0166
## 
##  Correlation: 
##       (Intr)
## Lnalt -0.879
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.4790545 -1.0293796 -0.6849539 -0.4535629  2.0893357 
## 
## Residual standard error: 0.1289145 
## Degrees of freedom: 33 total; 31 residual

Look at the information that this object contains. It should be very similar (in fact, almost identical) to what we see after performing a standard, OLS linear regression.

Note that unlike contrasts, though we can readily compute the residuals from a fitted PGLS model, these residuals will be phylogenetically correlated:

residuals(model1)
## Calorhamphus_fuliginosus_fuliginosus        Calorhamphus_fuliginosus_hayi 
##                           0.14965190                           0.11965190 
##                         M_armillaris                  M_asiatica_asiatica 
##                          -0.18283625                          -0.16770195 
##                  M_asiatica_davisoni               M_australis_duvaucelli 
##                          -0.17368942                          -0.15271916 
##                        M_chrysopogon                            M_corvina 
##                          -0.08271916                          -0.10567156 
##                             M_eximia                        M_faiostricta 
##                          -0.15271916                          -0.06534810 
##                         M_flavifrons             M_franklinii_auricularis 
##                          -0.05847083                          -0.08067156 
##              M_franklinii_franklinii                 M_franklinii_ramsayi 
##                          -0.08567156                          -0.10567156 
##                M_haemacephala_indica                           M_henricii 
##                          -0.08474955                          -0.03270993 
##                          M_incognita                           M_javensis 
##                          -0.15270195                          -0.11051843 
##                        M_lagrandieri                   M_lineata_hodgsoni 
##                           0.19631058                          -0.07830049 
##                          M_monticola                      M_mystacophanos 
##                          -0.13270195                          -0.12957231 
##                   M_oorti_annamensis                     M_oorti_nuchalis 
##                          -0.19067156                          -0.13771916 
##                        M_oorti_oorti                        M_pulcherrima 
##                          -0.07283625                           0.07088812 
##                           M_rafflesi           M_rubricapillus_malabarica 
##                          -0.02239570                          -0.10474955 
##        M_rubricapillus_rubricapillus                             M_virens 
##                          -0.12474955                           0.26934566 
##                            M_viridis                          M_zeylanica 
##                          -0.12051843                          -0.08830049 
##                Psilopogon_pyrolophus 
##                           0.08931123 
## attr(,"std")
##  [1] 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145
##  [8] 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145
## [15] 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145
## [22] 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145
## [29] 0.1289145 0.1289145 0.1289145 0.1289145 0.1289145
## attr(,"label")
## [1] "Residuals"
## we'll learn more about phylogenetic signal later
phylosig(btree.cut,residuals(model1))
## [1] 1.472683

It is also possible to conduct this analysis while relaxing the assumption of Brownian motion. The most simple generalization of the Brownian model is one with one additional parameter to scale the expected covariance under pairs of species. This model is called the λ model of Pagel (Pagel’s λ). When λ = 0 the covariance between species is zero and this corresponds to a non-phylogenetic regression. By contrast, when &lambda = 1, the evolution of the residual error is Brownian.

Let’s try, then, to repeat our analysis with the λ model.

model2<-gls(Lnote~Lnalt,data=barbets,correlation=corPagel(1,btree.cut))
summary(model2)
## Generalized least squares fit by REML
##   Model: Lnote ~ Lnalt 
##   Data: barbets 
##         AIC       BIC   logLik
##   -67.82689 -62.09094 37.91345
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
##   lambda 
## 1.015045 
## 
## Coefficients:
##                   Value  Std.Error   t-value p-value
## (Intercept) -0.16751498 0.15009041 -1.116094  0.2730
## Lnalt        0.05278484 0.02146686  2.458899  0.0197
## 
##  Correlation: 
##       (Intr)
## Lnalt -0.869
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.4063402 -0.9813780 -0.6620451 -0.4260508  2.0281622 
## 
## Residual standard error: 0.1340445 
## Degrees of freedom: 33 total; 31 residual

What value of λ did you obtain? Did the value of the slope change between the two models?

We can also conduct multiple regression with the same framework. Let’s now look at the relationship between body size & various different parameters of song.

model3<-gls(Lnote~Lnalt+wing, data=barbets, correlation=corPagel(1, btree.cut))
summary(model3)
## Generalized least squares fit by REML
##   Model: Lnote ~ Lnalt + wing 
##   Data: barbets 
##        AIC       BIC   logLik
##   -68.5887 -61.58271 39.29435
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
##   lambda 
## 1.018194 
## 
## Coefficients:
##                  Value Std.Error   t-value p-value
## (Intercept) -1.4726160 0.5303303 -2.776790  0.0094
## Lnalt        0.0665875 0.0208823  3.188697  0.0033
## wing         0.2723970 0.1062074  2.564766  0.0156
## 
##  Correlation: 
##       (Intr) Lnalt 
## Lnalt -0.487       
## wing  -0.964  0.276
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.7663313 -1.3894968 -0.9822076 -0.5451497  1.4419755 
## 
## Residual standard error: 0.1279261 
## Degrees of freedom: 33 total; 30 residual

Now let’s try to repeat these analyses with caper. This package funcions in a manner that it is a little bit different. We need to combine our data and tree into an object with a special class: “comparative.data”. In addition to the tree, this object contains a data frame in which one column contains the names of the species. The easiest way for us to get this data matrix is just to re-read our dataset from file and change the value of the argument row.names as follows:

library(caper)
## Loading required package: MASS
## Loading required package: mvtnorm
barbets2<-read.csv("Barbetdata.csv",header=TRUE)
comp.data<-comparative.data(btree.cut, barbets2,
    names.col="Species",vcv.dim=2,warn.dropped=TRUE)
model4<-pgls(Lnote~Lnalt,data=comp.data)
summary(model4)
## 
## Call:
## pgls(formula = Lnote ~ Lnalt, data = comp.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04816 -0.01198  0.01042  0.02527  0.04437 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [Fix]  : 1.000
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.179755   0.149409 -1.2031  0.23804  
## Lnalt        0.054754   0.021624  2.5321  0.01662 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02551 on 31 degrees of freedom
## Multiple R-squared: 0.1714,  Adjusted R-squared: 0.1446 
## F-statistic: 6.412 on 1 and 31 DF,  p-value: 0.01662
model5<-pgls(Lnote~Lnalt,data=comp.data,lambda="ML")
summary(model5)
## 
## Call:
## pgls(formula = Lnote ~ Lnalt, data = comp.data, lambda = "ML")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.04816 -0.01198  0.01042  0.02527  0.04437 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 1.000
##    lower bound : 0.000, p = 1.3578e-08
##    upper bound : 1.000, p = 1    
##    95.0% CI   : (0.952, NA)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.179755   0.149409 -1.2031  0.23804  
## Lnalt        0.054754   0.021624  2.5321  0.01662 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02551 on 31 degrees of freedom
## Multiple R-squared: 0.1714,  Adjusted R-squared: 0.1446 
## F-statistic: 6.412 on 1 and 31 DF,  p-value: 0.01662
model6<-pgls(Lnote~Lnalt+wing,data=comp.data,lambda="ML")
summary(model6)
## 
## Call:
## pgls(formula = Lnote ~ Lnalt + wing, data = comp.data, lambda = "ML")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.03564 -0.01143  0.00644  0.01945  0.04477 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 1.000
##    lower bound : 0.000, p = 1.5e-08
##    upper bound : 1.000, p = 1    
##    95.0% CI   : (0.940, NA)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -1.338053   0.547467 -2.4441 0.020617 * 
## Lnalt        0.067619   0.021240  3.1836 0.003378 **
## wing         0.240871   0.110005  2.1896 0.036464 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02408 on 30 degrees of freedom
## Multiple R-squared: 0.2856,  Adjusted R-squared: 0.2379 
## F-statistic: 5.995 on 2 and 30 DF,  p-value: 0.006449
model7<-pgls(patch~wing,data=comp.data,lambda="ML")
summary(model7)
## 
## Call:
## pgls(formula = patch ~ wing, data = comp.data, lambda = "ML")
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0708 -0.2744  0.0189  0.5531  1.1698 
## 
## Branch length transformations:
## 
## kappa  [Fix]  : 1.000
## lambda [ ML]  : 0.752
##    lower bound : 0.000, p = 0.017467
##    upper bound : 1.000, p = 0.019576
##    95.0% CI   : (0.183, 0.978)
## delta  [Fix]  : 1.000
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  35.5609    14.4246  2.4653  0.01943 *
## wing         -6.6440     3.1915 -2.0818  0.04571 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6115 on 31 degrees of freedom
## Multiple R-squared: 0.1227,  Adjusted R-squared: 0.09435 
## F-statistic: 4.334 on 1 and 31 DF,  p-value: 0.04571

An interesting feature of caper is that it also permits us to plot the residuals of the model.

First we need to divide the plotting window in 4 smaller windows to visualize all plots at once, then we can plot the residuals:

par(mfrow=c(2,2))
plot(model7)

Also, we can easily visualize, for instance, a likelihood surface for our parameters. Here is the likelihood surface for the parameter λ:

lm.lk<-pgls.profile(model7,which="lambda")
plot(lm.lk)

Written by Alejandro Gonzalez-Voyer & Liam J. Revell. Updated 25 June 2018.