Exercise 2: Introduction to phylogenies in R

This tutorial gives a basic introduction to phylogenies in the R language and statistical computing environment.

R phylogenetics packages

R phylogenetics is built on the contributed packages for phylogenetics in R, and there are many such packages. If you did not install the recommended phylogeny packages, you can begin today by installing a few critical packages, such as ape, phangorn, phytools, and geiger. To ensure that you get the most recent CRAN version of these packages, you will need to have R 3.4.x installed on your computer!

R.version
##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          3                           
## minor          5.0                         
## year           2018                        
## month          04                          
## day            23                          
## svn rev        74626                       
## language       R                           
## version.string R version 3.5.0 (2018-04-23)
## nickname       Joy in Playing
install.packages("ape")
install.packages("phangorn")
install.packages("phytools")
install.packages("geiger")

We can verify that we have the same versions of these packages installed by using the base function packageVersion:

packageVersion("ape")
## [1] '5.1'
packageVersion("phangorn")
## [1] '2.4.0'
packageVersion("phytools")
## [1] '0.6.59'
packageVersion("geiger")
## [1] '2.0.6'

I strongly recommend that you use the package versions shown above (or more recent versions) rather than older editions.

Installing automatically from CRAN using install.packages installs not only your target package - but also any libraries on which that package depends, if that package has not yet been installed.

The "phylo" object in R

Now we've installed critical packages (ape, phangorn, phytools, geiger). The most important core package for phylogenies in R is called ape, which stands for Analysis of Phylogenetics and Evolution in R.

## load ape
library(ape)

ape does many different things. To get started let's read a 'toy' example vertebrate tree from a Newick text string:

## read tree from string
text.string<-
    "(((((((cow, pig),whale),(bat,(lemur,human))),(robin,iguana)),coelacanth),gold_fish),shark);"
vert.tree<-read.tree(text=text.string)

We can visualize this phylogeny in R in a number of different ways. Let's plot this phylogeny in three different styles:

plot(vert.tree,no.margin=TRUE,edge.width=2)

plot of chunk unnamed-chunk-6

plot(vert.tree,no.margin=TRUE,edge.width=2,
    type="cladogram")

plot of chunk unnamed-chunk-6

plot(unroot(vert.tree),type="unrooted",no.margin=TRUE,
    lab4ut="axial",edge.width=2)

plot of chunk unnamed-chunk-6

The object created in memory when we simulate or estimate a phylogeny, or read one from an input file, is a list of class "phylo".

Remember, a list is just a customizable object type that can combine different objects of different types. For instance, a list might have a vector of real numbers (with mode "numeric") as its first element; and then a vector of strings (with mode "character") as its second element; and so on. Assigning our tree with a special class, "phylo", is just a convenient way to tell special functions in R how to treat that object.

An object of class "phylo" has at least three parts. These are normally hidden, for instance, just typing the name of your "phylo" object does not give you the structure in memory, as it does for many R objects:

vert.tree
## 
## Phylogenetic tree with 11 tips and 10 internal nodes.
## 
## Tip labels:
##  cow, pig, whale, bat, lemur, human, ...
## 
## Rooted; no branch lengths.

Instead, something called an S3 print method is activate to print to screen a summary of some of the important attributes of that object.

We can, however, reveal the internal structure of our object as follows:

str(vert.tree)
## List of 3
##  $ edge     : int [1:20, 1:2] 12 13 14 15 16 17 18 18 17 16 ...
##  $ Nnode    : int 10
##  $ tip.label: chr [1:11] "cow" "pig" "whale" "bat" ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

To understand the structure of a "phylo" object a little bit more clearly, let's decompose it to its essential components:

library(phytools)
plotTree(vert.tree,offset=1,type="cladogram")
labelnodes(1:(Ntip(vert.tree)+vert.tree$Nnode),
    1:(Ntip(vert.tree)+vert.tree$Nnode),
    interactive=F,cex=0.8)

plot of chunk unnamed-chunk-9

Here I have overlain the 'node numbers' - the indices from the "phylo" object edge which is a matrix containing the starting & ending node indices for each edge:

vert.tree$edge
##       [,1] [,2]
##  [1,]   12   13
##  [2,]   13   14
##  [3,]   14   15
##  [4,]   15   16
##  [5,]   16   17
##  [6,]   17   18
##  [7,]   18    1
##  [8,]   18    2
##  [9,]   17    3
## [10,]   16   19
## [11,]   19    4
## [12,]   19   20
## [13,]   20    5
## [14,]   20    6
## [15,]   15   21
## [16,]   21    7
## [17,]   21    8
## [18,]   14    9
## [19,]   13   10
## [20,]   12   11

We should see: 1) that tree$edge has a number of rows equal to the number of edges (8) in this tree; and (2) that each edge starts and ends with a unique pair of indices. Furthermore, by convention the indices 1 through N for N tips correspond with the tip nodes (species) in the tree.

The other components of importance are the vector tip.label and an integer Nnode which gives the number of interior nodes in the tree:

vert.tree$tip.label
##  [1] "cow"        "pig"        "whale"      "bat"        "lemur"     
##  [6] "human"      "robin"      "iguana"     "coelacanth" "gold_fish" 
## [11] "shark"
vert.tree$Nnode
## [1] 10

An object of class "phylo" also (by definition) has at least one attribute - its class. This is just a value to tell various methods in R what to do with an object of this type. For instance, if we call the generic method plot, R knows to use the method plot.phylo in the R package ape.

An object of class "phylo" can also have other components, the most common of which is edge.length (a vector of class "numeric" containing all the edge lengths). In addition, other elements & attributes can be added for special types of phylogenetic trees.

Writing & reading phylogenetic trees

We can easily write & read trees to & from files, for example if we download the file Anolis.tre:

anolis.tree<-read.tree(file="Anolis.tre")
anolis.tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  ahli, allogus, rubribarbus, imias, sagrei, bremeri, ...
## 
## Rooted; includes branch lengths.
plotTree(anolis.tree,ftype="i",fsize=0.5,lwd=1)

plot of chunk unnamed-chunk-12

This is a tree containing:

Ntip(anolis.tree)
## [1] 100

100 species of lizards in the neotropical lizard genus Anolis.

We can also write trees to file. For instance, we can easily write our vertebrate tree from before to a simple text file in Newick style:

write.tree(vert.tree,file="example.tre")
## this is what our file looks like (you can open it to check)
cat(readLines("example.tre"),sep="\n")
## (((((((cow,pig),whale),(bat,(lemur,human))),(robin,iguana)),coelacanth),gold_fish),shark);

Plotting, & manipulating trees

There are a range of ways in which we can plot trees in R.

For instance, a convenient plotting method for large rooted trees is a circular or 'fan' tree:

plotTree(anolis.tree,type="fan",fsize=0.8,lwd=1,
    ftype="i")

plot of chunk unnamed-chunk-15

It is pretty easy to drop species from the tree, or to extract clades. For instance the anoles from Puerto Rico (in this phylogeny) consist of A. cristatellus, A. cooki, A. poncensis, A. gundlachi, A. pulchellus, A. stratulus, and A. evermanni (which form a clade), as well as A. occultus and A. cuvieri.

First, let's find them:

pr.species<-c("cooki","poncensis",
    "gundlachi","pulchellus","stratulus",
    "krugi","evermanni","occultus","cuvieri",
    "cristatellus")
nodes<-sapply(pr.species,grep,anolis.tree$tip.label)
plotTree(anolis.tree,type="fan",fsize=0.8,lwd=1,
    ftype="i")
add.arrow(anolis.tree,tip=nodes,arrl=0.15,col="blue",
    offset=2)

plot of chunk unnamed-chunk-16

Now let's prune these species from the tree:

anolis.noPR<-drop.tip(anolis.tree,pr.species)
plotTree(anolis.noPR,type="fan",fsize=0.7,lwd=1,
    ftype="i")

plot of chunk unnamed-chunk-17

Or, we can extract the clade that includes all but two of the species:

node<-fastMRCA(anolis.tree,"evermanni",
    "cristatellus")
plot(paintSubTree(anolis.tree,node,"b","a"),
    type="fan",fsize=0.8,lwd=2,
    colors=setNames(c("grey","blue"),c("a","b")),
    ftype="i")
points(get("last_plot.phylo",envir=.PlotPhyloEnv)$xx[node],
    get("last_plot.phylo",envir=.PlotPhyloEnv)$yy[node],
    pch=21,bg="blue",cex=1.5)
arc.cladelabels(anolis.tree,"clade to extract",node,1.35,1.4,
    mark.node=F)

plot of chunk unnamed-chunk-18

pr.clade<-extract.clade(anolis.tree,node)
plotTree(pr.clade,ftype="i")

plot of chunk unnamed-chunk-18

Now, let's prune everything in the tree except these species:

pr.tree<-drop.tip(anolis.tree,
    setdiff(anolis.tree$tip.label,
    pr.species))
plotTree(pr.tree,ftype="i")

plot of chunk unnamed-chunk-19

Finally, we can do it interactively using collapseTree

anolis.pruned<-collapseTree(anolis.tree)
plotTree(anolis.pruned,type="fan",fsize=0.7,lwd=1,
    ftype="i")

Multiple trees

Finally, it is sometimes useful to store multiple phylogenies in a single object. This would be the case, for example, if we had a set of trees in a posterior sample from Bayesian phylogeny inference; or if we wanted to replicate a simulation analysis across a large number of phylogenies.

Multiple trees are stored as an object of class "multiPhylo". This is just a list of objects of class "phylo", with the class attribute "multiPhylo". Many, but not all, functions in ape, phytools, and other R packages can be applied to both "phylo" and "multiPhylo" objects. For instance:

anolis.trees<-c(anolis.tree,anolis.noPR,pr.clade,pr.tree)
print(anolis.trees,details=TRUE)
## 4 phylogenetic trees
## tree 1 : 100 tips
## tree 2 : 90 tips
## tree 3 : 8 tips
## tree 4 : 10 tips
## round the edge lengths of the tree to 1 digits
anolis.trees<-roundBranches(anolis.trees,digits=1)
## write to file
write.tree(anolis.trees,file="example.trees")
## this is what it looks like:
cat(readLines("example.trees"),sep="\n")
## ((((((((ahli:0.1,allogus:0.1):0.1,rubribarbus:0.2):0.3,imias:0.6):0.1,((((sagrei:0.3,(bremeri:0.1,quadriocellifer:0.1):0.1):0.1,ophiolepis:0.3):0.1,mestrei:0.4):0.1,(((jubar:0.1,homolechis:0.1):0.1,confusus:0.2):0,guafe:0.3):0.3):0.2):0.1,((((garmani:0.2,opalinus:0.2):0,grahami:0.2):0.2,valencienni:0.4):0.1,(lineatopus:0.5,reconditus:0.5):0.1):0.3):0.1,(((evermanni:0.2,stratulus:0.2):0.4,(((krugi:0.3,pulchellus:0.3):0.1,(gundlachi:0.4,poncensis:0.4):0.1):0,(cooki:0.4,cristatellus:0.4):0.1):0.1):0.1,(((brevirostris:0.3,(caudalis:0.2,marron:0.2):0.1):0,websteri:0.3):0.1,distichus:0.4):0.3):0.2):0,(((barbouri:0.8,(((alumina:0.3,semilineatus:0.3):0.2,olssoni:0.5):0.3,(etheridgei:0.6,(fowleri:0.4,insolitus:0.4):0.2):0.2):0.1):0.1,((((whitemani:0.3,((haetianus:0.3,breslini:0.3):0.1,((armouri:0.1,cybotes:0.1):0,shrevei:0.2):0.1):0):0.1,(longitibialis:0.3,strahmi:0.3):0.2):0.1,marcanoi:0.5):0.3,((((((baleatus:0,barahonae:0):0.1,ricordii:0.1):0.2,eugenegrahami:0.3):0.1,christophei:0.4):0.1,cuvieri:0.5):0.1,(barbatus:0.1,(porcus:0.1,(chamaeleonides:0.1,guamuhaya:0.1):0):0.1):0.4):0.2):0.1):0.1,((((((((altitudinalis:0.2,oporinus:0.2):0.1,isolepis:0.3):0.3,(allisoni:0.3,porcatus:0.3):0.2):0,(((argillaceus:0.1,centralis:0.1):0,pumilis:0.1):0.2,loysiana:0.4):0.2):0.1,guazuma:0.6):0,((placidus:0.2,sheplani:0.2):0.4,(alayoni:0.4,(angusticeps:0.2,paternus:0.2):0.2):0.2):0.1):0.1,((alutaceus:0.1,inexpectatus:0.1):0.4,(((clivicola:0.3,(cupeyalensis:0.1,cyanopleurus:0.1):0.2):0.1,(alfaroi:0.3,macilentus:0.3):0.2):0,vanidicus:0.5):0.1):0.2):0.1,(argenteolus:0.7,lucius:0.7):0.2):0.1):0):0,(((bartschi:0.5,vermiculatus:0.5):0.2,((((baracoae:0.1,(noblei:0,smallwoodi:0):0):0,luteogularis:0.1):0,equestris:0.1):0.6,(((monticola:0.6,(bahorucoensis:0.4,(dolichocephalus:0.2,hendersoni:0.2):0.2):0.2):0,darlingtoni:0.6):0,(((aliniger:0.2,singularis:0.2):0.1,chlorocyanus:0.3):0.2,coelestinus:0.5):0.1):0.1):0):0.1,occultus:0.9):0.1);
## ((((((((ahli:0.1,allogus:0.1):0.1,rubribarbus:0.2):0.3,imias:0.6):0.1,((((sagrei:0.3,(bremeri:0.1,quadriocellifer:0.1):0.1):0.1,ophiolepis:0.3):0.1,mestrei:0.4):0.1,(((jubar:0.1,homolechis:0.1):0.1,confusus:0.2):0,guafe:0.3):0.3):0.2):0.1,((((garmani:0.2,opalinus:0.2):0,grahami:0.2):0.2,valencienni:0.4):0.1,(lineatopus:0.5,reconditus:0.5):0.1):0.3):0.1,(((brevirostris:0.3,(caudalis:0.2,marron:0.2):0.1):0,websteri:0.3):0.1,distichus:0.4):0.5):0,(((barbouri:0.8,(((alumina:0.3,semilineatus:0.3):0.2,olssoni:0.5):0.3,(etheridgei:0.6,(fowleri:0.4,insolitus:0.4):0.2):0.2):0.1):0.1,((((whitemani:0.3,((haetianus:0.3,breslini:0.3):0.1,((armouri:0.1,cybotes:0.1):0,shrevei:0.2):0.1):0):0.1,(longitibialis:0.3,strahmi:0.3):0.2):0.1,marcanoi:0.5):0.3,(((((baleatus:0,barahonae:0):0.1,ricordii:0.1):0.2,eugenegrahami:0.3):0.1,christophei:0.4):0.2,(barbatus:0.1,(porcus:0.1,(chamaeleonides:0.1,guamuhaya:0.1):0):0.1):0.4):0.2):0.1):0.1,((((((((altitudinalis:0.2,oporinus:0.2):0.1,isolepis:0.3):0.3,(allisoni:0.3,porcatus:0.3):0.2):0,(((argillaceus:0.1,centralis:0.1):0,pumilis:0.1):0.2,loysiana:0.4):0.2):0.1,guazuma:0.6):0,((placidus:0.2,sheplani:0.2):0.4,(alayoni:0.4,(angusticeps:0.2,paternus:0.2):0.2):0.2):0.1):0.1,((alutaceus:0.1,inexpectatus:0.1):0.4,(((clivicola:0.3,(cupeyalensis:0.1,cyanopleurus:0.1):0.2):0.1,(alfaroi:0.3,macilentus:0.3):0.2):0,vanidicus:0.5):0.1):0.2):0.1,(argenteolus:0.7,lucius:0.7):0.2):0.1):0):0,((bartschi:0.5,vermiculatus:0.5):0.2,((((baracoae:0.1,(noblei:0,smallwoodi:0):0):0,luteogularis:0.1):0,equestris:0.1):0.6,(((monticola:0.6,(bahorucoensis:0.4,(dolichocephalus:0.2,hendersoni:0.2):0.2):0.2):0,darlingtoni:0.6):0,(((aliniger:0.2,singularis:0.2):0.1,chlorocyanus:0.3):0.2,coelestinus:0.5):0.1):0.1):0):0.2);
## ((evermanni:0.2,stratulus:0.2):0.4,(((krugi:0.3,pulchellus:0.3):0.1,(gundlachi:0.4,poncensis:0.4):0.1):0,(cooki:0.4,cristatellus:0.4):0.1):0.1);
## ((((evermanni:0.2,stratulus:0.2):0.4,(((krugi:0.3,pulchellus:0.3):0.1,(gundlachi:0.4,poncensis:0.4):0.1):0,(cooki:0.4,cristatellus:0.4):0.1):0.1):0.4,cuvieri:1):0,occultus:1);

Many other tasks are also possible in R phylogenetics. For an overview, read the CRAN Phylogenetics Task View. Also check out my blog: http://blog.phytools.org, or review the PDF manuals of different R packages such as ape, phytools, and phangorn.

Written by Liam J. Revell. Last updated 25 June 2018.