Exercise 2: Introduction to phylogenies in R
This tutorial gives a basic introduction to phylogenies in the R language and statistical computing environment.
R phylogenetics packages
R phylogenetics is built on the contributed packages for phylogenetics in R, and there are many such packages. If you did not install the recommended phylogeny packages, you can begin today by installing a few critical packages, such as ape, phangorn, phytools, and geiger. To ensure that you get the most recent CRAN version of these packages, you will need to have R 3.4.x installed on your computer!
R.version
## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## system x86_64, mingw32
## status
## major 3
## minor 5.0
## year 2018
## month 04
## day 23
## svn rev 74626
## language R
## version.string R version 3.5.0 (2018-04-23)
## nickname Joy in Playing
install.packages("ape")
install.packages("phangorn")
install.packages("phytools")
install.packages("geiger")
We can verify that we have the same versions of these packages installed by using
the base function packageVersion
:
packageVersion("ape")
## [1] '5.1'
packageVersion("phangorn")
## [1] '2.4.0'
packageVersion("phytools")
## [1] '0.6.59'
packageVersion("geiger")
## [1] '2.0.6'
I strongly recommend that you use the package versions shown above (or more recent versions) rather than older editions.
Installing automatically from CRAN using install.packages
installs not
only your target package - but also any libraries on which that package depends, if
that package has not yet been installed.
The "phylo"
object in R
Now we've installed critical packages (ape, phangorn, phytools, geiger). The most important core package for phylogenies in R is called ape, which stands for Analysis of Phylogenetics and Evolution in R.
## load ape
library(ape)
ape does many different things. To get started let's read a 'toy' example vertebrate tree from a Newick text string:
## read tree from string
text.string<-
"(((((((cow, pig),whale),(bat,(lemur,human))),(robin,iguana)),coelacanth),gold_fish),shark);"
vert.tree<-read.tree(text=text.string)
We can visualize this phylogeny in R in a number of different ways. Let's plot this phylogeny in three different styles:
plot(vert.tree,no.margin=TRUE,edge.width=2)
plot(vert.tree,no.margin=TRUE,edge.width=2,
type="cladogram")
plot(unroot(vert.tree),type="unrooted",no.margin=TRUE,
lab4ut="axial",edge.width=2)
The object created in memory when we simulate or estimate a phylogeny, or read
one from an input file, is a list of class "phylo"
.
Remember, a list is just a customizable object type that can combine different
objects of different types. For instance, a list might have a vector of real
numbers (with mode "numeric"
) as its first element; and then a vector
of strings (with mode "character"
) as its second element; and so on.
Assigning our tree with a special class, "phylo"
, is just a
convenient way to tell special functions in R how to treat that object.
An object of class "phylo"
has at least three parts. These are normally
hidden, for instance, just typing the name of your "phylo"
object does
not give you the structure in memory, as it does for many R objects:
vert.tree
##
## Phylogenetic tree with 11 tips and 10 internal nodes.
##
## Tip labels:
## cow, pig, whale, bat, lemur, human, ...
##
## Rooted; no branch lengths.
Instead, something called an S3 print
method is activate to print to
screen a summary of some of the important attributes of that object.
We can, however, reveal the internal structure of our object as follows:
str(vert.tree)
## List of 3
## $ edge : int [1:20, 1:2] 12 13 14 15 16 17 18 18 17 16 ...
## $ Nnode : int 10
## $ tip.label: chr [1:11] "cow" "pig" "whale" "bat" ...
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
To understand the structure of a "phylo"
object a little bit more clearly, let's decompose it to its
essential components:
library(phytools)
plotTree(vert.tree,offset=1,type="cladogram")
labelnodes(1:(Ntip(vert.tree)+vert.tree$Nnode),
1:(Ntip(vert.tree)+vert.tree$Nnode),
interactive=F,cex=0.8)
Here I have overlain the 'node numbers' - the indices from the "phylo"
object edge
which
is a matrix containing the starting & ending node indices for each edge:
vert.tree$edge
## [,1] [,2]
## [1,] 12 13
## [2,] 13 14
## [3,] 14 15
## [4,] 15 16
## [5,] 16 17
## [6,] 17 18
## [7,] 18 1
## [8,] 18 2
## [9,] 17 3
## [10,] 16 19
## [11,] 19 4
## [12,] 19 20
## [13,] 20 5
## [14,] 20 6
## [15,] 15 21
## [16,] 21 7
## [17,] 21 8
## [18,] 14 9
## [19,] 13 10
## [20,] 12 11
We should see: 1) that tree$edge
has a number of rows equal to the number of edges (8) in this tree;
and (2) that each edge starts and ends with a unique pair of indices. Furthermore, by convention the indices 1 through
N for N tips correspond with the tip nodes (species) in the tree.
The other components of importance are the vector tip.label
and an integer Nnode
which gives
the number of interior nodes in the tree:
vert.tree$tip.label
## [1] "cow" "pig" "whale" "bat" "lemur"
## [6] "human" "robin" "iguana" "coelacanth" "gold_fish"
## [11] "shark"
vert.tree$Nnode
## [1] 10
An object of class "phylo"
also (by definition) has at least one attribute -
its class. This is just a value to tell various methods in R what to do with an
object of this type. For instance, if we call the generic method plot
, R knows to use the
method plot.phylo
in the R package ape.
An object of class "phylo"
can also have other components, the most
common of which is edge.length
(a vector of class "numeric"
containing all the edge lengths). In addition, other elements & attributes can be added
for special types of phylogenetic trees.
Writing & reading phylogenetic trees
We can easily write & read trees to & from files, for example if we download the file Anolis.tre:
anolis.tree<-read.tree(file="Anolis.tre")
anolis.tree
##
## Phylogenetic tree with 100 tips and 99 internal nodes.
##
## Tip labels:
## ahli, allogus, rubribarbus, imias, sagrei, bremeri, ...
##
## Rooted; includes branch lengths.
plotTree(anolis.tree,ftype="i",fsize=0.5,lwd=1)
This is a tree containing:
Ntip(anolis.tree)
## [1] 100
100 species of lizards in the neotropical lizard genus Anolis.
We can also write trees to file. For instance, we can easily write our vertebrate tree from before to a simple text file in Newick style:
write.tree(vert.tree,file="example.tre")
## this is what our file looks like (you can open it to check)
cat(readLines("example.tre"),sep="\n")
## (((((((cow,pig),whale),(bat,(lemur,human))),(robin,iguana)),coelacanth),gold_fish),shark);
Plotting, & manipulating trees
There are a range of ways in which we can plot trees in R.
For instance, a convenient plotting method for large rooted trees is a circular or 'fan' tree:
plotTree(anolis.tree,type="fan",fsize=0.8,lwd=1,
ftype="i")
It is pretty easy to drop species from the tree, or to extract clades. For instance the anoles from Puerto Rico (in this phylogeny) consist of A. cristatellus, A. cooki, A. poncensis, A. gundlachi, A. pulchellus, A. stratulus, and A. evermanni (which form a clade), as well as A. occultus and A. cuvieri.
First, let's find them:
pr.species<-c("cooki","poncensis",
"gundlachi","pulchellus","stratulus",
"krugi","evermanni","occultus","cuvieri",
"cristatellus")
nodes<-sapply(pr.species,grep,anolis.tree$tip.label)
plotTree(anolis.tree,type="fan",fsize=0.8,lwd=1,
ftype="i")
add.arrow(anolis.tree,tip=nodes,arrl=0.15,col="blue",
offset=2)
Now let's prune these species from the tree:
anolis.noPR<-drop.tip(anolis.tree,pr.species)
plotTree(anolis.noPR,type="fan",fsize=0.7,lwd=1,
ftype="i")
Or, we can extract the clade that includes all but two of the species:
node<-fastMRCA(anolis.tree,"evermanni",
"cristatellus")
plot(paintSubTree(anolis.tree,node,"b","a"),
type="fan",fsize=0.8,lwd=2,
colors=setNames(c("grey","blue"),c("a","b")),
ftype="i")
points(get("last_plot.phylo",envir=.PlotPhyloEnv)$xx[node],
get("last_plot.phylo",envir=.PlotPhyloEnv)$yy[node],
pch=21,bg="blue",cex=1.5)
arc.cladelabels(anolis.tree,"clade to extract",node,1.35,1.4,
mark.node=F)
pr.clade<-extract.clade(anolis.tree,node)
plotTree(pr.clade,ftype="i")
Now, let's prune everything in the tree except these species:
pr.tree<-drop.tip(anolis.tree,
setdiff(anolis.tree$tip.label,
pr.species))
plotTree(pr.tree,ftype="i")
Finally, we can do it interactively using collapseTree
anolis.pruned<-collapseTree(anolis.tree)
plotTree(anolis.pruned,type="fan",fsize=0.7,lwd=1,
ftype="i")
Multiple trees
Finally, it is sometimes useful to store multiple phylogenies in a single object. This would be the case, for example, if we had a set of trees in a posterior sample from Bayesian phylogeny inference; or if we wanted to replicate a simulation analysis across a large number of phylogenies.
Multiple trees are stored as an object of class "multiPhylo"
. This is
just a list of objects of class "phylo"
, with the class attribute
"multiPhylo"
. Many, but not all, functions in ape, phytools, and
other R packages can be applied to both "phylo"
and "multiPhylo"
objects. For instance:
anolis.trees<-c(anolis.tree,anolis.noPR,pr.clade,pr.tree)
print(anolis.trees,details=TRUE)
## 4 phylogenetic trees
## tree 1 : 100 tips
## tree 2 : 90 tips
## tree 3 : 8 tips
## tree 4 : 10 tips
## round the edge lengths of the tree to 1 digits
anolis.trees<-roundBranches(anolis.trees,digits=1)
## write to file
write.tree(anolis.trees,file="example.trees")
## this is what it looks like:
cat(readLines("example.trees"),sep="\n")
## ((((((((ahli:0.1,allogus:0.1):0.1,rubribarbus:0.2):0.3,imias:0.6):0.1,((((sagrei:0.3,(bremeri:0.1,quadriocellifer:0.1):0.1):0.1,ophiolepis:0.3):0.1,mestrei:0.4):0.1,(((jubar:0.1,homolechis:0.1):0.1,confusus:0.2):0,guafe:0.3):0.3):0.2):0.1,((((garmani:0.2,opalinus:0.2):0,grahami:0.2):0.2,valencienni:0.4):0.1,(lineatopus:0.5,reconditus:0.5):0.1):0.3):0.1,(((evermanni:0.2,stratulus:0.2):0.4,(((krugi:0.3,pulchellus:0.3):0.1,(gundlachi:0.4,poncensis:0.4):0.1):0,(cooki:0.4,cristatellus:0.4):0.1):0.1):0.1,(((brevirostris:0.3,(caudalis:0.2,marron:0.2):0.1):0,websteri:0.3):0.1,distichus:0.4):0.3):0.2):0,(((barbouri:0.8,(((alumina:0.3,semilineatus:0.3):0.2,olssoni:0.5):0.3,(etheridgei:0.6,(fowleri:0.4,insolitus:0.4):0.2):0.2):0.1):0.1,((((whitemani:0.3,((haetianus:0.3,breslini:0.3):0.1,((armouri:0.1,cybotes:0.1):0,shrevei:0.2):0.1):0):0.1,(longitibialis:0.3,strahmi:0.3):0.2):0.1,marcanoi:0.5):0.3,((((((baleatus:0,barahonae:0):0.1,ricordii:0.1):0.2,eugenegrahami:0.3):0.1,christophei:0.4):0.1,cuvieri:0.5):0.1,(barbatus:0.1,(porcus:0.1,(chamaeleonides:0.1,guamuhaya:0.1):0):0.1):0.4):0.2):0.1):0.1,((((((((altitudinalis:0.2,oporinus:0.2):0.1,isolepis:0.3):0.3,(allisoni:0.3,porcatus:0.3):0.2):0,(((argillaceus:0.1,centralis:0.1):0,pumilis:0.1):0.2,loysiana:0.4):0.2):0.1,guazuma:0.6):0,((placidus:0.2,sheplani:0.2):0.4,(alayoni:0.4,(angusticeps:0.2,paternus:0.2):0.2):0.2):0.1):0.1,((alutaceus:0.1,inexpectatus:0.1):0.4,(((clivicola:0.3,(cupeyalensis:0.1,cyanopleurus:0.1):0.2):0.1,(alfaroi:0.3,macilentus:0.3):0.2):0,vanidicus:0.5):0.1):0.2):0.1,(argenteolus:0.7,lucius:0.7):0.2):0.1):0):0,(((bartschi:0.5,vermiculatus:0.5):0.2,((((baracoae:0.1,(noblei:0,smallwoodi:0):0):0,luteogularis:0.1):0,equestris:0.1):0.6,(((monticola:0.6,(bahorucoensis:0.4,(dolichocephalus:0.2,hendersoni:0.2):0.2):0.2):0,darlingtoni:0.6):0,(((aliniger:0.2,singularis:0.2):0.1,chlorocyanus:0.3):0.2,coelestinus:0.5):0.1):0.1):0):0.1,occultus:0.9):0.1);
## ((((((((ahli:0.1,allogus:0.1):0.1,rubribarbus:0.2):0.3,imias:0.6):0.1,((((sagrei:0.3,(bremeri:0.1,quadriocellifer:0.1):0.1):0.1,ophiolepis:0.3):0.1,mestrei:0.4):0.1,(((jubar:0.1,homolechis:0.1):0.1,confusus:0.2):0,guafe:0.3):0.3):0.2):0.1,((((garmani:0.2,opalinus:0.2):0,grahami:0.2):0.2,valencienni:0.4):0.1,(lineatopus:0.5,reconditus:0.5):0.1):0.3):0.1,(((brevirostris:0.3,(caudalis:0.2,marron:0.2):0.1):0,websteri:0.3):0.1,distichus:0.4):0.5):0,(((barbouri:0.8,(((alumina:0.3,semilineatus:0.3):0.2,olssoni:0.5):0.3,(etheridgei:0.6,(fowleri:0.4,insolitus:0.4):0.2):0.2):0.1):0.1,((((whitemani:0.3,((haetianus:0.3,breslini:0.3):0.1,((armouri:0.1,cybotes:0.1):0,shrevei:0.2):0.1):0):0.1,(longitibialis:0.3,strahmi:0.3):0.2):0.1,marcanoi:0.5):0.3,(((((baleatus:0,barahonae:0):0.1,ricordii:0.1):0.2,eugenegrahami:0.3):0.1,christophei:0.4):0.2,(barbatus:0.1,(porcus:0.1,(chamaeleonides:0.1,guamuhaya:0.1):0):0.1):0.4):0.2):0.1):0.1,((((((((altitudinalis:0.2,oporinus:0.2):0.1,isolepis:0.3):0.3,(allisoni:0.3,porcatus:0.3):0.2):0,(((argillaceus:0.1,centralis:0.1):0,pumilis:0.1):0.2,loysiana:0.4):0.2):0.1,guazuma:0.6):0,((placidus:0.2,sheplani:0.2):0.4,(alayoni:0.4,(angusticeps:0.2,paternus:0.2):0.2):0.2):0.1):0.1,((alutaceus:0.1,inexpectatus:0.1):0.4,(((clivicola:0.3,(cupeyalensis:0.1,cyanopleurus:0.1):0.2):0.1,(alfaroi:0.3,macilentus:0.3):0.2):0,vanidicus:0.5):0.1):0.2):0.1,(argenteolus:0.7,lucius:0.7):0.2):0.1):0):0,((bartschi:0.5,vermiculatus:0.5):0.2,((((baracoae:0.1,(noblei:0,smallwoodi:0):0):0,luteogularis:0.1):0,equestris:0.1):0.6,(((monticola:0.6,(bahorucoensis:0.4,(dolichocephalus:0.2,hendersoni:0.2):0.2):0.2):0,darlingtoni:0.6):0,(((aliniger:0.2,singularis:0.2):0.1,chlorocyanus:0.3):0.2,coelestinus:0.5):0.1):0.1):0):0.2);
## ((evermanni:0.2,stratulus:0.2):0.4,(((krugi:0.3,pulchellus:0.3):0.1,(gundlachi:0.4,poncensis:0.4):0.1):0,(cooki:0.4,cristatellus:0.4):0.1):0.1);
## ((((evermanni:0.2,stratulus:0.2):0.4,(((krugi:0.3,pulchellus:0.3):0.1,(gundlachi:0.4,poncensis:0.4):0.1):0,(cooki:0.4,cristatellus:0.4):0.1):0.1):0.4,cuvieri:1):0,occultus:1);
Many other tasks are also possible in R phylogenetics. For an overview, read the CRAN Phylogenetics Task View. Also check out my blog: http://blog.phytools.org, or review the PDF manuals of different R packages such as ape, phytools, and phangorn.
Written by Liam J. Revell. Last updated 25 June 2018.