Hi R-sig-phylo readers,
I have lately been thinking about an issue regarding model selection in
trait evolutionary models and was wondering if anyone on the list had any
insight into this question:
It is now commonplace for researchers to use some model selection criterion
such as AIC/AICc/BIC to select a model of trait evolution. In their book,
Burnham and Anderson discuss how the derivation of AIC only holds if the
number of data points is much larger than the number of parameters (they
suggest, roughly that n/k > 40). If this is not true, they provide a
small-sample size correction for the AIC (AICc) which explicitly takes into
account the number of observations. Similarily BIC, includes the number of
observations in the formulation.
My question is: how many observations do we have when we compare trait
evolutionary models? People tend to use the number of tips of taxa for
which we have trait values. However, this may not be technically accurate.
First, of course, both the branch lengths and the tip values factor into
the likelihood equations so it seems sensible that these are both somehow
included as observations. Second, the trait values we observe are of course
not independent (that is the whole reason we are using a phylogeny in the
first place!!). It is unclear whether/how this fact should factor into our
calculation of the n. I know that it phylogenetics, when people do model
selection for the model of sequence evolution, they use the number of sites
in the alignment though i am not sure there is a clear justification for
this either. I was just wondering what people thought about this. Boettiger
et al. (2012) showed that the choice of the evolutionary model for
moderately sized phylogenies is very different when using AIC vs AICc so I
think this may be worth some serious consideration.
Any thoughts?
cheers,
matt
[[alternative HTML version deleted]]
_______________________________________________
R-sig-phylo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo