-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12-05-02 12:29 PM, Matt Pennell wrote: > Hi R-sig-phylo readers, > > I have lately been thinking about an issue regarding model > selection in trait evolutionary models and was wondering if anyone > on the list had any insight into this question: > > It is now commonplace for researchers to use some model selection > criterion such as AIC/AICc/BIC to select a model of trait > evolution. In their book, Burnham and Anderson discuss how the > derivation of AIC only holds if the number of data points is much > larger than the number of parameters (they suggest, roughly that > n/k > 40). If this is not true, they provide a small-sample size > correction for the AIC (AICc) which explicitly takes into account > the number of observations. Similarily BIC, includes the number of > observations in the formulation. > > My question is: how many observations do we have when we compare > trait evolutionary models? People tend to use the number of tips of > taxa for which we have trait values. However, this may not be > technically accurate. First, of course, both the branch lengths and > the tip values factor into the likelihood equations so it seems > sensible that these are both somehow included as observations. > Second, the trait values we observe are of course not independent > (that is the whole reason we are using a phylogeny in the first > place!!). It is unclear whether/how this fact should factor into > our calculation of the n. I know that it phylogenetics, when people > do model selection for the model of sequence evolution, they use > the number of sites in the alignment though i am not sure there is > a clear justification for this either. I was just wondering what > people thought about this. Boettiger et al. (2012) showed that the > choice of the evolutionary model for moderately sized phylogenies > is very different when using AIC vs AICc so I think this may be > worth some serious consideration. > > Any thoughts? > > cheers, matt
Extremely interesting, extremely giant can of worms. Over on the mixed model side of the world people have been discussing this for years ... * "effective number of observations" is probably not always a precisely defined concept (it may depend on what you're trying to use the number for) * it may depend on the scale at which you're defining the 'best' model. In particular, with both the conditional AIC defined by Vaida and Blanchard (I think the ref. is 2005), and with the deviance information criterion (Spiegelhalter et al), both of which are attempting to measure 'effective number of parameters' at some scale, the correct definition depends on whether you are trying to maximize predictive accuracy at the scale of individual units (taxa) or at the scale of the population (e.g. predictions for as-yet-unmeasured taxa, or for the expected effect a change in some covariate applied to a randomly sampled taxon) -- in DIC this is referred to as the "level of focus". There is a nice blog post by Bob O'Hara on the topic. * Haven't looked at Boettiger et al 2012, but we should be reminded that the AICc was derived in a very particular context (linear models) and has been extrapolated *far* beyond that context -- it's reasonable as a rule of thumb, but we also shouldn't be surprised if it fails sometimes (Shane Richards has an Ecology paper where he shows that it can do poorly in a GLM context) Ben Bolker -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPoWg2AAoJED2whTVMEyK9ogMH/0CPsDFA17Qj+E6QJL1t05ki tLSyXc/Z1Tn3ORCjJi02HIrwxI5JmlgCvr5kJs0XClBnPlMdMwFVXZ4k8EOC7efq Vw9siJ9ygCm6D0xnvjnNwJMtjaRUvL8Ybsz1XN/8Db8gDk56StvFeei+VnQrPRVc kt7wcikU+5R6cQBwHZXpscwW8IOip9KTBqu6o4c0syPCfdQqY+6k/HyB0x12j72H tAixJY9nxE4jqtK1/jgs71W1Nscfzq0Ce3AbXjWQToouOAFTcsL55qJ8Ksc/I+a+ VlwEm3WiNTXLyQrSeFEoxWgKDzTk9A3jsg65iBnrB4s/hj3CzK68Zi5kp77JrAc= =YQA9 -----END PGP SIGNATURE----- _______________________________________________ R-sig-phylo mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
