Dear Ecolog readers, As Brian and others have pointed out, I made a poor choice of words when I used the phrase "future changes". Rsq is powerful for predicting responses within your range of data, but is completely invalid for predicting outside the observed range of data. For example, you have growth data for fishes ages 2, 6, 7, and 10 - Rsq will help you choose a model that most accurately estimates the size data for age 3 fish, but it would be invalid to use the same model to predict the size of fish at age 11. I hope this clarifies what I meant by "predict".
With regards to AIC, you still have the same "predictive" issues that you would with Rsq. Any measure of model appropriateness will be with respect to your current dataset. As a measure of the predictive quality of a model, I would argue that AIC is very innappropriate. Although it is true that minimizing AIC will help select the best variables to describe the dataset, without selecting copious amounts of variables, it doesn't describe how well the model generated from these parameters "fits" the data. I can't tell you what an AIC of 300 means with regards to the data, but I know that an Rsq value of .89 explains abour 89% of the variance in a model. Similarly, a modle with an AIC of 300 might have an Rsq of .40 or an Rsq of .90 = it comes down to how the variables AIC are being used to make a model/predicitive equation. Finally, unlike Rsq, the order in which variables enter the model effect the AIC tremendously. I hope this clarifies my earlier comments. Thanks, Sarah Fann "Education is what survives when what has been learnt has been forgotten." - Fortune cookie ________________________________________ From: Ecological Society of America: grants, jobs, news [[email protected]] On Behalf Of Brian R. Mitchell [[email protected]] Sent: Thursday, February 11, 2010 10:03 PM To: [email protected] Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats Hello ecolog, I disagree with the suggestion that maximizing R2 is a good way to predict future changes to a system... maximizing R2 may produce a perfect fit to your current data set, but you are fitting to the noise as well as the signal, and such a model will likely perform poorly with new data. I think that if you want to have predictive power, you should probably still use a parsimonious approach like AIC, since this will tend to reject covariates that only have a small impact on the model's predictive power. Brian Mitchell > Date: Wed, 10 Feb 2010 16:36:18 -0500 > From: "Fann, Sarah Lynn" <[email protected]> > Subject: Re: AIC, data-dredging, and inappropriate stats > > Dear ecology, > > AIC = model deviance + 2*(# of parameters). > > In essence, AIC is calculated so that a model that "best" balances between > decreasing the deviance of the model from the data (we want this) and keeping > a model simple and/or relevant. The deviance will be small if the covariates > (explanatory variables) are "good" or if we have a ton of lousy covariates. > Thus AIC penalizes excessive covariates by adding 2*# parameters (i.e. your > Betas which are estimated for each covariate and covariate interaction). > > Whether or not to use AIC, Rsq, or both comes down to the model design, and > the results you are after. Do you want to explain the current state of a > system and show which covariates are important? Minimize AIC. Do you want to > predict future changes in the system? Maximize R2. > > This is my view from a Statistics perspective since I haven't studied model > selection in a biological setting. > > Thank you very much, > > Sarah Fann
