Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Fann, Sarah Lynn Fri, 12 Feb 2010 12:16:32 -0800

Dear Ecolog readers,

As Brian and others have pointed out, I made a poor choice of words when I used 
the phrase "future changes". Rsq is powerful for predicting responses within 
your range of data, but is completely invalid for predicting outside the 
observed range of data. For example, you have growth data for fishes ages 2, 6, 
7, and 10 - Rsq will help you choose a model that most accurately estimates the 
size data for age 3 fish, but it would be invalid to use the same model to 
predict the size of fish at age 11. I hope this clarifies what I meant by 
"predict".


With regards to AIC, you still have the same "predictive" issues that you would 
with Rsq. Any measure of model appropriateness will be with respect to your 
current dataset. As a measure of the predictive quality of a model, I would 
argue that AIC is very innappropriate. Although it is true that minimizing AIC 
will help select the best variables to describe the dataset, without selecting 
copious amounts of variables, it doesn't describe how well the model generated 
from these parameters "fits" the data. I can't tell you what an AIC of 300 
means with regards to the data, but I know that an Rsq value of .89 explains 
abour 89% of the variance in a model. Similarly, a modle with an AIC of 300 
might have an Rsq of .40 or an Rsq of .90 = it comes down to how the variables 
AIC are being used to make a model/predicitive equation. Finally,  unlike Rsq, 
the order in which variables enter the model effect the AIC tremendously. 

I hope this clarifies my earlier comments. 

Thanks,

Sarah Fann

"Education is what survives when what has been learnt has been forgotten."

-   Fortune cookie
________________________________________
From: Ecological Society of America: grants, jobs, news 
[[email protected]] On Behalf Of Brian R. Mitchell 
[[email protected]]
Sent: Thursday, February 11, 2010 10:03 PM
To: [email protected]
Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Hello ecolog,

I disagree with the suggestion that maximizing R2 is a good way to
predict future changes to a system... maximizing R2 may produce a
perfect fit to your current data set, but you are fitting to the noise
as well as the signal, and such a model will likely perform poorly with
new data.  I think that if you want to have predictive power, you should
probably still use a parsimonious approach like AIC, since this will
tend to reject covariates that only have a small impact on the model's
predictive power.

Brian Mitchell
> Date:    Wed, 10 Feb 2010 16:36:18 -0500
> From:    "Fann, Sarah Lynn" <[email protected]>
> Subject: Re: AIC, data-dredging, and inappropriate stats
>
> Dear ecology,
>
> AIC =  model deviance + 2*(# of parameters).
>
> In essence, AIC is calculated so that a model that "best" balances between 
> decreasing the deviance of the model from the data (we want this) and keeping 
> a model simple and/or relevant. The deviance will be small if the covariates 
> (explanatory variables) are "good" or if we have a ton of lousy covariates. 
> Thus AIC penalizes excessive covariates by adding 2*# parameters (i.e. your 
> Betas which are estimated for each covariate and covariate interaction).
>
> Whether or not to use AIC, Rsq, or both comes down to the model design, and 
> the results you are after. Do you want to explain the current state of a 
> system and show which covariates are important? Minimize AIC. Do you want to 
> predict future changes in the system? Maximize R2.
>
> This is my view from a Statistics perspective since I haven't studied model 
> selection in a biological setting.
>
> Thank you very much,
>
> Sarah Fann

Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Reply via email to