A few general comments about stepwiseAIC and a suggestion of how to select 
models 

a) Apart form the problem, that stepwise selection is not a garanty to get the 
best model, you need to have a lot of data to avoid overfitting if your model 
includes 7 parameter plus interactions (> 10 observations per parameter is what 
you are ideally looking for).
b) Have a look at Anderson and Burnham's book of 2002 about multi model 
inference if you want to understand how to proper use AIC.

What I'm doing for my analysis at the moment (count data of two species, host 
and herbivore):

1) I checked which of my parameters explained the abundance of the species , 
using GLMs and bootstrapping of an LR-test to check if the model with the 
parameter is better than one without the parameter ( one way to deal with 
outliers and extrema)

2) Then I build all combinations of those parameters, that predicted the two 
species well (p-values <0.05, and >95% sucessfull bootstrapping).

3) I wrote down all the multiple models with decent p-values and calculated 
AICc ( AICc is for small data sets, and should be used anyway as for very large 
N AIC almost equals AICc)

(the package glmulti does all the combination models and you can set limits on 
number of parameters or interactions etc)

4) I manually calculated the weigth based on the AICc of each model with proper 
performance. This gives you a good idea of which one the best model is and how 
good that model is compared to all the others models considered. Also, you can 
calculate weights for each parameter which is very usefull if several models 
are equally good. I my case, the better models had only one or two parameters, 
but were ecologically meaningfull and not just the result of data dredging.

Hope this helps,

Cheers


Claas Damken
PhD candidate
School of Environment
The University of Auckland | Te Whare Wananga o Tamaki Makaurau
New Zealand


________________________________________
Von: r-help-boun...@r-project.org [r-help-boun...@r-project.org]&quot; im 
Auftrag von &quot;r-help-requ...@r-project.org [r-help-requ...@r-project.org]
Gesendet: Mittwoch, 19. September 2012 22:00
Bis: r-help@r-project.org
Betreff: R-help Digest, Vol 115, Issue 19

Send R-help mailing list submissions to
        r-help@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
        r-help-requ...@r-project.org

You can reach the person managing the list at
        r-help-ow...@r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-help digest..."


Today's Topics:

  12. Lowest AIC after stepAIC can be lowered by manual reduction
      of        variables (Florian Moser)

------------------------------

Message: 12
Date: Tue, 18 Sep 2012 14:27:34 +0100 (BST)
From: Florian Moser <flose...@yahoo.de>
To: r-help@r-project.org
Subject: [R] Lowest AIC after stepAIC can be lowered by manual
        reduction of    variables
Message-ID:
        <1347974854.4978.yahoomailclas...@web28904.mail.ir2.yahoo.com>
Content-Type: text/plain

Hello
I am not really a statistic person, so it's possible i did something completely 
wrong... if this is the case: sorry...
I try to get the best GLM model (with the lowest AIC) for my dataset.
Therefore I run a stepAIC (in the "MASS" package) for my GLM allowing only 
two-variable-interactions.
For the output (summary) I got a model with 7 (of 8) variabels and 5 
interactions and AIC=40.008
BUT: When I take this model and reduce stepwise further variables manually 
(starting with the one with the highest p-values and first reducing all 
interactions of a variable before i reduce the variable itself) until i can't 
reduce more variables since all (or its interaction) have a p-value < 0.1, I 
get a model with 4 variables and 2 interactions and an AIC of 33.879
So my questions: Why didn't the stepAIC give me the model with AIC=33.879?
And which model should I think of as the best?

For my calculations I used these formulae:
gm1<-glm(cpi~time+tank+...,data=d1)
gm2<-stepAIC(gm1)
summary(gm2)
#to get the "best" model -> AIC=40.008
#afterwards I reduced manually using the formula:
summary(glm(cpi~time+tank+...,data=d1))
giving me a model with AIC=33.879

Hope you understand what I did, and that you can help me.
Thanks
Florian




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to