Re: [R] Cross-validation for parameter selection (glm/logit)

JLucke Fri, 02 Apr 2010 08:18:14 -0700

Jay
Unless I have misunderstood some statistical subtleties, you can use the 
AIC in place of actual cross-validation, as the AIC is asymptotically 
equivalent to leave-out-one cross-validation under MLE.
Joe


Stone, M.
An asymptotic equivalence of choice of model by cross-validation and 
Akaike's criterion
Journal of the Royal Statistical Society. Series B (Methodological), 1977, 
39, 44-47
Abstract: A logarithmic assessment of the performance of a predicting 
density is found to lead to asymptotic equivalence of choice of model by 
cross-validation and Akaike's criterion, when maximum likelihood 
estimation is used within each model. 





Jay <josip.2...@gmail.com> 
Sent by: r-help-boun...@r-project.org
04/02/2010 09:14 AM

To
r-help@r-project.org
cc

Subject
[R] Cross-validation for parameter selection (glm/logit)






If my aim is to select a good subset of parameters for my final logit
model built using glm(). What is the best way to cross-validate the
results so that they are reliable?

Let's say that I have a large dataset of 1000's of observations. I
split this data into two groups, one that I use for training and
another for validation. First I use the training set to build a model,
and the the stepAIC() with a Forward-Backward search. BUT, if I base
my parameter selection purely on this result, I suppose it will be
somewhat skewed due to the 1-time data split (I use only 1 training
dataset)

What is the correct way to perform this variable selection? And are
the readily available packages for this?

Similarly, when I have my final parameter set, how should I go about
and make the final assessment of the models predictability? CV? What
package?


Thank you in advance,
Jay

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross-validation for parameter selection (glm/logit)

Reply via email to