On 29.04.2010 15:01, Eric Elguero wrote:
Hi,
a colleague ran a stepwise discriminant analysis
twice in a row and got different results, suggesting
some "sochasticity" in the algorithms involved.
I looked at her data and found that there was a lot
of collinearity, so that I reckoned that maybe "stepclass"
(klaR) cannot find a clear winner when trying to include a
new variable and makes a random choice. Is that true?
Yes, since a cross validation is involved.
If you want stable results, you could try leave one out or set a seed.
Anyway, if you variables are collinear I wonder if the stepwise approach
is the smartest solution here.....
another possibility is that "lda" (from MASS) computes
CV classification rates from a random subsample instead of
using all the data (?) That might be a sensible choice
with a very large sample.
I advised her to run the function several times and
see if a consensus emerges, but that doesn't seem to
be the case, and besides, I would like to know what
really is going on.
Well, it is called cross validation which is based on random sampling if
you do not have k=n -fold CV (=leave-one-out).
Again, to get reproducible results, you will need to set a seed.
If the results are that unstable: Do you really have a sufficient number
of observations for your classification problem?
Uwe Ligges
thanks
Eric Elguero
Laboratory Genetics and Evolution of Infectious Diseases,
Team: Genetics and Adaptation of Plasmodium
UMR 2724 CNRS-IRD,
IRD Montpellier,
911 Avenue Agropolis, BP 64501,
34394 Montpellier Cedex 5,
France
f4.U.spDA<- stepclass(f.mes, f.gp4,
"lda",improvement=0.01,prior=rep(0.25,4))
`stepwise classification', using 10-fold cross-validated correctness
rate of method lda'.
89 observations of 31 variables in 4 classes; direction: both
stop criterion: improvement less than 1%.
correctness rate: 0.58333; in: "X2"; variables (1): X2
correctness rate: 0.66389; in: "X9"; variables (2): X2, X9
correctness rate: 0.69583; in: "X27"; variables (3): X2, X9, X27
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 20.77
f4.U.spDA<- stepclass(f.mes, f.gp4,
"lda",improvement=0.01,prior=rep(0.25,4))
`stepwise classification', using 10-fold cross-validated correctness
rate of method lda'.
89 observations of 31 variables in 4 classes; direction: both
stop criterion: improvement less than 1%.
correctness rate: 0.60556; in: "X2"; variables (1): X2
correctness rate: 0.71806; in: "X6"; variables (2): X2, X6
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 15.14
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.