From: Dimitri Liakhovitski > >> Andy, I'll explain why I am asking. I probably should have > done it in > >> the beginning: > >> I am asking not in order to figure out how to do it. I am > asking in > >> order to figure something that' was done around November 01, 2008. > >> Back then, a piece of code was run where from the object of > >> randomForest(.... importance=T...) the importances > >> ($importance) were extracted (just by referring to > >> $importance) and the first column was used. > >> Do you happen to know what they were back then? > Standardized or not? > > > > The change coincided with the introduction of the > importanceSD component, due to the change in how the > importance is measured. The "importance" component are just > mean(d[i]), and importanceSD are sd(d[i])/sqrt(ntree). The > importance() function by default (scale=TRUE) does the > normalization, and that's what you should use. Leo found > that this normalization will greatly reduce the "bias" due to > different number of possible splits in different predictors. > > Actually, it looks like if one extracts incorrectly (by > looking just at $importance) - then one gets unscaled > results. Hope it was the same in 2008.
Yes. The NEWS file (what you see when you type rfNews()) shows the following for version 4.3-0: * The `importance' component of randomForest object has been changed: The permutation-based measures are not divided by their `standard errors'. Instead, the `standard errors' are stored in the `importanceSD' component. One should use the importance() extractor function rather than something like rf.obj$importance for extracting the importance measures. and version 4.3-0 is dated 2004-07-07. Andy > I've just run an example randomForest for a case with 6 > predictors (importance = T). My randomForest object is "rftrest." > Below are some results: > > Looking at importances the way it was done in November 2008: > as.data.frame(rftest$importance)[1] > I am getting: > > %IncMSE > v1 1.3900833 > v2 1.2219338 > v3 0.6337521 > v4 1.4101760 > v5 1.4474130 > v6 0.7583074 > > Extracting as you recommended one should - looking for unscaled > results: importance(rftest, scale=F) > I am getting exactly the same results as above: > > %IncMSE IncNodePurity > v1 1.3900833 147.31267 > v2 1.2219338 147.51669 > v3 0.6337521 97.11210 > v4 1.4101760 149.48934 > v5 1.4474130 149.61458 > v6 0.7583074 97.74933 > > Now, I am extracting scaled importances: importance(rftest, > scale=T) I am getting: > > %IncMSE IncNodePurity > v1 16.97155 147.31267 > v2 17.04288 147.51669 > v3 10.19135 97.11210 > v4 18.22732 149.48934 > v5 18.36879 149.61458 > v6 10.46555 97.74933 > > This is the same as what I get when I do this the way it was done in > 2008: > as.data.frame(rftest$importance)[1]/as.data.frame(rftest$importanceSD) > Resulting in: > > %IncMSE > v1 16.97155 > v2 17.04288 > v3 10.19135 > v4 18.22732 > v5 18.36879 > v6 10.46555 > > Dimitri > Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.