>> Andy, I'll explain why I am asking. I probably should have >> done it in the beginning: >> I am asking not in order to figure out how to do it. I am >> asking in order to figure something that' was done around >> November 01, 2008. >> Back then, a piece of code was run where from the object of >> randomForest(.... importance=T...) the importances >> ($importance) were extracted (just by referring to >> $importance) and the first column was used. >> Do you happen to know what they were back then? Standardized or not? > > The change coincided with the introduction of the importanceSD component, due > to the change in how the importance is measured. The "importance" component > are just mean(d[i]), and importanceSD are sd(d[i])/sqrt(ntree). The > importance() function by default (scale=TRUE) does the normalization, and > that's what you should use. Leo found that this normalization will greatly > reduce the "bias" due to different number of possible splits in different > predictors.
Actually, it looks like if one extracts incorrectly (by looking just at $importance) - then one gets unscaled results. Hope it was the same in 2008. I've just run an example randomForest for a case with 6 predictors (importance = T). My randomForest object is "rftrest." Below are some results: Looking at importances the way it was done in November 2008: as.data.frame(rftest$importance)[1] I am getting: %IncMSE v1 1.3900833 v2 1.2219338 v3 0.6337521 v4 1.4101760 v5 1.4474130 v6 0.7583074 Extracting as you recommended one should - looking for unscaled results: importance(rftest, scale=F) I am getting exactly the same results as above: %IncMSE IncNodePurity v1 1.3900833 147.31267 v2 1.2219338 147.51669 v3 0.6337521 97.11210 v4 1.4101760 149.48934 v5 1.4474130 149.61458 v6 0.7583074 97.74933 Now, I am extracting scaled importances: importance(rftest, scale=T) I am getting: %IncMSE IncNodePurity v1 16.97155 147.31267 v2 17.04288 147.51669 v3 10.19135 97.11210 v4 18.22732 149.48934 v5 18.36879 149.61458 v6 10.46555 97.74933 This is the same as what I get when I do this the way it was done in 2008: as.data.frame(rftest$importance)[1]/as.data.frame(rftest$importanceSD) Resulting in: %IncMSE v1 16.97155 v2 17.04288 v3 10.19135 v4 18.22732 v5 18.36879 v6 10.46555 Dimitri ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.