I've run the function randomForest with importance=T. All my variables (predictors and the dependent variable) are numeric. rf<-randomForest(formula, data=mydata, importance=T, etc.)
my results object "rf" contains predictor importances: rf$importance I am seeing two columns: %IncMSE IncNodePurity V1 -0.01683558 58.10910 V2 0.04000299 71.27579 V3 0.01974636 67.22586 V4 0.25020393 113.69823 V5 0.03146358 67.11151 V6 0.01717313 66.57246 V7 -0.00500985 62.37103 V8 -0.02862065 66.15369 V9 -0.02431507 54.50013 They seem to be clearly labeled %IncMSE and IncNodePurity However, when I look in ?randomForest, I am reading about importance as a component of my rf object: A matrix with nclass + 2 (for classification) or two (for regression) columns. For classification, the first nclass columns are the class-specific measures computed as mean descrease in accuracy. The nclass + 1st column is the mean descrease in accuracy over all classes. The last column is the mean decrease in Gini index. *For Regression, the first column is the mean decrease in accuracy and the second the mean decrease in MSE. If importance=FALSE, the last measure is still returned as a vector.* Maybe I am confused for no reason - but which column is which? Is %IncMSE = mean decrease in accuracy? Thanks a lot for clarifying! -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.