[R] Which column in randomForest importances (for regression) is MSE and which IncNodePurity

Dimitri Liakhovitski Wed, 05 May 2010 17:04:27 -0700

I've run the function randomForest with importance=T. All my variables
(predictors and the dependent variable) are numeric.
rf<-randomForest(formula, data=mydata, importance=T, etc.)


my results object "rf" contains predictor importances:
rf$importance

I am seeing two columns:

      %IncMSE IncNodePurity
V1 -0.01683558      58.10910
V2  0.04000299      71.27579
V3  0.01974636      67.22586
V4  0.25020393     113.69823
V5  0.03146358      67.11151
V6  0.01717313      66.57246
V7 -0.00500985      62.37103
V8 -0.02862065      66.15369
V9 -0.02431507      54.50013

They seem to be clearly labeled %IncMSE and IncNodePurity

However, when I look in ?randomForest, I am reading about importance as a
component of my rf object:
A matrix with nclass + 2 (for classification) or two (for regression)
columns. For classification, the first nclass columns are the class-specific
measures computed as mean descrease in accuracy. The nclass + 1st column is
the mean descrease in accuracy over all classes. The last column is the mean
decrease in Gini index. *For Regression, the first column is the mean
decrease in accuracy and the second the mean decrease in MSE. If
importance=FALSE, the last measure is still returned as a vector.*

Maybe I am confused for no reason - but which column is which?
Is %IncMSE = mean decrease in accuracy?

Thanks a lot for clarifying!

-- 
Dimitri Liakhovitski
Ninah.com
dimitri.liakhovit...@ninah.com

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Which column in randomForest importances (for regression) is MSE and which IncNodePurity

Reply via email to