Re: [R] randomForest: predictor importance (for regressions)

Dimitri Liakhovitski Thu, 06 May 2010 08:16:36 -0700

>> Andy, I'll explain why I am asking. I probably should have
>> done it in the beginning:
>> I am asking not in order to figure out how to do it. I am
>> asking in order to figure something that' was done around
>> November 01, 2008.
>> Back then, a piece of code was run where from the object of
>> randomForest(.... importance=T...) the importances
>> ($importance) were extracted (just by referring to
>> $importance) and the first column was used.
>> Do you happen to know what they were back then? Standardized or not?
>
> The change coincided with the introduction of the importanceSD component, due 
> to the change in how the importance is measured.  The "importance" component 
> are just mean(d[i]), and importanceSD are sd(d[i])/sqrt(ntree).  The 
> importance() function by default (scale=TRUE) does the normalization, and 
> that's what you should use.  Leo found that this normalization will greatly 
> reduce the "bias" due to different number of possible splits in different 
> predictors.


Actually, it looks like if one extracts incorrectly (by looking just
at $importance) - then one gets unscaled results. Hope it was the same
in 2008.

I've just run an example randomForest for a case with 6 predictors
(importance = T). My randomForest object is "rftrest."
Below are some results:

Looking at importances the way it was done in November 2008:
as.data.frame(rftest$importance)[1]
I am getting:

 %IncMSE
v1 1.3900833
v2 1.2219338
v3 0.6337521
v4 1.4101760
v5 1.4474130
v6 0.7583074

Extracting as you recommended one should - looking for unscaled
results:  importance(rftest, scale=F)
I am getting exactly the same results as above:

     %IncMSE IncNodePurity
v1 1.3900833     147.31267
v2 1.2219338     147.51669
v3 0.6337521      97.11210
v4 1.4101760     149.48934
v5 1.4474130     149.61458
v6 0.7583074      97.74933

Now, I am extracting scaled importances:  importance(rftest, scale=T)
I am getting:

    %IncMSE IncNodePurity
v1 16.97155     147.31267
v2 17.04288     147.51669
v3 10.19135      97.11210
v4 18.22732     149.48934
v5 18.36879     149.61458
v6 10.46555      97.74933

This is the same as what I get when I do this the way it was done in
2008:  as.data.frame(rftest$importance)[1]/as.data.frame(rftest$importanceSD)
Resulting in:

    %IncMSE
v1 16.97155
v2 17.04288
v3 10.19135
v4 18.22732
v5 18.36879
v6 10.46555

Dimitri

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] randomForest: predictor importance (for regressions)

Reply via email to