Hello!

I think I am relatively clear on how predictor importance (the first
one) is calculated by Random Forests for a Classification tree:

Importance of predictor P1 when the response variable is categorical:

1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, subtract the number of votes for the correct
class in the predictor-P1-permuted oob dataset from the number of
votes for the correct class in the untouched oob dataset: if P1 is
important, this number will be large.
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.

I am wondering what step 2 above looks like if the response variable
is continous and not categorical, in other words - for a Regression
tree. Could you please correct if what I wrote below is wrong? Thank
you very much!

Importance of predictor P1 when the response variable is continous:

1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, calculate mean squared deviation of observed y
minus predicted y for (a) the untouched oob dataset and for (b) the
predictor-P1-permuted oob dataset. Subtract (a) from (b).
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.

-- 
Dimitri Liakhovitski
MarketTools, Inc.
dimitri.liakhovit...@markettools.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to