Hello! I think I am relatively clear on how predictor importance (the first one) is calculated by Random Forests for a Classification tree:
Importance of predictor P1 when the response variable is categorical: 1. For out-of-bag (oob) cases, randomly permute their values on predictor P1 and then put them down the tree 2. For a given tree, subtract the number of votes for the correct class in the predictor-P1-permuted oob dataset from the number of votes for the correct class in the untouched oob dataset: if P1 is important, this number will be large. 3. The average of this number over all trees in the forest is the raw importance score for predictor P1. I am wondering what step 2 above looks like if the response variable is continous and not categorical, in other words - for a Regression tree. Could you please correct if what I wrote below is wrong? Thank you very much! Importance of predictor P1 when the response variable is continous: 1. For out-of-bag (oob) cases, randomly permute their values on predictor P1 and then put them down the tree 2. For a given tree, calculate mean squared deviation of observed y minus predicted y for (a) the untouched oob dataset and for (b) the predictor-P1-permuted oob dataset. Subtract (a) from (b). 3. The average of this number over all trees in the forest is the raw importance score for predictor P1. -- Dimitri Liakhovitski MarketTools, Inc. dimitri.liakhovit...@markettools.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.