Thank you very much, Andy. I did turn off HTML - hope it'll solve the problem!
> Andy, but it is the FIRST column in $importance (not the SECOND) that is > labeled "%IncMSE". The second column is labeled "IncNodePurity". So, I > am confused - which one is the mean decrease in accuracy? > Or, maybe I should ask again: In a case of regression trees, which of > the two columns in $importance contains the predictor importances > calculated by randomly permuting values and looking at how much worse > the prediction has become? > I assume it's the first column (labeled "%IncMSE"). Is this correct? > > [AL]: Note I said "reduction in node impurity", which is another way of > saying "increase in node purity" 8-). I should think from the help page > for importance() it should be clear which is which. When you permute > the value of a variable in OOB data and make prediction, the expectation > is that the MSE will increase, especially if the variable has some > importance, thus the label "%IncMSE". Why do you need to assume? Great, thanks for confirming! > [AL]: As I said, you are recommended to use importance() to extract > variable importance. The recommendation is for avoiding confusions like > yours. If you want to know what the components in the objects give you, > compare to what the extractor function returns, you can look inside the > extractor function to find out for yourself. Really, I'm not trying to > be difficult, but there are very good reasons for not accessing the > components directly when extractor functions exist. If the underlying > components are somehow changed in the future, only the extractor > functions are guaranteed to give you the "right thing". I added the > extractor function for importance measures precisely because the way > they are computed changed. Andy, I'll explain why I am asking. I probably should have done it in the beginning: I am asking not in order to figure out how to do it. I am asking in order to figure something that' was done around November 01, 2008. Back then, a piece of code was run where from the object of randomForest(.... importance=T...) the importances ($importance) were extracted (just by referring to $importance) and the first column was used. Do you happen to know what they were back then? Standardized or not? Thank you! Dimitri ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.