Andy, thank you - and sorry for being a bit slow (see my questions below):

On Thu, May 6, 2010 at 8:37 AM, Liaw, Andy <andy_l...@merck.com> wrote:

> See reply inline below.
>
> Andy
>
> From: Dimitri Liakhovitski
> >
> > I have a question about predictor importances in randomForest.
> >
> > Once I've run randomForest and got my object, I get their importances:
> > rfresult$importance
> > I also get the "standard errors" of the permutation-based importance
> > measure: rfresult$importanceSD
> >
> > I have 2 questions:
> >
> > 1. Because I am dealing with regressions, I am getting an
> > importance object
> > (rfresult$importance) with two columns, labeled "%IncMSE"
> > (the first column)
> > and "IncNodePurity" (the second column). I assume it's the
> > first one that is
> > the mean decrease in accuracy due to permutation. Am I correct or am I
> > wrong? I am confused because ?randomForest says: "or
> > Regression, the first
> > column is the mean decrease in accuracy and the second the
> > mean decrease in
> > MSE." - but it is the first column, not the second that has
> > "MSE" in its
> > header.
>
> In regression trees, node impurity is measured by MSE, therefore the
> second measure that averages cumulative reduction in node impurity due
> to splits by a variable over all trees is labelled as "mean decrease in
> MSE".
>

Andy, but it is the FIRST column in $importance (not the SECOND) that is
labeled "%IncMSE". The second column is labeled "IncNodePurity". So, I am
confused - which one is the mean decrease in accuracy?
Or, maybe I should ask again: In a case of regression trees, which of the
two columns in $importance contains the predictor importances calculated by
randomly permuting values and looking at how much worse the prediction has
become?
I assume it's the first column (labeled "%IncMSE"). Is this correct?


>
> > 2. According to this thread (
> > http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg94873.
> > html), The
> > overall importance measure is mean(d[i]) / se(d[i]), where se(d[i]) is
> > sd(d[i])/sqrt(ntree) (the "standard error").
> > So, in order to get at the importance of predictors (and I
> > want to use the
> > permutation-based importance) - should I just take the first column of
> > rfresult$importance or should I first divide rfresult$importance by
> > rfresult$importanceSD - to get something analogous to z-scores and use
> > those?
>
> See the "scale" argument in ?importance.  The recommended way of
> extracting components of an object in R is to use the extractor
> functions when they exist.
>
> Andy, I've run randomForest (for regression) and just wrote: Importance =
TRUE. Now, I am just looking at $importance (without specifying anything at
all, not scale either). So, if I do it that way - then to get the
standardized permutation-based importances, should I divide the first column
of $importance by $importanceSD - or has it been done by default so that the
first column of $importance already contains the standardized importances?


Thank you!
Dimitri


> > Thank you very much!
> >
> > --
> > Dimitri Liakhovitski
> > Ninah.com
> > dimitri.liakhovit...@ninah.com
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> Notice:  This e-mail message, together with any attach...{{dropped:22}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to