Hi Sonja,

How did you build the rpart tree (i.e., what settings did you use in 
rpart.control)?  Rpart by default will use cross validation to prune back the 
tree, whereas RF doesn't need that.  There are other more subtle differences as 
well.  If you want to compare single tree results, you really want to make sure 
the settings in the two are as close as possible.  Also, how did you compute 
the pseudo R2, on test set, or some other way?

Best,
Andy

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Schillo, Sonja
Sent: Thursday, April 03, 2014 3:58 PM
To: Mitchell Maltenfort
Cc: r-help@r-project.org
Subject: Re: [R] rpart and randomforest results

Hi,

the random forest should do that, you're totally right. As far as I know it 
does so by randomly selecting the variables considered for a split (but here we 
set the option for how many variables to consider at each split to the number 
of variables available so that I thought that the random forest does not have 
the chance to randomly select the variables). The next thing that randomforest 
does is bootstrapping. But here again we set the option to the number of cases 
we have in the data set so that no bootstrapping should be done.
We tried to take all the "randomness" from the randomforest away.

Is that plausible and does anyone have another idea?

Thanks
Sonja


Von: Mitchell Maltenfort [mailto:mmal...@gmail.com]
Gesendet: Dienstag, 1. April 2014 13:32
An: Schillo, Sonja
Cc: r-help@r-project.org
Betreff: Re: [R] rpart and randomforest results


Is it possible that the random forest is somehow adjusting for optimism or 
overfitting?
On Apr 1, 2014 7:27 AM, "Schillo, Sonja" 
<sonja.schi...@uni-due.de<mailto:sonja.schi...@uni-due.de>> wrote:
Hi all,

I have a question on rpart and randomforest results:

We calculated a single regression tree using rpart and got a pseudo-r2 of 
roundabout 10% (which is not too bad compared to a linear regression on this 
data). Encouraged by this we grew a whole regression forest on the same data 
set using randomforest. But we got  pretty bad pseudo-r2 values for the 
randomforest (even sometimes negative values for some option settings).
We then thought that if we built only one single tree with the randomforest 
routine we should get a result similar to that of rpart. So we set the options 
for randomforest to only one single tree but the resulting pseudo-r2 value was 
negative aswell.

Does anyone have a clue as to why the randomforest results are so bad whereas 
the rpart result is quite ok?
Is our assumption that a single tree grown by randomforest should give similar 
results as a tree grown by rpart wrong?
What am I missing here?

Thanks a lot for your help!
Sonja

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to