The OOB error estimates in RF is one really nifty feature that alleviate the need for additional cross-validation or resampling. I've done some empirical comparison between OOB estimates and 10-fold CV estimates, and they are basically the same.
Andy > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Claudia Beleites > Sent: Saturday, October 23, 2010 3:39 PM > To: r-help@r-project.org > Subject: Re: [R] Random Forest AUC > > Dear List, > > Just curiosity (disclaimer: I never used random forests till now for > more than a little playing around): > > Is there no out-of-bag estimate available? > I mean, there are already ca. 1/e trees where a (one) given sample is > out-of-bag, as Andy explained. If now the voting is done only > over the > oob trees, I should get a classical oob performance measure. > Or is the oob estimate internally used up by some kind of > optimization > (what would that be, given that the trees are grown till the end?)? > > Hoping that I do not spoil the pedagogic efforts of the list > in teaching > Ravishankar to do his homework reasoning himself... > > Claudia > > Am 23.10.2010 20:49, schrieb Changbin Du: > > I think you should use 10 fold cross validation to judge > your performance on > > the validation parts. What you did will be overfitted for > sure, you test on > > the same training set used for your model buliding. > > > > > > On Sat, Oct 23, 2010 at 6:39 AM, mxkuhn<mxk...@gmail.com> wrote: > > > >> I think the issue is that you really can't use the > training set to judge > >> this (without resampling). > >> > >> For example, k nearest neighbors are not known to over > fit, but a 1nn > >> model will always perfectly predict the training data. > >> > >> Max > >> > >> On Oct 23, 2010, at 9:05 AM, "Liaw, > Andy"<andy_l...@merck.com> wrote: > >> > >>> What Breiman meant is that as the model gets more complex > (i.e., as the > >>> number of trees tends to infinity) the geneeralization > error (test set > >>> error) does not increase. This does not hold for > boosting, for example; > >>> i.e., you can't "boost forever", which nececitate the > need to find the > >>> optimal number of iterations. You don't need that with RF. > >>> > >>>> -----Original Message----- > >>>> From: r-help-boun...@r-project.org > >>>> [mailto:r-help-boun...@r-project.org] On Behalf Of vioravis > >>>> Sent: Saturday, October 23, 2010 12:15 AM > >>>> To: r-help@r-project.org > >>>> Subject: Re: [R] Random Forest AUC > >>>> > >>>> > >>>> Thanks Max and Andy. If the Random Forest is always giving an > >>>> AUC of 1, isn't > >>>> it over fitting??? If not, how do you differentiate this > from over > >>>> fitting??? I believe Random forests are claimed to never over > >>>> fit (from the > >>>> following link). > >>>> > >>>> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.ht <http://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.ht> > >>>> m#features > >>>> > >>>> > >>>> Ravishankar R > >>>> -- > >>>> View this message in context: > >>>> > http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html > >>>> Sent from the R help mailing list archive at Nabble.com. > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, > reproducible code. > >>>> > >>> Notice: This e-mail message, together with any > attachme...{{dropped:11}} > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.