Hi there, Rather than cross validating or bootstrapping to prune a single tree you could use random forest instead. Look at the overview in http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
THere is a package in R for doing this called library(randomForest). I have found it to be an excellent method which produces better forecasts (in bag and out-of-bag) than a single tree. Also it allows you still interpret the most important variables. It handles continuous variables and classification variables. Regards Wayne -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Fiona Callaghan Sent: 13 September 2007 15:31 To: Terry Therneau Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] Bootstrap tree selection in rpart Thanks very much for replying -- just one final question: does this hold when the outcome is continuous (and not discrete) e.g instead of the outcome being multinomial we have a continuous outcome like residuals? Thanks again Fiona > Fiona Callaghan asked about using the bootstrap instead of > cross-validation in > the tree pruning step. > It turns out that cross-validation works better than the bootstrap for > trees. > The issue is a subtle one. The bootstrap can be thought of as 2 steps. > > 1. Deduction: Evaluate the behavior of some statistic "zed" under > repeated > sampling from the discrete distribution F-hat, i.e., the original data. > This > gives a direct evaluation of how zed behaves under F-hat. > > 2. Induction: Assume that (behavior of zed under sampling from F) = > (behavior > under sampling from F-hat). > > It turns out that trees behave differently under discreet distributions > than > they do under continuous ones, so step 2 fails. Essentially, there are > fewer > places to split in the discrete case, tree creation is less noisy, and the > bootstrap gives an overoptimistic view. I remember Brad Efron giving a > talk on > this long ago (I was still a student!), so the details are fuzzy; I think > that > he solved it by sampling from a smoothed version of the empirical CDF. > > Terry Therneau > -- Fiona Callaghan, MA MS A432 Crabtree Hall Department of Biostatistics Graduate School of Public Health University of Pittsburgh Phone 412 624 3063 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.