Thanks very much for replying -- just one final question: does this hold when the outcome is continuous (and not discrete) e.g instead of the outcome being multinomial we have a continuous outcome like residuals?
Thanks again Fiona > Fiona Callaghan asked about using the bootstrap instead of > cross-validation in > the tree pruning step. > It turns out that cross-validation works better than the bootstrap for > trees. > The issue is a subtle one. The bootstrap can be thought of as 2 steps. > > 1. Deduction: Evaluate the behavior of some statistic "zed" under > repeated > sampling from the discrete distribution F-hat, i.e., the original data. > This > gives a direct evaluation of how zed behaves under F-hat. > > 2. Induction: Assume that (behavior of zed under sampling from F) = > (behavior > under sampling from F-hat). > > It turns out that trees behave differently under discreet distributions > than > they do under continuous ones, so step 2 fails. Essentially, there are > fewer > places to split in the discrete case, tree creation is less noisy, and the > bootstrap gives an overoptimistic view. I remember Brad Efron giving a > talk on > this long ago (I was still a student!), so the details are fuzzy; I think > that > he solved it by sampling from a smoothed version of the empirical CDF. > > Terry Therneau > -- Fiona Callaghan, MA MS A432 Crabtree Hall Department of Biostatistics Graduate School of Public Health University of Pittsburgh Phone 412 624 3063 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.