Re: [R] Bootstrap tree selection in rpart

Fiona Callaghan Thu, 13 Sep 2007 07:36:56 -0700

Thanks very much for replying -- just one final question:  does this hold
when the outcome is continuous (and not discrete) e.g instead of the
outcome being multinomial we have a continuous outcome like residuals?


Thanks again
Fiona
> Fiona Callaghan asked about using the bootstrap  instead of
> cross-validation in
> the tree pruning step.
>    It turns out that cross-validation works better than the bootstrap for
> trees.
> The issue is a subtle one.  The bootstrap can be thought of as 2 steps.
>
> 1.  Deduction: Evaluate the behavior of some statistic "zed" under
> repeated
> sampling from the discrete distribution F-hat, i.e., the original data.
> This
> gives a direct evaluation of how zed behaves under F-hat.
>
> 2. Induction: Assume that (behavior of zed under sampling from F) =
> (behavior
> under sampling from F-hat).
>
>   It turns out that trees behave differently under discreet distributions
> than
> they do under continuous ones, so step 2 fails.  Essentially, there are
> fewer
> places to split in the discrete case, tree creation is less noisy, and the
> bootstrap gives an overoptimistic view.  I remember Brad Efron giving a
> talk on
> this long ago (I was still a student!), so the details are fuzzy; I think
> that
> he solved it by sampling from a smoothed version of the empirical CDF.
>
>    Terry Therneau
>


-- 
Fiona Callaghan, MA MS
A432 Crabtree Hall
Department of Biostatistics
Graduate School of Public Health
University of Pittsburgh
Phone 412 624 3063

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bootstrap tree selection in rpart

Reply via email to