One way around hacking rpart is to write code to do K fold samples based on unit outside rpart, then build trees using training sets and summarize scores on testing sets.
Weidong Gu On Mon, Jul 4, 2011 at 9:22 AM, Katerine Goyer <katerine.go...@uqtr.ca> wrote: > > > > > > > > Hello, > > > > I am using > the rpart function (from the rpart package) to do a regression tree that > would describe > the behaviour of a fish species according to several environmental variables. > For each fish (sampling unit), I have repeated observations of the response > variable, which means that the data are not independent. Normally, in this > case, V-fold cross-validation needs to be modified to prevent over-optimistic > predictions of error rates by cross-validation and overestimation of the tree > size. A way to overcome this problem is by selecting only whole sampling units > in our subsets of cross-validation. My problem is that I don’t know how to > perform this modification of the cross-validation process in the rpart > function. > > > Is there a > way to do this modification in rpart or is there any other function I could > use > that would consider interdependence in the response variable? > > > Here is an > example of the code I am using (“Y” being the response variable and “data.env” > being a data frame of the environmental > variables): > > > Tree = rpart(Y > ~ X1 + X2 + X3,xval=100,data=data.env) > > > > Thanks > > Katerine > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.