Hello, I am using the rpart function (from the rpart package) to do a regression tree that would describe the behaviour of a fish species according to several environmental variables. For each fish (sampling unit), I have repeated observations of the response variable, which means that the data are not independent. Normally, in this case, V-fold cross-validation needs to be modified to prevent over-optimistic predictions of error rates by cross-validation and overestimation of the tree size. A way to overcome this problem is by selecting only whole sampling units in our subsets of cross-validation. My problem is that I dont know how to perform this modification of the cross-validation process in the rpart function. Is there a way to do this modification in rpart or is there any other function I could use that would consider interdependence in the response variable? Here is an example of the code I am using (Y being the response variable and data.env being a data frame of the environmental variables): Tree = rpart(Y ~ X1 + X2 + X3,xval=100,data=data.env) Thanks Katerine [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.