Hello, 



I am using
the rpart function (from the rpart package) to do a regression tree that would 
describe
the behaviour of a fish species according to several environmental variables.
For each fish (sampling unit), I have repeated observations of the response
variable, which means that the data are not independent. Normally, in this
case, V-fold cross-validation needs to be modified to prevent over-optimistic
predictions of error rates by cross-validation and overestimation of the tree
size. A way to overcome this problem is by selecting only whole sampling units
in our subsets of cross-validation. My problem is that I don’t know how to
perform this modification of the cross-validation process in the rpart
function.


Is there a
way to do this modification in rpart or is there any other function I could use
that would consider interdependence in the response variable?


Here is an
example of the code I am using (“Y” being the response variable and “data.env”
being a data frame of the environmental 
variables):


Tree = rpart(Y
~ X1 + X2 + X3,xval=100,data=data.env) 



Thanks

Katerine


                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to