Thank you Dennis, I've got the idea now. However, a followup question to make sure I'm not wasting my time.
If I specify the precise CV folds to use, should I not get the same tree every time? e.g. here I have an hypothetical time sequence observed with error from 3 sites 's' If I specify to leave out 1 site each time in a 3-fold CV (leaving aside that 3-fold cv might not be a good idea) Should I not get the same tree each time? library(mvpart) library(lattice) y <- rep(sin(seq(0.1,6, 0.1)),3) y1 <- y+rnorm(length(y), sd=0.5) x <- rep(1:(length(y)/3),3) s <- rep(1:3, each=(length(y)/3)) dat <- data.frame(x,y1,s) xyplot(y1~x|s, data=dat) (mvpart(y1~x, data=dat, xv="1se", xval=s)) Thank you for your help. andydol...@gmail.com On 12 March 2010 18:03, Dennis Murphy <djmu...@gmail.com> wrote: > Hi: > > See inline... > > On Fri, Mar 12, 2010 at 4:15 AM, Andrew Dolman <andydol...@gmail.com> wrote: >> >> Dear R's >> >> I'm trying to use specific rather than random cross-validation groups >> in mvpart. >> >> The man page says: >> xval Number of cross-validations or vector defining cross-validation >> groups. >> >> >> And I found this reply to the list by Terry Therneau from 2006 >> >> The rpart function allows one to give the cross-validation groups >> explicitly. >> So if the number of observations was 10, you could use >> > rpart( y ~ x1 + x2, data=mydata, xval=c(1,1,2,2,3,3,1,3,2,1)) >> which causes observations 1,2,7, and 10 to be left out of the first xval >> sample, 3,4, and 9 out of the second, etc. >> >> Terry Therneau >> >> >> I can't see how this string of values, c(1,1,2,2,3,3,1,3,2,1), codes >> for observations 1,2,7,10 being left out of the 1st and so on. > > >> x <- c(1,1,2,2,3,3,1,3,2,1) >> which(x == 1) # elements left out of the first xval sample > [1] 1 2 7 10 >> which(x == 2) # elements left out of the second xval sample > [1] 3 4 9 >> which(x == 3) # elements left out of the third xval sample > [1] 5 6 8 > > This vector is used to index a response vector/model matrix. > > To see how this is applied, consider the following. y is a vector of > length 10, the same as x: >> y <- rpois(10, 15) >> y > [1] 12 15 17 11 14 14 12 12 16 16 >> y[x != 1] # first xval sample (y[1], y[2], y[7], y[10] >> removed) > [1] 17 11 14 14 12 16 >> y[x != 2] # second xval sample (y[3], y[4], y[9] removed) > [1] 12 15 14 14 12 12 16 >> y[x != 3] # third xval sample (y[5], y[6], y[8] removed) > [1] 12 15 17 11 12 16 16 > > Indexing is one of the most important and powerful features of R. > > HTH, > Dennis > >> Can anyone fill me in please? >> >> Thanks, >> >> andydol...@gmail.com >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.