Hi: I was curious to see how to do this. I generated two versions of the same function - one for 10-fold predictions when the number of observations is an exact multiple of 10, returning a matrix, and another that lets the user define the number of folds and works with lists. The function also returns a list. A couple of ways to show how to replicate the process are provided.
set.seed(100) x<-rnorm(100) y<-sample(rep(0:1,50),replace=T) dat<-data.frame(x,y) library(rms) # A function to perform one iteration of 10-fold predictions # Just computes the predictions for each subset, doesn't average # them or anything else. cvfit.fold10 <- function(x) { # For 10-fold prediction on a matrix whose number of rows is # a multiple of 10. # Permute the row numbers of the input matrix and put them # into ten columns xdiv <- matrix(sample(nrow(x)), 10) # Prealllocate an empty matrix for columnwise predictions using each of the 10 folds predf <- matrix(NA, nrow(x), 10) predfun <- function(i) { train <- x[as.vector(xdiv[, -i ]), ] test <- x[as.vector(xdiv[, i]), 1] predict(lrm(y ~ x, data = train), newdata = test) } for(i in seq_len(ncol(xdiv))) predf[, i] <- predfun(i) predf # returns a matrix } ## Returns a 10 x 10 x 200 array (takes about 16 s on my machine) ## replicate() executes the same function N times u <- replicate(200, cvfit.fold10(dat)) ## A more general version using lists for holding data subsets - it returns a list ## Takes a data frame x and the number of folds nfolds as arguments cvfit <- function(x, nfolds) { xp <- data.frame(x[sample(nrow(x)), ], gp = seq_len(nfolds)) xdiv <- split(xp, xp$gp) predf <- vector('list', nfolds) # Function to generate predictions for a generic fold of the data predfun <- function(i) { train <- do.call(rbind, xdiv[-i]) test <- xdiv[[i]][1] predict(lrm(y ~ x, data = train), newdata = test) } lapply(seq_len(nfolds), predfun) } # One rep: cvfit(dat) A relatively easy way to replicate a process N times and return the result as a particular type of object is to use the plyr package. For example, one way to redo the replication of cvfit.fold10 is as follows: library(plyr) v <- raply(200, cvfit.fold10(dat)) # returns a 200 x 10 x 10 array # For the more general function, returns a list of length 200 w <- rlply(200, cvfit(dat, 10)) w returns a list of length 200, each of which contains 10 sublists of length 10 corresponding to the 10-fold predictions from each iteration of cvfit(). There are ways to do this much faster with lm() using a little ingenuity with matrix indexing, but hopefully this is somewhat faithful to the approach you had in mind. I wanted to show you some alternatives to for loops as well. HTH, Dennis On Sun, Jun 19, 2011 at 10:34 PM, zhu yao <mailzhu...@gmail.com> wrote: > Dear R users: > > Recently, I tried to write a program to calculate cross-validated predicted > value. > My sources are as follows. However, the R reported an error. > Could you please check the sources? Thanks. > > set.seed(100) > x<-rnorm(100) > y<-sample(rep(0:1,50),replace=T) > dat<-data.frame(x,y) > > library(rms) > > fito<-lrm(y~x) > preo<-predict(fito) > > pre<-matrix(NA,nrow=100,ncol=200) > > for (i in 1:200) > { > sam<-sample(1:nrow(dat)) > sam<-split(sam,1:10) > for (j in 1:10) > { > fit<-lrm(y~x,data=dat[-sam[[j]],]) > pre[sam[[j]],i]<-predict(fit,data=dat[sam[[j]],]) > } > } > > > > > > > > > *Yao Zhu* > *Department of Urology > Fudan University Shanghai Cancer Center > Shanghai, China* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.