I'm trying to speed up a script that otherwise takes days to handle larger data sets. So, is there a way to completely vectorize or paralellize the following script:
*# k-fold cross validation* df <- trees # a data frame 'trees' from R. df <- df[sample(nrow(df)), ] # randomly shuffles the data. k <- 10 # Number of folds. Note k=nrow(df) in the leave-one-out cross validation. folds <- cut(seq(from=1, to=nrow(df)), breaks=k, labels=FALSE) # creates unique numbers for k equally size folds. df$ID <- folds # adds fold IDs. df[paste("pred", 1:3, sep="")] <- NA # adds multiple columns "pred1" "pred2" "pred3" to speed up the following loop. library(mgcv) for(i in 1:k) { # looping for different models: m1 <- gam(Volume ~ s(Height), data=df, subset=(ID != i)) m2 <- gam(Volume ~ s(Girth), data=df, subset=(ID != i)) m3 <- gam(Volume ~ s(Girth) + s(Height), data=df, subset=(ID != i)) # looping for predictions: df[df$ID==i, "pred1"] <- predict(m1, df[df$ID==i, ], type="response") df[df$ID==i, "pred2"] <- predict(m2, df[df$ID==i, ], type="response") df[df$ID==i, "pred3"] <- predict(m3, df[df$ID==i, ], type="response") } # calculating residuals: df$res1 <- with(df, Volume - pred1) df$res2 <- with(df, Volume - pred2) df$res3 <- with(df, Volume - pred3) Model <- paste("m", 1:3, sep="") # creates a vector of model names. # creating a vector of mean-square errors (MSE): MSE <- with(df, c( sum(res1^2) / nrow(df), sum(res2^2) / nrow(df), sum(res3^2) / nrow(df) )) model.mse <- data.frame(Model, MSE) # creates a data frame of model names and mean-square errors. model.mse <- model.mse[order(model.mse$MSE), ] # rearranges the previous data frame in order of increasing mean-square errors. I'd appreciate any help. This code takes several days if run on >=30,000 different GAM models and 3 predictors. Could you please help with re-writing the script into sapply() or foreach()/doParallel format? Thanks Lexo [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.