Before posting on the r-help list I did run Rprof(). In my posting I asked for help with re-writing the specific script into sapply() or foreach()/doParallel format.
Thanks anyway for your time and suggestions, Lexo On Thu, Jan 24, 2019 at 12:38 AM Eric Berger <ericjber...@gmail.com> wrote: > Charles writes about saving execution time by eliminating redundancies. > If you see redundancies related to calling a time-consuming function > multiple times with the same arguments, a very easy way to speed up your > program is to memoise the functions using the package memoise. > > HTH, > Eric > > > > > On Wed, Jan 23, 2019 at 8:34 PM Berry, Charles <ccbe...@ucsd.edu> wrote: > >> See inline. >> >> > On Jan 23, 2019, at 2:17 AM, Aleksandre Gavashelishvili < >> aleksandre.gavashelishv...@iliauni.edu.ge> wrote: >> > >> > I'm trying to speed up a script that otherwise takes days to handle >> larger >> > data sets. So, is there a way to completely vectorize or paralellize the >> > following script: >> > >> > *# k-fold cross validation* >> > >> > df <- trees # a data frame 'trees' from R. >> > df <- df[sample(nrow(df)), ] # randomly shuffles the data. >> > k <- 10 # Number of folds. Note k=nrow(df) in the leave-one-out cross >> > validation. >> > folds <- cut(seq(from=1, to=nrow(df)), breaks=k, labels=FALSE) # creates >> > unique numbers for k equally size folds. >> > df$ID <- folds # adds fold IDs. >> > df[paste("pred", 1:3, sep="")] <- NA # adds multiple columns "pred1" >> > "pred2" "pred3" to speed up the following loop. >> > >> > library(mgcv) >> > >> >> Rprof() >> >> replicate(100, { >> >> >> > for(i in 1:k) { >> > # looping for different models: >> > m1 <- gam(Volume ~ s(Height), data=df, subset=(ID != i)) >> > m2 <- gam(Volume ~ s(Girth), data=df, subset=(ID != i)) >> > m3 <- gam(Volume ~ s(Girth) + s(Height), data=df, subset=(ID != i)) >> > >> > # looping for predictions: >> > df[df$ID==i, "pred1"] <- predict(m1, df[df$ID==i, ], type="response") >> > df[df$ID==i, "pred2"] <- predict(m2, df[df$ID==i, ], type="response") >> > df[df$ID==i, "pred3"] <- predict(m3, df[df$ID==i, ], type="response") >> > } >> > >> >> }) >> >> Rprof(NULL) >> >> summaryRprof() >> >> ## read ?Rprof to get a sense of what it does >> >> ## read the summary to determine where time is being spent. >> >> ## the result was surprising to me. YMMV. >> >> ## there may be redundancies that you can eliminate by >> ## - doing the setup within gam() one time and saving it >> ## - calling the worker functions by modifying the setup >> ## in a loop or function and saving the results >> >> >> > # calculating residuals: >> > df$res1 <- with(df, Volume - pred1) >> > df$res2 <- with(df, Volume - pred2) >> > df$res3 <- with(df, Volume - pred3) >> > >> > Model <- paste("m", 1:3, sep="") # creates a vector of model names. >> > >> > # creating a vector of mean-square errors (MSE): >> > MSE <- with(df, c( >> > sum(res1^2) / nrow(df), >> > sum(res2^2) / nrow(df), >> > sum(res3^2) / nrow(df) >> > )) >> > >> > model.mse <- data.frame(Model, MSE) # creates a data frame of model >> names >> > and mean-square errors. >> > model.mse <- model.mse[order(model.mse$MSE), ] # rearranges the previous >> > data frame in order of increasing mean-square errors. >> > >> > I'd appreciate any help. This code takes several days if run on >=30,000 >> > different GAM models and 3 predictors. Could you please help with >> > re-writing the script into sapply() or foreach()/doParallel format? >> > >> >> This is something you should learn to do. It is pretty standard practice. >> Use the body of your for loop as the body of a function, add arguments, and >> create a suitable return value. The something like >> >> lapply( 1:k, your.loop.body.function, other.arg1, other.arg2, ...) >> >> should work. If it does, then parallel::mclapply(...) should also work. >> >> HTH, >> >> Chuck >> >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.