If you have the same set of predictors for multiple outcomes, you can speed up the process considerably by taking advantage of the fact that the design matrix is the same for multiple outcomes. As an example:
set.seed(10) y <- matrix(rnorm(10000 * 14), ncol = 14) x <- matrix(rnorm(10000 * 2), ncol = 2) system.time(res <- lapply(1:14, function(i) lm(y[, i] ~ x))) ## user system elapsed ## 0.34 0.00 0.34 system.time(res2 <- lm(y ~ x)) ## user system elapsed ## 0.05 0.02 0.06 lm can accept a matrix as the dependent variable. So if various combinations of variables predict all 14 outcomes, do not fit 14 * number of combinations of predictors separately, do them in chunks for substantial speedups. Finally, as long as you are not using factors or any other fancy things, and can work with just raw data matrices, instead of using lm(), which is a high level function, use lm.fit(). It is not especially clever, just expects the design matrix and response matrix. It will not an intercept by default, so to your data column bind on a vector of 1s. system.time(res3 <- lm.fit(cbind(1, x), y)) ## user system elapsed ## 0.02 0.00 0.01 These three methods produce identical results: res[[1]] ## Call: ## lm(formula = y[, i] ~ x) ## Coefficients: ## (Intercept) x1 x2 ## 0.0014401 -0.0198232 -0.0005721 res2[[1]][, 1] ## (Intercept) x1 x2 ## 0.0014401149 -0.0198232209 -0.0005720764 res3[[1]][, 1] ## x1 x2 x3 ## 0.0014401149 -0.0198232209 -0.0005720764 however, fit each response one at a time instead of taking advantage of fitting multiple responses at once (so the design matrix can be reused) and taking advantage of lower level functions when you have a simple, specific, repetitive task takes the estimation time from .34 down to .01. You would need 34 cores to achieve that by simply throwing more hardware at the problem as opposed to using more optimized code. Of course you can do both, and probably get the results pretty quickly. Cheers, Josh On Mon, Feb 18, 2013 at 8:25 AM, Milan Bouchet-Valat <nalimi...@club.fr> wrote: > Le lundi 18 février 2013 à 17:09 +0100, Jamora, Nelissa a écrit : >> Hi Milan, >> >> Thanks for responding to my question. I'm actually not interested in LM, it >> was more for example. >> >> You are right, I'm trying an enormous set of model runs. My Var1 n=14; Var2 >> n=255 ==> 3570! >> But first, I need be able to set up 2 variables in each model run. Those 2 >> variables are different in each case. I can set this up 1-by-1 but it will be >> tedious and not efficient. >> >> To describe in more details >> I have a data frame with 269 variables. >> 1. individual columns 1-14 can be my first variable >> 2. individual columns 15-269 can be my second variable. >> >> Variable1 and variable2 are different in each case. For e.g. >> Model 1: var1 and var15 >> Model 2: var1 and var16 >> Model 3: var1 and var17... >> .... >> Model 3570: var14 and var269 >> So I need to write a loop command that calls for different sets of variable1 >> and variable2 in each run. >> >> What do I intend to do with this? I'm running threshold vecm (package tsDyn), >> and I need to summarize threshold estimates in each model run (or market >> pairs, var1 and var2). The goal is to extract N=3,570 threshold estimates. >> I did similar linear VECM estimates in Stata using my foreach loop, but now I >> need to make parallel run in R but using threshold model. > Ah, if you need parallelism, you can likely try something like this > (untested) : > > # Create cluster cl before and export yourData to them > parLapply(cl, paste0("p", 1:14)), function(var) { > lapply(paste0("p", 15:269), function(y) { > lm(yourData[[var]] ~ yourData[[y]]) > }) > } > > This will only be optimal if you have less than 14 cores. > > > Regards > > >> Hope this clears things. >> Nelissa >> >> >> >> >> >> >> >> >> -----Original Message----- >> From: Milan Bouchet-Valat [mailto:nalimi...@club.fr] >> Sent: Monday, February 18, 2013 3:44 PM >> To: Jamora, Nelissa >> Cc: r-help@r-project.org >> Subject: Re: [R] foreach loop, stata equivalent >> >> Le lundi 18 février 2013 à 13:48 +0100, Jamora, Nelissa a écrit : >> > Hi! I'm a recent convert from Stata, so forgive my ignorance. >> > >> > >> > >> > In Stata, I can write foreach loops (example below) >> > >> > >> > >> > foreach var of varlist p1-p14 { >> > >> > foreach y of varlist p15-p269 { >> > >> > reg `var' `y' >> > >> > } >> > >> > } >> > >> > >> > >> > It's looping p1-p15, p1-p16...., p1-p269, p2-p15, p2-p16,... p2-p269,... >> > variable pairs. >> > >> > >> > >> > How can I write something similar in R? >> > >> > I 'tried' understanding the package.foreach but can't get it to work. >> You do not need package foreach, which is intended at a completely different >> problem. >> >> R does not really have the syntactic equivalent of "varlist", but you can >> easily do something like: >> for(var in paste0("p", 1:14)) { >> for(y in paste0("p", 15:269)) >> lm(yourData[[var]] ~ yourData[[y]]) } >> >> provided that yourData is the data frame in which the p* variables are >> stored. >> >> There are probably more direct ways of doing the same thing and storing the >> resulting lm objects in a list, but you did not state what you intend to do >> with this enormous set of regressions... >> >> >> Regards >> >> > Thanks for any help >> > >> > Nelissa >> > >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.