[R] fastest OLS w/ NA's and need for SE's

ivo welch Mon, 14 Sep 2009 14:20:39 -0700

dear R wizards:  apologies for two queries in one day.   I have a long form
data set, which identifies about 5,000 regressions, each with about 1,000
observations.


unit  date   y x
1    20060101 <two values>
1    20060102 <two values>
...
5000   20081230  <two values>
5000   20081231  <two values>

I need to run such regressions many many times, because they are part of an
optimization.  thus, getting my code to be fast is paramount.   I will need
to pick off the 5,000 coefficients on x (i.e., b) and the standard errors of
b's.  I can ignore the 5,000 intercept.

    by( dataset, as.factor(dataset$unit), function(x) coef(lm( y ~ x,
data=x)) )
gives me the coefficients.  of course, I could use the summary method to lm
to pick off the coefficient standard errors, too.  my guess is that this
would be slow.

I think the alternative would be to delete all NAs first, and then use a
building block function (such as lm.fit(), or solve(qr(),y)).  this would be
fast for getting the coefficients, but I wonder whether there is a *FAST*
way to obtain the standard error of b.  (I do know slow ways, but this would
defeat the purpose.)  is this the right idea?  or will I just end up with
more code but not more speed than I would with summary(lm())?  can someone
tell me the "fastest" way to generate b and se(b)?

is there anything else that comes to mind as a recommended way to speed this
up in R, short of writing everything in C?

as always, advice highly appreciated.

/iaw
-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fastest OLS w/ NA's and need for SE's

Reply via email to