Hi Thomas, What you just wrote is very interesting to me - do you have any suggestion then as to how to implement leaps (or any other package/code) to iterate on the final lm model produced by biglm ? Any advice would be very welcomed!
p.s: My purpose is to use a different algorithm than that of the AIC, so my end goal is to implement something that IS correctly calibrated for searching large model spaces. Thanks! Tal On Sun, Feb 22, 2009 at 12:46 PM, Thomas Lumley <tlum...@u.washington.edu>wrote: > On Sat, 21 Feb 2009, Charles C. Berry wrote: > > On Sat, 21 Feb 2009, Tal Galili wrote: >> >> Hello dear R mailing list members. >>> >>> I have recently became curious of the possibility applying model >>> selection algorithms (even as simple as AIC) to regressions of large >>> datasets. >>> >> >> >> Large in the sense of many observations, one assumes. >> >> But how large in terms of the number of variables?? >> >> If not too many variables, then you can form the regression sums of >> squares for all 2^p combinations of regressors from a biglm() fit of all >> variables as biglm provides coef() and vcov() methods. >> >> If it is large, then you most likely will need to do subsampling to reduce >> the number to 'not too many' via lm() and friends then and apply the above >> strategy. >> >> > If you can fit the complete p-variable model (so you have more observations > than variables) the search algorithms then don't require the raw data so the > search time depends on p but not on n. That's how the leaps package works, > for example. This is only for lm(), but you get a pretty good approximation > for glm() by doing the search using the weighted linear model from the last > iteration of IWLS, finding a reasonably large collection of best models, and > then refitting them in glm() to see which is really best. > > Of course, none of this solves the problem that AIC isn't correctly > calibrated for searching large model spaces. > > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle > > > -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.