At 02:23 29/12/2010, Entropi ntrp wrote:
Hi,
I have been examining large data and need to do simple linear regression
with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y, and I need to
regress x on y for each distinct value of ID. Specifically, for the set of
data corresponding to each of the 4 values of ID (76,111,121,168) in the
below data, I should invoke linear regression 4 times. The challenge is
that, the length of the ID vector is around 20000 and therefore linear
regression must be done automatically for each distinct value of ID.
ID x y
76 36476 15.8 76 36493 66.9 76 36579 65.6 111 35465 10.3 111 35756 4.8
121 38183 16 121 38184 15 121 38254 9.6 121 38255 7 168 37727 21.9 168
37739 29.7 168 37746 97.4
I was wondering whether there is an easy way to group data based on the
values of ID in R so that linear regression can be done easily for each
group determined by each value of ID. Or, is the only way to construct
loops with 'for' or 'while' in which a matrix is generated for each
distinct value of ID that stores corresponding values of x and y by
screening the entire ID vector?
The advantage of using lmList from nlme is that
a) it gives you access to a range of functions already written to
operate on such oblects
b) you can easily write your own extractor function and then call it
using lapply
If you do it yourself you can still do (b) but you lose (a)
Thanks in advance,
Yasin
[[alternative HTML version deleted]]
Michael Dewey
http://www.aghmed.fsnet.co.uk
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.