Hi Greg,

Thanks for the help, it works perfectly. To answer your question,
there are 339 independent variables but only 10 will be used at one
time . So at any given line of the data set there will be 10 non zero
entries for the independent variables and the rest will be zeros.

One more question:

1. I still want to find a way to look at the interactions of the
independent variables.

the regression would look like this:

y = b12*X1X2 + b23*X2X3 +...+ bk-1k*Xk-1Xk

so I think the regression in R would look like this:

lm(MARGIN, P235:P236+P236:P237+....,weights = Poss, data = adj0708),

my problem is that since I have technically 339 independent variables,
when I do this regression I would have 339 Choose 2 = approx 57000
independent variables (a vast majority will be 0s though) so I dont
want to have to write all of these out. Is there a way to do this
quickly in R?

Also just a curious question that I cant seem to find to online:
is there a more efficient model other than lm() that is better for
very sparse data sets like mine?

Thanks,
Matt


On Mon, Feb 28, 2011 at 4:30 PM, Greg Snow <greg.s...@imail.org> wrote:
> Don't put the name of the dataset in the formula, use the data argument to lm 
> to provide that.  A single period (".") on the right hand side of the formula 
> will represent all the columns in the data set that are not on the left hand 
> side (you can then use "-" to remove any other columns that you don't want 
> included on the RHS).
>
> For example:
>
>> lm(Sepal.Width ~ . - Sepal.Length, data=iris)
>
> Call:
> lm(formula = Sepal.Width ~ . - Sepal.Length, data = iris)
>
> Coefficients:
>      (Intercept)       Petal.Length        Petal.Width  Speciesversicolor
>           3.0485             0.1547             0.6234            -1.7641
>  Speciesvirginica
>          -2.1964
>
>
> But, are you sure that a regression model with 339 predictors will be 
> meaningful?
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
>> project.org] On Behalf Of Matthew Douglas
>> Sent: Monday, February 28, 2011 1:32 PM
>> To: r-help@r-project.org
>> Subject: [R] Regression with many independent variables
>>
>> Hi,
>>
>> I am trying use lm() on some data, the code works fine but I would
>> like to use a more efficient way to do this.
>>
>> The data looks like this (the data is very sparse with a few 1s, -1s
>> and the rest 0s):
>>
>> > head(adj0708)
>>       MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337....
>> 1   64.28571   29    0    0    0    0    0    0   0    0    0    0
>> 0    0    0
>> 2 -100.00000    6    0    0    0    0    0    0   0    1    0    0
>> 0    0    0
>> 3  100.00000    4    0    0    0    0    0    0   0    1    0    0
>> 0    0    0
>> 4  -33.33333    7    0    0    0    0    0    0   0    0    0    0
>> 0    0    0
>> 5  200.00000    2    0    0    0    0    0    0   0    0    0    0
>> -1    0    0
>> 6  -83.33333   12    0    -1    0    0    0    0   0    0    0    0
>> 0    0    0
>>
>> adj0708 is actually a 35657x341 data set. Each column after "Poss" is
>> an independent variable, the dependent variable is "MARGIN" and it is
>> weighted by "Poss"
>>
>>
>> The regression is below:
>> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
>> adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
>> adj0708$P605 + adj0708$P337 + .... +
>> adj0708$P510,weights=adj0708$Poss)
>>
>> I have two questions:
>>
>> 1. Is there a way to to condense how I write the independent variables
>> in the lm(), instead of having such a long line of code (I have 339
>> independent variables to be exact)?
>> 2. I would like to pair the data to look a regression of the
>> interactions between two independent variables. I think it would look
>> something like this....
>> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 +
>> adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 +
>> adj0708$P605:adj0708$P337 + ....,weights=adj0708$Poss)
>> but there will be 339 Choose 2 combinations, so a lot of independent
>> variables! Is there a more efficient way of writing this code. Is
>> there a way I can do this?
>>
>> Thanks,
>> Matt
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to