Hi Greg, Thanks for the help, it works perfectly. To answer your question, there are 339 independent variables but only 10 will be used at one time . So at any given line of the data set there will be 10 non zero entries for the independent variables and the rest will be zeros.
One more question: 1. I still want to find a way to look at the interactions of the independent variables. the regression would look like this: y = b12*X1X2 + b23*X2X3 +...+ bk-1k*Xk-1Xk so I think the regression in R would look like this: lm(MARGIN, P235:P236+P236:P237+....,weights = Poss, data = adj0708), my problem is that since I have technically 339 independent variables, when I do this regression I would have 339 Choose 2 = approx 57000 independent variables (a vast majority will be 0s though) so I dont want to have to write all of these out. Is there a way to do this quickly in R? Also just a curious question that I cant seem to find to online: is there a more efficient model other than lm() that is better for very sparse data sets like mine? Thanks, Matt On Mon, Feb 28, 2011 at 4:30 PM, Greg Snow <greg.s...@imail.org> wrote: > Don't put the name of the dataset in the formula, use the data argument to lm > to provide that. A single period (".") on the right hand side of the formula > will represent all the columns in the data set that are not on the left hand > side (you can then use "-" to remove any other columns that you don't want > included on the RHS). > > For example: > >> lm(Sepal.Width ~ . - Sepal.Length, data=iris) > > Call: > lm(formula = Sepal.Width ~ . - Sepal.Length, data = iris) > > Coefficients: > (Intercept) Petal.Length Petal.Width Speciesversicolor > 3.0485 0.1547 0.6234 -1.7641 > Speciesvirginica > -2.1964 > > > But, are you sure that a regression model with 339 predictors will be > meaningful? > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.s...@imail.org > 801.408.8111 > > >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >> project.org] On Behalf Of Matthew Douglas >> Sent: Monday, February 28, 2011 1:32 PM >> To: r-help@r-project.org >> Subject: [R] Regression with many independent variables >> >> Hi, >> >> I am trying use lm() on some data, the code works fine but I would >> like to use a more efficient way to do this. >> >> The data looks like this (the data is very sparse with a few 1s, -1s >> and the rest 0s): >> >> > head(adj0708) >> MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337.... >> 1 64.28571 29 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 >> 2 -100.00000 6 0 0 0 0 0 0 0 1 0 0 >> 0 0 0 >> 3 100.00000 4 0 0 0 0 0 0 0 1 0 0 >> 0 0 0 >> 4 -33.33333 7 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 >> 5 200.00000 2 0 0 0 0 0 0 0 0 0 0 >> -1 0 0 >> 6 -83.33333 12 0 -1 0 0 0 0 0 0 0 0 >> 0 0 0 >> >> adj0708 is actually a 35657x341 data set. Each column after "Poss" is >> an independent variable, the dependent variable is "MARGIN" and it is >> weighted by "Poss" >> >> >> The regression is below: >> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 + >> adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 + >> adj0708$P605 + adj0708$P337 + .... + >> adj0708$P510,weights=adj0708$Poss) >> >> I have two questions: >> >> 1. Is there a way to to condense how I write the independent variables >> in the lm(), instead of having such a long line of code (I have 339 >> independent variables to be exact)? >> 2. I would like to pair the data to look a regression of the >> interactions between two independent variables. I think it would look >> something like this.... >> fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 + >> adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 + >> adj0708$P605:adj0708$P337 + ....,weights=adj0708$Poss) >> but there will be 339 Choose 2 combinations, so a lot of independent >> variables! Is there a more efficient way of writing this code. Is >> there a way I can do this? >> >> Thanks, >> Matt >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.