Hi Daniel, Thanks for your thorough response. You are indeed correct that I was looking for (a), and the Chow test fits the bill exactly. However, I believe my method is equivalent to the Chow test. Rather than summing the errors across regressions on different datasets, I formulate a single specification with errors equivalent to that sum. (Except there would be one difference: the reduced and full models include dummies for the grouping variable, and the Chow test doesn't.) What interfered with the method was the fact that my factor was ordered and therefore was not treated as a dummy variables by R in my model specification. Thanks to David who helped clarify R's behaviour on this point.
Once I unordered my factor, I found that the coefficient estimates were the same. > data.ex$t = factor(data.ex$t, ordered = F) > coef(lm.together) (Intercept) t2 t3 t4 x1 x2 2.691272263 -0.975915716 -0.811480858 -0.763039039 0.107520721 0.054694784 t2:x1 t3:x1 t4:x1 t2:x2 t3:x2 t4:x2 -0.060271900 -0.063503572 -0.087683111 -0.015070147 0.005671774 -0.024606167 x1:x2 t2:x1:x2 t3:x1:x2 t4:x1:x2 0.002180749 -0.001815363 -0.002329575 -0.002921691 You can see that these are equivalent to the estimates from lm.separate. Demonstrating with the intercept: > sapply(lm.separate, coef)[1,] 1 2 3 4 2.691272 1.715357 1.879791 1.928233 > coef(lm.together)[1] + c(0, coef(lm.together)[2:4]) t2 t3 t4 2.691272 1.715357 1.879791 1.928233 For reference, the method I'm using is described here: www.stat.wisc.edu/~mchung/teaching/stat324/324.24.pdf Thanks very much for everyone's help, Clara ________________________________________ From: Daniel Malter [dan...@umd.edu] Sent: Friday, December 18, 2009 4:22 AM To: Clara Yuan; 'r-h...@lists.r-project.org' Subject: RE: [R] Testing equality of regression model on multiple groups Hi, your question is unclear. It is not clear whether you want to compare a.) whether two subsets of data have the same coefficients for an identical model or b.) whether a couple of coefficients are different once you include additional regressors. The reason for this confusion is that your lm.separate and lm.together are not the same models (i.e., they do not include the same regressors). The former is (y~x1*x2); the latter is (y~t*x1*x2), which is obviously different. If a.) is your goal, you want to run a Chow test. For that, you run the regression (y~x1*x2) on the entire dataset and on the two subsets of the data separately. If the model on the entire data produces much larger errors (in sum) than the two regressions on the subsets of data (in sum), then this is evidence that the data-generating processes for the two subsets differ significantly. In other words, the coefficients for the subsets will jointly be significantly different from each other. This is, as you said, an F-Test. Look here for further details: http://en.wikipedia.org/wiki/Chow_test It just beckons me that you probably want to compare multiple groups. If you want to run (y~x1*x2) interacted with the group indicator, i.e. (y~group*x1*x2), you should code your group indicator as "treatment contrasts." The t.L, t.Q, and t.C indicate a.) that t is a factor variable and b.) that this factor variable is coded as orthogonal polynomial contrasts. This tests for linear, quadractic, cubic, etc. influence of t on the dependent variable. This makes sense if t has constant differences between the factor levels (like 10 cm, 15 cm, 20 cm, 25 cm). Otherwise, you probably want dummy-variable coding, which is called "treatment contrasts" in R lingo. HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Clara Yuan Sent: Thursday, December 17, 2009 5:15 PM To: r-h...@lists.r-project.org Subject: [R] Testing equality of regression model on multiple groups Hello, I'm trying to test for the joint equality of coefficients of the same model across different subsets of the data (ie, is it necessary to estimate the same model on these different populations, or can I just estimate the model once on the whole dataset?). My plan is to use the F-test on the reduced model and the full model. By full model, I mean a specification that mimics my regressions on separate subsets of data, but I have found that the full model's coefficient estimates don't correspond to my original model's estimates. I was under the impression that they would be identical. Original model: > lm.separate = by(data.ex, data.ex$t, function(x) lm(y ~ x1 * x2, data = x)) Full model: > lm.together = lm(y ~ t * x1 * x2, data = data.ex) The data are grouped by t. When I examine the coefficients, I find that they are roughly in the same ballpark, but not nearly identical: > sapply(lm.separate, coef) 1 2 3 4 (Intercept) 2.691272263 1.7153565472 1.8797914048 1.9282332240 x1 0.107520721 0.0472488208 0.0440171489 0.0198376096 x2 0.054694784 0.0396246366 0.0603665574 0.0300886164 x1:x2 0.002180749 0.0003653858 -0.0001488267 -0.0007409421 > coef(lm.together) (Intercept) t.L t.Q t.C x1 2.0536633597 -0.4750933962 0.5121787674 -0.2809269719 0.0546560750 x2 t.L:x1 t.Q:x1 t.C:x1 t.L:x2 0.0461936485 -0.0595422428 0.0180461803 -0.0174386682 -0.0118682844 t.Q:x2 t.C:x2 x1:x2 t.L:x1:x2 t.Q:x1:x2 -0.0076038969 -0.0194162097 0.0004140914 -0.0020749112 0.0006116237 t.C:x1:x2 -0.0003083657 (Also, why are the coefficients renamed to t.L, t.Q, etc instead of t.1, t.2?) What am I missing? Thanks for the help, Clara ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.