Alexandra <a...@imm.dtu.dk> writes: > Dear all, > > I have encountered some strange things when creating lm objects in R: model > depends on the order of the terms specified in a formula. > Let us consider the following simple example: > >> dat <- expand.grid(A = factor(c("a1", "a2")), > + B = factor(paste("b", 1:4, sep="")), > + rep = factor(1:2)) >> set.seed(12345) >> dat$x <- 1:16 * .5 + runif(16) >> dat$Y <- rnorm(16) > >> dat > A B rep x Y > 1 a1 b1 1 1.220904 -0.2841597 > 2 a2 b1 1 1.875773 -0.9193220 > 3 a1 b2 1 2.260982 -0.1162478 > 4 a2 b2 1 2.886125 1.8173120 > 5 a1 b3 1 2.956481 0.3706279 > 6 a2 b3 1 3.166372 0.5202165 > 7 a1 b4 1 3.825095 -0.7505320 > 8 a2 b4 1 4.509224 0.8168998 > 9 a1 b1 2 5.227705 -0.8863575 > 10 a2 b1 2 5.989737 -0.3315776 > 11 a1 b2 2 5.534535 1.1207127 > 12 a2 b2 2 6.152373 0.2987237 > 13 a1 b3 2 7.235685 0.7796219 > 14 a2 b3 2 7.001137 1.4557851 > 15 a1 b4 2 7.891203 -0.6443284 > 16 a2 b4 2 8.462495 -1.5531374 > > >> logLik(m0 <- lm(Y ~ A:B + x:A, dat)) > 'log Lik.' -13.22186 (df=11) > >> logLik(m1 <- lm(Y ~ x:A + A:B, dat)) > 'log Lik.' -13.66822 (df=10) > > To me it is a bit strange that m0 and m1 models appear to have different > loglikelihood (only the order of the terms in a formula was changed!) > > My guess is that the problem lies in model.matrix: > > X1 <- model.matrix(~ x:A + A:B, dat) > X2 <- model.matrix(~ A:B + x:A, dat)
Close, but not quite. The problem lies in terms() Here are the attr(terms(...),"factors") matrices: > attributes(terms(Y ~ x:A + A:B,data=dat))$factors x:A A:B Y 0 0 x 2 0 A 2 2 B 0 1 > attributes(terms(Y ~ A:B + x:A ,data=dat))$factors A:B A:x Y 0 0 A 2 2 B 2 0 x 0 1 As you see, the encoding of x and B are treated differently under the two orderings. See ?terms.object for what those codes mean. Same deal for these seemingly equivalent formulae: > attributes(terms(Y ~ (x + A + B)^2-A,data=dat))$factors x B x:A x:B A:B Y 0 0 0 0 0 x 1 0 2 1 0 A 0 0 1 0 1 B 0 1 0 1 1 > attributes(terms(Y ~ (A + B + x)^2-A,data=dat))$factors B x A:B A:x B:x Y 0 0 0 0 0 A 0 0 1 1 0 B 1 0 2 0 1 x 0 1 0 1 1 > AFAICS, this is a bug. HTH, Chuck > > > ## number of columns: > ncol(X1) ## 9 > >> ncol(X2) ## 11 > > ## rank of design matrices: > qr(X1)$rank ## 9 > > qr(X2)$rank ## 10 > > > Will be very grateful if someone could help me here! > > Thanks, > > Alexandra > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Why-does-the-order-of-terms-in-a-formula-translate-into-different-models-model-matrices-tp4333408p4333408.html > Sent from the R help mailing list archive at Nabble.com. > -- Charles C. Berry Dept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.