Dear all, I have encountered some strange things when creating lm objects in R: model depends on the order of the terms specified in a formula. Let us consider the following simple example:
> dat <- expand.grid(A = factor(c("a1", "a2")), + B = factor(paste("b", 1:4, sep="")), + rep = factor(1:2)) > set.seed(12345) > dat$x <- 1:16 * .5 + runif(16) > dat$Y <- rnorm(16) > dat A B rep x Y 1 a1 b1 1 1.220904 -0.2841597 2 a2 b1 1 1.875773 -0.9193220 3 a1 b2 1 2.260982 -0.1162478 4 a2 b2 1 2.886125 1.8173120 5 a1 b3 1 2.956481 0.3706279 6 a2 b3 1 3.166372 0.5202165 7 a1 b4 1 3.825095 -0.7505320 8 a2 b4 1 4.509224 0.8168998 9 a1 b1 2 5.227705 -0.8863575 10 a2 b1 2 5.989737 -0.3315776 11 a1 b2 2 5.534535 1.1207127 12 a2 b2 2 6.152373 0.2987237 13 a1 b3 2 7.235685 0.7796219 14 a2 b3 2 7.001137 1.4557851 15 a1 b4 2 7.891203 -0.6443284 16 a2 b4 2 8.462495 -1.5531374 > logLik(m0 <- lm(Y ~ A:B + x:A, dat)) 'log Lik.' -13.22186 (df=11) > logLik(m1 <- lm(Y ~ x:A + A:B, dat)) 'log Lik.' -13.66822 (df=10) To me it is a bit strange that m0 and m1 models appear to have different loglikelihood (only the order of the terms in a formula was changed!) My guess is that the problem lies in model.matrix: X1 <- model.matrix(~ x:A + A:B, dat) X2 <- model.matrix(~ A:B + x:A, dat) ## number of columns: ncol(X1) ## 9 > ncol(X2) ## 11 ## rank of design matrices: qr(X1)$rank ## 9 qr(X2)$rank ## 10 Will be very grateful if someone could help me here! Thanks, Alexandra -- View this message in context: http://r.789695.n4.nabble.com/Why-does-the-order-of-terms-in-a-formula-translate-into-different-models-model-matrices-tp4333408p4333408.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.