Dear Ravi, I always hesitate to address this question because it seems to generate much more heat than light. I know that you've been told that the question has been asked and answered on the list before, but simply to point to unilluminating answers isn't helpful, I believe. It's also hard to answer the question briefly, because a careful explanation takes more space than is reasonable on an email list, so I'll just make a few general points:
(1) What's important in formulating tests are the hypotheses being tested. These should be sensible and of interest. Take a crossed two-way ANOVA, for example, for factors A and B. In that context, hypotheses should address the pattern of cell means. (2) The null hypothesis of no interaction is that the profiles of cell means for A (say) across the levels of B are parallel. If these is no interaction, then testing equality of any weighted average of the profiles of cell means across the levels of B will test the null hypothesis of no A main effects. The most powerful test for the A main effect is the type-II test: for A after B ignoring the AB interaction; (and continuing) for B after A ignoring AB; and for AB after A and B. These tests are independent of the contrasts chosen to represent A and B, which is generally the case when one restricts consideration to models that conform to the principle of marginality. (3) If there is interaction, then what one means by main effects is ambiguous, but one possibility is to formulate the main effects for A in terms of the marginal means for A averaged across the levels of B. The null hypothesis of no A main effects is then that the A marginal means are all equal. This is the type-III test: for A after B and AB; (and continuing) for B after A and AB; and AB after A and B. Because the tests for the main effect violate the principle of marginality, to compute these tests properly requires contrasts that are orthogonal in the row-basis of the model -- in R, e.g., contr.sum or contr.helmert, but not the default contr.treatment. The type-III tests have the attraction that they also test for main effects if the interactions are absent, though they are not maximally powerful in that circumstance. There's also a serious question about whether one would be interested in main effects defined as averages over the level of the other factor when interactions are present. If, however, interactions are present, then the type-II tests for A and B are not tests of main effects in a reasonable interpretation of that term. (4) The type-I tests are sequential: for A ignoring B and their interaction; for B after A and ignoring the interaction; and for AB after A and B. These tests compare models that conform to marginality and thus are independent of the contrasts selected to represent A and B. If A and B are related (as happens when the cell counts are unequal) then the test for A does not test the A main effect in any reasonable interpretation of that term -- i.e., as the partial relationship between the response and A conditional on B. (5) Other, similar, issues arise in models with factors and covariates. These are not typically handled reasonably for type-III tests in software such as SAS and SPSS, which, e.g., test for differences across the levels of a factor A when a covariate X is 0. As you suggest, the anova function in R produces type-I tests. The Anova function in the car package produces type-II tests by default, and type-III tests optionally. If you select the latter, then you must be careful to use, say, contr.sum and not contr.treatment to encode the factors. I know that there are objections to the use of the terms type-I, -II and -III, but I find these an innocuous shorthand once the issues distinguishing the tests are understood. My preference is for type-II tests, which are hard to screw up because they conform to the principle of marginality and are maximally powerful in the context in which they are interesting. I hope this helps, John -------------------------------- John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Ravi Kulkarni > Sent: March-01-10 10:13 AM > To: r-help@r-project.org > Subject: [R] Type-I v/s Type-III Sum-Of-Squares in ANOVA > > > Hello, > I believe the aov() function in R uses a "Type-I sum-of-squares" by > default as against "Type-III". > This is relevant for me because I am trying to understand ANOVA in R using > my knowledge of ANOVA in SPSS. I can only reproduce the results of an ANOVA > done using R through SPSS if I specify that SPSS uses a Type-I > sum-of-squares. (And yes, I know that when the sample sizes of all groups > are equal, Type-I and Type-III produce the same answers.) > My questions: 1) exactly what is the difference between the two types of > sums-of-squares? > 2) how can I make R use a Type-III s-o-s? Should I? R > must have some reason for using Type-I as default rather than Type-III. > (Given a choice, believe R!) > > A reference (stats book, URL...) would be helpful... > > Ravi > > > > > -- > View this message in context: http://n4.nabble.com/Type-I-v-s-Type-III-Sum- > Of-Squares-in-ANOVA-tp1573657p1573657.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.