Dear Peter, Thank you very much for that excellent answer to a rather stupid question :) I did not notice that the RSS actually increased for the model with more parameters and so in this case the F-statistic is negative and therefore a p-value from the F-distribution is meaningless. But I guess your answer also clarifies that as long as the F-statistic is in the valid range (>=0), anova() will calculate it and return a p-value (whether or not the models are nested).
Best, Suresh Peter Dalgaard-2 wrote > > On Jul 9, 2012, at 15:40 , Suresh Krishna wrote: > >> >> Hello, >> >> Why does anova.lm sometimes return a p-value and at other times not ? Is >> it because it recognizes nested models from non-nested ones ? >> >>> x<-seq(1,100,1) >>> y<-3*x+rnorm(100) >>> anova(lm(y~x),lm(y~x+I(x^2)),test="F") >> Analysis of Variance Table >> >> Model 1: y ~ x >> Model 2: y ~ x + I(x^2) >> Res.Df RSS Df Sum of Sq F Pr(>F) >> 1 98 90.449 >> 2 97 90.288 1 0.16117 0.1732 0.6782 >> >>> anova(lm(y~x),lm(y~I(x^2)+I(x^3)),test="F") >> Analysis of Variance Table >> >> Model 1: y ~ x >> Model 2: y ~ I(x^2) + I(x^3) >> Res.Df RSS Df Sum of Sq F Pr(>F) >> 1 98 90.4 >> 2 97 7345.7 1 -7255.3 >> > > You have Df and Sum of Sq with opposite sign, so more parameters with a > worse fit. The models are not nested, so the F test makes no sense. > > I'd say that the real question is why anova.lm doesn't protest loudly when > detecting this? One possible answer is that it also misses other > non-nested cases where the signs do not clash, and warning only in some of > the incorrect cases could lead the naive user to believe that the other > ones are OK. E.g. this F test is equally meaningless > >> anova(lm(y~I(x^4)),lm(y~I(x^2)+I(x^3)),test="F") > Analysis of Variance Table > > Model 1: y ~ I(x^4) > Model 2: y ~ I(x^2) + I(x^3) > Res.Df RSS Df Sum of Sq F Pr(>F) > 1 98 186639 > 2 97 7101 1 179538 2452.4 < 2.2e-16 *** > > (Non-nestedness could in principle be determined by checking whether > cbind(model.matrix(m1), model.matrix(m2)) has higher rank that both of its > constituents, but numerical rank determination is a bit error-prone and > slow, so this was not implemented). > > > -- > Peter Dalgaard, Professor > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes@ Priv: PDalgd@ > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/anova-lm-and-F-test-tp4635845p4635867.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.