Consider this code fragment: --------------------------------------------------------------------------- set.seed(42)
x <- runif(20) y <- 2 + 3*x + rnorm(20) m1 <- lm(y ~ x) m2 <- lm(y ~ -1 + x) summary(m1) summary(m2) cor(y, fitted.values(m1))^2 cor(y, fitted.values(m2))^2 --------------------------------------------------------------------------- m1 is the true model and all is well. m2 is a false model, the intercept is truly 2 but it's been omitted. The R2 for m1 shows as 0.4953 while for m2 it shows 0.8983. I am aware that there are difficulties with standard formulas for R2 when there is no intercept. So the fact that the R2 of m2 is much higher (even though it's a wrong model) probably flows from that. What surprised me was that both correlations (between y and the fitted values of either m1 or m2) are identical. I am unable to understand how this could be. The estimated coefficient of x is quite different between the two cases. There must be an interesting theoretical angle to this. I would greatly appreciate some help in understanding this, and (more generally) in interpreting the R2 of regressions where the intercept is absent. -- Ajay Shah http://www.mayin.org/ajayshah ajays...@mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.