As Kehl pointed out, any linear function of the independent variable (speed) will have the same squared correlation with the dependent variable (dist), but only one linear function minimizes the squared deviations between the fitted values and the original values. The equation you are using is only applicable to that function, not to any of the others. In fact, some linear functions will produce negative values:
> fitted.new <- 6*cars$speed > cor(cbind(fitted.new, fitted.right, fitted.wrong, cars$dist)) fitted.new fitted.right fitted.wrong fitted.new 1.0000000 1.0000000 1.0000000 0.8068949 fitted.right 1.0000000 1.0000000 1.0000000 0.8068949 fitted.wrong 1.0000000 1.0000000 1.0000000 0.8068949 0.8068949 0.8068949 0.8068949 1.0000000 > 1-sum((cars$dist-fitted.new)^2)/sum((cars$dist-mean(cars$dist))^2) [1] -3.281849 David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Thayn Sent: Sunday, February 22, 2015 12:01 AM To: Kehl Dániel Cc: r-help@r-project.org Subject: Re: [R] Correlation question Of course! Thank you, I knew I was missing something painfully obvious. Its seems, then, that this line 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) is finding something other than the traditional correlation. I found this in a lecture introducing correlation, but , now, I'm not sure what it is. It does do a better job of showing that the fitted.wrong variable is not a good prediction of the distance. On Feb 21, 2015, at 4:36 PM, Kehl Dániel wrote: > Hi, > > try > > cor(fitted.right,fitted.wrong) > > should give 1 as both are a linear function of speed! Hence > cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be > the same. > > HTH > d > ________________________________________ > Feladó: R-help [r-help-boun...@r-project.org] ; meghatalmazó: Jonathan > Thayn [jth...@ilstu.edu] > Küldve: 2015. február 21. 22:42 > To: r-help@r-project.org > Tárgy: [R] Correlation question > > I recently compared two different approaches to calculating the correlation > of two variables, and I cannot explain the different results: > > data(cars) > model <- lm(dist~speed,data=cars) > coef(model) > fitted.right <- model$fitted > fitted.wrong <- -17+5*cars$speed > > > When using the OLS fitted values, the lines below all return the same R2 > value: > > 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2) > cor(cars$dist,fitted.right)^2 > (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2 > > > However, when I use my estimated parameters to find the fitted values, > "fitted.wrong", the first equation returns a much lower R2 value, which I > would expect since the fit is worse, but the other lines return the same R2 > that I get when using the OLS fitted values. > > 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2) > cor(x=cars$dist,y=fitted.wrong)^2 > (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2 > > > I'm sure I'm missing something simple, but can someone explain the difference > between these two methods of finding R2? Thanks. > > Jon > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.