I am struiggling a bit with this function 'hatvalues'. I would like a little more undrestanding than taking the black-box and using the values. I looked at the Fortran source and it is quite opaque to me. So I am asking for some help in understanding the theory. First, I take the simplest case of a single variant. For this I turn o John Fox's book, "Applied Regression Analysis and Generalized Linear Models, p 245 and generate this 'R' code:
> library(car) > attach(Davis) # remove the NA's > narepwt <- repwt[!is.na(repwt)] > meanrw <- mean(narepwt) > drw <- narepwt - meanrw > ssrw <- sum(drw * drw) > h <- 1/length(narepwt) + (drw * drw)/ssrw > h This gives me a array of values the largest of which is > order(h, decreasing=TRUE) [1] 21 52 17 93 30 62 158 113 175 131 182 29 106 125 123 146 91 99 So the largest "hatvalue" is > h[21] [1] 0.1041207 Which doesn't match the 0.714 value that is reported in the book but I will probably take that up with the author later. Then I use more of 'R' and I get fit <- lm(weight ~ repwt) hr <- hatvalues(fit) hr[21] 21 0.1041207 So this matches which is reasusing. My question is this, given the QR transformation and the residuals derived from that transformation what is a simple matrix formula for the hatvalues? >From http://en.wikipedia.org/wiki/Linear_regression I get residuals = y - Hy = y(I - H) or H = -(residuals/y - I) > fit <- lm(weight ~ repwt) > h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - > diag(1,length(residuals(fit)), length(residuals(fit)))) This generates a matrix but I cannot see any coerrelation between this "hat-matrix" and the return from "hatvalues". Comments? Thank you. Kevin ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.