Dear Kevin, If you do the same regression as in the text then you'll get the same hat-values; the regression is the one on the top of p. 244:
> mod <- lm(repwt ~ weight*sex, data=Davis) > max(hatvalues(mod)) [1] 0.7141856 As to making sense of the computations: > X <- model.matrix(mod) > head(X) (Intercept) weight sexM weight:sexM 1 1 77 1 77 2 1 58 0 0 3 1 53 0 0 4 1 68 1 68 5 1 59 0 0 6 1 76 1 76 > H <- X %*% solve( t(X) %*% X ) %*% t(X) > h <- diag(H) > max(h) [1] 0.7141856 I hope this helps, John > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of rkevinbur...@charter.net > Sent: March-05-09 11:40 AM > To: r-h...@stat.math.ethz.ch > Subject: [R] hatvalues? > > I am struiggling a bit with this function 'hatvalues'. I would like a little > more undrestanding than taking the black-box and using the values. I looked > at the Fortran source and it is quite opaque to me. So I am asking for some > help in understanding the theory. First, I take the simplest case of a single > variant. For this I turn o John Fox's book, "Applied Regression Analysis and > Generalized Linear Models, p 245 and generate this 'R' code: > > > library(car) > > attach(Davis) > # remove the NA's > > narepwt <- repwt[!is.na(repwt)] > > meanrw <- mean(narepwt) > > drw <- narepwt - meanrw > > ssrw <- sum(drw * drw) > > h <- 1/length(narepwt) + (drw * drw)/ssrw > > h > > This gives me a array of values the largest of which is > > > order(h, decreasing=TRUE) > [1] 21 52 17 93 30 62 158 113 175 131 182 29 106 125 123 146 91 99 > > So the largest "hatvalue" is > > > h[21] > [1] 0.1041207 > > Which doesn't match the 0.714 value that is reported in the book but I will > probably take that up with the author later. > > Then I use more of 'R' and I get > > fit <- lm(weight ~ repwt) > hr <- hatvalues(fit) > hr[21] > 21 > 0.1041207 > > So this matches which is reasusing. My question is this, given the QR > transformation and the residuals derived from that transformation what is a > simple matrix formula for the hatvalues? > > >From http://en.wikipedia.org/wiki/Linear_regression I get > > residuals = y - Hy = y(I - H) > or > H = -(residuals/y - I) > > > fit <- lm(weight ~ repwt) > > h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - > diag(1,length(residuals(fit)), length(residuals(fit)))) > > This generates a matrix but I cannot see any coerrelation between this "hat- > matrix" and the return from "hatvalues". > > Comments? > > Thank you. > > Kevin > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.