I am struiggling a bit with this function 'hatvalues'.  I would like a little 
more undrestanding than taking the black-box and using the values. I looked at 
the Fortran source and it is quite opaque to me. So I am asking for some help 
in understanding the theory. First, I take the simplest case of a single 
variant. For this I turn o John Fox's book, "Applied Regression Analysis and 
Generalized Linear Models, p 245 and generate this 'R' code:

> library(car)
> attach(Davis)
# remove the NA's
> narepwt <- repwt[!is.na(repwt)]
> meanrw <- mean(narepwt)
> drw <- narepwt - meanrw
> ssrw <- sum(drw * drw)
> h <- 1/length(narepwt) + (drw * drw)/ssrw
> h

This gives me a array of values the largest of which is

> order(h, decreasing=TRUE)
  [1]  21  52  17  93  30  62 158 113 175 131 182  29 106 125 123 146  91  99

So the largest "hatvalue" is 

> h[21]
[1] 0.1041207

Which doesn't match the 0.714 value that is reported in the book but I will 
probably take that up with the author later.

Then I use more of 'R' and I get

fit <- lm(weight ~ repwt)
hr <- hatvalues(fit)
hr[21]
       21 
0.1041207 

So this matches which is reasusing. My question is this, given the QR 
transformation and the residuals derived from that transformation what is a 
simple matrix formula for the hatvalues?

>From http://en.wikipedia.org/wiki/Linear_regression I get

residuals = y - Hy = y(I - H)
or
H = -(residuals/y - I)

> fit <- lm(weight ~ repwt)
> h <- -(residuals(fit)/weight[as.numeric(names(residuals(fit)))] - 
> diag(1,length(residuals(fit)), length(residuals(fit))))

This generates a matrix but I cannot see any coerrelation between this 
"hat-matrix" and the return from "hatvalues".

Comments?

Thank you.

Kevin

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to