Re: [R] R's basic lm() and summary.lm functions

Achim Zeileis Sat, 11 May 2013 12:57:57 -0700

On Sat, 11 May 2013, ivo welch wrote:

coeftest with sandwich is indeed powerful and probably faster.  I see
one drawback: it requires at least three more packages: lmtest,
sandwich, and in turn zoo, which do not come with standard R.


But they should be easy enough to install...

I also wonder whether I committed a bug in my own code, or whether thereis another parameter I need to specify in sandwich. my standard errorestimates are very similar but not the same.


Hmmm, I get the same results (up to machine precision):

## data and model
set.seed(1)
x <- rnorm(12); y <- rnorm(12); z <- rnorm(12); x[2] <-NA
m <- lm(y ~ x + z)

## standard errors
nw1 <- ols(y ~ x + z)$coefficients[, 6]
nw2 <- sqrt(diag(NeweyWest(m, lag = 0, prewhite = FALSE)))

And then I get:

R> nw1 - nw2
  (Intercept)             x             z
 0.000000e+00 -1.110223e-16 -2.775558e-17

By the way: With lag=0, these are equivalent to the basic sandwichstandard errors and to the HC0 standard errors:


nw3 <- sqrt(diag(sandwich(m)))
nw4 <- sqrt(diag(vcovHC(m, type = "HC0")))

And then:

R> nw3 - nw2
(Intercept)           x           z
          0           0           0
R> nw4 - nw2
(Intercept)           x           z
          0           0           0

if anyone wants to use my code, I am dropping a revised version in
below.  it also uses a better way to document the function inline.
eval(parse(text=(docs[["lm"]][["examples"]]))) is nice.  you get what
you pay for.

more generally, I am still wondering why we have an lm and a
summary.lm function, rather than just one function and one resulting
object for parsimony, given that the summary.lm function is fast and
does not increase the object size.

When running many regressions, it may still be useful to not run thesummary every time, I guess.


Best,
Z

----
Ivo Welch (ivo.we...@gmail.com)



docs[["lm"]] <- c(
 author='ivo.we...@gmail.com',
 date='2013',
 description='add newey-west and standardized coefficients to lm(),
and return the summary.lm',
 usage='lm(..., newey.west=0, stdcoefs=TRUE)',
 arguments='',
 details='
   Note that when y or x have different observations, the
coef(lm(y~x))*sd(x)/sd(y)
   calculations are different from a scale(y) ~ scale(x) regression.

   Note that the NeweyWest statistics I get from the sandwich package
are different.
   could be my bug, or could be my problem specifying an additional parameter.
   Here is my code

     library(sandwich)
     library(lmtest)
     print(coeftest(lmo, vcov = NeweyWest, lag=0, prewhite=FALSE))
',
 seealso='stats:::lm, stats:::summary.lm',
 examples='
   x <- rnorm(12); y <- rnorm(12); z <- rnorm(12); x[2] <-NA;
   lm( y ~ x + z )
',
 test='eval(parse(text=(docs[["lm"]][["examples"]])))',
 changes= '',
 version='0.91'
)

################################################################

lm <- function (..., x = FALSE, newey.west=(0), stdcoefs=TRUE) {

 ## R is painfully error-tolerant. I prefer reasonable and immediate
error warnings.
 stopifnot( 
(is.vector(newey.west))&(length(newey.west)==1)|(is.numeric(newey.west))
)
 stopifnot( (is.vector(stdcoefs))&(length(stdcoefs)==1)|(is.logical(stdcoefs))
)
 stopifnot( (is.vector(x))&(length(x)==1)|(is.logical(x)) )
 ## I wish I could check lm()'s argument, but I cannot.

 lmo <- stats:::lm(..., x=TRUE)
 ## note that both the x matrix and the residuals from the model have
their NA's omitted by default

 if (newey.west>=0) {
   resids <- residuals(lmo)
   diagband.matrix <- function(m, ar.terms) {
     nomit <- m - ar.terms - 1
     mm <- matrix(TRUE, nrow = m, ncol = m)
     mm[1:nomit, (ncol(mm) - nomit + 1):ncol(mm)] <-
(lower.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
     mm[(ncol(mm) - nomit + 1):ncol(mm), 1:nomit] <-
(upper.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
     mm
   }
   invx <- chol2inv(chol(crossprod(lmo$x)))
   invx.x <- invx %*% t(lmo$x)
   if (newey.west==0)
     resid.matrix <- diag(resids^2)
   else {
     full <- resids %*% t(resids)
     maskmatrix <- diagband.matrix(length(resids), newey.west)
     resid.matrix <- full * maskmatrix
   }
   vmat <- invx.x %*% resid.matrix %*% t(invx.x)

   nw <- newey.west  ## the number of AR terms
   nw.se <- sqrt(diag(vmat))  ## the standard errors
 }

 if (stdcoefs) {
   xsd <- apply( lmo$x, 2, sd)
   ysd <- sd( lmo$model[,1] )
   stdcoefs.v <- lmo$coefficients*xsd/ysd
 }

 full.x.matrix <- if (x) lmo$x else NULL
 lmo <- stats:::summary.lm(lmo)  ## add the summary.lm object
 if (x) lmo$x <- full.x.matrix

 if (stdcoefs) {
   lmo$coefficients <- cbind(lmo$coefficients, stdcoefs.v )
   colnames(lmo$coefficients)[ncol(lmo$coefficients)] <- "stdcoefs"
 }

 if (newey.west>=0) {
   lmo$coefficients <- cbind(lmo$coefficients, nw.se)
   colnames(lmo$coefficients)[ncol(lmo$coefficients)] <-
paste0("se.nw(",newey.west,")")
   lmo$coefficients <- cbind(lmo$coefficients, lmo$coefficients[,1]/nw.se)
   colnames(lmo$coefficients)[ncol(lmo$coefficients)] <-
paste0("T.nw(",newey.west,")")
 }

 lmo

}

attr(lm, "original") <- stats:::lm


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R's basic lm() and summary.lm functions

Reply via email to