On 14-04-2012, at 21:45, peter dalgaard wrote: > > On Apr 14, 2012, at 14:40 , Berend Hasselman wrote: > >> >> On 13-04-2012, at 22:20, Gene Leynes wrote: >> >>> I can't figure out why this is returning an NA for the slope in one case, >>> but not in the other. >>> >>> I can tell that R thinks the first case is singular, but why isn't the >>> second? >>> >>> ## Define X and Y >>> ## There are two versions of x >>> ## 1) "as is" >>> ## 2) shifted to start at 0 >>> y = c(58, 57, 57, 279, 252, 851, 45, 87, 47) >>> x1 = c(1334009411.437, 1334009411.437, 1334009411.437, 1334009469.297, >>> 1334009469.297, 1334009469.297, 1334009485.697, 1334009485.697, >>> 1334009485.697) >>> x2 = x1 - min(x1) >>> >>> ## Why doesn't the LM model work for the "as is" x? >>> lm(y~x1) >>> lm(y~x2) >> >> >> With your data the matrix t(X)%*%X is extremely ill-conditioned for the >> case with x1. >> See >> >> http://en.wikipedia.org/wiki/Condition_number >> http://mathworld.wolfram.com/ConditionNumber.html >> http://en.wikipedia.org/wiki/Numerical_methods_for_linear_least_squares >> >> You can check it with >> >> makeXX <- function(x) { >> matrix(data=c(x,rep(1,length(x))),nrow=length(x), byrow=FALSE) >> } >> >> X1 <- makeXX(x1) >> (XTX1 <- t(X1) %*% X1) >> svd(XTX1) >> >> and similar for x2. > > <lecture on numerical linear algebra> > > lm() is actually a bit smarter than to use the textbook formula > (X'X)^{-1}X'Y, it is internally based on a QR decomposition which is > numerically far more stable. What it does is effectively to orthogonalize the > columns of X successively. > >> x <- cbind(1, x1) >> qr(x) > $qr > x1 > [1,] -3.0000000 -4.002028e+09 > [2,] 0.3333333 9.555777e+01 > [3,] 0.3333333 3.456548e-01 > [4,] 0.3333333 -2.598428e-01 > [5,] 0.3333333 -2.598428e-01 > [6,] 0.3333333 -2.598428e-01 > [7,] 0.3333333 -4.314668e-01 > [8,] 0.3333333 -4.314668e-01 > [9,] 0.3333333 -4.314668e-01 > > $rank > [1] 1 > > $qraux > [1] 1.333333 1.345655 > > $pivot > [1] 1 2 > > attr(,"class") > [1] "qr" > > However, notice the rank of 1. That's because it wants to protect the user > against unwittingly plugging in a singular design matrix, so when the > algorithm encounters a variable that looks like it is getting reduced to zero > by after orthogonalization, it basically throws it out. > > You can defeat this mechanism by setting the tol= argument sufficiently low. > In fact, you can do the same with lm itself: > >> lm(y ~ x1, tol = 0) > ... > (Intercept) x1 > -2.474e+09 1.854e+00 > > Notice that the slope is consistent with the > >> lm(y ~ x2) > ... > (Intercept) x2 > 110.884 1.854 > > In contrast if you try the textbook way: > >> solve(crossprod(x), crossprod(x,y)) > Error in solve.default(crossprod(x), crossprod(x, y)) : > system is computationally singular: reciprocal condition number = 2.67813e-34 > > Ah, well, let's try and defeat that: > >> solve(crossprod(x), crossprod(x,y), tol=0) > [,1] > -2.959385e+09 > x1 2.218414e+00 > > Which, as you'll notice is, er, somewhat off the mark. > > </lecture on numerical linear algebra>
Thank you. Indeed. I'm perfectly well aware of this. I was trying to illustrate the problem. Berend ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.