Re: [R] naive "collinear" weighted linear regression

David Winsemius Sat, 14 Nov 2009 13:53:13 -0800


On Nov 14, 2009, at 1:50 PM, Mauricio O Calvao wrote:

David Winsemius <dwinsemius <at> comcast.net> writes:
Which means those x, y, and "error" figures did not come from an
experiment, but rather from theory???
The fact is I am trying to compare the results of:
(1) lm under R and
(2) the Java applet at http://omnis.if.ufrj.br/~carlos/applets/reta/reta.html
(3) the Fit method of the ROOT system used by CERN,
(4) the Numerical Recipes functions for weighted linear regression
The three last ones all provide, for the "fake" data set I furnishedin my first
post in this thread, the same results; particularly they give erros or
uncertainties in the estimated coefficients of intercept and slopewhich, asseems intuitive, are not zero at all, but of the order 0.1 or 0.2,whereas themethod lm under R issues a "Std. Error", which is zero.Independently ofterminology, which sure is of utmost importance, the data I providedshould giverise to a best fit straight line with intercept zero and slope 2,but also withnon-vanishing errors associated with them. How do I get this fromlm????
I only want, for instance, calculation of the so-called covariancematrix forthe estimated coefficients, as given, for instance, in Equation(2.3.2) of thesecond edition of Draper and Smith, "Applied regression analysis";this is astandard statistical result, right? So why does R not directlyprovide it as a
summary from an lm object???

It's really not that difficult to get the variance covariance matrix.What is not so clear is why you think differential weighting of a setthat has a perfect fit should give meaningfully different results thana fit that has no weights.


?lm
?vcov

>  y <- c(2,4,6,8) # response vect
> fit_mod <- lm(y~x,weights=1/error^2)
Error in eval(expr, envir, enclos) : object 'error' not found
> error <- c(0.3,0.3,0.3,0.3)
> fit_mod <- lm(y~x,weights=1/error^2)
> vcov(fit_mod)
              (Intercept)             x
(Intercept)  2.396165e-30 -7.987217e-31
x           -7.987217e-31  3.194887e-31

Numerically those are effectively zero.

> fit_mod <- lm(y~x)
> vcov(fit_mod)
            (Intercept) x
(Intercept)           0 0
x                     0 0


--
David.


Of course the best fit coefficients should be 0 for the intercept
and 2 for the slope. Furthermore, it seems completely plausible (or
not?) that, since the y_i have associated non-vanishing
``errors'' (dispersions), there should be corresponding non-
vanishing ``errors'' associated to the best fit coefficients, right?

When I try:

fit_mod <- lm(y~x,weights=1/error^2)


I get

Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
extra arguments weigths are just disregarded.


 (Actually the weights are for adjusting for sampling, and I do not
see any sampling in your "design".)


Keeping on, despite the warning message, which I did not quite
understand, when I type:

summary(fit_mod)


I get

Call:
lm(formula = y ~ x, weigths = 1/error^2)

Residuals:
       1          2          3          4
-5.067e-17  8.445e-17 -1.689e-17 -1.689e-17

Coefficients:
           Estimate Std. Error   t value Pr(>|t|)
(Intercept) 0.000e+00  8.776e-17 0.000e+00        1
x           2.000e+00  3.205e-17 6.241e+16   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.166e-17 on 2 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1
F-statistic: 3.895e+33 on 1 and 2 DF,  p-value: < 2.2e-16


Naively, should not the column Std. Error be different from zero??
What I have in mind, and sure is not what Std. Error means, is that
if I carried out a large simulation, assuming each response y_i a
Gaussian random variable with mean y_i and standard deviation
2*error=0.6, and then making an ordinary least squares fitting of
the slope and intercept, I would end up with a mean for these
simulated coefficients which should be 2 and 0, respectively,


Well, not precisely 2 and 0, but rather something very close ... i.e,
within "experimental error". Please note that numbers in the range of
10e-17 are effectively zero from a numerical analysis perspective.

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

.Machine$double.eps ^ 0.5

[1] 1.490116e-08

I know this all too well and it is obviously a trivial supernewbieissue, which

I have already overcome a long time ago...

and, that's the point, a non-vanishing standard deviation for these
fitted coefficients, right?? This somehow is what I expected should
be an estimate or, at least, a good indicator, of the degree of
uncertainty which I should assign to the fitted coefficients; it
seems to me these deviations, thus calculated as a result of the
simulation, will certainly not be zero (or  3e-17, for that matter).
So this Std. Error does not provide what I, naively, think should be
given as a measure of the uncertainties or errors in the fitted
coefficients...


You are trying to impose an error structure on a data situation that
you constructed artificially to be perfect.


What am I not getting right??

That if you input "perfection" into R's linear regression program,you

get appropriate warnings?


Thanks and sorry for the naive and non-expert question!


You are a Professor of physics, right? You do experiments, right? You
replicate them.  S0 perhaps I'm the one who should be puzzled.

Unfortunately you eschewed answering objectively any of myquestions; I insistthey do make sense. Don't mention the data are perfect; this doesnot help tomake any progress in understanding the choice of convenient summaryinfo the lmmethod provides, as compared to what, in my humble opinion and inthis specificparticular case, it should provide: the covariance matrix of theestimated

coefficients...

--
#######################################
Prof. Mauricio Ortiz Calvao
Federal University of Rio de Janeiro
Institute of Physics, P O Box 68528
CEP 21941-972 Rio de Janeiro, RJ
Brazil


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] naive "collinear" weighted linear regression

Reply via email to