Dear John,

thanks for the clarification! The lesson to be learned is that one should be aware of the fact that weights may mean different things in different functions, and sometimes different things in the same function (glm)!

Göran

On 02/06/2014 02:17 PM, John Fox wrote:
Dear Marco and Goran,

Perhaps the documentation could be clearer, but it is after all a
brief help page. Using weights of 2 to lm() is *not* equivalent to
entering the observation twice. The weights are variance weights, not
case weights.

You can see this by looking at the whole summary() output for the
models, not just the residual standard errors:

------------- snip ---------

summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))

Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
weights = rep(1, 6))

Residuals: 1        2        3        4        5        6 0.06477
-0.08728  0.07487 -0.03996  0.01746 -0.02986

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
-0.11208    0.08066   -1.39    0.237 c(1, 2.1, 2.9, 1.1, 2, 3)
1.04731    0.03732   28.07 9.59e-06 *** --- Signif. codes:  0 ‘***’
0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.07108 on 4 degrees of freedom Multiple
R-squared:  0.9949,     Adjusted R-squared:  0.9937 F-statistic: 787.6 on
1 and 4 DF,  p-value: 9.59e-06

summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))

Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
weights = rep(2, 6))

Residuals: 1        2        3        4        5        6 0.09160
-0.12343  0.10589 -0.05652  0.02469 -0.04223

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
-0.11208    0.08066   -1.39    0.237 c(1, 2.1, 2.9, 1.1, 2, 3)
1.04731    0.03732   28.07 9.59e-06 *** --- Signif. codes:  0 ‘***’
0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1005 on 4 degrees of freedom Multiple
R-squared:  0.9949,     Adjusted R-squared:  0.9937 F-statistic: 787.6 on
1 and 4 DF,  p-value: 9.59e-06

------------- snip -------------

Notice that while the residual standard errors differ, the
coefficients and their standard errors are identical. There are
compensating changes in the residual variance and the weighted sum of
squares and products matrix for X.

In contrast, literally entering each observation twice reduces the
coefficient standard errors by a factor of sqrt((6 - 2)/(12 - 2)),
i.e., the square root of the relative residual df of the models:

------------- snip --------

summary(lm(rep(c(1,2,3,1,2,3),2)~rep(c(1,2.1,2.9,1.1,2,3),2)))

Call: lm(formula = rep(c(1, 2, 3, 1, 2, 3), 2) ~ rep(c(1, 2.1, 2.9,
1.1, 2, 3), 2))

Residuals: Min        1Q    Median        3Q       Max -0.087276
-0.039963 -0.006201  0.064768  0.074874

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
-0.11208    0.05101  -2.197   0.0527 rep(c(1, 2.1, 2.9, 1.1, 2, 3),
2)  1.04731    0.02360  44.374 8.12e-13

(Intercept)                       . rep(c(1, 2.1, 2.9, 1.1, 2, 3), 2)
*** --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1

Residual standard error: 0.06358 on 10 degrees of freedom Multiple
R-squared:  0.9949,     Adjusted R-squared:  0.9944 F-statistic:  1969 on
1 and 10 DF,  p-value: 8.122e-13

---------- snip -------------

I hope this helps,

John

------------------------------------------------ John Fox, Professor
McMaster University Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/   On Thu, 6 Feb 2014 09:27:22 +0100
Göran Broström <goran.brost...@umu.se> wrote:
On 05/02/14 22:40, Marco Inacio wrote:
Hello all, can help clarify something?

According to R's lm() doc:

Non-NULL weights can be used to indicate that different
observations have different variances (with the values in
weights being inversely *proportional* to the variances); or
equivalently, when the elements of weights are positive
integers w_i, that each response y_i is the mean of w_i
unit-weight observations (including the case that there are w_i
observations equal to y_i and the data have been summarized).

Since the idea here is *proportion*, not equality, shouldn't the
vectors of weights x, 2*x give the same result? And yet they
don't, standard errors differs:

summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))$sigma


[1] 0.07108323
summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))$sigma


[1] 0.1005269

The weights are in fact case weights, i.e., a weight of 2 is the
same as including the corresponding item twice. I agree that the
documentation is no wonder of clarity in this respect.

Btw, note that, in your example, (0.1005269 / 0.07108323)^2 = 2,
your constant weight.

Göran Broström


So what if I know a-priori, observation A has variance 2 times
bigger than observation B? Both weights=c(1,2) and weights=c(2,4)
(and so on) represent very well this knowledge, but we get
different regression (since sigma is different).


Also, if we do the same thing with a glm() model, than we get a
lot of other differences like in the deviance.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.


______________________________________________ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to