Dear Marco and Goran,
Perhaps the documentation could be clearer, but it is after all a
brief help page. Using weights of 2 to lm() is *not* equivalent to
entering the observation twice. The weights are variance weights, not
case weights.
You can see this by looking at the whole summary() output for the
models, not just the residual standard errors:
------------- snip ---------
summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))
Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
weights = rep(1, 6))
Residuals: 1 2 3 4 5 6 0.06477
-0.08728 0.07487 -0.03996 0.01746 -0.02986
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
-0.11208 0.08066 -1.39 0.237 c(1, 2.1, 2.9, 1.1, 2, 3)
1.04731 0.03732 28.07 9.59e-06 *** --- Signif. codes: 0 ‘***’
0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07108 on 4 degrees of freedom Multiple
R-squared: 0.9949, Adjusted R-squared: 0.9937 F-statistic: 787.6 on
1 and 4 DF, p-value: 9.59e-06
summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))
Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
weights = rep(2, 6))
Residuals: 1 2 3 4 5 6 0.09160
-0.12343 0.10589 -0.05652 0.02469 -0.04223
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
-0.11208 0.08066 -1.39 0.237 c(1, 2.1, 2.9, 1.1, 2, 3)
1.04731 0.03732 28.07 9.59e-06 *** --- Signif. codes: 0 ‘***’
0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1005 on 4 degrees of freedom Multiple
R-squared: 0.9949, Adjusted R-squared: 0.9937 F-statistic: 787.6 on
1 and 4 DF, p-value: 9.59e-06
------------- snip -------------
Notice that while the residual standard errors differ, the
coefficients and their standard errors are identical. There are
compensating changes in the residual variance and the weighted sum of
squares and products matrix for X.
In contrast, literally entering each observation twice reduces the
coefficient standard errors by a factor of sqrt((6 - 2)/(12 - 2)),
i.e., the square root of the relative residual df of the models:
------------- snip --------
summary(lm(rep(c(1,2,3,1,2,3),2)~rep(c(1,2.1,2.9,1.1,2,3),2)))
Call: lm(formula = rep(c(1, 2, 3, 1, 2, 3), 2) ~ rep(c(1, 2.1, 2.9,
1.1, 2, 3), 2))
Residuals: Min 1Q Median 3Q Max -0.087276
-0.039963 -0.006201 0.064768 0.074874
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
-0.11208 0.05101 -2.197 0.0527 rep(c(1, 2.1, 2.9, 1.1, 2, 3),
2) 1.04731 0.02360 44.374 8.12e-13
(Intercept) . rep(c(1, 2.1, 2.9, 1.1, 2, 3), 2)
*** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 0.06358 on 10 degrees of freedom Multiple
R-squared: 0.9949, Adjusted R-squared: 0.9944 F-statistic: 1969 on
1 and 10 DF, p-value: 8.122e-13
---------- snip -------------
I hope this helps,
John
------------------------------------------------ John Fox, Professor
McMaster University Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/ On Thu, 6 Feb 2014 09:27:22 +0100
Göran Broström <goran.brost...@umu.se> wrote:
On 05/02/14 22:40, Marco Inacio wrote:
Hello all, can help clarify something?
According to R's lm() doc:
Non-NULL weights can be used to indicate that different
observations have different variances (with the values in
weights being inversely *proportional* to the variances); or
equivalently, when the elements of weights are positive
integers w_i, that each response y_i is the mean of w_i
unit-weight observations (including the case that there are w_i
observations equal to y_i and the data have been summarized).
Since the idea here is *proportion*, not equality, shouldn't the
vectors of weights x, 2*x give the same result? And yet they
don't, standard errors differs:
summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))$sigma