Re: [R] lm without intercept

Dennis Murphy Fri, 18 Feb 2011 04:29:59 -0800

Hi:

On Fri, Feb 18, 2011 at 2:49 AM, Jan <jrheinlaen...@gmx.de> wrote:

> Hi,
>
> I am not a statistics expert, so I have this question. A linear model
> gives me the following summary:
>
> Call:
> lm(formula = N ~ N_alt)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -110.30  -35.80  -22.77   38.07  122.76
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  13.5177   229.0764   0.059   0.9535
> N_alt         0.2832     0.1501   1.886   0.0739 .
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
>
> Residual standard error: 56.77 on 20 degrees of freedom
>  (16 observations deleted due to missingness)
> Multiple R-squared: 0.151, Adjusted R-squared: 0.1086
> F-statistic: 3.558 on 1 and 20 DF,  p-value: 0.07386
>
> The regression is not very good (high p-value, low R-squared).
> The Pr value for the intercept seems to indicate that it is zero with a
> very high probability (95.35%). So I repeat the regression forcing the
> intercept to zero:
>

That's not the interpretation of a p-value. What it means is: *given that
the null hypothesis beta0 = 0 is true*, the probability of observing a value
of the t-statistic *more extreme than the observed value of 0.059* is about
0.9535. The presumption that H_0 is true for the purpose of the test allows
one to derive a 'reference distribution' (in this case, the t-distribution
with error degrees of freedom) against which one can compare the observed
value of the t-statistic. The second part of the emphasized statement
provides a context for which the p-value can be correctly interpreted in
relation to the reference distribution of the test statistic when H_0 is
true.

You're evidently trying to interpret the p-value as the probability that the
null hypothesis is true. No.

You can conclude, however, that there is not enough sample evidence to
contradict the null hypothesis beta0 = 0 due to the magnitude of the
p-value.

> Call:
> lm(formula = N ~ N_alt - 1)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -110.11  -36.35  -22.13   38.59  123.23
>
> Coefficients:
>      Estimate Std. Error t value Pr(>|t|)
> N_alt 0.292046   0.007742   37.72   <2e-16 ***
> ---
> Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
>
> Residual standard error: 55.41 on 21 degrees of freedom
>  (16 observations deleted due to missingness)
> Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848
> F-statistic:  1423 on 1 and 21 DF,  p-value: < 2.2e-16
>
> 1. Is my interpretation correct?
>
2. Is it possible that just by forcing the intercept to become zero, a
> bad regression becomes an extremely good one?
>
No.

> 3. Why doesn't lm suggest a value of zero (or near zero) by itself if
> the regression is so much better with it?
>
Because computer programs don't read minds. You may want a zero intercept;
someone else may not. And your perception that the 'regression is so much
better' with a zero intercept is in error.

If you plotted your data, you would realize that whether you fit the 'best'
least squares model or one with a zero intercept, the fit is not going to be
very good, and you would have deduced that the 0.985 R^2 returned from the
no-intercept model is an illusion. It is mathematically correct, however,
given the linear model theory behind it and the definition of R^2 as the
ratio of the model sum of squares (SS) to the total SS. If you want to have
more fun, sum the residuals from the zero-intercept fit, and then ask
yourself why they don't add to zero.

You need to educate yourself on the difference between regression with and
without intercepts. In particular, the R^2 in the with-intercept model uses
mean corrections before computing sums of squares; in the no-intercept
model, mean corrections are not applied. Since R^2 is a ratio of sums of
squares, this distinction matters. (If my use of 'mean correction' is
confusing, Y is not mean-corrected, but Y - Ybar is. Ditto for X.)

Try this:
plot(N_alt, N, pch = 16)
abline(coef(lm(N ~ N_alt)))
abline(c(0, coef(lm(N ~ N_alt + 0))), lty = 'dashed')

Do the data cluster tightly around the dashed line?

HTH,
Dennis

PS: A Google search on 'linear regression zero intercept' might be
beneficial. Here are a couple of hits from such a search:
http://www.bios.unc.edu/~truong/b663/pdf/noint.pdf
http://tltc.ttu.edu/cs/colleges__schools/rawls_college_of_business/f/42/p/288/470.aspx

Please excuse my ignorance.
>
> Jan Rheinländer
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm without intercept

Reply via email to