On Dec 19, 2011, at 9:09 AM, Brent Pedersen wrote:

Hi, I'm sure this is simple, but I haven't been able to find this in TFM,
say I have some data in R like this (pasted here:
http://pastebin.com/raw.php?i=sjS9Zkup):

One of the reason this is not in TFM is that these are questions that should be available in any first course on regression textbook.


head(df)
   gender age smokes disease    Y
 1 female  65   ever control 0.18
 2 female  77  never control 0.12
 3   male  40         state1 0.11
 4 female  67   ever control 0.20
 5   male  63   ever  state1 0.16
 6 female  26  never  state1 0.13

where unique(disease) == c("control", "state1", "state2")
and unique(smokes) == c("ever", "never", "", "current")

I then fit a linear model like:

model = lm(Y ~ smokes + disease + age + gender, data=df)

And I want to understand the difference between:

print(summary(model))
   Call:
   lm(formula = Y ~ smokes + disease + age + gender, data = df)

   Residuals:
        Min       1Q   Median       3Q      Max
   -0.22311 -0.08108 -0.03483  0.05604  0.46507

   Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
   (Intercept)    0.1206825  0.0521368   2.315   0.0211 *
   smokescurrent  0.0150641  0.0444466   0.339   0.7348
   smokesever     0.0498764  0.0326254   1.529   0.1271
   smokesnever    0.0394109  0.0349142   1.129   0.2597
   diseasestate1  0.0018739  0.0176817   0.106   0.9157
   diseasestate2 -0.0009858  0.0178651  -0.055   0.9560
   age            0.0002841  0.0006290   0.452   0.6518
   gendermale     0.1164889  0.0128748   9.048   <2e-16 ***
   ---
   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

   Residual standard error: 0.1257 on 397 degrees of freedom
   Multiple R-squared: 0.1933, Adjusted R-squared: 0.1791
   F-statistic: 13.59 on 7 and 397 DF,  p-value: 8.975e-16

and:

anova(model)
 Analysis of Variance Table

 Response: Y
            Df Sum Sq Mean Sq F value  Pr(>F)
 smokes      3 0.1536 0.05120  3.2397 0.02215 *
 disease     2 0.0129 0.00647  0.4096 0.66420
 age         1 0.0431 0.04310  2.7270 0.09946 .
 gender      1 1.2937 1.29373 81.8634 < 2e-16 ***
 Residuals 397 6.2740 0.01580
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I understand (hopefully correctly) that anova() tests by adding each covariate
to the model in order it is specified in the formula.


More specific questions are:

All of which are general statistics questions which you are asked to post in forums or lists that expect such questions ... and not to r- help.


1) How do the p-values for smokes* in summary(model) relate to the
  Pr(>F) for smokes in anova
2) what do the p-values for each of those smokes* mean exactly?
3) the summary above shows the values for diseasestate1 and diseasestate2 how can I get the p-value for diseasecontrol? (or, e.g. genderfemale)


^^^^^^^^^^^^^^^^^^^^^^^
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
-------------------

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to