Dear R users,

I think i have a simple question which i want to explain by an example;

i have several 2-digit industry codes that i want to use for conducting 
by-industry analysis but i think there is a problem with the degrees of freedom!

for example, when i do my analysis without any 2-digit industry code, i got the 
following summary (i have 146574 observations in total):
> abc<-lm(lnQ~lnC+lnM+lnL+lnE+eco+inno, data=ds)
> summary(abc)

Call:
lm(formula = lnQ ~ lnC + lnM + lnL + lnE + eco + inno, data = ds)

Residuals:
      Min        1Q    Median        3Q       Max 
-11.01340  -0.17637  -0.02217   0.14974   7.79005 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) 0.8870369  0.0050646  175.144   <2e-16 ***
lnC         0.0658922  0.0006549  100.614   <2e-16 ***
lnM         0.8027478  0.0006549 1225.764   <2e-16 ***
lnL         0.0173622  0.0004025   43.138   <2e-16 ***
lnE         0.0657710  0.0006745   97.516   <2e-16 ***
ecoTRUE     0.0101649  0.0045892    2.215   0.0268 *  
innoTRUE    0.0945100  0.0030317   31.174   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.294 on 146160 degrees of freedom
  (407 observations deleted due to missingness)
Multiple R-squared: 0.9705,     Adjusted R-squared: 0.9705 
F-statistic: 8.027e+05 on 6 and 146160 DF,  p-value: < 2.2e-16 

as we can see from the last row there are 146160 DF (407 deleted) this is ok!




but when i want to use for example just one of the industry lets say just the 
11th industry
1st:  i create the dummy for this industry such as; 


>ind1=(ind_2d==11)# so here the R supposed to consider just the 11th industry!!
> abc<-lm(lnQ~lnC+lnM+lnL+lnE+eco+inno+ind, data=ds)
> summary(abc)

Call:
lm(formula = lnQ ~ lnC + lnM + lnL + lnE + eco + inno + ind, 
    data = ds)

Residuals:
      Min        1Q    Median        3Q       Max 
-11.03392  -0.17647  -0.02301   0.14901   7.74957 

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  0.8980397  0.0050451  178.001  < 2e-16 ***
lnC          0.0672255  0.0006523  103.065  < 2e-16 ***
lnM          0.7990819  0.0006579 1214.596  < 2e-16 ***
lnL          0.0171633  0.0004004   42.870  < 2e-16 ***
lnE          0.0670030  0.0006716   99.770  < 2e-16 ***
ecoTRUE      0.0162249  0.0045672    3.552 0.000382 ***
innoTRUE     0.0966967  0.0030160   32.062  < 2e-16 ***
indTRUE     -0.1251466  0.0031509  -39.717  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.2924 on 146159 degrees of freedom
  (407 observations deleted due to missingness)
Multiple R-squared: 0.9709,     Adjusted R-squared: 0.9709 
F-statistic: 6.957e+05 on 7 and 146159 DF,  p-value: < 2.2e-16 

but as we can see it again counted in all the industries! so the DF is 146159!!!


So i just wonder, where do i made mistake, or there is no mistake at all, and i 
just misunderstood the DF issue?

Any answer would be appreciated
thanks in advance






                                          
_________________________________________________________________


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to