Hélène,

thanks for spotting this! This is a bug in "AER". I had just tested the new diagnostics for regressions with 1 endogenous variable and hence never noticed the problem. But if there are > 1 endogenous variables, the df used in ivreg() (and hence the associated p-values) are too large.

I've fixed the problem in AER's devel-version and will release it on CRAN in the next days.

Thanks & best regards,
Z

On Thu, 7 Nov 2013, Hélène Huber-Yahi wrote:

Hello,
I'm new to R and I'm currently learning to use package AER, which is
extremely comprehensive and useful. I have one question related to the
diagnostics after ivreg: if I understood well, the Sargan test provided
states that the statistic should follow a Chi squared of degrees of freedom
equal to the number of excluded instruments minus one. But I read many
times that the degrees of freedom of this statistic is supposed to equal
the number of overidentifying restrictions, i.e. the number of excluded
instruments minus the number of endogenous variables tested. When comparing
with Stata results (estat overid after ivreg, same with ivreg2 output), the
statistic is the same as the one provided by R, only the p-value changes
because the distribution chosen is different. Is this command using a
different flavor of the Sargan test ? I did not find the details in the AER
pdf.
I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The
output I get from R is the following, where the Sargan DF is equal to 5,
while I thought it would be equal to 6-3=3. The data comes from Verbeek's
econometrics textbook and the example replicates the one in the book.
Dependent variable is log of wage, endogenous variables are education,
experience and its square (3 of them), excluded instruments are parents'
education etc (6 of them).

ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + 
momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76,+             data 
= school)> > summary(ivmodel,diagnostics=TRUE)
Call:
ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 +
   south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 +
   black + smsa76 + south76, data = school)

Residuals:
    Min       1Q   Median       3Q      Max
-1.63375 -0.22253  0.02403  0.24350  1.32911

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.6064811  0.1126195  40.903  < 2e-16 ***
ed76         0.0848507  0.0066061  12.844  < 2e-16 ***
exp76        0.0796432  0.0164406   4.844 1.34e-06 ***
exp762      -0.0020376  0.0008257  -2.468   0.0136 *
black       -0.1726723  0.0195231  -8.845  < 2e-16 ***
smsa76       0.1521693  0.0165207   9.211  < 2e-16 ***
south76     -0.1204765  0.0154904  -7.778 1.01e-14 ***

Diagnostic tests:
                 df1  df2 statistic p-value
Weak instruments    6 2987   965.450  <2e-16 ***
Wu-Hausman          2 2988     1.949   0.143
Sargan              5   NA     3.868   0.569
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 0.3753 on 2990 degrees of freedom
Multiple R-Squared: 0.2868,     Adjusted R-squared: 0.2854
Wald test: 178.6 on 6 and 2990 DF,  p-value: < 2.2e-16


Would this be caused by the fact that I'm using 2SLS and not GMM (at least
I suppose) to estimate the IV model ? I apologize if this comes from a
misunderstanding from my part, and I thank you in advance for your help.

Best,

H. Huber

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to