Jessica L Hite/hitejl/O/VCU <hitejl <at> vcu.edu> writes: > I am attempting to run a glm with a binomial model to analyze proportion > data. > I have been following Crawley's book closely and am wondering if there is > an accepted standard for how much is too much overdispersion? (e.g. change > in AIC has an accepted standard of 2).
In principle, in the null case (i.e. data are really binomial) the deviance is chi-squared distributed with the df equal to the residual df. For example: example(glm) deviance(glm.D93) ## 5.13 summary(glm.D93)$dispersion ## 1 (by definition) dfr <- df.residual(glm.D93) deviance(glm.D93)/dfr ## 1.28 d2 <- sum(residuals(glm.D93,"pearson")^2) ## 5.17 (disp2 <- d2/dfr) ## 1.293 gg2 <- update(glm.D93,family=quasipoisson) summary(gg2)$dispersion ## 1.293, same as above pchisq(d2,df=dfr,lower.tail=FALSE) all.equal(coef(glm.D93),coef(gg2)) ## TRUE se1 <- coef(summary(glm.D93))[,"Std. Error"] se2 <- coef(summary(gg2))[,"Std. Error"] se2/se1 # (Intercept) outcome2 outcome3 treatment2 treatment3 # 1.137234 1.137234 1.137234 1.137234 1.137234 sqrt(disp2) # [1] 1.137234 > My code and output are below, given the example in the book, these data are > WAY overdispersed .....do I mention this and go on or does this signal the > need to try a different model? If so, any suggestions on the type of > distribution (gamma or negative binomial ?)? Way overdispersed may indicate model lack of fit. Have you examined residuals/data for outliers etc.? quasibinomial should be fine, or you can try beta-binomial (see the aod package) ... > attach(Clutch2) > y<-cbind(Total,Size-Total) > glm1<-glm(y~Pred,"binomial") > summary(glm1) > > Call: > glm(formula = y ~ Pred, family = "binomial") > > Deviance Residuals: > Min 1Q Median 3Q Max > -9.1022 -2.7899 -0.4781 2.6058 8.4852 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 1.35095 0.06612 20.433 < 2e-16 *** > PredF -0.34811 0.11719 -2.970 0.00297 ** > PredSN -3.29156 0.10691 -30.788 < 2e-16 *** > PredW -1.46451 0.12787 -11.453 < 2e-16 *** > PredWF -0.56412 0.13178 -4.281 1.86e-05 *** > --- > #### the output for residual deviance and df does not change even when I > use quasibinomial, is this ok? ##### That's as expected. > library(MASS) you don't really need MASS for quasibinomial. > > glm2<-glm(y~Pred,"quasibinomial") > > summary(glm2) > > Call: > glm(formula = y ~ Pred, family = "quasibinomial") > > Deviance Residuals: > Min 1Q Median 3Q Max > -9.1022 -2.7899 -0.4781 2.6058 8.4852 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.3510 0.2398 5.633 1.52e-07 *** > PredF -0.3481 0.4251 -0.819 0.41471 > PredSN -3.2916 0.3878 -8.488 1.56e-13 *** > PredW -1.4645 0.4638 -3.157 0.00208 ** > PredWF -0.5641 0.4780 -1.180 0.24063 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for quasibinomial family taken to be 13.15786) > > Null deviance: 2815.5 on 108 degrees of freedom > Residual deviance: 1323.5 on 104 degrees of freedom > (3 observations deleted due to missingness) > AIC: NA > > Number of Fisher Scoring iterations: 5 > > > anova(glm2,test="F") > Analysis of Deviance Table > > Model: quasibinomial, link: logit > > Response: y > > Terms added sequentially (first to last) > > Df Deviance Resid. Df Resid. Dev F Pr(>F) > NULL 108 2815.5 > Pred 4 1492.0 104 1323.5 28.349 6.28e-16 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > model1<-update(glm2,~.-Pred) > > anova(glm2,model1,test="F") > Analysis of Deviance Table > > Model 1: y ~ Pred > Model 2: y ~ 1 > Resid. Df Resid. Dev Df Deviance F Pr(>F) > 1 104 1323.5 > 2 108 2815.5 -4 -1492.0 28.349 6.28e-16 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > coef(glm2) > (Intercept) PredF PredSN PredW PredWF > 1.3509550 -0.3481096 -3.2915601 -1.4645097 -0.5641223 > > Thanks > Jessica > hitejl <at> vcu.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.