On Tue, 17 Feb 2009, Ben Bolker wrote:
Jessica L Hite/hitejl/O/VCU <hitejl <at> vcu.edu> writes:
I am attempting to run a glm with a binomial model to analyze proportion
data.
I have been following Crawley's book closely and am wondering if there is
an accepted standard for how much is too much overdispersion? (e.g. change
in AIC has an accepted standard of 2).
In principle, in the null case (i.e. data are really binomial)
the deviance is chi-squared distributed with the df equal
to the residual df.
*Approximately*, provided the expected counts are not near or below
one. See MASS ยง7.5 for an analysis of the size of the approximation
errors (which can be large and in both directions).
Given that I once had a consulting job where the over-dispersion was
causing something close ot panic and was entirely illusory, the lack
of the 'approximately' can have quite serious consequences.
For example:
example(glm)
deviance(glm.D93) ## 5.13
summary(glm.D93)$dispersion ## 1 (by definition)
dfr <- df.residual(glm.D93)
deviance(glm.D93)/dfr ## 1.28
d2 <- sum(residuals(glm.D93,"pearson")^2) ## 5.17
(disp2 <- d2/dfr) ## 1.293
gg2 <- update(glm.D93,family=quasipoisson)
summary(gg2)$dispersion ## 1.293, same as above
pchisq(d2,df=dfr,lower.tail=FALSE)
all.equal(coef(glm.D93),coef(gg2)) ## TRUE
se1 <- coef(summary(glm.D93))[,"Std. Error"]
se2 <- coef(summary(gg2))[,"Std. Error"]
se2/se1
# (Intercept) outcome2 outcome3 treatment2 treatment3
# 1.137234 1.137234 1.137234 1.137234 1.137234
sqrt(disp2)
# [1] 1.137234
My code and output are below, given the example in the book, these data are
WAY overdispersed .....do I mention this and go on or does this signal the
need to try a different model? If so, any suggestions on the type of
distribution (gamma or negative binomial ?)?
Way overdispersed may indicate model lack of fit. Have
you examined residuals/data for outliers etc.?
quasibinomial should be fine, or you can try beta-binomial
(see the aod package) ...
attach(Clutch2)
y<-cbind(Total,Size-Total)
glm1<-glm(y~Pred,"binomial")
summary(glm1)
Call:
glm(formula = y ~ Pred, family = "binomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-9.1022 -2.7899 -0.4781 2.6058 8.4852
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.35095 0.06612 20.433 < 2e-16 ***
PredF -0.34811 0.11719 -2.970 0.00297 **
PredSN -3.29156 0.10691 -30.788 < 2e-16 ***
PredW -1.46451 0.12787 -11.453 < 2e-16 ***
PredWF -0.56412 0.13178 -4.281 1.86e-05 ***
---
#### the output for residual deviance and df does not change even when I
use quasibinomial, is this ok? #####
That's as expected.
library(MASS)
you don't really need MASS for quasibinomial.
glm2<-glm(y~Pred,"quasibinomial")
summary(glm2)
Call:
glm(formula = y ~ Pred, family = "quasibinomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-9.1022 -2.7899 -0.4781 2.6058 8.4852
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3510 0.2398 5.633 1.52e-07 ***
PredF -0.3481 0.4251 -0.819 0.41471
PredSN -3.2916 0.3878 -8.488 1.56e-13 ***
PredW -1.4645 0.4638 -3.157 0.00208 **
PredWF -0.5641 0.4780 -1.180 0.24063
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for quasibinomial family taken to be 13.15786)
Null deviance: 2815.5 on 108 degrees of freedom
Residual deviance: 1323.5 on 104 degrees of freedom
(3 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 5
anova(glm2,test="F")
Analysis of Deviance Table
Model: quasibinomial, link: logit
Response: y
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev F Pr(>F)
NULL 108 2815.5
Pred 4 1492.0 104 1323.5 28.349 6.28e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
model1<-update(glm2,~.-Pred)
anova(glm2,model1,test="F")
Analysis of Deviance Table
Model 1: y ~ Pred
Model 2: y ~ 1
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 104 1323.5
2 108 2815.5 -4 -1492.0 28.349 6.28e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
coef(glm2)
(Intercept) PredF PredSN PredW PredWF
1.3509550 -0.3481096 -3.2915601 -1.4645097 -0.5641223
Thanks
Jessica
hitejl <at> vcu.edu
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.