On Sun, 14 Oct 2012, Eiko Fried wrote:
I would like to test in R what regression fits my data best. My dependent
variable is a count, and has a lot of zeros.
And I would need some help to determine what model and family to use
(poisson or quasipoisson, or zero-inflated poisson regression), and how to
test the assumptions.
1) Poisson Regression: as far as I understand, the strong assumption is
that dependent variable mean = variance. How do you test this? How close
together do they have to be? Are unconditional or conditional mean and
variance used for this? What do I do if this assumption does not hold?
There are various formal tests for this, e.g., dispersiontest() in package
"AER". Alternatively, you can use a simple likelihood-ratio test (e.g., by
means of lrtest() in "lmtest") between a poisson and negative binomial
(NB) fit. The p-value can even be halved because the Poisson is on the
border of the NB theta parameter range (theta = infty).
However, overdispersion can already matter before this is detected by a
significance test. Hence, if in doubt, I would simply use an NB model and
you're on the safe side. And if the NB's estimated theta parameter turns
out to be extremely large (say beyond 20 or 30), then you can still switch
back to Poisson if you want.
2) I read that if variance is greater than mean we have overdispersion,
and a potential way to deal with this is including more independent
variables, or family=quasipoisson. Does this distribution have any other
requirements or assumptions? What test do I use to see whether 1) or 2)
fits better - simply anova(m1,m2)?
quasipoisson yields the same parameter estimates as the poisson, only the
inference is adjusted appropriately.
3) I also read that negative-binomial distribution can be used when
overdispersion appears. How do I do this in R?
glm.nb() in "MASS" is one of standard options.
What is the difference to quasipoisson?
The NB is a likelihood-based model while the quasipoisson is not
associated with a likelihood (but has the same conditional mean equation).
4) Zero-inflated Poisson Regression: I read that using the vuong test
checks what models fits better.
vuong (model.poisson, model.zero.poisson)
Is that correct?
It's one of the possibilities.
5) ats.ucla.edu has a section about zero-inflated Poisson Regressions, and
test the zeroinflated model (a) against the standard poisson model (b):
m.a <- zeroinfl(count ~ child + camper | persons, data = zinb)
m.b <- glm(count ~ child + camper, family = poisson, data = zinb)
vuong(m.a, m.b)
I don't understand what the "| persons" part of the first model does, and
why you can compare these models if. I had expected the regression to be
the same and just use a different family.
I recommend you read the associated documentation. See
vignette("countreg", package = "pscl")
For glm.nb() I recommend its accompanying documentation, namely the MASS
book.
hth,
Z
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.