On Sun, 21 Oct 2012, Eiko Fried wrote:
I am running 9 negative binomial regressions with count data.
The nine models use 9 different dependent variables - items of a clinical
screening instrument - and use the same set of 5 predictors. Goal is to
find out whether these predictors have differential effects on the items.
Due to various reasons, one being that I want to avoid overfitting models,
I need to employ identical types of models for all 9 regressions.
The problem is that some of my dependent variables are overdispersed,
others are not (parameter estimates gained by using family=quasipoisson).
dispersion p theta
S1 1.084 0.102 30.0
S2 0.903 0.000 4125.0 PROBLEM
S3 0.997 0.926 3754.0 PROBLEM
S4 0.784 0.000 10283.0 PROBLEM
S5 1.108 0.003 8.6
S6 1.010 0.772 1287.0 PROBLEM
S7 1.228 0.001 1.7
S8 1.222 0.005 0.4
S9 2.120 0.283 1.2
So I thought using maximum likelihood Poisson models would provide wrong
results, and that using negative binomial models would be the best way to
go.
3 models cause the following problem (the ones that are not overdispersed):
m2.nb <- glm.nb(t0s2 ~ Sex + HisDep + FamHis + ZEFE + ZNeuro, data=data)
Error in while ((it <- it + 1) < limit && abs(del) > eps) { :
missing value where TRUE/FALSE needed
What happens is that, looking at theta iterations, values become too large,
and thereby infinite, and thereby NaN.
This is an example for model 2 above, with a theta of 4125 and a
significantly smaller dispersion value than 1 of .903 (p<.000):
theta.ml: iter76 theta =1.34118e+16
theta.ml: iter77 theta =3.0232e+16
theta.ml: iter78 theta =-Inf
theta.ml: iter79 theta =NaN
One symptom causes this problem:
1: In sqrt(1/i) : NaNs produced
How would you recommend to deal with these problems? Are results reliable
although these errors occur?
Again, I have to use a model that fits all 9 models overall best - maybe I
should default to maximum likelihood Poisson?
From the description of your response in earlier e-mails on this list, my
feeling is that this is caused by the type of response you have. It is not
really a count response and might be better modeled by an ordered response
model. Below I've simulated data from an ordered probit model and then
pretended they are count data. The result are the symptoms you describe:
underdispersion, non-convergence for theta in NB. Fitting an ordered
probit model, avoids these problems for this data and (not surprisingly)
reasonably recovers the true parameters.
hth,
Z
## random regressor and latent Gaussian variable
set.seed(1)
x <- runif(200)
y0 <- 5 * x + rnorm(200)
## collapse to factor with 5 levels
yf <- cut(y0, c(-Inf, 3, 4, 5, 6, Inf), labels = 0:4)
## pretend levels are count variables
yc <- as.numeric(as.character(yf))
## quasi-Poisson with underdispersion
summary(glm(yc ~ x, family = quasipoisson))
## NB with problems in theta estimation
summary(glm.nb(yc ~ x))
## ordered probit recovers true parameters
summary(polr(yf ~ x, method = "probit", Hess = TRUE))
## compare with the latent linear model
summary(lm(y0 ~ x))
Thank you
Torvon
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.