The other possibilities are: (1) you're missing a necessary interaction term (2) one of the variables affecting output just isn't in your data set. (3) you need to transform 'hosp_days' or 'age' -- are those the only two continuous variables? Might be worth trying to plot 'em versus # of hospitalizations just to see if a pattern emerges that gives you a hint.
"Lack of fit" just means that your model isn't as good as the model created by taking each combination of factors as a separate variable. I'm skeptical whether that means you necessarily have a bad model. On Wed, Nov 10, 2010 at 2:43 AM, avsha38 <avsha...@post.tau.ac.il> wrote: > > Hello All, > > I have a question regarding using Poission Regression, I would like to Model > the number of hospitalizations by a set of covariates. > > The issue I ran into is "lack of fit" even after I tried to solve the > "overdispersion" problem with negetive binomial. > > What would you suggest? > 1. Is the Poission Regression the right Model? > 2. is there another Model that is better for my needs? > > Any ideas will be more the welcome. > Thank you, > Avi > > The File is attached and the code is below: > Please note that the "offset" is for dealingwith different followup times of > the subjects. > > pr2<-read.csv("c:/rfiles/pr/pr_na.csv",header=TRUE) > head(pr2) > # distribution of the Response variable "zb44" , number of hospitalization. > tab <- table(pr2$zb44) > tab > x <- as.numeric(names(tab)) > plot(x, tab, type='h', xlab='Number of hospitalizations', ylab='Frequency', > main="Distribution of hospitalization") > points(x, tab, pch=16) > mean(pr2$zb44) > # find the "best model" according to AIC > fit.pr0<- glm(zb44 ~1+ offset(log(timem)),data=pr2, family=poisson) > fit.full<- glm(zb44 ~ > ESHKOL+factor(Phys_activity)+factor(PTCAwin45d)+factor(White_collar)+factor > (Asia_Africa)+factor(Sex)+Age+ factor(SteadyPartner) > + factor(RelIncome)+ factor(Employment)+ > Education+factor(CA_blockers)+Health_Prob+factor(PerceivHealth)+factor(Diuretics)+ > factor(ACE) > + factor(DiabTreat)+ factor(Insulin)+ factor(LLD)+ factor(Aspirin)+ > factor(Smoking)+factor(CHD_duration)+ factor(HTN)+ factor(Diabetes) > + factor(Hyperchol)+ factor(Intens_care)+ factor(Thromb)+ > factor(AP)+ factor(QWave)+factor(AntMI)+hosp_days+factor(COPD)+ > factor(B_blockers) > + factor(Cancer)+factor(Ulcer)+factor(kilip2)+ factor(HF_index)+ > factor(CVA)+Charlson_cat > + factor(Cardiac_death)+ > factor(CABGwin45d)+offset(log(timem)),data=pr2,family=poisson) > Forw <- step(fit.pr0, scope = list(lower =fit.pr0, upper= > fit.full),direction = "both") > summary(Forw) > anova.glm(Forw ,test = "Chisq") # display significate factors > # checking model fit > 1-pchisq(Forw$dev, Forw$df.residual) > # p value is nearly zero, There is a lack of fit!!! > > # Residuals plots > dev.new() > par(mfrow=c(2,2)) > plot(Forw , pch=23, bg='red', cex=2) > > # Examine (over) dispersion > # Pearson’s residuals > (dp <- sum(resid(Forw , type = "pearson")^2)/Forw$df.resid) > summary(Forw,dispersion=dp) > # Deviance > (dp1<-Forw$deviance/Forw$df.resid) > summary(Forw,dispersion=dp1) > # Model is over dispersed > > # Consider a negative binomial to deal with overdispersion > > NB.model<- step(glm.nb(hosp_total~ > ESHKOL+factor(Phys_activity)+factor(PTCAwin45d)+factor(White_collar)+factor(Asia_Africa)+factor(Sex)+Age+ > factor(SteadyPartner) > + factor(RelIncome)+ factor(Employment)+ > Education+factor(CA_blockers)+Health_Prob+factor(PerceivHealth)+factor(Diuretics)+ > factor(ACE) > + factor(DiabTreat)+ factor(Insulin)+ factor(LLD)+ factor(Aspirin)+ > factor(Smoking)+factor(CHD_duration)+ factor(HTN)+ factor(Diabetes) > + factor(Hyperchol)+ factor(Intens_care)+ factor(Thromb)+ > factor(AP)+ factor(QWave)+factor(AntMI)+hosp_days+factor(COPD)+ > factor(B_blockers) > + factor(Cancer)+factor(Ulcer)+factor(kilip2)+ factor(HF_index)+ > factor(CVA)+Charlson_cat > + factor(Cardiac_death)+ > factor(CABGwin45d)+offset(log(timem)),data=pr2)) > summary(NB.model) > 1 - pchisq(NB.model$dev, NB.model$df.residual) > par(mfrow=c(2,2)) > plot(NB.model, pch=23, bg='red', cex=2) > # p value is nearly zero, There is still a lack of fit!!! > http://r.789695.n4.nabble.com/file/n3035613/pr_na.csv pr_na.csv > -- > View this message in context: > http://r.789695.n4.nabble.com/poisson-regression-tp3035613p3035613.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.