Meredith Jantzen <mjantzen <at> uwo.ca> writes: > Hello everyone, Apologies in advance, as this is partially a stats > question and partially an R question. I have been using a GAM to > model the activity level of bats going into and coming out from a > forested edge. I had eight microphones set up in a line transect at > each of eight sites, and I am hoping to construct a model for each > of 7 species.
> My count data has a reverse J-shaped skew and is overdispersed with > a fair amount of zeros, and I haven't found any transformations that > will completely normalize it (I've tried square roots and logs). > Meanwhile, the variance in call numbers varies between sites and > between microphones. I wanted to use a GAMM to incorporate varComb > and varIdent, but these can only be applied on data with a gaussian > distribution. > Are there any packages I should be looking into that I don't know > about that will apply a variance structure on a negative binomial > distribution? Or is there some transformation that I should be > using that will solve my normality issues? I've been searching the > R-help boards, everything in Zuur and Woods, but I haven't found an > answer yet. I'm not entirely clear about this, but this question and the previous question that Simon Wood answered (about neg binom and GAMM) suggest to me that you might be going in slightly the wrong direction. If your data are non-normally distributed, your choices are typically (1) pick an alternative family of distributions to characterize the variation (e.g. neg binomial or ZINB), (2) use some form of robust estimation (e.g. rlm in the MASS package), or (3) try to find a transformation of the data that makes the data normal (and/or homoscedastic, and/or linear with respect to the predictor variables). Among ecologists #3 is the classical approach and #1 is the most common modern approach. Combining #1 and #3 doesn't make that much sense to me. One doesn't necessarily expect the variance to be constant in a negative binomial model; are the *standardized* residuals heteroscedastic? (i.e. does the boxplot of residuals(m,type="pearson") vs site, microphone, or site*microphone combination look funky?) It's not absolutely clear whether you need zero-inflation explicitly or not. There are tests for zero-inflation and overdispersion (see ref below), but I don't know of any that are implemented in R ... your choices seem to be * negative binomial in mgcv:gam, without zero-inflation; * ZINB in pscl, without the sophisticated GAM machinery of mgcv (but you can use spline terms via splines::ns(v,n) where v is the predictor variable and n is the number of knots -- it just won't do all the slick automatic complexity selection that mgcv does) * it looks like the COZIGAM package will do zero-inflated GAMs, but it doesn't do negative binomials ... @article{deng_score_2005, title = {Score tests for zero-inflation and over-dispersion in generalized linear models}, volume = {15}, url = {http://www3.stat.sinica.edu.tw/statistica/j15n1/j15n115/j15n115.html}, journal = {Statistica Sinica}, author = {Deng, D. and Paul, {S.R.}}, year = {2005}, pages = {257–276} } ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.