On Wed, 24 Feb 2010, Nicholas M. Caruso wrote:

I have some questions regarding Zero Inflation Poisson models.

I am using count data to analyze abundance trends of salamanders.  However,
I have surveys which differ in the amount of effort (i.e. the number of
people searching and amount of time - I am using a museum database so not
all surveys were conducted by me).  Therefore I need to account for the
effort.  If change the count (response variable) then it will have decimals
and not be usable in this model.  So I decided to put this term into the
independent variable.

The usual approach would be the following: If you think that some link function of y/n (response per effort) is linear in a set of covariates x with coefficients b, you would typically write

  log(y/n) = x'b

which can be transformed to

  log(y) - log(n) = x'b
  log(y)          = x'b + log(n)

i.e., the log-effort would be an additional regressor with coefficient fixed to 1. This is called an offset so the R formula would be

  y ~ x + offset(log(n))

Alternatively, instead of relying on the fact the coefficient is exactly 1, you can estimate and test it, i.e.

  y ~ x + log(n)

I am analyzing Historic vs. Current surveys.

Here is an example of my code:
require(pscl)
model <- zeroinfl(Sallys~Survey:Person.Hours, dist="poisson", EM=TRUE)
summary(model)

I think I would allow different intercepts as well, i.e.,

  zeroinfl(Sallys ~ Survey * log(Person.Hours))

I have received some very significant results on most of them and on some
that I thought wouldn't be significant turned out to be.  So I am concerned
with the model being appropriate.  I created a simulated database and ran a
simple glm to see if y/b ~ x is the same as y~x:b and it is not (not
surprisingly).  Does anyone have suggestions for how to adjust my model to
allow for these comparisons?  I cannot use a glm with Poisson error because
of overdispersion and a lot of zeroes.  I thought about either rounding up
my ratios or multiplying everything by 100 to eliminate the decimals but to
keep the variation (I am not pleased with either of those options)

On another note, I am having a little trouble interpreting the results (I
think).  Which this may not matter if I cannot use the ZIP model.  Is the
Count model coefficients (poisson with log link) the measure of if the sites
differ and if so what do the estimates for both surveys indicate?  Is that
the mean for both surveys and it is testing them against zero?  If so I want
to test them against each other and I don't know exactly how to do that.
Here is the output:
                                           Estimate Std. Error z value
Pr(>|z|)
(Intercept)                             1.97418    0.06570  30.048   <2e-16
***
SurveyCurrent:Person.Hours   0.04192    0.07597   0.552    0.581
SurveyHistoric:Person.Hours  0.40221    0.01540  26.110   <2e-16 ***

It forces the intercept to be the same, both for the current and the historic sites which is not so intuitive. The two slopes mean, that for the historic sites, the counts increased clearly with effort, but for the current sites it increased only slightly (not significantly).

As for the "Zero-inflation model coefficients( binomial with logit link).  I
read that this is a measure of 1) suitability or 2) if the predictor of
excess zeros was significant.  Which one of these (or is it something else)
is correct and how do I interpret this?

Here is a sample of a read out:

Zero-inflation model coefficients (binomial with logit link):
                                           Estimate Std. Error z value
Pr(>|z|)
(Intercept)                               -1.1625     0.9833  -1.182
0.237
SurveyCurrent:Person.Hours   -1.1787     1.1304  -1.043    0.297
SurveyHistoric:Person.Hours  -0.5050     0.3440  -1.468    0.142

This reflects the probability of additional zeros which does not seem to depend on either site or effort.

For an introduction to the zero-inflation model and its implementation in R see
  vignette("countreg", package = "pscl")

Also, I would recommend to consider hurdle() models as well. They often give similar fits and are slightly easier to interpret (IMO).

hth,
Z

<http://search.twitter.com/search?q=%0D%0A><http://www.google.com/search?q=%0D%0A><http://smarterfox.com/wikisearch/search?q=%0D%0A&locale=en-US><http://www.oneriot.com/search?p=smarterfox&ssrc=smarterfox_popup_bubble&spid=8493c8f1-0b5b-4116-99fd-f0bcb0a3b602&q=%0D%0A>

Thanks for any suggestions/help!!

--
Nicholas M Caruso
Graduate Student
CLFS-Biology
4219 Biology-Psychology Building
University of Maryland, College Park, MD 20742-5815
phone: 301-405-6884



------------------------------------------------------------------
I learned something of myself in the woods today,
and walked out pleased for having made the acquaintance.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to