[R] conducting GAM-GEE within gamm4?

Nathan Furey Thu, 03 May 2012 06:46:34 -0700

Dear R-help users,

I am trying to analyze some visual transect data of organisms to generate a
habitat distribution model. Once organisms are sighted, they are followed
as point data is collected at a given time interval.  Because of the
autocorrelation among these "follows," I wish to utilize a GAM-GEE approach
similar to that of Pirotta et al. 2011, using packages 'yags' and 'splines'
(http://www.int-res.com/abstracts/meps/v436/p257-272/).  Their R scripts
are shown here (
http://www.int-res.com/articles/suppl/m436p257_supp/m436p257_supp1-code.r).
I have used this code with limited success and multiple issues of models
failing to converge.



Below is the structure of my data:

> str(dat2)

'data.frame':   10792 obs. of  4 variables:
 $ dist_slag       : num  26475 26340 25886 25400 24934 ...
 $ Depth           : num  -10.1 -10.5 -16.6 -22.2 -29.7 ...
 $ dolphin_presence: int  0 0 0 0 0 0 0 0 0 0 ...


 $ block           : int  1 1 1 1 1 1 1 1 1 1 ...

> head(dat2)

  dist_slag    Depth dolphin_presence block
1  26475.47 -10.0934                0     1
2  26340.47 -10.4870                0     1
3  25886.33 -16.5752                0     1
4  25399.88 -22.2474                0     1


5  24934.29 -29.6797                0     1
6  24519.90 -26.2370                0     1


Here is the summary of my block variable (indicating the number of
groups for which autocorrelation exists within each block


> summary(dat2$block)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   1.00   39.00   76.00   73.52  111.00  148.00


However, I would like to use the package 'gamm4', as I am more familiar
with Professor Simon Wood's packages and functions, and it appears gamm4
might be the most appropriate. It is important to note that the models have
a binary response (organism presence of absence along a transect), and thus
why I think gamm4 is more appropriate than gamm.  In the gamm help, it
provides the following example for autocorrelation within factors:

## more complicated autocorrelation example - AR errors
## only within groups defined by `fac'
e <- rnorm(n,0,sig)
for (i in 2:n) e[i] <- 0.6*e[i-1]*(fac[i-1]==fac[i]) + e[i]
y <- f + e
b <- gamm(y~s(x,k=20),correlation=corAR1(form=~1|fac))

Following this example, the following is the code I used for my dataset

b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),random=(form=~1|block),
family=binomial(),data=dat)


However, by examining the output (summary(b$gam)) and specifically
summary(b$mer)), I am either unsure of how to interpret the results,
or do not believe that the autocorrelation within the group is being
taken into consideration.


> summary(b$gam)

Family: binomial
Link function: logit

Formula:
dolphin_presence ~ s(dist_slag) + s(Depth)

Parametric coefficients:

            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -13.968      5.145  -2.715  0.00663 **
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Approximate significance of smooth terms:

               edf Ref.df Chi.sq  p-value
s(dist_slag) 4.943  4.943  70.67 6.85e-14 ***
s(Depth)     6.869  6.869 115.59  < 2e-16 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

R-sq.(adj) =  0.317glmer.ML score =  10504  Scale est. = 1         n = 10792
>

> summary(b$mer)
Generalized linear mixed model fit by the Laplace approximation

   AIC   BIC logLik deviance
 10514 10551  -5252    10504
Random effects:
 Groups Name         Variance Std.Dev.
 Xr     s(dist_slag) 1611344  1269.39
 Xr.0   s(Depth)       98622   314.04
Number of obs: 10792, groups: Xr, 8; Xr.0, 8

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
X(Intercept)      -13.968      5.145  -2.715  0.00663 **
Xs(dist_slag)Fx1  -35.871     33.944  -1.057  0.29063
Xs(Depth)Fx1        3.971      3.740   1.062  0.28823

---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Correlation of Fixed Effects:
            X(Int) X(_)F1
Xs(dst_s)F1  0.654
Xs(Dpth)Fx1 -0.030  0.000
>


How do I ensure that autocorrelation is indeed being accounted for
within each unique value of the "block" variable? What is the simplest
way to interpret the output for "summary(b$mer)"?

The results do differ from a normal gam (package mgcv) using the same
variables and parameters without the "correlation=..." term,
indicating that *something *different is occurring.

However, when I use a different variable for the correlation term
(season), I get the SAME output:

> dat2 <- data.frame(dist_slag = dat$dist_slag, Depth = dat$Depth, 
> dolphin_presence = dat$dolphin_presence,
+ block = dat$block, season=dat$season)
> head(dat2)
  dist_slag    Depth dolphin_presence block season
1  26475.47 -10.0934                0     1      F
2  26340.47 -10.4870                0     1      F
3  25886.33 -16.5752                0     1      F
4  25399.88 -22.2474                0     1      F
5  24934.29 -29.6797                0     1      F
6  24519.90 -26.2370                0     1      F

> summary(dat2$season)
   F    S
3224 7568


> b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),correlation=corAR1(1, 
> form=~1 | season), family=binomial(),data=dat2)
> summary(b$gam)

Family: binomial
Link function: logit

Formula:
dolphin_presence ~ s(dist_slag) + s(Depth)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -13.968      5.145  -2.715  0.00663 **
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Approximate significance of smooth terms:
               edf Ref.df Chi.sq  p-value
s(dist_slag) 4.943  4.943  70.67 6.85e-14 ***
s(Depth)     6.869  6.869 115.59  < 2e-16 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

R-sq.(adj) =  0.317glmer.ML score =  10504  Scale est. = 1         n = 10792
> summary(b$mer)
Generalized linear mixed model fit by the Laplace approximation
   AIC   BIC logLik deviance
 10514 10551  -5252    10504
Random effects:
 Groups Name         Variance Std.Dev.
 Xr     s(dist_slag) 1611344  1269.39
 Xr.0   s(Depth)       98622   314.04
Number of obs: 10792, groups: Xr, 8; Xr.0, 8

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
X(Intercept)      -13.968      5.145  -2.715  0.00663 **
Xs(dist_slag)Fx1  -35.871     33.944  -1.057  0.29063
Xs(Depth)Fx1        3.971      3.740   1.062  0.28823
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Correlation of Fixed Effects:
            X(Int) X(_)F1
Xs(dst_s)F1  0.654
Xs(Dpth)Fx1 -0.030  0.000
>


I just want to make sure it is correctly allowing for correlation
within each value for the "block" variable. How do I formulate the
model to say that autocorrelation can exist within each single value
for block, but assume independence among blocks?


On another note, I am also receiving the following warning message
after model completion for larger models (with many more variables
than 2):

Warning message:
In mer_finalize(ans) : false convergence (8)




Thank you for your time, effort, and patience dealing with a novice.

Sincerely,

Nathan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] conducting GAM-GEE within gamm4?

Reply via email to