Greetings
This is a long email. 

I'm struggling with a data set comprising 2,278 hydroacoustic estimates of 
fish biomass density made along line transects in two lakes (lakes 
Michigan and Huron, three years in each lake).  The data represent 
lakewide surveys in each year and each data point represents the estimate 
for a horizontal interval 1 km in length.

I'm interested in comparing biomass density and bathymetric distribution 
(bottom depth) in the two lakes and there is graphical evidence of a 
non-linear relationship between biomass density and bottom depth.  Hence 
my interest in GAMs.

Predictors of primary interest are lake (factor) and bottom depth 
(continuous).

The fish data are autocorrelated at varying ranges, depending on species 
and year.  I've tested this using correlog (package ncf)

The bottom depth data are also highly autocorrelated.

Because of the autocorrelations in data, autocorrelations in GAM residuals 
(up to 20 lags in some cases), patterns in residual plots from GAM models, 
and very narrow confidence intervals for the predictions, I feel that GAM 
results are biased and have attempted to use GAMM.

Data and procedure examples:
#> fish[1:10, ]
   Transect yaoalebiom yaosmeltbiom yaobloaterbiom year     depth lake  x  
    y interval
1      nn_1  12.019655 34.910370110       2.647370 2005  97.07525    2 
526601.8 4850206        1
2      nn_1  12.164686 35.331548810       3.982028 2005  98.37024    2 
526742.2 4849339        2
3      nn_1  11.176009 32.460052230       1.646604 2005  99.98218    2 
526886.9 4848348        3
4      nn_1   0.000000  0.036457091       5.306225 2005  81.44616    2 
526993.4 4850849        4
5      nn_1  40.808118 10.988825410       3.222485 2005 101.45707    2 
526997.5 4847359        5
6      nn_1   6.273421 18.176753520      18.832348 2005  98.69197    2 
527084.1 4846366        6
7      nn_1   6.225799 16.050983390      66.941892 2005  94.14283    2 
527214.7 4845372        7
8      nn_1   7.322910 19.001196850      47.273341 2005  91.21771    2 
527331.6 4844636        8
9      nn_1   0.000000  0.067646462      20.912908 2005  87.76123    2 
527495.9 4843390        9
10     nn_1   0.000000  0.006012106      26.611785 2005  87.59767    2 
527606.6 4842426       10

#GAM example
bloat.gam8 <- gam(log10(yaobloaterbiom+0.00325) ~ lakef +s(depth, 
by=lakef), data=fish3)

#GAMM example:
bloat.gamm1 <- gamm(log10(yaobloaterbiom+0.00325) ~ lakef +  s(depth, 
by=lakef), correlation=corAR1(form = ~ interval|tranf), data=fish3)

However, GAMM results from models including a wide variety of correlation 
structures (corExp, CorSpher, CorLin, AR1, ARMA) produce autocorrelated 
residuals (similar lag range as GAM), patterns in residuals plots, and 
confidence intervals for predictions that are only slightly large than for 
GAMs.  This suggests to me that GAMM is not performing much better than 
GAM (or I've not specified models correctly).

Is my assessment of the GAMM performance reasonable?  None of the models 
(GAM or GAMM) explain much of the deviance (~20%).

I'm interested in an information-theoretic approach to selecting the best 
model from a set of possible models (AICc, dAICc, AICc weights), but 
cannot run some of the GAM models with GAM because they lack a random 
term.  I'm not sure how to use the GAMM output to compare the models I can 
run with this procedure.

Finally, as a last resort, I've subsampled the original data set so that I 
have 1 record per transect per lake per year for a total N=99.

I get different "best models" from GAM (original data) GAMM (original data 
but including correlation structure), and GAM (subsetted data).  Selection 
of different models leads to fairly different conclusions about the 
similarities and differences between the lakes.

I'm not sure where to go with this as a result. 

Any thoughts/comments would be appreciated. 
Dave


 



David Warner
Research Fishery Biologist
USGS Great Lakes Science Center
1451 Green Road
Ann Arbor MI 48105
734.214.9392
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to