Greetings This is a long email. I'm struggling with a data set comprising 2,278 hydroacoustic estimates of fish biomass density made along line transects in two lakes (lakes Michigan and Huron, three years in each lake). The data represent lakewide surveys in each year and each data point represents the estimate for a horizontal interval 1 km in length.
I'm interested in comparing biomass density and bathymetric distribution (bottom depth) in the two lakes and there is graphical evidence of a non-linear relationship between biomass density and bottom depth. Hence my interest in GAMs. Predictors of primary interest are lake (factor) and bottom depth (continuous). The fish data are autocorrelated at varying ranges, depending on species and year. I've tested this using correlog (package ncf) The bottom depth data are also highly autocorrelated. Because of the autocorrelations in data, autocorrelations in GAM residuals (up to 20 lags in some cases), patterns in residual plots from GAM models, and very narrow confidence intervals for the predictions, I feel that GAM results are biased and have attempted to use GAMM. Data and procedure examples: #> fish[1:10, ] Transect yaoalebiom yaosmeltbiom yaobloaterbiom year depth lake x y interval 1 nn_1 12.019655 34.910370110 2.647370 2005 97.07525 2 526601.8 4850206 1 2 nn_1 12.164686 35.331548810 3.982028 2005 98.37024 2 526742.2 4849339 2 3 nn_1 11.176009 32.460052230 1.646604 2005 99.98218 2 526886.9 4848348 3 4 nn_1 0.000000 0.036457091 5.306225 2005 81.44616 2 526993.4 4850849 4 5 nn_1 40.808118 10.988825410 3.222485 2005 101.45707 2 526997.5 4847359 5 6 nn_1 6.273421 18.176753520 18.832348 2005 98.69197 2 527084.1 4846366 6 7 nn_1 6.225799 16.050983390 66.941892 2005 94.14283 2 527214.7 4845372 7 8 nn_1 7.322910 19.001196850 47.273341 2005 91.21771 2 527331.6 4844636 8 9 nn_1 0.000000 0.067646462 20.912908 2005 87.76123 2 527495.9 4843390 9 10 nn_1 0.000000 0.006012106 26.611785 2005 87.59767 2 527606.6 4842426 10 #GAM example bloat.gam8 <- gam(log10(yaobloaterbiom+0.00325) ~ lakef +s(depth, by=lakef), data=fish3) #GAMM example: bloat.gamm1 <- gamm(log10(yaobloaterbiom+0.00325) ~ lakef + s(depth, by=lakef), correlation=corAR1(form = ~ interval|tranf), data=fish3) However, GAMM results from models including a wide variety of correlation structures (corExp, CorSpher, CorLin, AR1, ARMA) produce autocorrelated residuals (similar lag range as GAM), patterns in residuals plots, and confidence intervals for predictions that are only slightly large than for GAMs. This suggests to me that GAMM is not performing much better than GAM (or I've not specified models correctly). Is my assessment of the GAMM performance reasonable? None of the models (GAM or GAMM) explain much of the deviance (~20%). I'm interested in an information-theoretic approach to selecting the best model from a set of possible models (AICc, dAICc, AICc weights), but cannot run some of the GAM models with GAM because they lack a random term. I'm not sure how to use the GAMM output to compare the models I can run with this procedure. Finally, as a last resort, I've subsampled the original data set so that I have 1 record per transect per lake per year for a total N=99. I get different "best models" from GAM (original data) GAMM (original data but including correlation structure), and GAM (subsetted data). Selection of different models leads to fairly different conclusions about the similarities and differences between the lakes. I'm not sure where to go with this as a result. Any thoughts/comments would be appreciated. Dave David Warner Research Fishery Biologist USGS Great Lakes Science Center 1451 Green Road Ann Arbor MI 48105 734.214.9392 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.