On Wed, May 19, 2010 at 4:51 PM, Lucia Rueda <lucia.ru...@ba.ieo.es> wrote:
> > Hi Joris, > > We're using mgcv. > > We have data on abundance of groupers on line transects that have the same > length. I only now realized groupers are actually fish :-). Should work on my english skills... > My coworker has selected a bunch of variables and he has calculated > them in terms of total area in different sizes of buffers around the > centroid of the transect. He has run gam models (quasipoisson, mgcv) for > each explanatory variable at each size of buffer. Here you lost me a bit. How should I imagine those buffers? Is it, as Simon said, some area? Then that would mean you measure eg salinity along the transect, and average the numbers using a window of a specific size? Or am I seeing it wrong? Then he has selected the > signifficant variables. Some variables explain a higher percentage of > deviance at different sizes of buffers. And now he wants to build a gam > model trying the different explanatory variables but using the values that > correspond to the size of the buffer where they explain a higher deviance, > so one variable might have the values of a smaller scale whereas other > might > correspond to a higher buffer size (I don't know if I made myself clear). I > am wondering if this is correct. > It seems not correct to me. Model building in these frameworks, especially when using inference, should be driven by hypothesis, not by any correlation in the data. Especially with smooths one has to be very careful. Another issue is the correlation between environmental variables, They often covary along transects, meaning that you can have confounding and even aliasing in your dataset. This has to be checked and taken into account _before_ building the models. I have the impression that his approach does not take care of this. Next, I believe that data should be used as raw as possible, to not jeopardize the interpretation. If you use different buffer sizes, you can't just say that variable X and Y contribute significantly to the explanation of the variation, but that variable X and Y contributes significantly, depending on the scale it is measured. It also depends on whether your goal is purely predictive, or if you want to do inference. In case you want to conclude something about the significance of the parameters, his approach seems unvalid to me. How to explain that the significance of a variable depends on the scale of measurement? One assumes a continuous relation -unless working with factors- so the scale shouldn't make much of a difference anyway. If you can predict the number of groupers by the amount of bald men in Hong-Kong, by all means, do so. But I wouldn't formulate a scientific conclusion based on the significance of that model, if you get my drift. Also I don't know if he should include an offset in spite all the transects > have the same length. > Do you mean an intercept? In that case I'd always include one, except in very specific cases. > > I'm in charge of looking at the spatial correlation once he builds the > model. I don't know much about it but I was thinking of doing a Moran test, > correlogram and variogram and then if there's spatial autocorrelation doing > gamm, sar or gee. > Gamm is a very powerful tool, but -if I understood Simon's book correctly- you cannot trust the anova's on the gam-component of the gamm-object when using link functions. LR tests can give some information, but there is not a solid statistical framework yet for formal hypothesis testing of those models. I also wonder why building a model without, and then doing the same with the correct variance-covariance structure. Personally, I'd do it the other way around. Not that it will change much about the predictions, but it definitely will change the inference. In any case, all of these are my personal opinions on a problem I do not understand fully. It's some general considerations, feel free to think different. > > Thanks, > > Lucia > -- > View this message in context: > http://r.789695.n4.nabble.com/offset-in-gam-and-spatial-scale-of-variables-tp2222483p2222976.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.