Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as recommended 
to me earlier by some folks on this list.  I've run into a question to which I 
can't find the answer in his book, so I'm hoping somebody here knows.

   My outcome variable is binary, so I'm doing a binomial fit with gam().  I 
have five independent variables, all continuous, all uniformly distributed in 
[0, 1].  (This dataset is the result of a simulation model.)  Let's call them 
a,b,c,d,e for simplicity.  I'm interested in interactions such as a*b, so I'm 
using tensor product smooths such as te(a,b).  So far so good.  But I'm also 
interested in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these 
have a as a marginal basis (if I'm using the right terminology; all I mean is, 
both interactions involve a), and I would have expected them to share that 
basis; I would have expected them to be constrained such that the effect of a 
when b=0, for one, would be the same as the effect of a when d=0, for the 
other.  This would be just as, in a GLM with formula a*b + a*d, that formula 
would expand to a + b + d + a:b + a:d, and there is only one "a"; a doesn't get 
to be different for the a*b interaction than it is for the!
  a*d interaction.  But with tensor product smooths in gam(), that does not 
seem to be the case.  I'm still just getting to know mgcv and experimenting 
with things, so I may be doing something wrong; but the plots I have done of 
fits of this type appear to show different marginal effects.

  I tried explicitly including terms for the marginal basis; in my example, I 
tried a formula like te(a) + te(b) + te(d) + te(a, b) + te(a, d).  No dice; in 
this case, the main effect of a is different between all three places where it 
occurs in the model.  I.e. te(a) shows a different effect of a than te(a, b) 
shows at b=0, which is again different from the effect shown by te(a, d) at 
d=0.  I don't even know what that could possibly mean; it seems wrong to me 
that this could even be the case, but what do I know.  :->

   I could move up to a higher-order tensor like te(a,b,d), but there are three 
problems with that.  One, the b:d interaction (in my simplified example) is 
then also part of the model, and I'm not interested in it.  Two, given the set 
of interactions that I *am* interested in, I would actually be forced to do the 
full five-way te(a,b,c,d,e), and with a 300,000 row dataset, I shudder to think 
how long that will take to run, since it would have something like 5^5 free 
parameters to fit; that doesn't seem worth pursuing.  And three, interpretation 
of a five-way interaction would be unpleasant, to say the least; I'd much 
rather be able to stay with just the two-way (and one three-way) interactions 
that I know are of interest (I know this from previous logistic regression 
modelling of the dataset).

  For those who like to see the actual R code, here are two fits I've tried:

gam(outcome ~ te(acl, dispersal) + te(amplitude, dispersal) + te(slope, 
curvature, amplitude), family=binomial, data=rla, method="REML")

gam(outcome ~ te(slope) + te(curvature) + te(amplitude) + te(acl) + 
te(dispersal) + te(slope, curvature) + te(slope, amplitude) + te(curvature, 
amplitude) + te(acl, dispersal) + te(amplitude, dispersal) + te(slope, 
curvature, amplitude), family=binomial, data=rla, method="REML")

  So.  Any advice?  How can I correctly do a gam() fit involving multiple 
interactions that involve the same independent variable?

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to