On Jun 9, 2011, at 11:35 AM, Simon Wood wrote:

> I think that the main problem here is that smooths are not constrained to 
> pass through the origin, so the covariate taking the value zero doesn't 
> correspond to no effect in the way that you would like it to. Another way of 
> putting this is that smooths are translation invariant, you get essentially 
> the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) 
> + e_i (which implies that x_i=0 can have no special status).

  OK, I understand the translation invariance, I think, and why x_i=0 has no 
special status.  I don't understand the consequence that you are saying follows 
from that, though.  Onward...

> All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
> te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of 
> te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) so that 
> te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not 
> even be in the range of a and b).

  Hmm.  Perhaps my question would be better phrased as a question of model 
interpretation.  Can I think of the smooth found for te(a) as the "main effect" 
of a?  If so, should I just not be bothered by the fact that te(a,b) at b=0 has 
a different shape from te(a)?  Or is the "main effect" of a really, here, te(a) 
+ te(a,b | b=0) + te(a,d | d=0) (if my notation makes any sense, I'm not much 
at math)?  Or is the whole concept of "main effect" meaningless for these kinds 
of models -- in which case, how do I interpret what te(a) means?  Or perhaps I 
should not be trying to interpret a by itself; perhaps I can only interpret the 
interactions, not the main effects.  In that case, do I interpret te(a,b) by 
itself, or do I need to conceptually "add in" te(a) to understand what te(a,b) 
is telling me?  My head is spinning.  Clearly I just don't understand what 
these GAM models even mean, on a fundamental conceptual level.

> In general I find functional ANOVA not entirely intuitive to think about, but 
> there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, 
> Springer), and the associated package gss is on CRAN.

  Is what I'm doing functional ANOVA?  I realize ANOVA and regression are 
fundamentally related; but I think of ANOVA as involving discrete levels 
(factors) in the independent variables, like treatment groups.  My independent 
variables are all continuous, so I would not have thought of this as ANOVA.  
Anyhow, OK.  I will go get that book today, and see if I can figure all this 
out.

  Thanks for your help!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/


Begin forwarded message:

> From: Simon Wood <s.w...@bath.ac.uk>
> Date: June 9, 2011 11:35:11 AM EDT
> To: r-help@r-project.org, rh...@sticksoftware.com
> Subject: Re: [R] gam() (in mgcv) with multiple interactions
> 
> I think that the main problem here is that smooths are not constrained to 
> pass through the origin, so the covariate taking the value zero doesn't 
> correspond to no effect in the way that you would like it to. Another way of 
> putting this is that smooths are translation invariant, you get essentially 
> the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) 
> + e_i (which implies that x_i=0 can have no special status).
> 
> All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
> te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of 
> te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) so that 
> te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not 
> even be in the range of a and b).
> 
> In general I find functional ANOVA not entirely intuitive to think about, but 
> there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, 
> Springer), and the associated package gss is on CRAN.
> 
> best,
> Simon
> 
> 
> 
> On 07/06/11 17:00, Ben Haller wrote:
>> Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as
>> recommended to me earlier by some folks on this list.  I've run into
>> a question to which I can't find the answer in his book, so I'm
>> hoping somebody here knows.
>> 
>> My outcome variable is binary, so I'm doing a binomial fit with
>> gam().  I have five independent variables, all continuous, all
>> uniformly distributed in [0, 1].  (This dataset is the result of a
>> simulation model.)  Let's call them a,b,c,d,e for simplicity.  I'm
>> interested in interactions such as a*b, so I'm using tensor product
>> smooths such as te(a,b).  So far so good.  But I'm also interested
>> in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these
>> have a as a marginal basis (if I'm using the right terminology; all I
>> mean is, both interactions involve a), and I would have expected them
>> to share that basis; I would have expected them to be constrained
>> such that the effect of a when b=0, for one, would be the same as the
>> effect of a when d=0, for the other.  This would be just as, in a GLM
>> with formula a*b + a*d, that formula would expand to a + b + d + a:b
>> + a:d, and there is only one "a"; a doesn't get to be different for
>> the a*b interaction than it is for the! a*d interaction.  But with
>> tensor product smooths in gam(), that does not seem to be the case.
>> I'm still just getting to know mgcv and experimenting with things, so
>> I may be doing something wrong; but the plots I have done of fits of
>> this type appear to show different marginal effects.
>> 
>> I tried explicitly including terms for the marginal basis; in my
>> example, I tried a formula like te(a) + te(b) + te(d) + te(a, b) +
>> te(a, d).  No dice; in this case, the main effect of a is different
>> between all three places where it occurs in the model.  I.e. te(a)
>> shows a different effect of a than te(a, b) shows at b=0, which is
>> again different from the effect shown by te(a, d) at d=0.  I don't
>> even know what that could possibly mean; it seems wrong to me that
>> this could even be the case, but what do I know.  :->
>> 
>> I could move up to a higher-order tensor like te(a,b,d), but there
>> are three problems with that.  One, the b:d interaction (in my
>> simplified example) is then also part of the model, and I'm not
>> interested in it.  Two, given the set of interactions that I *am*
>> interested in, I would actually be forced to do the full five-way
>> te(a,b,c,d,e), and with a 300,000 row dataset, I shudder to think how
>> long that will take to run, since it would have something like 5^5
>> free parameters to fit; that doesn't seem worth pursuing.  And three,
>> interpretation of a five-way interaction would be unpleasant, to say
>> the least; I'd much rather be able to stay with just the two-way (and
>> one three-way) interactions that I know are of interest (I know this
>> from previous logistic regression modelling of the dataset).
>> 
>> For those who like to see the actual R code, here are two fits I've
>> tried:
>> 
>> gam(outcome ~ te(acl, dispersal) + te(amplitude, dispersal) +
>> te(slope, curvature, amplitude), family=binomial, data=rla,
>> method="REML")
>> 
>> gam(outcome ~ te(slope) + te(curvature) + te(amplitude) + te(acl) +
>> te(dispersal) + te(slope, curvature) + te(slope, amplitude) +
>> te(curvature, amplitude) + te(acl, dispersal) + te(amplitude,
>> dispersal) + te(slope, curvature, amplitude), family=binomial,
>> data=rla, method="REML")
>> 
>> So.  Any advice?  How can I correctly do a gam() fit involving
>> multiple interactions that involve the same independent variable?
>> 
>> Thanks!
>> 
>> Ben Haller McGill University
>> 
>> http://biology.mcgill.ca/grad/ben/
>> 
>> ______________________________________________ R-help@r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> -- 
> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
> +44 (0)1225 386603               http://people.bath.ac.uk/sw283

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to