Dear Simon, I ran an additional analysis using bam (mgcv 1.7-17) with three random intercepts and no non-linearities, and compared these to the results of lmer (lme4). Using bam results in a significant random intercept (even though it has a very low edf-value), while the lmer results show no variance associated to the random intercept of Placename. Should I drop the random intercept of Placename and if so, how is this apparent from the results of bam?
Summaries of both models are shown below. With kind regards, Martijn #### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) + (1|Placename), data=wrddst); print(l1,cor=F) Linear mixed model fit by REML Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1 | Placename) Data: wrddst AIC BIC logLik deviance REMLdev -44985 -44927 22498 -45009 -44997 Random effects: Groups Name Variance Std.Dev. Word (Intercept) 0.0800944 0.283009 Key (Intercept) 0.0013641 0.036933 Placename (Intercept) 0.0000000 0.000000 Residual 0.0381774 0.195390 Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40 Fixed effects: Estimate Std. Error t value (Intercept) -0.00342 0.01513 -0.23 geogamfit 0.99249 0.02612 37.99 #### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") + s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML"); summary(m1,freq=F) Family: gaussian Link function: identity Formula: RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key, bs = "re") + s(Placename, bs = "re") Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.00342 0.01513 -0.226 0.821 geogamfit 0.99249 0.02612 37.991 <2e-16 *** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 Approximate significance of smooth terms: edf Ref.df F p-value s(Word) 3.554e+02 347 634.716 <2e-16 *** s(Key) 2.946e+02 316 23.054 <2e-16 *** s(Placename) 1.489e-04 38 7.282 <2e-16 *** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 R-sq.(adj) = 0.693 Deviance explained = 69.4% REML score = -22498 Scale est. = 0.038177 n = 112608 On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R] <ml-node+s789695n4631060...@n4.nabble.com> wrote: > Having looked at this further, I've made some changes in mgcv_1.7-17 to > the p-value computations for terms that can be penalized to zero during > fitting (e.g. s(x,bs="re"), s(x,m=1) etc). > > The Wald statistic based p-values from summary.gam and anova.gam (i.e. > what you get from e.g. anova(a) where a is a fitted gam object) are > quite well founded for smooth terms that are non-zero under full > penalization (e.g. a cubic spline is a straight line under full > penalization). For such smooths, an extension of Nychka's (1988) result > on CI's for splines gives a well founded distributional result on which > to base a Wald statistic. However, the Nychka result requires the > smoothing bias to be substantially less than the smoothing estimator > variance, and this will often not be the case if smoothing can actually > penalize a term to zero (to understand why, see argument in appendix of > Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74). > > Simulation testing shows that this theoretical concern has serious > practical consequences. So for terms that can be penalized to zero, > alternative approximations have to be used, and these are now > implemented in mgcv_1.7-17 (see ?summary.gam). > > The approximate test performed by anova(a,b) (a and b are fitted "gam" > objects) is less well founded. It is a reasonable approximation when > each smooth term in the models could in principle be well approximated > by an unpenalized term of rank approximately equal to the edf of the > smooth term, but otherwise the p-values produced are likely to be much > too small. In particular simulation testing suggests that the test is > not to be trusted with s(...,bs="re") terms, and can be poor if the > models being compared involve any terms that can be penalized to zero > during fitting. (Although the mechanisms are a little different, this is > similar to the problem we would have if the models were viewed as > regular mixed models and we tried to use a GLRT to test variance > components for equality to zero). > > These issues are now documented in ?anova.gam and ?summary.gam... > > Simon > > On 08/05/12 15:01, Martijn Wieling wrote: > >> Dear useRs, >> >> I am using mgcv version 1.7-16. When I create a model with a few >> non-linear terms and a random intercept for (in my case) country using >> s(Country,bs="re"), the representative line in my model (i.e. >> approximate significance of smooth terms) for the random intercept >> reads: >> edf Ref.df F p-value >> s(Country) 36.127 58.551 0.644 0.982 >> >> Can I interpret this as there being no support for a random intercept >> for country? However, when I compare the simpler model to the model >> including the random intercept, the latter appears to be a significant >> improvement. >> >>> anova(gam1,gam2,test="F") >> Model 1: .... >> Model 2: .... + s(BirthNation, bs="re") >> Resid. Df Resid. Dev Df Deviance F Pr(>F) >> 1 789.44 416.54 >> 2 753.15 373.54 36.292 43.003 2.3891 1.225e-05 *** >> >> I hope somebody could help me in how I should proceed in these >> situations. Do I include the random intercept or not? >> >> I also have a related question. When I used to create a mixed-effects >> regression model using lmer and included e.g., an interaction in the >> fixed-effects structure, I would test if the inclusion of this >> interaction was warranted using anova(lmer1,lmer2). It then would show >> me that I invested 1 additional df and the resulting (possibly >> significant) improvement in fit of my model. >> >> This approach does not seem to work when using gam. In this case an >> apparent investment of 1 degree of freedom for the interaction, might >> result in an actual decrease of the degrees of freedom invested by the >> total model (caused by a decrease of the edf's of splines in the model >> with the interaction). In this case, how would I proceed in >> determining if the model including the interaction term is better? >> >> With kind regards, >> Martijn Wieling >> >> -- >> ******************************************* >> Martijn Wieling >> http://www.martijnwieling.nl >> [hidden email] >> +31(0)614108622 >> ******************************************* >> University of Groningen >> http://www.rug.nl/staff/m.b.wieling >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Simon Wood, Mathematical Science, University of Bath BA2 7AY UK > +44 (0)1225 386603 http://people.bath.ac.uk/sw283 > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://r.789695.n4.nabble.com/mgcv-inclusion-of-random-intercept-in-model-based-on-p-value-of-smooth-or-anova-tp4617585p4631060.html > To unsubscribe from mgcv: inclusion of random intercept in model - based on > p-value of smooth or anova?, click here. > NAML ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.