Dear all, I'm using the mgcv library by Simon Wood to fit gam models with interactions and I have been reading (and running) the "factor 'by' variable example" given on the gam.models help page (see below, output from the two first models b, and b1). The example explains that both b and b1 fits are similar: "note that the preceding fit (here b) is the same as (b1)...." I agree with the idea that it "looks" the same but when I look at the results from both models (summary b and summary b1) I see that the results look in fact quite different (edf, and also deviance explained for example???) Are those two models (b and b1) really testing the same things??? If yes, why are the results so different between models??? Thanks a lot if anyone can help with that... Geraldine
dat <- gamSim(4) ## fit model... b <- gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat) plot(b,pages=1) summary(b) Family: gaussian Link function: identity Formula: y ~ fac + s(x2, by = fac) + s(x0) Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.1784 0.1985 5.937 6.59e-09 *** fac2 -1.2148 0.2807 -4.329 1.92e-05 *** fac3 2.2012 0.2436 9.034 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df F p-value s(x2):fac1 5.364 6.472 2.285 0.0312 * s(x2):fac2 4.523 5.547 11.396 4.59e-11 *** s(x2):fac3 8.024 8.741 43.456 < 2e-16 *** s(x0) 1.000 1.000 0.237 0.6269 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.634 Deviance explained = 65.3% GCV score = 4.0288 Scale est. = 3.8082 n = 400 ## note that the preceding fit is the same as.... b1<-gam(y ~ s(x2,by=as.numeric(fac==1))+s(x2,by=as.numeric(fac==2))+ s(x2,by=as.numeric(fac==3))+s(x0)-1,data=dat) ## ... the `-1' is because the intercept is confounded with the ## *uncentred* smooths here. plot(b1,pages=1) summary(b1) Family: gaussian Link function: identity Formula: y ~ s(x2, by = as.numeric(fac == 1)) + s(x2, by = as.numeric(fac == 2)) + s(x2, by = as.numeric(fac == 3)) + s(x0) - 1 Approximate significance of smooth terms: edf Ref.df F p-value s(x2):as.numeric(fac == 1) 6.341 7.447 6.214 3.38e-07 *** s(x2):as.numeric(fac == 2) 3.393 3.961 14.727 4.07e-11 *** s(x2):as.numeric(fac == 3) 9.015 9.737 104.760 < 2e-16 *** s(x0) 1.000 1.000 0.266 0.606 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.631 Deviance explained = 75% GCV score = 4.0345 Scale est. = 3.8353 n = 400 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.