I have a question about some strange results I get when using lmer to build a logistic mixed effects model. I have a data set of about 30k points, and I'm trying to do backwards selection to reduce the number of fixed effects in my model. I've got 3 crossed random effects and about 20 or so fixed effects. At a certain point, I get a model (m17) where the fixed effects are like this (full output is at end of message):
> print(m17, corr=F) ... Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.97887 0.19699 -10.045 < 2e-16 *** sexM 0.45553 0.14387 3.166 0.001544 ** ... is_discTRUE 0.24676 0.15204 1.623 0.104576 poly(wfreq, 2)1 -119.72397 11.00516 -10.879 < 2e-16 *** poly(wfreq, 2)2 17.35646 5.44456 3.188 0.001433 ** poly(wlen_p, 2)1 -13.60798 7.26926 -1.872 0.061208 . poly(wlen_p, 2)2 -6.43167 5.24119 -1.227 0.219770 ... where poly(wlen_p,2)2 is the least significant factor left in the model. So I then build a model (m18) with exactly the same random and fixed effects except removing poly(wlen_p,2)2. Then I do an anova, and I get: > anova(m17,m18) Data: Models: m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m17: after_part + first_rep + is_open + is_disc + poly(wfreq, m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean + m17: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) + m18: (1 | speaker) + (1 | corpus) + (1 | ref) m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m18: after_part + first_rep + is_open + is_disc + poly(wfreq, m17: 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) + m18: pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange, m17: 2) + (1 | speaker) + (1 | corpus) + (1 | ref) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m18 27 25928 26153 -12937 m17 28 25925 26159 -12934 5.2136 1 0.02241 * So my first question is: Should I be concerned that the significance level shown in the original m17 is so different from the one shown by the anova? It's hard for me to see how this could happen. I noticed that there is a post on the FAQ about significance levels in linear mixed models, but I'm not sure whether it applies to the logistic case and if so how. Now, my second question is a result of removing one more factor (is_disc) from the model, creating m19. I do another anova: > anova(m19,m18) Data: Models: m19: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m18: after_part + first_rep + is_open + poly(wfreq, 2) + wlen_p + m19: poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange, m18: 2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 | m19: corpus) + (1 | ref) m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m19: after_part + first_rep + is_open + is_disc + poly(wfreq, m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean + m19: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) + m18: (1 | speaker) + (1 | corpus) + (1 | ref) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m19 26 25925 26142 -12936 m18 27 25928 26153 -12937 0 1 1 and now it seems that m19 (which contains fewer parameters than m18) is a better fit. I don't see how it's possible to remove parameters from a model and get a better likelihood, but I will confess that I don't entirely understand how these kinds of models are estimated. Does this have something to do with approximations that R is making to fit the models, or numerical rounding errors? Could either problem be due to correlations among variables? All my fixed effects are either binary or numeric, and there are some fairly high correlations between a few pairs of them (maybe as high as .6 or .65 using Kendall tau) but I figured that this would be ok given the large number of data points. I think each value of each binary feature is observed at least 70-100 times. In case it matters, I'm running R 2.5.1 on Linux, with lme4 0.99875-8, and below is a full printout of all my output: > m17 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part > + after_part + first_rep + is_open + is_disc + poly(wfreq,2) + poly(wlen_p,2) > + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) > + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial") > print(m17,corr=F) Generalized linear mixed model fit using Laplace Formula: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + is_disc + poly(wfreq, 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 | corpus) + (1 | ref) Family: binomial(logit link) AIC BIC logLik deviance 25925 26159 -12934 25869 Random effects: Groups Name Variance Std.Dev. ref (Intercept) 0.434599 0.65924 speaker (Intercept) 0.327813 0.57255 corpus (Intercept) 0.039722 0.19930 number of obs: 31017, groups: ref, 2601; speaker, 72; corpus, 2 Estimated scale (compare to 1 ) 0.9810366 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.97887 0.19699 -10.045 < 2e-16 *** sexM 0.45553 0.14387 3.166 0.001544 ** starts_turnTRUE 0.25426 0.08087 3.144 0.001666 ** before_hesTRUE 0.50943 0.12364 4.120 3.79e-05 *** after_hesTRUE 0.26439 0.12162 2.174 0.029712 * before_partTRUE 1.07972 0.10748 10.046 < 2e-16 *** after_partTRUE 0.47089 0.12363 3.809 0.000140 *** first_repTRUE 1.00673 0.13073 7.701 1.35e-14 *** is_openTRUE -0.42213 0.09894 -4.267 1.98e-05 *** is_discTRUE 0.24676 0.15204 1.623 0.104576 poly(wfreq, 2)1 -119.72397 11.00516 -10.879 < 2e-16 *** poly(wfreq, 2)2 17.35646 5.44456 3.188 0.001433 ** poly(wlen_p, 2)1 -13.60798 7.26926 -1.872 0.061208 . poly(wlen_p, 2)2 -6.43167 5.24119 -1.227 0.219770 poly(utt_rate, 2)1 5.21605 3.56227 1.464 0.143126 poly(utt_rate, 2)2 22.75593 3.04234 7.480 7.45e-14 *** poly(dur, 2)1 -110.73779 6.15641 -17.987 < 2e-16 *** poly(dur, 2)2 38.71283 3.52495 10.983 < 2e-16 *** pmean 1.00184 0.17912 5.593 2.23e-08 *** poly(log_prange, 2)1 3.71200 3.76913 0.985 0.324702 poly(log_prange, 2)2 17.07179 2.79954 6.098 1.07e-09 *** poly(imean, 2)1 -21.73475 4.75449 -4.571 4.84e-06 *** poly(imean, 2)2 28.22247 3.32294 8.493 < 2e-16 *** poly(irange, 2)1 -6.47276 3.95778 -1.635 0.101955 poly(irange, 2)2 6.99340 3.11937 2.242 0.024966 * #remove wlen^2 > m18 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part > + after_part + first_rep + is_open + is_disc + poly(wfreq,2) + wlen_p + > poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + > poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial") > anova(m17,m18) Data: Models: m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m17: after_part + first_rep + is_open + is_disc + poly(wfreq, m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean + m17: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) + m18: (1 | speaker) + (1 | corpus) + (1 | ref) m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m18: after_part + first_rep + is_open + is_disc + poly(wfreq, m17: 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) + m18: pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange, m17: 2) + (1 | speaker) + (1 | corpus) + (1 | ref) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m18 27 25928 26153 -12937 m17 28 25925 26159 -12934 5.2136 1 0.02241 * #remove is_disc > m19 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part > + after_part + first_rep + is_open + poly(wfreq,2) + wlen_p + > poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + > poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial") > anova(m19,m18) Data: Models: m19: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m18: after_part + first_rep + is_open + poly(wfreq, 2) + wlen_p + m19: poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange, m18: 2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 | m19: corpus) + (1 | ref) m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part + m19: after_part + first_rep + is_open + is_disc + poly(wfreq, m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean + m19: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) + m18: (1 | speaker) + (1 | corpus) + (1 | ref) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) m19 26 25925 26142 -12936 m18 27 25928 26153 -12937 0 1 1 --AuH2O Sharon Goldwater Postdoctoral Scholar Department of Linguistics Stanford Universtiy ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.