On Fri, 2011-10-14 at 02:32 -0700, lincoln wrote: > As you suggested I had a further look at the profile by changing default > values of stepsize (I tried to modify the others but apparently there was > any change).
Have you read ?glm, specifically this bit: Details: .... For the background to warning messages about ‘fitted probabilities numerically 0 or 1 occurred’ for binomial GLMs, see Venables & Ripley (2002, pp. 197-8). There, V&R say (me paraphrasing) that if there are some large fitted \beta_i the curvature of the log-likelihood at the fitted \beta can be much less than at \beta_i = 0. The Wald approximation underestimates the change in the LL on setting \beta_i = 0. As the absolute value of the fitted \beta_i becomes large (tends to infinity) the t statistic tends to 0. This is the Hauck Donner effect. Whilst I am (so very) far from being an expert here - this does seem to fit the results you showed. Furthermore, did you follow the steps Ioannis Kosmidis took me through with my data in that email thread? I have with your data and everything seems to follow the explanation/situation given by Ioannis. Namely, if you increase the number of iterations and tolerance in the glm() call you get the same fit as with a standard glm() call: > ## normal glm() > summary(mod) Call: glm(formula = sex ~ twp + hwp + hcp + hnp, family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max -2.9703 -0.1760 0.3181 0.6061 3.5235 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.4362 0.2687 5.345 9.02e-08 *** twp 5.5673 1.3602 4.093 4.26e-05 *** hwp -4.2793 2.3781 -1.799 0.0719 . hcp -450.1918 56.6823 -7.942 1.98e-15 *** hnp -4.5302 3.2825 -1.380 0.1676 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 581.51 on 425 degrees of freedom Residual deviance: 294.00 on 421 degrees of freedom (41 observations deleted due to missingness) AIC: 304 Number of Fisher Scoring iterations: 8 > ## now with control = glm.control(epsilon = 1e-16, maxit = 1000) > ## in the call > summary(mod3) Call: glm(formula = sex ~ twp + hwp + hcp + hnp, family = binomial, data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000)) Deviance Residuals: Min 1Q Median 3Q Max -2.9703 -0.1760 0.3181 0.6061 3.5235 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.4362 0.2687 5.345 9.02e-08 *** twp 5.5673 1.3602 4.093 4.26e-05 *** hwp -4.2793 2.3781 -1.799 0.0719 . hcp -450.1918 56.6823 -7.942 1.98e-15 *** hnp -4.5302 3.2825 -1.380 0.1676 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 581.51 on 425 degrees of freedom Residual deviance: 294.00 on 421 degrees of freedom (41 observations deleted due to missingness) AIC: 304 Number of Fisher Scoring iterations: 9 >From the thread you link to, we would expect the effects to diverge. Profiling the model shows that the LL starts to increase again at low values, but does so slowly. The LL is very flat around the estimates and is far from 0, which seems to correspond with the description of the Hauck Donner effect given By Venables and Ripley in their book. In your case however, the statistic is still sufficiently large for it to be identified as significant via the Wald test. If we fit the model via brglm() we get essentially the same "result" as fitted by glm(): > summary(mod2) Call: brglm(formula = sex ~ twp + hwp + hcp + hnp, family = binomial, data = dat) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.4262 0.2662 5.357 8.47e-08 *** twp 5.3696 1.3323 4.030 5.57e-05 *** hwp -4.2813 2.3504 -1.821 0.0685 . hcp -435.9212 55.0566 -7.918 2.42e-15 *** hnp -4.6295 3.2459 -1.426 0.1538 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 560.48 on 425 degrees of freedom Residual deviance: 294.08 on 421 degrees of freedom Penalized deviance: 302.8212 (41 observations deleted due to missingness) AIC: 304.08 Ioannis points out that if there were separation, the brglm results would differ markedly from the glm() ones, and they don't. As Ioannis mentioned in that thread, you can get the warning about the fitted probabilities without a separation problem - In my case it was because there were some very small fitted probabilities, which R was just warning about. HTH G > Here they go the scripts I have used: > > > dati<-read.table("simone.txt",header=T,sep="\t",as.is=T) > > glm.sat<-glm(sex~twp+hwp+hcp+hnp,binomial,data=dati) > Mensajes de aviso perdidos > glm.fit: fitted probabilities numerically 0 or 1 occurred > >pp<-profileModel(glm.sat,quantile=qchisq(0.95,1),objective="ordinaryDeviance",which=4) > Preliminary iteration . Done > > Profiling for parameter hcp ... Done > >pp1<-profileModel(glm.sat,quantile=qchisq(0.95,1),objective="ordinaryDeviance",which=4,stepsize=20) > Preliminary iteration . Done > > Profiling for parameter hcp ... Done > >pp2<-profileModel(glm.sat,quantile=qchisq(0.95,1),objective="ordinaryDeviance",which=4,stepsize=100) > Preliminary iteration . Done > > Profiling for parameter hcp ... Done > >plot(pp) > >mtext("Default stepsize",adj=0,cex=2,line=1) > >dev.new() > >plot(pp) > >mtext("Stepsize=20",adj=0,cex=2,line=1) > >dev.new() > >plot(pp) > >mtext("Stepsize=100",adj=0,cex=2,line=1) > > And these are the plots as they look like: > http://r.789695.n4.nabble.com/file/n3904261/plot1.png > http://r.789695.n4.nabble.com/file/n3904261/plot2.png > http://r.789695.n4.nabble.com/file/n3904261/plot3.png > > I have tried to understand what is going on but I don't know how to > interpret this. > It's quite a long time that I am trying to solve this but I have not been > able to. Here ( http://r.789695.n4.nabble.com/file/n3904261/simone.txt > simone.txt ) I attach a subset of the data I am working with that comprises > the variables specified in the above glm model and by the way the "funky" > variable called "hcp". > Thank you for take your time to help me in this. > > -- > View this message in context: > http://r.789695.n4.nabble.com/binomial-GLM-quasi-separation-tp3901687p3904261.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.