On Tue, 2008-12-16 at 13:31 +0100, vito muggeo wrote: > dear Gavin, > I do not know whether such comment may be still useful..
Very much so, Thank you. > > Why are you unsure about quasi-separation? > I think that it is quite evident in the plot Unsure in the sense that I had been unable to ascertain what quasi-complete separation was ;-) I'm still not convinced about the quasi-separation issue though. The coefficients on the glm are large but the standard errors don't indicate anything much wrong. I tried brglm() in the package of the same name and this gave effectively the same coefficients and standard errors as glm() where I would have expected them to differ considerably if (quasi-)separation were an issue. I'm not very familiar with the approach behind brglm() however. I'll take a look at the profiling you describe below also when our computing problems here get sorted. Apologies if people have had problems downloading the file from my web space - we are having all sorts of filestore problems here this week. Thanks again Vito for your comments, G > > plot(analogs ~ Dij, data = dat) > > Also it may be useful to see the plot of the monotone (profile) deviance > (or the log-lik) for the coef of Dij, > > xval<-seq(-20,0,l=50) > ll<-vector(length=50) > for(i in 1:length(xval)){ > mod <- glm(analogs ~ offset(xval[i]*Dij), data = dat, family = binomial) > ll[i]<-mod$dev > } > > plot(xval, ll) > > Hope this helps you, > > vito > > Gavin Simpson ha scritto: > > Dear List, > > > > Apologies for this off-topic post but it is R-related in the sense that > > I am trying to understand what R is telling me with the data to hand. > > > > ROC curves have recently been used to determine a dissimilarity > > threshold for identifying whether two samples are from the same "type" > > or not. Given the bashing that ROC curves get whenever anyone asks about > > them on this list (and having implemented the ROC methodology in my > > analogue package) I wanted to try directly modelling the probability > > that two sites are analogues for one another for given dissimilarity > > using glm(). > > > > The data I have then are a logical vector ('analogs') indicating whether > > the two sites come from the same vegetation and a vector of the > > dissimilarity between the two sites ('Dij'). These are in a csv file > > currently in my university web space. Each 'row' in this file > > corresponds to single comparison between 2 sites. > > > > When I analyse these data using glm() I get the familiar "fitted > > probabilities numerically 0 or 1 occurred" warning. The data do not look > > linearly separable when plotted (code for which is below). I have read > > Venables and Ripley's discussion of this in MASS4 and other sources that > > discuss this warning and R (Faraway's Extending the Linear Model with R > > and John Fox's new Applied Regression, Generalized Linear Models, and > > Related Methods, 2nd Ed) as well as some of the literature on Firth's > > bias reduction method. But I am still somewhat unsure what > > (quasi-)separation is and if this is the reason for the warnings in this > > case. > > > > My question then is, is this a separation issue with my data, or is it > > quasi-separation that I have read a bit about whilst researching this > > problem? Or is this something completely different? > > > > Code to reproduce my problem with the actual data is given below. I'd > > appreciate any comments or thoughts on this. > > > > #### Begin code snippet ################################################ > > > > ## note data file is ~93Kb in size > > dat <- read.csv(url("http://www.homepages.ucl.ac.uk/~ucfagls/dat.csv")) > > head(dat) > > ## fit model --- produces warning > > mod <- glm(analogs ~ Dij, data = dat, family = binomial) > > ## plot the data > > plot(analogs ~ Dij, data = dat) > > fit.mod <- fitted(mod) > > ord <- with(dat, order(Dij)) > > with(dat, lines(Dij[ord], fit.mod[ord], col = "red", lwd = 2)) > > > > #### End code snippet ################################################## > > > > Thanks in advance > > > > Gavin > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.