Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
below did not go through.

Hello,

I'm new'ish to R, and very new to glm. I've read a lot about my issue:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

...including:

http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html
http://r.789695.n4.nabble.com/glm-fit-quot-fitted-probabilities-numerically-0-or-1-occurred-quot-td849242.html
(note that I never found: "MASS4 pp.197-8"  However, Ted's post was quite
helpful.)

This is a common question, sorry. Because it is a common issue I am posting
everything I know about the issue and how I think I am not falling into the
same trap at the others (but I must be due to some reason I am not yet
aware of).

>From the two links above I gather that my warning "glm.fit: fitted
probabilities numerically 0 or 1 occurred" arises from a "perfect fit"
situation (i.e. the issue where all the high value x's (predictor
variables) are Y=1 (response=1) or the other way around). I don't feel my
data has this issue. Please point out how it does!

The list post instructions state that I can attach pdf's, so I attached
plots of my data right before I do the call to glm.

The attachments are plots of my data stored in variable l_yx (as can be
seen in the axis names):
My response (vertical axis) by row index (horizontal axis):
 plot(l_yx[,1],type='h')
My predictor variable (vertical axis) by row index index (horizontal axis):
 plot(l_yx[,2],type='h')

 So here is more info on my data frame/data (in case you can't see my pdf
attachments):
> unique(l_yx[,1])
[1] 0 1
> mean(l_yx[,2])
[1] 0.01123699
> max(l_yx[,2])
[1] 14.66518
> min(l_yx[,2])
[1] 0
> attributes(l_yx)
$dim
[1] 690303      2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "y" "x"


With the above data I do:
>     l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Why am I getting this warning when I have data points of varying values for
y=1 and y=0?  In other words, I don't think I have the linear separation
issue discussed in one of the links I provided.

PS - Then I do this and I get a odds ratio a crazy size:
>     l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds
$coefficients[2]
>     l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the
coeffcients
>     l_exp_coef
       x
3161.781

So for one unit increase in the predictor variable I get 3160.781%
(3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
How do I correct for this issue? (I tried multiplying the predictor
variables by a constant and the odds ratio goes down, but the warning above
still persists and shouldn't the odds ratio be predictor variable size
independent?)

Thank you for your help!

Ben

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to