Dear Paul,

I think that this thread has gotten unnecessarily complicated. The answer, as is easily demonstrated, is that a binary response for a binomial GLM in glm() may be a factor, a numeric variable, or a logical variable, with identical results; for example:

--------------- snip -------------

> set.seed(123)

> head(x <- rnorm(100))
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499

> head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
[1] 0 1 1 1 1 0

> head(yf <- as.factor(y))
[1] 0 1 1 1 1 0
Levels: 0 1

> head(yl <- y == 1)
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE

> glm(y ~ x, family=binomial)

Call:  glm(formula = y ~ x, family = binomial)

Coefficients:
(Intercept)            x
     0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:      134.6
Residual Deviance: 114.9        AIC: 118.9

> glm(yf ~ x, family=binomial)

Call:  glm(formula = yf ~ x, family = binomial)

Coefficients:
(Intercept)            x
     0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:      134.6
Residual Deviance: 114.9        AIC: 118.9

> glm(yl ~ x, family=binomial)

Call:  glm(formula = yl ~ x, family = binomial)

Coefficients:
(Intercept)            x
     0.3995       1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:      134.6
Residual Deviance: 114.9        AIC: 118.9

--------------- snip -------------

The original poster claimed to have encountered an error with a 0/1 numeric response, but didn't show any data or even a command. I suspect that the response was a character variable, but of course can't really know that.

Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-08-01 2:25 p.m., Paul Bernal wrote:
Dear friend,

I am aware that I have a binomial dependent variable, which is covid status
(1 if covid positive, and 0 otherwise).

My question was if R requires to turn a binomial response variable into a
factor or not, that's all.

Cheers,

Paul

El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <bgunter.4...@gmail.com>
escribió:

... yes, but so does lm() for a categorical **INdependent** variable with
more than 2 numerically labeled levels. n levels  = (n-1) df for a
categorical covariate, but 1 for a continuous one (unless more complex
models are explicitly specified of course). As I said, the OP seems
confused about whether he is referring to the response or covariates. Or
maybe he just made the same typo I did.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
mal...@malonequantitative.com> wrote:

No, R does not. glm() does in order to do logistic regression.

On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulberna...@gmail.com>
wrote:

Hi Bert,

Thank you for the kind reply.

But what if I don't turn the variable into a factor. Let's say that in
excel I just coded the variable as 1s and 0s and just imported the
dataset
into R and fitted the logistic regression without turning any categorical
variable or dummy variable into a factor?

Does R requires every dummy variable to be treated as a factor?

Best regards,

Paul

El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
bgunter.4...@gmail.com> escribió:

x <- factor(0:1)
x <- factor("yes","no")

will produce identical results up to labeling.


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulberna...@gmail.com>
wrote:

Dear friends,

Hope you are doing great. I want to fit a logistic regression in R,
where
the dependent variable is the covid status (I used 1 for covid
positives,
and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.

What would be more advisable, to keep the dependent variable with 1s
and
0s, or code it as yes/no and then make it a factor?

Any guidance will be greatly appreciated,

Best regards,

Paul

         [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



         [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick S. Malone, Ph.D., Malone Quantitative
NEW Service Models: http://malonequantitative.com

He/Him/His



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to