On 20/08/15 09:43, Abraham Mathew wrote:
Very simple question that I want confirm.
Let's say that I have a response variable. What are the appropriate ways
that it can be coded for a logistic regression model?
1. It can be 0/1 and a factor
2. It can be 1/2 and a factor
3. It can be characters and a factor, where the second letter takes on the
1. (bad/good becomes 0/1).
4. ?
5. ?
My question is....are 1, 2, and 3 all correct, and are there other coding
schemes that glm can take.
When in doubt, RTFM! :-)
From ?binomial:
For the binomial and quasibinomial families the response can be
specified in one of three ways:
As a factor: ‘success’ is interpreted as the factor not having the first
level (and hence usually of having the second level).
As a numerical vector with values between 0 and 1, interpreted as the
proportion of successful cases (with the total number of cases given by
the weights).
As a two-column integer matrix: the first column gives the number of
successes and the second the number of failures.
That pretty well says it all. One thing to note: If the response is a
*numeric* vector of 0's and 1's it will produce the same result as it
would if it were converted to a factor. (This is because the default
weights are all 1.)
HTH
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.