Re: [R] predict.glm -> which class does it predict?

Peter Dalgaard Fri, 10 Jul 2009 09:39:14 -0700

Peter Schüffler wrote:

Hi,
I have a question about logistic regression in R.
Suppose I have a small list of proteins P1, P2, P3 that predict atwo-class target T, say cancer/noncancer. Lets further say I know that Ican build a simple logistic regression model in R
model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset ofthe Proteins).
This works fine. T is a factored vector with levels cancer, noncancer.Proteins are numeric.
Now, I want to use predict.glm to predict a new data.
predict(model, newdata=testsamples, type="response") (testsamples isa small set of new samples).
The result is a vector of the probabilites for each sample intestsamples. But probabilty WHAT for? To belong to the first level in T?To belong to second level in T?
Is this fallowing expression
factor(predict(model, newdata=testsamples, type="response") >= 0.5)
TRUE, when the new sample is classified to Cancer or when it'sclassified to Noncancer? And why not the other way around?

It's the probability of the 2nd level of a factor response (termed"success" in the documentation, even when your modeling the probabilityof disease or death...), just like when interpreting the logisticregression itself.

I find it easiest to sort ut this kind of issue by experimentation insimplified situations. E.g.


> x <- sample(c("A","B"),10,replace=TRUE)
> x
 [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A"
> table(x)
x
A B
4 6

(notice that the relative frequency of B is 0.6)

> glm(x~1,binomial)
Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
In addition: Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'x' converted to a factor

(OK, so it won't go without conversion to factor. This is a good thing.)

> glm(factor(x)~1,binomial)

Call:  glm(formula = factor(x) ~ 1, family = binomial)

Coefficients:
(Intercept)
     0.4055

Degrees of Freedom: 9 Total (i.e. Null);  9 Residual
Null Deviance:      13.46
Residual Deviance: 13.46        AIC: 15.46

(The intercept is positive, corresponding to log odds for a probability> 0.5 ; i.e., must be that "B": 0.4055==log(6/4))


> predict(glm(factor(x)~1,binomial))

1 2 3 4 5 6 780.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.40546510.4054651

        9        10
0.4054651 0.4054651
> predict(glm(factor(x)~1,binomial),type="response")
  1   2   3   4   5   6   7   8   9  10
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6

As for why it's not the other way around, well, if it had been, then youcould have asked the same question....



--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.glm -> which class does it predict?

Reply via email to