On Jun 15, 2009, at 5:54 AM, Paul Christoph Schröder wrote:

Hi all!
Maybe someone could help me with the following. I know this hasn't directly to do with ecology but I'm also using glm.

I have a list of 16 genes and 10 samples. The samples are of two types, 4 Ctrl and 6 Diseased. If,

labelInd<-as.factor(c(rep("0",4),rep("1",6)))
genes.glm<-glm(labelInd ~ ., family=binomial, data=mat)


beeing "mat" the 10x16 matrix (without NAs), I got 17 values, first the intercept, 9 numerical values and "NA" for the last 7 genes. Does anybody you know why this is happening or how I can model using the 16 genes?

I hope anyone could help me with this!
Many thanks in advance,

Paul

More than likely, the 7 genes for which you are getting NA's are collinear to other genes. Thus you get NA's. If you switched the order of the 7 genes for which you are getting NAs so that they come first in the formula, you would get NAs for others.

If you use:

  summary(genes.glm)

you will likely see a warning message about singularities in the coefficient table header line. Something like:

  Coefficients: (7 not defined because of singularities)

I would use cor(mat) to take a look at the correlation matrix for your data so that you can review this in more detail.

BTW, with only 10 observations, you are significantly overfitting the model by using so many covariates. You typically need at least 10 to 20 'events' for each covariate degree of freedom in a logistic regression model. With only 6 diseased (events) you really don't even have enough data to support one covariate. The study, presuming an 'a priori' design, is way underpowered for what you are attempting to do.

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to