Re: [R] NA as a result of using GLM

Marc Schwartz Mon, 15 Jun 2009 09:34:51 -0700

On Jun 15, 2009, at 10:13 AM, Paul Christoph Schröder wrote:

> Thanks Marc, as you said, this is a priori design. I'm trying to get  
> more events.
> But for now, I'll have to try to get rid of the collinear genes. Is  
> there any other method than using cor? Any method to state which  
> genes behave in the same  manner?
> Paul


Gene analysis is not in my area of expertise, so there may be other  
methods that make more sense here. I would defer to others with the  
appropriate expertise in the domain.  The BioConductor list would also  
be a good resource for you. More information at:

   http://www.bioconductor.org/

In terms of simply assessing correlation, there would be some  
graphical alternatives to reviewing a numeric correlation matrix.

One would be to use pairs() to create a graphical visualization of the  
correlation matrix. See ?pairs and:

   http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=137

another would be to use the parallel lattice graphics function splom():

   http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=50


Yet another would be to use correlation matrix ellipses with the  
plotcorr() function in the 'ellipse' CRAN package:

   http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=149


Each of the above are going to get you to the same basic place that a  
standard correlation matrix will get you to, which is to assess which  
genes seem to move in the same direction as others. That is, which  
gene pairs have a correlation of ~1.

The problem that you have now is that you have so few observations,  
that it is conceivable that the correlation observed at this point is  
spurious and specific to this dataset. In other words, it may not be  
observed (or not to the same extent) if you had a much larger dataset.  
Thus, I would be cautious about trying to explain any behavior at this  
point.

Remember, that with only 6 events, you cannot really create a viable  
LR model with even a single gene, using traditional approaches. So to  
simply eliminate the 7 collinear genes, in theory leaving you with 9  
genes as covariates in the model, is not going to be very helpful.  
Your model estimates are not going to be stable.

As part of your model building and experimental design strategy, I  
would refer you to Frank's book "Regression Modeling Strategies". More  
information here:

   http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS

Certainly, when you get substantially more data, there would be other  
data reduction techniques that could come into play here and Frank's  
book covers them.

HTH,

Marc


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA as a result of using GLM

Reply via email to