On Jun 15, 2009, at 10:13 AM, Paul Christoph Schröder wrote: > Thanks Marc, as you said, this is a priori design. I'm trying to get > more events. > But for now, I'll have to try to get rid of the collinear genes. Is > there any other method than using cor? Any method to state which > genes behave in the same manner? > Paul
Gene analysis is not in my area of expertise, so there may be other methods that make more sense here. I would defer to others with the appropriate expertise in the domain. The BioConductor list would also be a good resource for you. More information at: http://www.bioconductor.org/ In terms of simply assessing correlation, there would be some graphical alternatives to reviewing a numeric correlation matrix. One would be to use pairs() to create a graphical visualization of the correlation matrix. See ?pairs and: http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=137 another would be to use the parallel lattice graphics function splom(): http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=50 Yet another would be to use correlation matrix ellipses with the plotcorr() function in the 'ellipse' CRAN package: http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=149 Each of the above are going to get you to the same basic place that a standard correlation matrix will get you to, which is to assess which genes seem to move in the same direction as others. That is, which gene pairs have a correlation of ~1. The problem that you have now is that you have so few observations, that it is conceivable that the correlation observed at this point is spurious and specific to this dataset. In other words, it may not be observed (or not to the same extent) if you had a much larger dataset. Thus, I would be cautious about trying to explain any behavior at this point. Remember, that with only 6 events, you cannot really create a viable LR model with even a single gene, using traditional approaches. So to simply eliminate the 7 collinear genes, in theory leaving you with 9 genes as covariates in the model, is not going to be very helpful. Your model estimates are not going to be stable. As part of your model building and experimental design strategy, I would refer you to Frank's book "Regression Modeling Strategies". More information here: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS Certainly, when you get substantially more data, there would be other data reduction techniques that could come into play here and Frank's book covers them. HTH, Marc [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.