On Wed, 14 May 2008, Jorge Ivan Velez wrote:

Dear useRs:
I'm not sure if it's the correct place to ask but I'll try it out. I've been
reading about how to perform Principal Component Analysis (PCA) in
microarrays (see [1]) and there's something that I don't get it. Basically
it's related with performing PCA over data sets which number of variables is
greater than the number of samples. For example in the paper mentioned
above, the number of variables (genes) and samples (tumors) is 8538 and 104,
respectively. My understanding is that, in PCA, the number of samples (n)
must be greater than the number of variables (p) and its goal is to seek k
components, such as k<p and the variance in this new data set be
maximized. Am I wrong?

Yes, in detail. One of the properties of PCA is to seek projections (unit-length linear combinations of the variables) of maximal variance, each being uncorrelated with earlier ones. That is well-defined for n < p. But you will only get at most n PCs of non-zero variance (and at most n-1 unless you centre externally), and the rest are pretty arbitrary basis vectors for the space of constant combinations.

Could somebody please tell me how is possible to perform PCA when the number of variables is greater than the number of samples and how to do it in R? I'm really confused. In R I've tried "prcomp" and "princomp" but they didn't work.

See any good book on multivariate analysis, or your statistical consultant. (See the posting guide as to why this is not the list on which to ask that question.)

That you can do this does not make it sensible, but it can be interpretable if there is a strong signal associated with a handful of genes -- but then so can other methods.

And BTW, prcomp() *does* work, e.g.

X <- matrix(rnorm(20*200), 20)
fit <- prcomp(X)
str(fit)

so the problem is what you did (and you didn't manage to tell us what that was -- see the footer of the message). ?princomp does tell you to use prcomp() in this case.

I'm using Win XP SP2, Intel Core- 2 Duo 2.4 GHz and R 2.7.0 Patched.


Thanks in advance,


Jorge Ivan Velez



[1] Ringn?r, M.  What is principal components analysis? Nature Biotechnology
26, 303 - 304 (2008),
http://www.nature.com/nbt/journal/v26/n3/full/nbt0308-303.html

Hmm, that's not a free resource.


        [[alternative HTML version deleted]]



--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to