-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Charles C. Berry wrote: > On Mon, 15 Dec 2008, Nathan S. Watson-Haigh wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Nathan S. Watson-Haigh wrote: >>> I'm trying to calculate Pearson correlation coefficients for a large >>> matrix of size 18563 x 18563. The following function takes about XX >>> minutes to complete, and I'd like to do this calculation about 15 times >>> and so speed is some what of an issue. > > I think you are on the wrong track, Nathan. > > The matrix you are starting with is 18563 x 18563 and the result of > finding the correlations amongst the columns of that matrix is also 18563 > x 18563. It will require more than 5 Gigabytes of memory to store the > result and the original matrix.
Yes the memory usage is somewhat large - luckily I have the use of a cluster with lots of shared memory! However, I'm interested to learn how you came about the calculation to determine the memory requirements. > > Likely the time needed to do the calc is inflated because of caching > issues and if your machine has less than enough memory to store the > result and all the intermediate pieces by swapping as well. > > You can finesse these by breaking your problem into smaller pieces, say > computing the correlations between each pair of 19 blocks of columns > (columns 1:977, 977+1:977, ... 18*977+1:977 ), then assembling the > results. This is possibly, however why is something like this not implemented internally in the cor() function if it poorly scales due to the large memory requirements? > > --- > > BTW, R already has the necessary machinery to calculate the crossproduct > matrix (etc) needed to find the correlations. You can access the low level > linear algebra that R uses. You can marry R to an optimized BLAS if you > like. > > So pulling in some other code to do this will not save you anything. If > you ever do decide to import C[++] code there is excellent documentation > in the Writing R Extensions manual, which you should review before > attempting to import C++ code into R. Thanks, I have seen this and it seemed quite technical to use as a starting point for someone unfamiliar with both C++ and incorporating C++ code into R. Cheers, Nathan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklHESYACgkQ9gTv6QYzVL68aQCgl0TsZL4CcnWFdlP073d7Vvui 5WAAoIcvGcunYzR+DM0Xv6R1TPmH4oA+ =5As1 -----END PGP SIGNATURE----- ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.