Hi, I have a 900,000,000*9,000 matrix where I need to calculate the correlation between all entries along the smaller dimension, thus creating a 9k*9k correlation matrix. This matrix is too big to be uploaded in R, and is saved as a binary file. To access the data in the file I use mmap and some api-functions (to get all values in one row, one column, or one particular value). I'm looking for some advice in how to calculate the correlation matrix. Right now my approach is to do something similar to this (toy code):
corr.matrix<-matrix('numeric',ncol=9000,nrow=9000) for (i in 1:9000) { for (j in (i+1):9000) { # i1=... getting the index of item (i) in a second file # i2=....getting the index of item (j) g1=api$getCol(i1) g2=api$getCol(i2) cor.matrix[i,j]=cor(g1,g2) }} This will work, but will take forever. Any advice for how this can be done more efficiently? I'm running on a 2.6.18 linux system, with R version R-2.11.1. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Correlation-of-huge-matrix-saved-as-binary-file-tp4440119p4440119.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.