First, we strongly recommend 64-bit R. Otherwise, you may not be able to scale up as far as you would like.
Second, as I think you realize, with big objects you may have to do things in chunks. I generally recommend working a column at a time rather than in blocks of rows if possible (better performance, particularly if the filebacking is used because of matrices exceeding RAM), and you may find that alternative data organization can really pay off. Keep an open mind. Third, you really need to avoid this runif(1,...) usage. It can't possibly be efficient. If a single call to runif() doesn't work, break it into chunks, certainly, but going down to chunks of size 1 just can't make any sense. Fourth, although you aren't there yet, once you get to the point you are trying to do things in parallel with foreach and bigmemory, you *may* need to place the following inside your foreach loop to make use of the shared memory properly: mdesc <- describe(m) foreach(...) %dopar% { require(bigmemory) m <- attach.big.matrix(mdesc) .... now operate on m } I say *may* because the backend doMC (not available in Windows) does not require this, but the other backends do; otherwise, the workers will not be able to properly address the shared-memory or filebacked big.matrix. Some documentation on bigmemory.org may help, and feel free to email us directly. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.