On C++, Armadillo can be passed a a pointer to memory for the backing store of its objects, so can use memory mapping. On the R side, package bigmemory provides R access and initialization of memory-mapped arrays. See https://www.r-bloggers.com/using-rcpparmadillo-with-bigmemory/. This doesn’t provide language or platform interchange of the backing store, but would be an easy-ish solution.
On Mar 3, 2017, at 10:23 AM, bioc-devel-requ...@r-project.org<mailto:bioc-devel-requ...@r-project.org> wrote: Some comment on Aaron's stuff One possibility for doing things like this is if your code can be done in C++ using a subset of rows or columns. That can sometimes give the necessary speed up. What I mean is this Say you can safely process 1000 cells (not matrix cells, but biological cells, aka columns) at a time in RAM iterate in R: get chunk i containing 1000 cells from the backend data storage do something on this sub matrix where everything is in a normal matrix and you just use C++ write results out to whatever backend you're using Then, with a million cells you iterate over 1000 chunks in R. And you don't need to "touch" the full dataset which can be stored on an arbitrary backend. And this approach could be run even (potentially) with different chunks on different nodes. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel