On C++, Armadillo can be passed a a pointer to memory for the backing store of 
its objects, so can use memory mapping.  On the R side, package bigmemory 
provides R access and initialization of memory-mapped arrays.  See 
https://www.r-bloggers.com/using-rcpparmadillo-with-bigmemory/.  This doesn’t 
provide language or platform interchange of the backing store, but would be an 
easy-ish solution.

On Mar 3, 2017, at 10:23 AM, 
bioc-devel-requ...@r-project.org<mailto:bioc-devel-requ...@r-project.org> wrote:

Some comment on Aaron's stuff

One possibility for doing things like this is if your code can be done in
C++ using a subset of rows or columns.  That can sometimes give the
necessary speed up.  What I mean is this

Say you can safely process 1000 cells (not matrix cells, but biological
cells, aka columns) at a time in RAM

iterate in R:
 get chunk i containing 1000 cells from the backend data storage
 do something on this sub matrix where everything is in a normal matrix
and you just use C++
 write results out to whatever backend you're using

Then, with a million cells you iterate over 1000 chunks in R.  And you
don't need to "touch" the full dataset which can be stored on an arbitrary
backend.  And this approach could be run even (potentially) with different
chunks on different nodes.


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to