Hi Francesco, this is certainly achievable with currently available HDF5 support in R/Bioconductor. For example the rhdf5 package gives you access to this functionality (https://bioconductor.org/packages/release/bioc/html/rhdf5.html (https://bioconductor.org/packages/release/bioc/html/rhdf5.html)).
rhdf5 is relatively 'low-level', in the sense that it is really close to the HDF5 library it exposes to R (i.e. you get h5read an h5write functions). For what you are describing I typically use a small wrapper to make my life a bit easier, I have something like that on github here: https://github.com/PaulPyl/h5array (https://github.com/PaulPyl/h5array) Please note that this is not an official Bioconductor package so it doesn't fulfill the strict standards of documentation etc., since it is just a small wrapper to give you an array-like object that writes/reads its data from disk though, it should be fairly straightforward to use. Best, Paul On Thu, Dec 21, 2017 at 12:22, Francesco Napolitano wrote: Hi, I need to deal with very large matrices and I was thinking of using HDF5-based data models. However, from the documentation and examples that I have been looking at, I'm not quite sure how to do this. My use case is as follows. I want to build a very large matrix one column at a time, and I need to write columns directly to disk since I would otherwise run out of memory. I need a format that, afterwards, will allow me to extract subsets of rows or columns and rank them. The subsets will be small enough to be loaded in memory. Can I achieve this with current HDF5 support in R? Any help greatly appreciated. than you, Francesco _______________________________________________ Bioc-devel@r-project.org (mailto:Bioc-devel@r-project.org) mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel (https://stat.ethz.ch/mailman/listinfo/bioc-devel) [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel