That seems to solve my problem, I will try this way, thak you very much. Francesco
On Thu, Dec 21, 2017 at 1:16 PM, Martin Morgan <martin.mor...@roswellpark.org> wrote: > On 12/21/2017 06:22 AM, Francesco Napolitano wrote: >> >> Hi, >> >> I need to deal with very large matrices and I was thinking of using >> HDF5-based data models. However, from the documentation and examples >> that I have been looking at, I'm not quite sure how to do this. >> >> My use case is as follows. >> I want to build a very large matrix one column at a time, and I need >> to write columns directly to disk since I would otherwise run out of >> memory. I need a format that, afterwards, will allow me to extract >> subsets of rows or columns and rank them. The subsets will be small >> enough to be loaded in memory. Can I achieve this with current HDF5 >> support in R? > > > this is basically straight-forward in rhdf5. The idea is to create a dataset > of the size to contain your total data > > library(rhdf5) > fl <- tempfile() > h5createFile(fl) > > nrow <- 10000 > ncol <- 100 > h5createDataset(fl, "big", c(nrow, ncol), showWarnings = FALSE) > > then to fill it in chunks by specifying which start row / column you'd like > to write to and the 'count' of the number data points in each direction > you'd like to write to > > chunk_ncol <- ncol / 10 > j <- 1 # which column to start writing? > > while (j < ncol) { > m <- matrix(seq(1, length.out = nrow * chunk_ncol), nrow) > h5write(m, fl, "big", start = c(1, j), count = c(nrow, chunk_ncol)) > j <- j + chunk_ncol > } > > You can read arbitrary 'slabs' > > h5read(fl, "big", start = c(1, 1), count = c(5, 5)) > h5read(fl, "big", start = c(1, 9), count = c(5, 2)) > > Probably you don't want to write 1 column at a time, but as many columns as > comfortably fit into memory. This minimizes the number of R function calls > needed to write / read the data. > > The HDF5Array package provides an easy abstraction for reading (probably > writing is possible too, but it might be easier to understand the building > blocks first). > >> library(HDF5Array) >> hdf <- HDF5Array(fl, "big") >> hdf > HDF5Matrix object of 10000 x 100 doubles: > [,1] [,2] [,3] ... [,99] [,100] > [1,] 1 10001 20001 . 80001 90001 > [2,] 2 10002 20002 . 80002 90002 > [3,] 3 10003 20003 . 80003 90003 > [4,] 4 10004 20004 . 80004 90004 > [5,] 5 10005 20005 . 80005 90005 > ... . . . . . . > [9996,] 9996 19996 29996 . 89996 99996 > [9997,] 9997 19997 29997 . 89997 99997 > [9998,] 9998 19998 29998 . 89998 99998 > [9999,] 9999 19999 29999 . 89999 99999 > [10000,] 10000 20000 30000 . 90000 100000 >> hdf[1:5, 1:5] > DelayedMatrix object of 5 x 5 doubles: > [,1] [,2] [,3] [,4] [,5] > [1,] 1 10001 20001 30001 40001 > [2,] 2 10002 20002 30002 40002 > [3,] 3 10003 20003 30003 40003 > [4,] 4 10004 20004 30004 40004 > [5,] 5 10005 20005 30005 40005 >> as.matrix(hdf[1:5, 1:5]) > [,1] [,2] [,3] [,4] [,5] > [1,] 1 10001 20001 30001 40001 > [2,] 2 10002 20002 30002 40002 > [3,] 3 10003 20003 30003 40003 > [4,] 4 10004 20004 30004 40004 > [5,] 5 10005 20005 30005 40005 >> rowSums(hdf[1:5, 1:5]) > [1] 100005 100010 100015 100020 100025 > > Martin > >> >> Any help greatly appreciated. >> >> than you, >> Francesco >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel