Re: [Rcpp-devel] efficient ingestion of "sparse csv"

2021-05-26 Thread Serguei Sokol
Le 26/05/2021 à 16:36, Vincent Carey a écrit : On this theme, the following proved sufficient to ingest and convert sparse csv without column headers or row names: Nice to share your final solution which could be further shorten to smth like: #include "RcppArmadillo.h" using namespace Rcpp; //

Re: [Rcpp-devel] efficient ingestion of "sparse csv"

2021-05-26 Thread Vincent Carey
On this theme, the following proved sufficient to ingest and convert sparse csv without column headers or row names: #include "RcppArmadillo.h" using namespace Rcpp; // [[Rcpp::depends(RcppArmadillo)]] // [[Rcpp::export]] List parse_sparse_csv_impl(SEXP fname) { using namespace Rcpp; std::strin

Re: [Rcpp-devel] efficient ingestion of "sparse csv" (Vincent Carey)

2021-05-11 Thread Peter Hickey
Hi Vince, Aaron Lun (CC-ed) has written scuttle::readSparseCounts() for this purpose. You may have already seen it since it's in BioC. It's written in R rather than C++ but I imagine it's pretty efficient since, well, it's written by Aaron (https://github.com/LTLA/scuttle/blob/master/R/readSparseC

Re: [Rcpp-devel] efficient ingestion of "sparse csv"

2021-05-10 Thread Vincent Carey
Thanks Dirk, lots of useful information there. I wonder whether the sparse ingestion problem would best be solved with multiple passes -- it seems one would want to learn the dimensions and the number of nonzero elements per row to allocate the index vectors, and then populate them and the data ve

Re: [Rcpp-devel] efficient ingestion of "sparse csv"

2021-05-10 Thread Dirk Eddelbuettel
Vincent, In the broad terms of the question the best answer may be a simple "sure". More seriously, there have been many approaches. Consider for example the recent Rcpp Gallery post lead by Zach (with some edits by me): https://gallery.rcpp.org/articles/sparse-matrix-class/ It's focus on not

[Rcpp-devel] efficient ingestion of "sparse csv"

2021-05-10 Thread Vincent Carey
This problem has been discussed in various places but I don't see a clear solution. Certain applications are generating large comma-delimited files with mostly zero entries. The aim is to ingest efficiently, converting to sparse representation a record at a time. Presumably a triplet format woul