Hi, I need help to integrate arrow cpp in an open source storage project. In fact I built cpp library and can call api.
I have a c++ code that reads data by chunks then uses some erasure code to rebuild original data. The rebuild is done in chunks , At each iteration I can access a buffer of rebuilt data. My need is to pass this data as a stream to arrow process then send the processed stream. For example if my original file is a csv and I would like to filter and save first column: col1,col2, col3, col3 a1,b1,c1,d1 an,bn,cn,dn split to 6 chunks of equal sizes chunk1: chunk1: a1,b1,c1,d1 ak,bk chunk2: ck,dk ... am,bm,cm,dm and so on. My question is how to use the right StreamReader in arrow and how this deals with incomplete records( lines) at the beginning and end of each chunk ? Here a snippet of code I use : buffer_type_t res = fut.get0(); BOOST_LOG_TRIVIAL(trace) << "RawxBackendReader: Got result with buffer size: " << res.size(); std::shared_ptr<arrow::io::InputStream> input; std::shared_ptr<arrow::io::BufferReader> buffer(new arrow::io::BufferReader( reinterpret_cast<const uint8_t*>(res.get()), res.size())); input = buffer; BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get(); ArrowFilter arrow_filter = ArrowFilter(input); arrow_filter.ToCsv(); result.push_back(std::move(res)); Thank you