Antoine, That's a good question. I think there's a critical part that I haven't articulated well in the doc yet.
When converting from Arrow's columnar format to Rows, you have three options: (1) Go through the record batch row-by-row (2) Iterate through each column of record batch, add column value to each row (3) Iterate through smaller sub-batches of the record batch, and do (2) on each sub batch The converter would do (3). In cases I've heard of seems to be the most performant, though I would welcome others' opinions on that. I imagine there are some "memory locality" benefits, though I am no expert on that. This is most apparent when you look at the following two methods: template<T> class ToRowConverter<T> { // This is implemented by subclass virtual arrow::Result<std::vector<T>> Convert(std::shared_ptr<arrow::RecordBatch> batch); /// This derived arrow::Result<std::vector<T>> RecordBatchToRows(std::shared_ptr<arrow::RecordBatch> batch, size_t batch_size); } The idea here is that RecordBatchToRows() will convert in smaller slices dictated by batch_size. A Record Batch with 2 million rows might be converted 10,000 rows at a time. I'm going to update the doc to make that clearer, but does what I described above seem sensible? Best, Will Jones On Thu, Mar 24, 2022 at 9:47 AM Antoine Pitrou <anto...@python.org> wrote: > > Hello Will, > > So the added value would simply be the automatic definition of > iterator-returning methods? Or am I missing something? > > Regards > > Antoine. > > > Le 23/03/2022 à 19:36, Will Jones a écrit : > > Hello Arrow devs, > > > > I recently created ARROW-16006 [1] ("Helpers for converting between rows > > and Arrow objects"), and would appreciate feedback. It's meant for > > conversion from arbitrary schemas, whereas the existing C++ examples > > demonstrate fixed schemas (that is, known at compile-time). > > > > If you have implemented conversion between Arrow and a row-based data > > structures in C++ (or tried to): Would these helpers work for your use > > case? There is an associated draft design doc linked in the issue [2], > > which is open to comments. > > > > Thanks, > > > > Will Jones > > > > [1] https://issues.apache.org/jira/browse/ARROW-16006 > > [2] > > > https://docs.google.com/document/d/174tldmQLMCvOtjxGtFPeoLBefyE1x26_xntwfSzDXFA/edit?usp=sharing > > >