Hi All, Over in the Apache Drill project, we developed some handy vector reader/writer abstractions. I wonder if they might be of interest to Apache Arrow. Key contributions of the "RowSet" abstractions:
* Control row batch size: the aggregate memory taken by a set of vectors (and all their sub-vectors for structured types.) * Control the maximum per-vector size. * Simple, highly optimized read/write interface that handles vector offset accounting, even for deeply nested types. * Minimize vector internal fragmentation (wasted space.) More information is available in [1]. Arrow improved and simplified Drill's original vector and metadata abstractions. As a result, work would be required to port the RowSet code from Drill's version of these classes to the Arrow versions. Does Arrow already have a similar solution? If not, would the above be useful for Arrow? Thanks, - Paul Apache Drill PMC member Co-author of the upcoming O'Reilly book "Learning Apache Drill" [1] https://github.com/paul-rogers/drill/wiki/RowSet-Abstractions-for-Arrow
