Wes McKinney created ARROW-8250:
-----------------------------------
Summary: [C++] Add "random access" / slice read API to
RecordBatchFileReader
Key: ARROW-8250
URL: https://issues.apache.org/jira/browse/ARROW-8250
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Wes McKinney
Fix For: 1.0.0
If you want to read a small section of a file, it is not possible to easily
determine the relevant record batches that need "rehydrating".
I would propose the following:
* A way to cheaply read (and cache, so this doesn't have to be done multiple
times) all the RecordBatch metadata without deserializing the record batch data
structures themselves
* Based on the metadata you can then determine the range of batches that need
to be rehydrated and then sliced accordingly to produce the Table of interest
This functionality can be lifted into the Feather read APIs also
--
This message was sent by Atlassian Jira
(v8.3.4#803005)