[C++] read Parquet columns into 64-bit offset types

Steve Kim Fri, 08 Jan 2021 20:27:42 -0800

Is it possible to read Parquet columns into an Arrow schema that has
variable-width types with 64-bit offsets (LargeBinary, LargeList, etc.)?


For my current use case, I prefer the large types because the data overflow
32-bit offsets, and it is easier to waste memory with 8 bytes per offset
than it is to work with chunked arrays. (I need to access the Arrow buffers
from Java, and the Java library does not yet provide a convenient
abstraction for chunked arrays.)

I would like an option to use large types when reading Parquet files with
the Dataset API. My feature request could be satisfied more generally by
enabling users to specify type coercion/promotion when mapping Parquet
types to Arrow types.

Are other users interested in this feature? Is anyone opposed?

Steve Kim

[C++] read Parquet columns into 64-bit offset types

Reply via email to