Alex Mendelson created ARROW-3020:
-------------------------------------
Summary: Addition of option to allow empty row groups in pyarrow
Key: ARROW-3020
URL: https://issues.apache.org/jira/browse/ARROW-3020
Project: Apache Arrow
Issue Type: New Feature
Components: C++, Python
Reporter: Alex Mendelson
While our use case is not common, I was able to find one related request from
roughly a year ago. Could this be added as a feature?
https://issues.apache.org/jira/browse/PARQUET-1047
*Motivation*
We have an application where each row is associated with one of N contexts,
though a minority of contexts may have no associated rows. When encountering
the Nth context, we will wish to retrieve all the associated rows. Row groups
would provide a natural way to index the data, as the nth context could
naturally relate to the nth row group.
Unfortunately, this is not possible at the present time, as pyarrow does not
support writing empty row groups. If one writes a pyarrow.Table containing zero
rows using pyarrow.parquet.ParquetWriter, it is omitted from the final file,
and this distorts the indexing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)