Safyre Anderson created ARROW-1400:
--------------------------------------
Summary: Ability to create partitions when writing to Parquet
Key: ARROW-1400
URL: https://issues.apache.org/jira/browse/ARROW-1400
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Affects Versions: 0.6.0
Environment: Mac OS Sierra 10.12.6
Reporter: Safyre Anderson
Priority: Minor
I'm fairly new to pyarrow so I apologize if this is already a feature, but I
couldn't find a solution in the documentation nor an existing issue. Basically
I'm trying to export pandas dataframes to .parquet files with partitions. I can
see that pyarrow.parquet has a way of reading .parquet files with partitions,
but there's no indication that it can write with partitions. E.g., it would be
nice if there was a parameter in pyarrow.Table.write_table() that took a list
of columns to partition the table similar to the pyspark implementation:
spark.write.parquet's "partitionBy" parameter.
Referenced links:
https://arrow.apache.org/docs/python/parquet.html
https://arrow.apache.org/docs/python/parquet.html?highlight=pyarrow%20parquet%20partition
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)