Safyre Anderson created ARROW-1400: -------------------------------------- Summary: Ability to create partitions when writing to Parquet Key: ARROW-1400 URL: https://issues.apache.org/jira/browse/ARROW-1400 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.6.0 Environment: Mac OS Sierra 10.12.6 Reporter: Safyre Anderson Priority: Minor
I'm fairly new to pyarrow so I apologize if this is already a feature, but I couldn't find a solution in the documentation nor an existing issue. Basically I'm trying to export pandas dataframes to .parquet files with partitions. I can see that pyarrow.parquet has a way of reading .parquet files with partitions, but there's no indication that it can write with partitions. E.g., it would be nice if there was a parameter in pyarrow.Table.write_table() that took a list of columns to partition the table similar to the pyspark implementation: spark.write.parquet's "partitionBy" parameter. Referenced links: https://arrow.apache.org/docs/python/parquet.html https://arrow.apache.org/docs/python/parquet.html?highlight=pyarrow%20parquet%20partition -- This message was sent by Atlassian JIRA (v6.4.14#64029)