Justin Tan created ARROW-2369: --------------------------------- Summary: Large (>~20 GB) files written to Parquet via PyArrow are corrupted Key: ARROW-2369 URL: https://issues.apache.org/jira/browse/ARROW-2369 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Environment: Reproduced on Ubuntu + Mac OSX Reporter: Justin Tan Fix For: 0.9.0 Attachments: Screen Shot 2018-03-30 at 11.54.01 pm.png
When writing large Parquet files (above 20 GB or so) from Pandas to Parquet via the command {{pq.write_table(my_df, 'table.parquet')}} The write succeeds, but when the parquet file is loaded, the error message {{ArrowIOError: Invalid parquet file. Corrupt footer.}} appears. This same error occurs when the parquet file is written chunkwise as well. When the parquet files are small, say < 10 GB or so (drawn randomly from the same dataset), everything proceeds as normal. Details: Arrow v0.9.0 Reproduced on Ubuntu, Mac osx -- This message was sent by Atlassian JIRA (v7.6.3#76005)