Ying Wang created ARROW-3208: -------------------------------- Summary: Segmentation fault when reading a Parquet partitioned dataset to a Parquet file Key: ARROW-3208 URL: https://issues.apache.org/jira/browse/ARROW-3208 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Environment: Ubuntu 16.04 LTS; System76 Oryx Pro Reporter: Ying Wang
Steps to reproduce: # Create a partitioned dataset with the following code: ```python import numpy as np import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame({ 'one': [-1, 10, 2.5, 100, 1000, 1, 29.2], 'two': [-1, 10, 2, 100, 1000, 1, 11], 'three': [0, 0, 0, 0, 0, 0, 0] }) table = pa.Table.from_pandas(df) pq.write_to_dataset(table, root_path='/home/yingw787/misc/example_dataset', partition_cols=['one', 'two']) ``` # Create a Parquet file from a PyArrow Table created from the partitioned Parquet dataset: ```python import pyarrow.parquet as pq table = pq.ParquetDataset('/path/to/dataset').read() pq.write_table(table, '/path/to/example.parquet') ``` EXPECTED: * Successful write GOT: * Segmentation fault Issue reference on GitHub mirror: https://github.com/apache/arrow/issues/2511 -- This message was sent by Atlassian JIRA (v7.6.3#76005)