Jarno Seppanen created ARROW-1440: ------------------------------------- Summary: Segmentation fault after loading parquet file to pandas dataframe Key: ARROW-1440 URL: https://issues.apache.org/jira/browse/ARROW-1440 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.6.0 Environment: ubuntu 16.04.2 Reporter: Jarno Seppanen Attachments: part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet
Reading the attached parquet file into pandas dataframe and then inspecting the dataframe segfaults. {noformat} Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 11:58:13) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> import pyarrow >>> import pyarrow.parquet as pq >>> pyarrow.__version__ '0.6.0' >>> df = >>> pq.read_table('part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet') >>> \ ... .to_pandas() >>> len(df) 69 >>> df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 69 entries, 0 to 68 Data columns (total 6 columns): label 69 non-null int32 account_meta 69 non-null object features_type 69 non-null int32 features_size 69 non-null int32 features_indices 1 non-null object features_values 1 non-null object dtypes: int32(3), object(3) memory usage: 2.5+ KB >>> >>> print(df) Segmentation fault (core dumped) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)