[ https://issues.apache.org/jira/browse/ARROW-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-4814: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/21331 > [Python] Exception when writing nested columns that are tuples to parquet > ------------------------------------------------------------------------- > > Key: ARROW-4814 > URL: https://issues.apache.org/jira/browse/ARROW-4814 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.1 > Environment: 4.20.8-100.fc28.x86_64 > Reporter: Suvayu Ali > Priority: Major > Labels: pandas > Attachments: df_to_parquet_fail.py, test.csv > > > I get an exception when I try to write a {{pandas.DataFrame}} to a parquet > file where one of the columns has tuples in them. I use tuples here because > it allows for easier querying in pandas (see ARROW-3806 for a more detailed > description). > {code} > Traceback (most recent call last): > File "df_to_parquet_fail.py", line 5, in <module> > df.to_parquet("test.parquet") # crashes > File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py", > line 2203, in to_parquet > > partition_cols=partition_cols, **kwargs) > File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", > line 252, in to_parquet > > partition_cols=partition_cols, **kwargs) > File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", > line 113, in write > > table = self.api.Table.from_pandas(df, **from_pandas_kwargs) > File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas > File > "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 431, in dataframe_to_arrays > > convert_types)] > File > "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 430, in <listcomp> > > for c, t in zip(columns_to_convert, > File > "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 426, in convert_column > > raise e > File > "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 420, in convert_column > > return pa.array(col, type=ty, from_pandas=True, safe=safe) > File "pyarrow/array.pxi", line 176, in pyarrow.lib.array > File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not > recognize Python value type when inferring an Arrow data type", 'Conversion > failed for column ALTS with type object') > {code} > The issue maybe replicated with the attached script and csv file. -- This message was sent by Atlassian Jira (v8.20.10#820010)