Suvayu Ali created ARROW-4814:
---------------------------------
Summary: [Python] Exception when writing nested columns that are
tuples to parquet
Key: ARROW-4814
URL: https://issues.apache.org/jira/browse/ARROW-4814
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.12.1
Environment: 4.20.8-100.fc28.x86_64
Reporter: Suvayu Ali
Attachments: df_to_parquet_fail.py, test.csv
I get an exception when I try to write a {{pandas.DataFrame}} to a parquet file
where one of the columns has tuples in them. I use tuples here because it
allows for easier querying in pandas (see ARROW-3806 for a more detailed
description).
{code}
Traceback (most recent call last):
File "df_to_parquet_fail.py", line 5, in <module>
df.to_parquet("test.parquet") # crashes
File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py",
line 2203, in to_parquet
partition_cols=partition_cols, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py",
line 252, in to_parquet
partition_cols=partition_cols, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py",
line 113, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas
File
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line
431, in dataframe_to_arrays
convert_types)]
File
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line
430, in <listcomp>
for c, t in zip(columns_to_convert,
File
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line
426, in convert_column
raise e
File
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line
420, in convert_column
return pa.array(col, type=ty, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 176, in pyarrow.lib.array
File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not
recognize Python value type when inferring an Arrow data type", 'Conversion
failed for column ALTS with type object')
{code}
The issue maybe replicated with the attached script and csv file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)