rob created ARROW-2814: -------------------------- Summary: Error inferring Arrow type for Python object array. Got Python object of type dict but can only handle these types: string, bool, float, int, date, time, decimal, list, array Key: ARROW-2814 URL: https://issues.apache.org/jira/browse/ARROW-2814 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: rob Attachments: part-00000-8f03690f-736d-43a9-9287-6db9e228d59c.c000.gz.parquet
There is a problem when trying to run pa.Table.from_pandas() on a parquet file that has a json string in it. I have attached the file to this ticket that is the source of the problem and the code below will show the error # Reproducible code import pandas as pd import pyarrow as pa import pyarrow.parquet as pq pd.options.display.max_colwidth = 10000 pq_table = pq.read_table("part-00000-8f03690f-736d-43a9-9287-6db9e228d59c.c000.gz.parquet") panda_table = pq_table.to_pandas() orginal_count = len(panda_table) # Fails table_output = pa.Table.from_pandas(panda_table) del panda_table['payload'] # Works table_output = pa.Table.from_pandas(panda_table) # payload is the faulty column. Print out data pq_table = pq.read_table("part-00000-8f03690f-736d-43a9-9287-6db9e228d59c.c000.gz.parquet") panda_table = pq_table.to_pandas() orginal_count = len(panda_table) table_output = pa.Table.from_pandas(panda_table[['payload']]) panda_table[['payload']] -- This message was sent by Atlassian JIRA (v7.6.3#76005)