Robert Dailey created ARROW-1938:
------------------------------------

             Summary: Error writing to partitioned dataset
                 Key: ARROW-1938
                 URL: https://issues.apache.org/jira/browse/ARROW-1938
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.8.0
         Environment: Linux (Ubuntu 16.04)
            Reporter: Robert Dailey
         Attachments: pyarrow_dataset_error.png

I receive the following error after upgrading to pyarrow 0.8.0 when writing to 
a dataset:
* ArrowIOError: Column 3 had 187374 while previous column had 10000

The command was:
write_table_values = {'row_group_size': 10000}
pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), 
'/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 
'hour'], **write_table_values)

This same command works in version 0.7.1.  I am trying to troubleshoot the 
problem but wanted to submit a ticket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to