RE: Reading Parquet datetime column gives different answer in Spark vs PyArrow

2017-08-25 Thread Lucas Pickup
Quick follow up. I'm trying to work around this myself in the meantime. The goal is to qualify the TimestampValue with a timezone (by creating a new column in the arrow table based off the previous one). If this can be done before the Value's are converted to python it may fix the issue I was ha

Reading Parquet datetime column gives different answer in Spark vs PyArrow

2017-08-25 Thread Lucas Pickup
Hi all, I've been messing around with Spark and PyArrow Parquet reading. In my testing I've found that a Parquet file written by Spark containing a datetime column, results in different datetimes from Spark and PyArrow. The attached script demonstrates this. Output: Spark Reading the parquet f

[jira] [Created] (ARROW-1411) [PythonBooleans in Float Columns cause

2017-08-25 Thread Nick White (JIRA)
Nick White created ARROW-1411: - Summary: [PythonBooleans in Float Columns cause Key: ARROW-1411 URL: https://issues.apache.org/jira/browse/ARROW-1411 Project: Apache Arrow Issue Type: Bug