Mark Waddle created ARROW-8967:
--
Summary: [Python] [Parquet] Table.to_pandas() fails to convert
valid TIMESTAMP_MILLIS fails to convert to pandas timestamp
Key: ARROW-8967
URL: https://issues.apache.org/jira/browse
Thomas Buhrmann created ARROW-2706:
--
Summary: pandas Timestamp not supported in ListArray
Key: ARROW-2706
URL: https://issues.apache.org/jira/browse/ARROW-2706
Project: Apache Arrow
Issue
Thanks Wes, after looking at it more the issue is with Spark's internal
storage not being UTC. I discussed with Holden and it will probably be
best to shelf timestamp support for SPARK-13534 and add it as a follow up
PR.
>From what you've written I am not sure where the problem is. If you can
point us to some unit tests or some other code that is not working, we can
help with the "pandas" way of doing things. If changes are needed in
PySpark this would be good motivation.
On Tue, Apr 25, 2017 at 6:40 PM Bryan Cutl
Thanks Wes. I think I've managed to confuse myself pretty good over this,
I'm not sure where the fix should be. Spark, by default, will store a
timestamp internally with python "time.mktime", which is in local time and
not UTC, I believe. If there is a tzinfo object, Spark will use
"calendar.tim
hi Bryan,
You will want to create DataFrame objects having datetime64[ns] columns.
There are some examples in the pyarrow test suite:
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_convert_pandas.py#L324
You can convert an array of datetime.datetime objects to datetime64[n
I am writing a unit test to compare that a Pandas DataFrame made by Arrow
is equal to one constructed directly with data. The timestamp values are a
Python datetime object with a timezone tzinfo object. When I compare the
results, the values are equal but the schema is not. Using arrow the type