Re: Pandas timestamp

2017-04-26 Thread Bryan Cutler
Thanks Wes, after looking at it more the issue is with Spark's internal storage not being UTC. I discussed with Holden and it will probably be best to shelf timestamp support for SPARK-13534 and add it as a follow up PR.

Re: Pandas timestamp

2017-04-25 Thread Wes McKinney
>From what you've written I am not sure where the problem is. If you can point us to some unit tests or some other code that is not working, we can help with the "pandas" way of doing things. If changes are needed in PySpark this would be good motivation. On Tue, Apr 25, 2017 at 6:40 PM Bryan Cutl

Re: Pandas timestamp

2017-04-25 Thread Bryan Cutler
Thanks Wes. I think I've managed to confuse myself pretty good over this, I'm not sure where the fix should be. Spark, by default, will store a timestamp internally with python "time.mktime", which is in local time and not UTC, I believe. If there is a tzinfo object, Spark will use "calendar.tim

Re: Pandas timestamp

2017-04-25 Thread Wes McKinney
hi Bryan, You will want to create DataFrame objects having datetime64[ns] columns. There are some examples in the pyarrow test suite: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_convert_pandas.py#L324 You can convert an array of datetime.datetime objects to datetime64[n