Hi All,

I've encounter an issue where PyArrow does not appear to be propagating 
datetime metadata from parquet files into the resuling python objects.

λ python
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  5 2016, 11:41:13) [MSC 
v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pytz
>>> import pandas
>>> from datetime import datetime
>>>
>>> d1 = datetime.strptime('2015-07-05 23:50:00', '%Y-%m-%d %H:%M:%S')
>>> d1
datetime.datetime(2015, 7, 5, 23, 50)
>>> aware = pytz.utc.localize(d1)
>>> aware
datetime.datetime(2015, 7, 5, 23, 50, tzinfo=<UTC>)
>>>
>>> df = pandas.DataFrame()
>>> df['DateNaive'] = [d1]
>>> df['DateAware'] = [aware]
>>> df
            DateNaive                 DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00+00:00
>>>
>>> table  = pa.Table.from_pandas(df)
>>> table
pyarrow.Table
DateNaive: timestamp[ns]
DateAware: timestamp[ns, tz=UTC]
__index_level_0__: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", 
"pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, 
{"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": 
"datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": 
["__index_level_0__"]}
>>>
>>> pq.write_table(table, "E:\\pyarrowDates.parquet")
>>>
>>> pyarrowTable = pq.read_table("E:\\pyarrowDates.parquet")
>>> pyarrowTable
pyarrow.Table
DateNaive: timestamp[us]
DateAware: timestamp[us]
__index_level_0__: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", 
"pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, 
{"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": 
"datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": 
["__index_level_0__"]}
>>>
>>> pyarrowDF = pyarrowTable.to_pandas()
>>> pyarrowDF
            DateNaive           DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00
>>>

This was on PyArrow 0.6.0.

Cheers, Lucas Pickup

Reply via email to