Tim Swast created ARROW-5450:
--------------------------------

             Summary: [Python] TimestampArray.to_pylist() fails with 
OverflowError: Python int too large to convert to C long
                 Key: ARROW-5450
                 URL: https://issues.apache.org/jira/browse/ARROW-5450
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Tim Swast


When I attempt to roundtrip from a list of moderately large (beyond what can be 
represented in nanosecond precision, but within microsecond precision) datetime 
objects to pyarrow and back, I get an OverflowError: Python int too large to 
convert to C long.

pyarrow version:
{noformat}
$ pip freeze | grep pyarrow
pyarrow==0.13.0{noformat}
 

Reproduction:
{code:java}
import datetime

import pandas
import pyarrow
import pytz


timestamp_rows = [
datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
None,
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=pytz.utc),
datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
]
timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", 
tz="UTC"))
timestamp_roundtrip = timestamp_array.to_pylist()


# ---------------------------------------------------------------------------
# OverflowError Traceback (most recent call last)
# <ipython-input-25-4a798e917c20> in <module>
# ----> 1 timestamp_roundtrip = timestamp_array.to_pylist()
#
# 
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
 in __iter__()
#
# 
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
 in pyarrow.lib.TimestampValue.as_py()
#
# 
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
 in pyarrow.lib._datetime_conversion_functions.lambda5()
#
# pandas/_libs/tslibs/timestamps.pyx in 
pandas._libs.tslibs.timestamps.Timestamp.__new__()
#
# pandas/_libs/tslibs/conversion.pyx in 
pandas._libs.tslibs.conversion.convert_to_tsobject()
#
# OverflowError: Python int too large to convert to C long
{code}
For good measure, I also tested with timezone-naive timestamps with the same 
error:
{code:java}
naive_rows = [
datetime.datetime(1, 1, 1, 0, 0, 0),
None,
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999),
datetime.datetime(1970, 1, 1, 0, 0, 0),
]
naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
naive_roundtrip = naive_array.to_pylist()

# ---------------------------------------------------------------------------
# OverflowError Traceback (most recent call last)
# <ipython-input-27-0c32e563d44a> in <module>
# ----> 1 naive_roundtrip = naive_array.to_pylist()
#
# 
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
 in __iter__()
#
# 
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
 in pyarrow.lib.TimestampValue.as_py()
#
# 
~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
 in pyarrow.lib._datetime_conversion_functions.lambda5()
#
# pandas/_libs/tslibs/timestamps.pyx in 
pandas._libs.tslibs.timestamps.Timestamp.__new__()
#
# pandas/_libs/tslibs/conversion.pyx in 
pandas._libs.tslibs.conversion.convert_to_tsobject()
#
# OverflowError: Python int too large to convert to C long
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to