Tim Swast created ARROW-5450: -------------------------------- Summary: [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long Key: ARROW-5450 URL: https://issues.apache.org/jira/browse/ARROW-5450 Project: Apache Arrow Issue Type: Bug Reporter: Tim Swast
When I attempt to roundtrip from a list of moderately large (beyond what can be represented in nanosecond precision, but within microsecond precision) datetime objects to pyarrow and back, I get an OverflowError: Python int too large to convert to C long. pyarrow version: {noformat} $ pip freeze | grep pyarrow pyarrow==0.13.0{noformat} Reproduction: {code:java} import datetime import pandas import pyarrow import pytz timestamp_rows = [ datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc), None, datetime.datetime(9999, 12, 31, 23, 59, 59, 999999, tzinfo=pytz.utc), datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc), ] timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", tz="UTC")) timestamp_roundtrip = timestamp_array.to_pylist() # --------------------------------------------------------------------------- # OverflowError Traceback (most recent call last) # <ipython-input-25-4a798e917c20> in <module> # ----> 1 timestamp_roundtrip = timestamp_array.to_pylist() # # ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi in __iter__() # # ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi in pyarrow.lib.TimestampValue.as_py() # # ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi in pyarrow.lib._datetime_conversion_functions.lambda5() # # pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__() # # pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() # # OverflowError: Python int too large to convert to C long {code} For good measure, I also tested with timezone-naive timestamps with the same error: {code:java} naive_rows = [ datetime.datetime(1, 1, 1, 0, 0, 0), None, datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), datetime.datetime(1970, 1, 1, 0, 0, 0), ] naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None)) naive_roundtrip = naive_array.to_pylist() # --------------------------------------------------------------------------- # OverflowError Traceback (most recent call last) # <ipython-input-27-0c32e563d44a> in <module> # ----> 1 naive_roundtrip = naive_array.to_pylist() # # ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi in __iter__() # # ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi in pyarrow.lib.TimestampValue.as_py() # # ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi in pyarrow.lib._datetime_conversion_functions.lambda5() # # pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__() # # pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() # # OverflowError: Python int too large to convert to C long {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)