Max Bolingbroke created ARROW-5125:
--------------------------------------

             Summary: [Python] Cannot roundtrip extreme dates through pyarrow
                 Key: ARROW-5125
                 URL: https://issues.apache.org/jira/browse/ARROW-5125
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.13.0
         Environment: Windows 10, Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 
22:22:05)
            Reporter: Max Bolingbroke


You can roundtrip many dates through a pyarrow array:

 
{noformat}
>>> pa.array([datetime.date(1980, 1, 1)], type=pa.date32())[0]
datetime.date(1980, 1, 1){noformat}
 

But (on Windows at least), not extreme ones:

 
{noformat}
>>> pa.array([datetime.date(1960, 1, 1)], type=pa.date32())[0]
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__
 File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py
OSError: [Errno 22] Invalid argument
>>> pa.array([datetime.date(3200, 1, 1)], type=pa.date32())[0]
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__
 File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py
{noformat}
This is because datetime.utcfromtimestamp and datetime.timestamp fail on these 
dates, but it seems we should be able to totally avoid invoking this function 
when deserializing dates. Ideally we would be able to roundtrip these as 
datetimes too, of course, but it's less clear that this will be easy. For some 
context on this see [https://bugs.python.org/issue29097].

This may be related to ARROW-3176 and ARROW-4746



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to