If the release condition is for the regression to be fixed in less than
24 hours (less than 12 hours now?), I think we should simply revert the
original PR and work on a fix more leisurely for 1.1.0 (or even 1.0.1).

Unless it really causes havoc for Spark users, in which case a
circumvention should be found.

Regards

Antoine.


Le 20/07/2020 à 16:46, Krisztián Szűcs a écrit :
> If I understand correctly we used to just store the timestamp and the
> timezone if an explicit arrow type was passed during the python->arrow
> conversion, but the timestamp values were not changed in any way.
> Micah's current patch changes the python->arrow conversion behavior to
> normalize all values to utc timestamps.
> 
> While it's definitely an improvement over the previously ignored
> timezones, I'm not sure that it won't cause unexpected regressions in
> the users' codebases.
> I'm still trying to better understand the issue and its compatibility
> implications, but my intuition tells me that we should apply the
> reversion instead and properly handle the datetime value conversions
> in an upcoming minor release.
> 
> Either way we should move this conversation to the pull request [1],
> because the code snippets pasted here are hardly readable.
> 
> [1]: https://github.com/apache/arrow/pull/7805
> 
> On Mon, Jul 20, 2020 at 9:40 AM Sutou Kouhei <k...@clear-code.com> wrote:
>>
>> Done: https://github.com/apache/arrow/pull/7805#issuecomment-660855376
>>
>> We can use ...-3.8-... not ...-3.7-... because we don't have
>> ...-3.7-... task in
>> https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml.
>>
>> In <cak7z5t8hqcsd3meg42cuzkscpjr3zndsvrjmm8vied0gzto...@mail.gmail.com>
>>   "Re: [VOTE] Release Apache Arrow 1.0.0 - RC1" on Mon, 20 Jul 2020 00:14:00 
>> -0700,
>>   Micah Kornfield <emkornfi...@gmail.com> wrote:
>>
>>> FYI, I'm not sure if it is a permissions issue or I've done something wrong
>>> but github-actions does not seem to be responding to "@github-actions
>>> <https://github.com/github-actions> crossbow submit
>>> test-conda-python-3.7-spark-master" when I enter it.  If someone could kick
>>> off the spark integration test I would be grateful.
>>>
>>> On Mon, Jul 20, 2020 at 12:09 AM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Bryan.  I cherry-picked your change onto my change [1] which now
>>>> honors timezone aware datetime objects on ingestion.  I've kicked off the
>>>> spark integration tests.
>>>>
>>>> If this change doesn't work I think the correct course of action is to
>>>> provide an environment variable in python to turn back to the old behavior
>>>> (ignoring timezones on conversion).  I think honoring timezone information
>>>> where possible is a strict improvement but I agree we should give users an
>>>> option to not break if they wish to upgrade to the latest version.  I need
>>>> to get some sleep but I will have another PR posted tomorrow evening if the
>>>> current one doesn't unblock the release.
>>>>
>>>> [1] https://github.com/apache/arrow/pull/7805
>>>>
>>>> On Sun, Jul 19, 2020 at 10:50 PM Bryan Cutler <cutl...@gmail.com> wrote:
>>>>
>>>>> I'd rather not see ARROW-9223 reverted, if possible. I will put up my
>>>>> hacked patch to Spark for this so we can test against it if needed, and
>>>>> could share my branch if anyone else wants to test it locally.
>>>>>
>>>>> On Sun, Jul 19, 2020 at 7:35 PM Micah Kornfield <emkornfi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'll spend some time tonight on it and if I can't get round trip working
>>>>>> I'll handle reverting
>>>>>>
>>>>>> On Sunday, July 19, 2020, Wes McKinney <wesmck...@gmail.com> wrote:
>>>>>>
>>>>>>> On Sun, Jul 19, 2020 at 7:33 PM Neal Richardson
>>>>>>> <neal.p.richard...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> It sounds like you may have identified a pyarrow bug, which sounds
>>>>> not
>>>>>>>> good, though I don't know enough about the broader context to know
>>>>>>> whether
>>>>>>>> this is (1) worse than 0.17 or (2) release blocking. I defer to
>>>>> y'all
>>>>>> who
>>>>>>>> know better.
>>>>>>>>
>>>>>>>> If there are quirks in how Spark handles timezone-naive timestamps,
>>>>>>>> shouldn't the fix/workaround go in pyspark, not pyarrow? For what
>>>>> it's
>>>>>>>> worth, I dealt with similar Spark timezone issues in R recently:
>>>>>>>> https://github.com/sparklyr/sparklyr/issues/2439 I handled with it
>>>>> (in
>>>>>>>> sparklyr, not the arrow R package) by always setting a timezone when
>>>>>>>> sending data to Spark. Not ideal but it made the numbers "right".
>>>>>>>
>>>>>>> Since people are running this code in production we need to be careful
>>>>>>> about disrupting them. Unfortunately I'm at the limit of how much time
>>>>>>> I can spend on this, but releasing with ARROW-9223 as is (without
>>>>>>> being partially or fully reverted) makes me deeply uncomfortable. So I
>>>>>>> hope the matter can be resolved.
>>>>>>>
>>>>>>>> Neal
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Jul 19, 2020 at 5:13 PM Wes McKinney <wesmck...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Honestly I think reverting is the best option. This change
>>>>> evidently
>>>>>>>>> needs more time to "season" and perhaps this is motivation to
>>>>> enhance
>>>>>>>>> test coverage in a number of places.
>>>>>>>>>
>>>>>>>>> On Sun, Jul 19, 2020 at 7:11 PM Wes McKinney <wesmck...@gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I am OK with any solution that doesn't delay the production of
>>>>> the
>>>>>>>>>> next RC by more than 24 hours
>>>>>>>>>>
>>>>>>>>>> On Sun, Jul 19, 2020 at 7:08 PM Micah Kornfield <
>>>>>>> emkornfi...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> If I read the example right it looks like constructing from
>>>>>> python
>>>>>>>>> types
>>>>>>>>>>> isn't keeping timezones into in place?  I can try make a patch
>>>>>> that
>>>>>>>>> fixes
>>>>>>>>>>> that tonight or would the preference be to revert my patch
>>>>> (note
>>>>>> I
>>>>>>>>> think
>>>>>>>>>>> another bug with a prior bug was fixed in my PR as well)
>>>>>>>>>>>
>>>>>>>>>>> -Micah
>>>>>>>>>>>
>>>>>>>>>>> On Sunday, July 19, 2020, Wes McKinney <wesmck...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I think I see the problem now:
>>>>>>>>>>>>
>>>>>>>>>>>> In [40]: parr
>>>>>>>>>>>> Out[40]:
>>>>>>>>>>>> 0           {'f0': 1969-12-31 16:00:00-08:00}
>>>>>>>>>>>> 1    {'f0': 1969-12-31 16:00:00.000001-08:00}
>>>>>>>>>>>> 2    {'f0': 1969-12-31 16:00:00.000002-08:00}
>>>>>>>>>>>> dtype: object
>>>>>>>>>>>>
>>>>>>>>>>>> In [41]: parr[0]['f0']
>>>>>>>>>>>> Out[41]: datetime.datetime(1969, 12, 31, 16, 0,
>>>>>> tzinfo=<DstTzInfo
>>>>>>>>>>>> 'America/Los_Angeles' PST-1 day, 16:00:00 STD>)
>>>>>>>>>>>>
>>>>>>>>>>>> In [42]: pa.array(parr)
>>>>>>>>>>>> Out[42]:
>>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f0893706a60>
>>>>>>>>>>>> -- is_valid: all not null
>>>>>>>>>>>> -- child 0 type: timestamp[us]
>>>>>>>>>>>>   [
>>>>>>>>>>>>     1969-12-31 16:00:00.000000,
>>>>>>>>>>>>     1969-12-31 16:00:00.000001,
>>>>>>>>>>>>     1969-12-31 16:00:00.000002
>>>>>>>>>>>>   ]
>>>>>>>>>>>>
>>>>>>>>>>>> In [43]: pa.array(parr).field(0).type
>>>>>>>>>>>> Out[43]: TimestampType(timestamp[us])
>>>>>>>>>>>>
>>>>>>>>>>>> On 0.17.1
>>>>>>>>>>>>
>>>>>>>>>>>> In [8]: arr = pa.array([0, 1, 2], type=pa.timestamp('us',
>>>>>>>>>>>> 'America/Los_Angeles'))
>>>>>>>>>>>>
>>>>>>>>>>>> In [9]: arr
>>>>>>>>>>>> Out[9]:
>>>>>>>>>>>> <pyarrow.lib.TimestampArray object at 0x7f9dede69d00>
>>>>>>>>>>>> [
>>>>>>>>>>>>   1970-01-01 00:00:00.000000,
>>>>>>>>>>>>   1970-01-01 00:00:00.000001,
>>>>>>>>>>>>   1970-01-01 00:00:00.000002
>>>>>>>>>>>> ]
>>>>>>>>>>>>
>>>>>>>>>>>> In [10]: struct_arr = pa.StructArray.from_arrays([arr],
>>>>>>> names=['f0'])
>>>>>>>>>>>>
>>>>>>>>>>>> In [11]: struct_arr
>>>>>>>>>>>> Out[11]:
>>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f9ded0016e0>
>>>>>>>>>>>> -- is_valid: all not null
>>>>>>>>>>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles]
>>>>>>>>>>>>   [
>>>>>>>>>>>>     1970-01-01 00:00:00.000000,
>>>>>>>>>>>>     1970-01-01 00:00:00.000001,
>>>>>>>>>>>>     1970-01-01 00:00:00.000002
>>>>>>>>>>>>   ]
>>>>>>>>>>>>
>>>>>>>>>>>> In [12]: struct_arr.to_pandas()
>>>>>>>>>>>> Out[12]:
>>>>>>>>>>>> 0           {'f0': 1970-01-01 00:00:00}
>>>>>>>>>>>> 1    {'f0': 1970-01-01 00:00:00.000001}
>>>>>>>>>>>> 2    {'f0': 1970-01-01 00:00:00.000002}
>>>>>>>>>>>> dtype: object
>>>>>>>>>>>>
>>>>>>>>>>>> In [13]: pa.array(struct_arr.to_pandas())
>>>>>>>>>>>> Out[13]:
>>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f9ded003210>
>>>>>>>>>>>> -- is_valid: all not null
>>>>>>>>>>>> -- child 0 type: timestamp[us]
>>>>>>>>>>>>   [
>>>>>>>>>>>>     1970-01-01 00:00:00.000000,
>>>>>>>>>>>>     1970-01-01 00:00:00.000001,
>>>>>>>>>>>>     1970-01-01 00:00:00.000002
>>>>>>>>>>>>   ]
>>>>>>>>>>>>
>>>>>>>>>>>> In [14]: pa.array(struct_arr.to_pandas()).type
>>>>>>>>>>>> Out[14]: StructType(struct<f0: timestamp[us]>)
>>>>>>>>>>>>
>>>>>>>>>>>> So while the time zone is getting stripped in both cases,
>>>>> the
>>>>>>> failure
>>>>>>>>>>>> to round trip is a problem. If we are going to attach the
>>>>> time
>>>>>>> zone
>>>>>>>>> in
>>>>>>>>>>>> to_pandas() then we need to respect it when going the other
>>>>>> way.
>>>>>>>>>>>>
>>>>>>>>>>>> This looks like a regression to me and so I'm inclined to
>>>>>> revise
>>>>>>> my
>>>>>>>>>>>> vote on the release to -0/-1
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:46 PM Wes McKinney <
>>>>>>> wesmck...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah I forgot that this is a "feature" of nanosecond
>>>>> timestamps
>>>>>>>>>>>>>
>>>>>>>>>>>>> In [21]: arr = pa.array([0, 1, 2], type=pa.timestamp('us',
>>>>>>>>>>>>> 'America/Los_Angeles'))
>>>>>>>>>>>>>
>>>>>>>>>>>>> In [22]: struct_arr = pa.StructArray.from_arrays([arr],
>>>>>>>>> names=['f0'])
>>>>>>>>>>>>>
>>>>>>>>>>>>> In [23]: struct_arr.to_pandas()
>>>>>>>>>>>>> Out[23]:
>>>>>>>>>>>>> 0           {'f0': 1969-12-31 16:00:00-08:00}
>>>>>>>>>>>>> 1    {'f0': 1969-12-31 16:00:00.000001-08:00}
>>>>>>>>>>>>> 2    {'f0': 1969-12-31 16:00:00.000002-08:00}
>>>>>>>>>>>>> dtype: object
>>>>>>>>>>>>>
>>>>>>>>>>>>> So this is working as intended, such as it is
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:40 PM Wes McKinney <
>>>>>>> wesmck...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There seems to be other broken StructArray stuff
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In [14]: arr = pa.array([0, 1, 2],
>>>>> type=pa.timestamp('ns',
>>>>>>>>>>>>>> 'America/Los_Angeles'))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In [15]: struct_arr = pa.StructArray.from_arrays([arr],
>>>>>>>>> names=['f0'])
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In [16]: struct_arr
>>>>>>>>>>>>>> Out[16]:
>>>>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f089370f590>
>>>>>>>>>>>>>> -- is_valid: all not null
>>>>>>>>>>>>>> -- child 0 type: timestamp[ns, tz=America/Los_Angeles]
>>>>>>>>>>>>>>   [
>>>>>>>>>>>>>>     1970-01-01 00:00:00.000000000,
>>>>>>>>>>>>>>     1970-01-01 00:00:00.000000001,
>>>>>>>>>>>>>>     1970-01-01 00:00:00.000000002
>>>>>>>>>>>>>>   ]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In [17]: struct_arr.to_pandas()
>>>>>>>>>>>>>> Out[17]:
>>>>>>>>>>>>>> 0    {'f0': 0}
>>>>>>>>>>>>>> 1    {'f0': 1}
>>>>>>>>>>>>>> 2    {'f0': 2}
>>>>>>>>>>>>>> dtype: object
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All in all it appears that this part of the project
>>>>> needs
>>>>>>> some
>>>>>>>>> TLC
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:16 PM Wes McKinney <
>>>>>>>>> wesmck...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, the problem is that time zones are really
>>>>> finicky
>>>>>>>>> comparing
>>>>>>>>>>>>>>> Spark (which uses a localtime interpretation of
>>>>>> timestamps
>>>>>>>>> without
>>>>>>>>>>>>>>> time zone) and Arrow (which has naive timestamps -- a
>>>>>>> concept
>>>>>>>>> similar
>>>>>>>>>>>>>>> but different from the SQL concept TIMESTAMP WITHOUT
>>>>> TIME
>>>>>>> ZONE
>>>>>>>>> -- and
>>>>>>>>>>>>>>> tz-aware timestamps). So somewhere there is a time
>>>>> zone
>>>>>>> being
>>>>>>>>>>>> stripped
>>>>>>>>>>>>>>> or applied/localized which may result in the
>>>>> transferred
>>>>>>> data
>>>>>>>>> to/from
>>>>>>>>>>>>>>> Spark being shifted by the time zone offset. I think
>>>>> it's
>>>>>>>>> important
>>>>>>>>>>>>>>> that we determine what the problem is -- if it's a
>>>>>> problem
>>>>>>>>> that has
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be fixed in Arrow (and it's not clear to me that it
>>>>> is)
>>>>>>> it's
>>>>>>>>> worth
>>>>>>>>>>>>>>> spending some time to understand what's going on to
>>>>> avoid
>>>>>>> the
>>>>>>>>>>>>>>> possibility of patch release on account of this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:12 PM Neal Richardson
>>>>>>>>>>>>>>> <neal.p.richard...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If it’s a display problem, should it block the
>>>>> release?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jul 19, 2020, at 3:57 PM, Wes McKinney <
>>>>>>>>> wesmck...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I opened https://issues.apache.org/
>>>>>>> jira/browse/ARROW-9525
>>>>>>>>>>>> about the
>>>>>>>>>>>>>>>>> display problem. My guess is that there are other
>>>>>>> problems
>>>>>>>>>>>> lurking
>>>>>>>>>>>>>>>>> here
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney <
>>>>>>>>>>>> wesmck...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> hi Bryan,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is a display bug
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In [6]: arr = pa.array([0, 1, 2],
>>>>>>> type=pa.timestamp('ns',
>>>>>>>>>>>>>>>>>> 'America/Los_Angeles'))
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In [7]: arr.view('int64')
>>>>>>>>>>>>>>>>>> Out[7]:
>>>>>>>>>>>>>>>>>> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30>
>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>  0,
>>>>>>>>>>>>>>>>>>  1,
>>>>>>>>>>>>>>>>>>  2
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In [8]: arr
>>>>>>>>>>>>>>>>>> Out[8]:
>>>>>>>>>>>>>>>>>> <pyarrow.lib.TimestampArray object at
>>>>>> 0x7fd1b8aae6e0>
>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>  1970-01-01 00:00:00.000000000,
>>>>>>>>>>>>>>>>>>  1970-01-01 00:00:00.000000001,
>>>>>>>>>>>>>>>>>>  1970-01-01 00:00:00.000000002
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In [9]: arr.to_pandas()
>>>>>>>>>>>>>>>>>> Out[9]:
>>>>>>>>>>>>>>>>>> 0             1969-12-31 16:00:00-08:00
>>>>>>>>>>>>>>>>>> 1   1969-12-31 16:00:00.000000001-08:00
>>>>>>>>>>>>>>>>>> 2   1969-12-31 16:00:00.000000002-08:00
>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the repr of TimestampArray doesn't take into
>>>>> account
>>>>>>> the
>>>>>>>>>>>> timezone
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In [10]: arr[0]
>>>>>>>>>>>>>>>>>> Out[10]: <pyarrow.TimestampScalar:
>>>>>>> Timestamp('1969-12-31
>>>>>>>>>>>>>>>>>> 16:00:00-0800', tz='America/Los_Angeles')>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So if it's incorrect, the problem is happening
>>>>>>> somewhere
>>>>>>>>> before
>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>> while the StructArray is being created. If I had
>>>>> to
>>>>>>> guess
>>>>>>>>> it's
>>>>>>>>>>>> caused
>>>>>>>>>>>>>>>>>> by the tzinfo of the datetime.datetime values not
>>>>>>> being
>>>>>>>>> handled
>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>> way that they were before
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney <
>>>>>>>>>>>> wesmck...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Well this is not good and pretty disappointing
>>>>>> given
>>>>>>>>> that we
>>>>>>>>>>>> had nearly a month to sort through the implications of
>>>>> Micah’s
>>>>>>>>> patch. We
>>>>>>>>>>>> should try to resolve this ASAP
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler <
>>>>>>>>>>>> cutl...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +0 (non-binding)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I ran verification script for binaries and then
>>>>>>> source,
>>>>>>>>> as
>>>>>>>>>>>> below, and both
>>>>>>>>>>>>>>>>>>>> look good
>>>>>>>>>>>>>>>>>>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0
>>>>>>>>> TEST_SOURCE=1
>>>>>>>>>>>> TEST_CPP=1
>>>>>>>>>>>>>>>>>>>> TEST_PYTHON=1 TEST_JAVA=1
>>>>> TEST_INTEGRATION_CPP=1
>>>>>>>>>>>> TEST_INTEGRATION_JAVA=1
>>>>>>>>>>>>>>>>>>>> dev/release/verify-release-candidate.sh source
>>>>>>> 1.0.0 1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I tried to patch Spark locally to verify the
>>>>>> recent
>>>>>>>>> change in
>>>>>>>>>>>> nested
>>>>>>>>>>>>>>>>>>>> timestamps and was not able to get things
>>>>> working
>>>>>>> quite
>>>>>>>>>>>> right, but I'm not
>>>>>>>>>>>>>>>>>>>> sure if the problem is in Spark, Arrow or my
>>>>>> patch -
>>>>>>>>> hence my
>>>>>>>>>>>> vote of +0.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is what I'm seeing
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>> (Input as datetime)
>>>>>>>>>>>>>>>>>>>> datetime.datetime(2018, 3, 10, 0, 0)
>>>>>>>>>>>>>>>>>>>> datetime.datetime(2018, 3, 15, 0, 0)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (Struct Array)
>>>>>>>>>>>>>>>>>>>> -- is_valid: all not null
>>>>>>>>>>>>>>>>>>>> -- child 0 type: timestamp[us,
>>>>>>> tz=America/Los_Angeles]
>>>>>>>>>>>>>>>>>>>>  [
>>>>>>>>>>>>>>>>>>>>    2018-03-10 00:00:00.000000,
>>>>>>>>>>>>>>>>>>>>    2018-03-10 00:00:00.000000
>>>>>>>>>>>>>>>>>>>>  ]
>>>>>>>>>>>>>>>>>>>> -- child 1 type: timestamp[us,
>>>>>>> tz=America/Los_Angeles]
>>>>>>>>>>>>>>>>>>>>  [
>>>>>>>>>>>>>>>>>>>>    2018-03-15 00:00:00.000000,
>>>>>>>>>>>>>>>>>>>>    2018-03-15 00:00:00.000000
>>>>>>>>>>>>>>>>>>>>  ]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (Flattened Arrays)
>>>>>>>>>>>>>>>>>>>> types [TimestampType(timestamp[us,
>>>>>>>>> tz=America/Los_Angeles]),
>>>>>>>>>>>>>>>>>>>> TimestampType(timestamp[us,
>>>>>>> tz=America/Los_Angeles])]
>>>>>>>>>>>>>>>>>>>> [<pyarrow.lib.TimestampArray object at
>>>>>>> 0x7ffbbd88f520>
>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>>  2018-03-10 00:00:00.000000,
>>>>>>>>>>>>>>>>>>>>  2018-03-10 00:00:00.000000
>>>>>>>>>>>>>>>>>>>> ], <pyarrow.lib.TimestampArray object at
>>>>>>> 0x7ffba958be50>
>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>>  2018-03-15 00:00:00.000000,
>>>>>>>>>>>>>>>>>>>>  2018-03-15 00:00:00.000000
>>>>>>>>>>>>>>>>>>>> ]]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (Pandas Conversion)
>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>> 0   2018-03-09 16:00:00-08:00
>>>>>>>>>>>>>>>>>>>> 1   2018-03-09 16:00:00-08:00
>>>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles],
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 0   2018-03-14 17:00:00-07:00
>>>>>>>>>>>>>>>>>>>> 1   2018-03-14 17:00:00-07:00
>>>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles]]
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Based on output of existing a correct timestamp
>>>>>>> udf, it
>>>>>>>>> looks
>>>>>>>>>>>> like the
>>>>>>>>>>>>>>>>>>>> pyarrow Struct Array values are wrong and
>>>>> that's
>>>>>>> carried
>>>>>>>>>>>> through the
>>>>>>>>>>>>>>>>>>>> flattened arrays, causing the Pandas values to
>>>>>> have
>>>>>>> a
>>>>>>>>>>>> negative offset.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is output from a working udf with
>>>>> timestamp,
>>>>>>> the
>>>>>>>>> pyarrow
>>>>>>>>>>>> Array
>>>>>>>>>>>>>>>>>>>> displays in UTC time, I believe.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>> (Timestamp Array)
>>>>>>>>>>>>>>>>>>>> type timestamp[us, tz=America/Los_Angeles]
>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>>  [
>>>>>>>>>>>>>>>>>>>>    1969-01-01 09:01:01.000000
>>>>>>>>>>>>>>>>>>>>  ]
>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (Pandas Conversion)
>>>>>>>>>>>>>>>>>>>> 0   1969-01-01 01:01:01-08:00
>>>>>>>>>>>>>>>>>>>> Name: _0, dtype: datetime64[ns,
>>>>>> America/Los_Angeles]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> (Timezone Localized)
>>>>>>>>>>>>>>>>>>>> 0   1969-01-01 01:01:01
>>>>>>>>>>>>>>>>>>>> Name: _0, dtype: datetime64[ns]
>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'll have to dig in further at another time and
>>>>>>> debug
>>>>>>>>> where
>>>>>>>>>>>> the values go
>>>>>>>>>>>>>>>>>>>> wrong.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sat, Jul 18, 2020 at 9:51 PM Micah
>>>>> Kornfield <
>>>>>>>>>>>> emkornfi...@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ran wheel and binary tests on ubuntu 19.04
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal
>>>>> Richardson <
>>>>>>>>>>>>>>>>>>>>> neal.p.richard...@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In addition to the usual verification on
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/pull/7787,
>>>>> I've
>>>>>>>>>>>> successfully staged the
>>>>>>>>>>>>>>>>>>>>> R
>>>>>>>>>>>>>>>>>>>>>> binary artifacts on Windows (
>>>>>>>>>>>>>>>>>>>>>> https://github.com/r-windows/
>>>>>>> rtools-packages/pull/126
>>>>>>>>> ),
>>>>>>>>>>>> macOS (
>>>>>>>>>>>>>>>>>>>>>>
>>>>>> https://github.com/autobrew/homebrew-core/pull/12
>>>>>>> ),
>>>>>>>>> and
>>>>>>>>>>>> Linux (
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/
>>>>>>>>>>>> 172977277)
>>>>>>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>>>>> the release candidate.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> And I agree with the judgment about skipping
>>>>> a
>>>>>> JS
>>>>>>>>> release
>>>>>>>>>>>> artifact. Looks
>>>>>>>>>>>>>>>>>>>>>> like there hasn't been a code change since
>>>>>>> October so
>>>>>>>>>>>> there's no point.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Neal
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes
>>>>> McKinney <
>>>>>>>>>>>> wesmck...@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I see the JS failures as well. I think it
>>>>> is a
>>>>>>>>> failure
>>>>>>>>>>>> localized to
>>>>>>>>>>>>>>>>>>>>>>> newer Node versions since our JavaScript CI
>>>>>> works
>>>>>>>>> fine. I
>>>>>>>>>>>> don't think
>>>>>>>>>>>>>>>>>>>>>>> it should block the release given the lack
>>>>> of
>>>>>>>>> development
>>>>>>>>>>>> activity in
>>>>>>>>>>>>>>>>>>>>>>> JavaScript [1] -- if any JS devs are
>>>>> concerned
>>>>>>> about
>>>>>>>>>>>> publishing an
>>>>>>>>>>>>>>>>>>>>>>> artifact then we can skip pushing it to NPM
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> @Ryan it seems it may be something
>>>>> environment
>>>>>>>>> related on
>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>> machine, I'm on Ubuntu 18.04 and have not
>>>>> seen
>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>  * Python 3.8 wheel's tests are failed.
>>>>> 3.5,
>>>>>> 3.6
>>>>>>>>> and 3.7
>>>>>>>>>>>>>>>>>>>>>>>>    are passed. It seems that -larrow and
>>>>>>>>> -larrow_python
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>    Cython are failed.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I suspect this is related to
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/commit/
>>>>>>>>>>>> 120c21f4bf66d2901b3a353a1f67bac3c3355924#diff-
>>>>>>>>>>>> 0f69784b44040448d17d0e4e8a641fe8
>>>>>>>>>>>>>>>>>>>>>>> ,
>>>>>>>>>>>>>>>>>>>>>>> but I don't think it's a blocking issue
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [1]:
>>>>>>>>> https://github.com/apache/arrow/commits/master/js
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray
>>>>> <
>>>>>>>>>>>> rym...@dremio.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I've tested Java and it looks good. However
>>>>>> the
>>>>>>>>> verify
>>>>>>>>>>>> script keeps
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> bailing with protobuf related errors:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_
>>>>>>>>>>>> proto.pb.cc'
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> friends cant find protobuf definitions. A
>>>>> bit
>>>>>>> odd as
>>>>>>>>>>>> cmake can see
>>>>>>>>>>>>>>>>>>>>>>> protobuf
>>>>>>>>>>>>>>>>>>>>>>>> headers and builds directly off master work
>>>>>> just
>>>>>>>>> fine.
>>>>>>>>>>>> Has anyone
>>>>>>>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>>>>>>>>>>> experienced this? I am on ubutnu 18.04
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine
>>>>>> Pitrou
>>>>>>> <
>>>>>>>>>>>> anto...@python.org>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> +1 (binding).  I tested on Ubuntu 18.04.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> * Wheels verification went fine.
>>>>>>>>>>>>>>>>>>>>>>>>> * Source verification went fine with CUDA
>>>>>>> enabled
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I didn't test the binaries.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Antoine.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a
>>>>>> écrit
>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to propose the second
>>>>> release
>>>>>>>>> candidate
>>>>>>>>>>>> (RC1) of
>>>>>>>>>>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>>>>>>>>>>>>>> Arrow version 1.0.0.
>>>>>>>>>>>>>>>>>>>>>>>>>> This is a major release consisting of 826
>>>>>>>>> resolved JIRA
>>>>>>>>>>>>>>>>>>>>> issues[1].
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The verification of the first release
>>>>>>> candidate
>>>>>>>>> (RC0)
>>>>>>>>>>>> has failed
>>>>>>>>>>>>>>>>>>>>>>> [0], and
>>>>>>>>>>>>>>>>>>>>>>>>>> the packaging scripts were unable to
>>>>> produce
>>>>>>> two
>>>>>>>>>>>> wheels. Compared
>>>>>>>>>>>>>>>>>>>>>>>>>> to RC0 this release candidate includes
>>>>>>> additional
>>>>>>>>>>>> patches for the
>>>>>>>>>>>>>>>>>>>>>>>>>> following bugs: ARROW-9506, ARROW-9504,
>>>>>>>>> ARROW-9497,
>>>>>>>>>>>>>>>>>>>>>>>>>> ARROW-9500, ARROW-9499.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> This release candidate is based on
>>>>> commit:
>>>>>>>>>>>>>>>>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641
>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The source release rc1 is hosted at [3].
>>>>>>>>>>>>>>>>>>>>>>>>>> The binary artifacts are hosted at
>>>>>>> [4][5][6][7].
>>>>>>>>>>>>>>>>>>>>>>>>>> The changelog is located at [8].
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Please download, verify checksums and
>>>>>>> signatures,
>>>>>>>>> run
>>>>>>>>>>>> the unit
>>>>>>>>>>>>>>>>>>>>>> tests,
>>>>>>>>>>>>>>>>>>>>>>>>>> and vote on the release. See [9] for how
>>>>> to
>>>>>>>>> validate a
>>>>>>>>>>>> release
>>>>>>>>>>>>>>>>>>>>>>> candidate.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The vote will be open for at least 72
>>>>> hours.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0
>>>>>>>>>>>>>>>>>>>>>>>>>> [ ] +0
>>>>>>>>>>>>>>>>>>>>>>>>>> [ ] -1 Do not release this as Apache
>>>>> Arrow
>>>>>>> 1.0.0
>>>>>>>>>>>> because...
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [0]:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment-
>>>>>>>>>>>> 659065370
>>>>>>>>>>>>>>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/
>>>>>>> jira/issues/?jql=project%20%
>>>>>>>>>>>> 3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%
>>>>>>> 20Closed%29%20AND%
>>>>>>>>>>>> 20fixVersion%20%3D%201.0.0
>>>>>>>>>>>>>>>>>>>>>>>>>> [2]:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/tree/
>>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641
>>>>>>>>>>>>>>>>>>>>>>>>>> [3]:
>>>>>>>>>>>>>>>>>>>>>>> https://dist.apache.org/repos/
>>>>>>>>>>>> dist/dev/arrow/apache-arrow-1.0.0-rc1
>>>>>>>>>>>>>>>>>>>>>>>>>> [4]: https://bintray.com/apache/
>>>>>>>>>>>> arrow/centos-rc/1.0.0-rc1
>>>>>>>>>>>>>>>>>>>>>>>>>> [5]: https://bintray.com/apache/
>>>>>>>>>>>> arrow/debian-rc/1.0.0-rc1
>>>>>>>>>>>>>>>>>>>>>>>>>> [6]: https://bintray.com/apache/
>>>>>>>>>>>> arrow/python-rc/1.0.0-rc1
>>>>>>>>>>>>>>>>>>>>>>>>>> [7]: https://bintray.com/apache/
>>>>>>>>>>>> arrow/ubuntu-rc/1.0.0-rc1
>>>>>>>>>>>>>>>>>>>>>>>>>> [8]:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/blob/
>>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md
>>>>>>>>>>>>>>>>>>>>>>>>>> [9]:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
>>>>>>> confluence/display/ARROW/How+
>>>>>>>>>>>> to+Verify+Release+Candidates
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>

Reply via email to