If the release condition is for the regression to be fixed in less than 24 hours (less than 12 hours now?), I think we should simply revert the original PR and work on a fix more leisurely for 1.1.0 (or even 1.0.1).
Unless it really causes havoc for Spark users, in which case a circumvention should be found. Regards Antoine. Le 20/07/2020 à 16:46, Krisztián Szűcs a écrit : > If I understand correctly we used to just store the timestamp and the > timezone if an explicit arrow type was passed during the python->arrow > conversion, but the timestamp values were not changed in any way. > Micah's current patch changes the python->arrow conversion behavior to > normalize all values to utc timestamps. > > While it's definitely an improvement over the previously ignored > timezones, I'm not sure that it won't cause unexpected regressions in > the users' codebases. > I'm still trying to better understand the issue and its compatibility > implications, but my intuition tells me that we should apply the > reversion instead and properly handle the datetime value conversions > in an upcoming minor release. > > Either way we should move this conversation to the pull request [1], > because the code snippets pasted here are hardly readable. > > [1]: https://github.com/apache/arrow/pull/7805 > > On Mon, Jul 20, 2020 at 9:40 AM Sutou Kouhei <k...@clear-code.com> wrote: >> >> Done: https://github.com/apache/arrow/pull/7805#issuecomment-660855376 >> >> We can use ...-3.8-... not ...-3.7-... because we don't have >> ...-3.7-... task in >> https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml. >> >> In <cak7z5t8hqcsd3meg42cuzkscpjr3zndsvrjmm8vied0gzto...@mail.gmail.com> >> "Re: [VOTE] Release Apache Arrow 1.0.0 - RC1" on Mon, 20 Jul 2020 00:14:00 >> -0700, >> Micah Kornfield <emkornfi...@gmail.com> wrote: >> >>> FYI, I'm not sure if it is a permissions issue or I've done something wrong >>> but github-actions does not seem to be responding to "@github-actions >>> <https://github.com/github-actions> crossbow submit >>> test-conda-python-3.7-spark-master" when I enter it. If someone could kick >>> off the spark integration test I would be grateful. >>> >>> On Mon, Jul 20, 2020 at 12:09 AM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>> >>>> Thanks Bryan. I cherry-picked your change onto my change [1] which now >>>> honors timezone aware datetime objects on ingestion. I've kicked off the >>>> spark integration tests. >>>> >>>> If this change doesn't work I think the correct course of action is to >>>> provide an environment variable in python to turn back to the old behavior >>>> (ignoring timezones on conversion). I think honoring timezone information >>>> where possible is a strict improvement but I agree we should give users an >>>> option to not break if they wish to upgrade to the latest version. I need >>>> to get some sleep but I will have another PR posted tomorrow evening if the >>>> current one doesn't unblock the release. >>>> >>>> [1] https://github.com/apache/arrow/pull/7805 >>>> >>>> On Sun, Jul 19, 2020 at 10:50 PM Bryan Cutler <cutl...@gmail.com> wrote: >>>> >>>>> I'd rather not see ARROW-9223 reverted, if possible. I will put up my >>>>> hacked patch to Spark for this so we can test against it if needed, and >>>>> could share my branch if anyone else wants to test it locally. >>>>> >>>>> On Sun, Jul 19, 2020 at 7:35 PM Micah Kornfield <emkornfi...@gmail.com> >>>>> wrote: >>>>> >>>>>> I'll spend some time tonight on it and if I can't get round trip working >>>>>> I'll handle reverting >>>>>> >>>>>> On Sunday, July 19, 2020, Wes McKinney <wesmck...@gmail.com> wrote: >>>>>> >>>>>>> On Sun, Jul 19, 2020 at 7:33 PM Neal Richardson >>>>>>> <neal.p.richard...@gmail.com> wrote: >>>>>>>> >>>>>>>> It sounds like you may have identified a pyarrow bug, which sounds >>>>> not >>>>>>>> good, though I don't know enough about the broader context to know >>>>>>> whether >>>>>>>> this is (1) worse than 0.17 or (2) release blocking. I defer to >>>>> y'all >>>>>> who >>>>>>>> know better. >>>>>>>> >>>>>>>> If there are quirks in how Spark handles timezone-naive timestamps, >>>>>>>> shouldn't the fix/workaround go in pyspark, not pyarrow? For what >>>>> it's >>>>>>>> worth, I dealt with similar Spark timezone issues in R recently: >>>>>>>> https://github.com/sparklyr/sparklyr/issues/2439 I handled with it >>>>> (in >>>>>>>> sparklyr, not the arrow R package) by always setting a timezone when >>>>>>>> sending data to Spark. Not ideal but it made the numbers "right". >>>>>>> >>>>>>> Since people are running this code in production we need to be careful >>>>>>> about disrupting them. Unfortunately I'm at the limit of how much time >>>>>>> I can spend on this, but releasing with ARROW-9223 as is (without >>>>>>> being partially or fully reverted) makes me deeply uncomfortable. So I >>>>>>> hope the matter can be resolved. >>>>>>> >>>>>>>> Neal >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Jul 19, 2020 at 5:13 PM Wes McKinney <wesmck...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Honestly I think reverting is the best option. This change >>>>> evidently >>>>>>>>> needs more time to "season" and perhaps this is motivation to >>>>> enhance >>>>>>>>> test coverage in a number of places. >>>>>>>>> >>>>>>>>> On Sun, Jul 19, 2020 at 7:11 PM Wes McKinney <wesmck...@gmail.com >>>>>> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I am OK with any solution that doesn't delay the production of >>>>> the >>>>>>>>>> next RC by more than 24 hours >>>>>>>>>> >>>>>>>>>> On Sun, Jul 19, 2020 at 7:08 PM Micah Kornfield < >>>>>>> emkornfi...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> If I read the example right it looks like constructing from >>>>>> python >>>>>>>>> types >>>>>>>>>>> isn't keeping timezones into in place? I can try make a patch >>>>>> that >>>>>>>>> fixes >>>>>>>>>>> that tonight or would the preference be to revert my patch >>>>> (note >>>>>> I >>>>>>>>> think >>>>>>>>>>> another bug with a prior bug was fixed in my PR as well) >>>>>>>>>>> >>>>>>>>>>> -Micah >>>>>>>>>>> >>>>>>>>>>> On Sunday, July 19, 2020, Wes McKinney <wesmck...@gmail.com> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I think I see the problem now: >>>>>>>>>>>> >>>>>>>>>>>> In [40]: parr >>>>>>>>>>>> Out[40]: >>>>>>>>>>>> 0 {'f0': 1969-12-31 16:00:00-08:00} >>>>>>>>>>>> 1 {'f0': 1969-12-31 16:00:00.000001-08:00} >>>>>>>>>>>> 2 {'f0': 1969-12-31 16:00:00.000002-08:00} >>>>>>>>>>>> dtype: object >>>>>>>>>>>> >>>>>>>>>>>> In [41]: parr[0]['f0'] >>>>>>>>>>>> Out[41]: datetime.datetime(1969, 12, 31, 16, 0, >>>>>> tzinfo=<DstTzInfo >>>>>>>>>>>> 'America/Los_Angeles' PST-1 day, 16:00:00 STD>) >>>>>>>>>>>> >>>>>>>>>>>> In [42]: pa.array(parr) >>>>>>>>>>>> Out[42]: >>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f0893706a60> >>>>>>>>>>>> -- is_valid: all not null >>>>>>>>>>>> -- child 0 type: timestamp[us] >>>>>>>>>>>> [ >>>>>>>>>>>> 1969-12-31 16:00:00.000000, >>>>>>>>>>>> 1969-12-31 16:00:00.000001, >>>>>>>>>>>> 1969-12-31 16:00:00.000002 >>>>>>>>>>>> ] >>>>>>>>>>>> >>>>>>>>>>>> In [43]: pa.array(parr).field(0).type >>>>>>>>>>>> Out[43]: TimestampType(timestamp[us]) >>>>>>>>>>>> >>>>>>>>>>>> On 0.17.1 >>>>>>>>>>>> >>>>>>>>>>>> In [8]: arr = pa.array([0, 1, 2], type=pa.timestamp('us', >>>>>>>>>>>> 'America/Los_Angeles')) >>>>>>>>>>>> >>>>>>>>>>>> In [9]: arr >>>>>>>>>>>> Out[9]: >>>>>>>>>>>> <pyarrow.lib.TimestampArray object at 0x7f9dede69d00> >>>>>>>>>>>> [ >>>>>>>>>>>> 1970-01-01 00:00:00.000000, >>>>>>>>>>>> 1970-01-01 00:00:00.000001, >>>>>>>>>>>> 1970-01-01 00:00:00.000002 >>>>>>>>>>>> ] >>>>>>>>>>>> >>>>>>>>>>>> In [10]: struct_arr = pa.StructArray.from_arrays([arr], >>>>>>> names=['f0']) >>>>>>>>>>>> >>>>>>>>>>>> In [11]: struct_arr >>>>>>>>>>>> Out[11]: >>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f9ded0016e0> >>>>>>>>>>>> -- is_valid: all not null >>>>>>>>>>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles] >>>>>>>>>>>> [ >>>>>>>>>>>> 1970-01-01 00:00:00.000000, >>>>>>>>>>>> 1970-01-01 00:00:00.000001, >>>>>>>>>>>> 1970-01-01 00:00:00.000002 >>>>>>>>>>>> ] >>>>>>>>>>>> >>>>>>>>>>>> In [12]: struct_arr.to_pandas() >>>>>>>>>>>> Out[12]: >>>>>>>>>>>> 0 {'f0': 1970-01-01 00:00:00} >>>>>>>>>>>> 1 {'f0': 1970-01-01 00:00:00.000001} >>>>>>>>>>>> 2 {'f0': 1970-01-01 00:00:00.000002} >>>>>>>>>>>> dtype: object >>>>>>>>>>>> >>>>>>>>>>>> In [13]: pa.array(struct_arr.to_pandas()) >>>>>>>>>>>> Out[13]: >>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f9ded003210> >>>>>>>>>>>> -- is_valid: all not null >>>>>>>>>>>> -- child 0 type: timestamp[us] >>>>>>>>>>>> [ >>>>>>>>>>>> 1970-01-01 00:00:00.000000, >>>>>>>>>>>> 1970-01-01 00:00:00.000001, >>>>>>>>>>>> 1970-01-01 00:00:00.000002 >>>>>>>>>>>> ] >>>>>>>>>>>> >>>>>>>>>>>> In [14]: pa.array(struct_arr.to_pandas()).type >>>>>>>>>>>> Out[14]: StructType(struct<f0: timestamp[us]>) >>>>>>>>>>>> >>>>>>>>>>>> So while the time zone is getting stripped in both cases, >>>>> the >>>>>>> failure >>>>>>>>>>>> to round trip is a problem. If we are going to attach the >>>>> time >>>>>>> zone >>>>>>>>> in >>>>>>>>>>>> to_pandas() then we need to respect it when going the other >>>>>> way. >>>>>>>>>>>> >>>>>>>>>>>> This looks like a regression to me and so I'm inclined to >>>>>> revise >>>>>>> my >>>>>>>>>>>> vote on the release to -0/-1 >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:46 PM Wes McKinney < >>>>>>> wesmck...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Ah I forgot that this is a "feature" of nanosecond >>>>> timestamps >>>>>>>>>>>>> >>>>>>>>>>>>> In [21]: arr = pa.array([0, 1, 2], type=pa.timestamp('us', >>>>>>>>>>>>> 'America/Los_Angeles')) >>>>>>>>>>>>> >>>>>>>>>>>>> In [22]: struct_arr = pa.StructArray.from_arrays([arr], >>>>>>>>> names=['f0']) >>>>>>>>>>>>> >>>>>>>>>>>>> In [23]: struct_arr.to_pandas() >>>>>>>>>>>>> Out[23]: >>>>>>>>>>>>> 0 {'f0': 1969-12-31 16:00:00-08:00} >>>>>>>>>>>>> 1 {'f0': 1969-12-31 16:00:00.000001-08:00} >>>>>>>>>>>>> 2 {'f0': 1969-12-31 16:00:00.000002-08:00} >>>>>>>>>>>>> dtype: object >>>>>>>>>>>>> >>>>>>>>>>>>> So this is working as intended, such as it is >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:40 PM Wes McKinney < >>>>>>> wesmck...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> There seems to be other broken StructArray stuff >>>>>>>>>>>>>> >>>>>>>>>>>>>> In [14]: arr = pa.array([0, 1, 2], >>>>> type=pa.timestamp('ns', >>>>>>>>>>>>>> 'America/Los_Angeles')) >>>>>>>>>>>>>> >>>>>>>>>>>>>> In [15]: struct_arr = pa.StructArray.from_arrays([arr], >>>>>>>>> names=['f0']) >>>>>>>>>>>>>> >>>>>>>>>>>>>> In [16]: struct_arr >>>>>>>>>>>>>> Out[16]: >>>>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f089370f590> >>>>>>>>>>>>>> -- is_valid: all not null >>>>>>>>>>>>>> -- child 0 type: timestamp[ns, tz=America/Los_Angeles] >>>>>>>>>>>>>> [ >>>>>>>>>>>>>> 1970-01-01 00:00:00.000000000, >>>>>>>>>>>>>> 1970-01-01 00:00:00.000000001, >>>>>>>>>>>>>> 1970-01-01 00:00:00.000000002 >>>>>>>>>>>>>> ] >>>>>>>>>>>>>> >>>>>>>>>>>>>> In [17]: struct_arr.to_pandas() >>>>>>>>>>>>>> Out[17]: >>>>>>>>>>>>>> 0 {'f0': 0} >>>>>>>>>>>>>> 1 {'f0': 1} >>>>>>>>>>>>>> 2 {'f0': 2} >>>>>>>>>>>>>> dtype: object >>>>>>>>>>>>>> >>>>>>>>>>>>>> All in all it appears that this part of the project >>>>> needs >>>>>>> some >>>>>>>>> TLC >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:16 PM Wes McKinney < >>>>>>>>> wesmck...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Well, the problem is that time zones are really >>>>> finicky >>>>>>>>> comparing >>>>>>>>>>>>>>> Spark (which uses a localtime interpretation of >>>>>> timestamps >>>>>>>>> without >>>>>>>>>>>>>>> time zone) and Arrow (which has naive timestamps -- a >>>>>>> concept >>>>>>>>> similar >>>>>>>>>>>>>>> but different from the SQL concept TIMESTAMP WITHOUT >>>>> TIME >>>>>>> ZONE >>>>>>>>> -- and >>>>>>>>>>>>>>> tz-aware timestamps). So somewhere there is a time >>>>> zone >>>>>>> being >>>>>>>>>>>> stripped >>>>>>>>>>>>>>> or applied/localized which may result in the >>>>> transferred >>>>>>> data >>>>>>>>> to/from >>>>>>>>>>>>>>> Spark being shifted by the time zone offset. I think >>>>> it's >>>>>>>>> important >>>>>>>>>>>>>>> that we determine what the problem is -- if it's a >>>>>> problem >>>>>>>>> that has >>>>>>>>>>>> to >>>>>>>>>>>>>>> be fixed in Arrow (and it's not clear to me that it >>>>> is) >>>>>>> it's >>>>>>>>> worth >>>>>>>>>>>>>>> spending some time to understand what's going on to >>>>> avoid >>>>>>> the >>>>>>>>>>>>>>> possibility of patch release on account of this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:12 PM Neal Richardson >>>>>>>>>>>>>>> <neal.p.richard...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If it’s a display problem, should it block the >>>>> release? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 19, 2020, at 3:57 PM, Wes McKinney < >>>>>>>>> wesmck...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I opened https://issues.apache.org/ >>>>>>> jira/browse/ARROW-9525 >>>>>>>>>>>> about the >>>>>>>>>>>>>>>>> display problem. My guess is that there are other >>>>>>> problems >>>>>>>>>>>> lurking >>>>>>>>>>>>>>>>> here >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney < >>>>>>>>>>>> wesmck...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> hi Bryan, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This is a display bug >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In [6]: arr = pa.array([0, 1, 2], >>>>>>> type=pa.timestamp('ns', >>>>>>>>>>>>>>>>>> 'America/Los_Angeles')) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In [7]: arr.view('int64') >>>>>>>>>>>>>>>>>> Out[7]: >>>>>>>>>>>>>>>>>> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30> >>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>> 0, >>>>>>>>>>>>>>>>>> 1, >>>>>>>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In [8]: arr >>>>>>>>>>>>>>>>>> Out[8]: >>>>>>>>>>>>>>>>>> <pyarrow.lib.TimestampArray object at >>>>>> 0x7fd1b8aae6e0> >>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>> 1970-01-01 00:00:00.000000000, >>>>>>>>>>>>>>>>>> 1970-01-01 00:00:00.000000001, >>>>>>>>>>>>>>>>>> 1970-01-01 00:00:00.000000002 >>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In [9]: arr.to_pandas() >>>>>>>>>>>>>>>>>> Out[9]: >>>>>>>>>>>>>>>>>> 0 1969-12-31 16:00:00-08:00 >>>>>>>>>>>>>>>>>> 1 1969-12-31 16:00:00.000000001-08:00 >>>>>>>>>>>>>>>>>> 2 1969-12-31 16:00:00.000000002-08:00 >>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> the repr of TimestampArray doesn't take into >>>>> account >>>>>>> the >>>>>>>>>>>> timezone >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In [10]: arr[0] >>>>>>>>>>>>>>>>>> Out[10]: <pyarrow.TimestampScalar: >>>>>>> Timestamp('1969-12-31 >>>>>>>>>>>>>>>>>> 16:00:00-0800', tz='America/Los_Angeles')> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So if it's incorrect, the problem is happening >>>>>>> somewhere >>>>>>>>> before >>>>>>>>>>>> or >>>>>>>>>>>>>>>>>> while the StructArray is being created. If I had >>>>> to >>>>>>> guess >>>>>>>>> it's >>>>>>>>>>>> caused >>>>>>>>>>>>>>>>>> by the tzinfo of the datetime.datetime values not >>>>>>> being >>>>>>>>> handled >>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>> way that they were before >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney < >>>>>>>>>>>> wesmck...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Well this is not good and pretty disappointing >>>>>> given >>>>>>>>> that we >>>>>>>>>>>> had nearly a month to sort through the implications of >>>>> Micah’s >>>>>>>>> patch. We >>>>>>>>>>>> should try to resolve this ASAP >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler < >>>>>>>>>>>> cutl...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> +0 (non-binding) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I ran verification script for binaries and then >>>>>>> source, >>>>>>>>> as >>>>>>>>>>>> below, and both >>>>>>>>>>>>>>>>>>>> look good >>>>>>>>>>>>>>>>>>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 >>>>>>>>> TEST_SOURCE=1 >>>>>>>>>>>> TEST_CPP=1 >>>>>>>>>>>>>>>>>>>> TEST_PYTHON=1 TEST_JAVA=1 >>>>> TEST_INTEGRATION_CPP=1 >>>>>>>>>>>> TEST_INTEGRATION_JAVA=1 >>>>>>>>>>>>>>>>>>>> dev/release/verify-release-candidate.sh source >>>>>>> 1.0.0 1 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I tried to patch Spark locally to verify the >>>>>> recent >>>>>>>>> change in >>>>>>>>>>>> nested >>>>>>>>>>>>>>>>>>>> timestamps and was not able to get things >>>>> working >>>>>>> quite >>>>>>>>>>>> right, but I'm not >>>>>>>>>>>>>>>>>>>> sure if the problem is in Spark, Arrow or my >>>>>> patch - >>>>>>>>> hence my >>>>>>>>>>>> vote of +0. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Here is what I'm seeing >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> (Input as datetime) >>>>>>>>>>>>>>>>>>>> datetime.datetime(2018, 3, 10, 0, 0) >>>>>>>>>>>>>>>>>>>> datetime.datetime(2018, 3, 15, 0, 0) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (Struct Array) >>>>>>>>>>>>>>>>>>>> -- is_valid: all not null >>>>>>>>>>>>>>>>>>>> -- child 0 type: timestamp[us, >>>>>>> tz=America/Los_Angeles] >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000, >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000 >>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>> -- child 1 type: timestamp[us, >>>>>>> tz=America/Los_Angeles] >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000, >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000 >>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (Flattened Arrays) >>>>>>>>>>>>>>>>>>>> types [TimestampType(timestamp[us, >>>>>>>>> tz=America/Los_Angeles]), >>>>>>>>>>>>>>>>>>>> TimestampType(timestamp[us, >>>>>>> tz=America/Los_Angeles])] >>>>>>>>>>>>>>>>>>>> [<pyarrow.lib.TimestampArray object at >>>>>>> 0x7ffbbd88f520> >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000, >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000 >>>>>>>>>>>>>>>>>>>> ], <pyarrow.lib.TimestampArray object at >>>>>>> 0x7ffba958be50> >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000, >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000 >>>>>>>>>>>>>>>>>>>> ]] >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (Pandas Conversion) >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> 0 2018-03-09 16:00:00-08:00 >>>>>>>>>>>>>>>>>>>> 1 2018-03-09 16:00:00-08:00 >>>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles], >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 0 2018-03-14 17:00:00-07:00 >>>>>>>>>>>>>>>>>>>> 1 2018-03-14 17:00:00-07:00 >>>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles]] >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Based on output of existing a correct timestamp >>>>>>> udf, it >>>>>>>>> looks >>>>>>>>>>>> like the >>>>>>>>>>>>>>>>>>>> pyarrow Struct Array values are wrong and >>>>> that's >>>>>>> carried >>>>>>>>>>>> through the >>>>>>>>>>>>>>>>>>>> flattened arrays, causing the Pandas values to >>>>>> have >>>>>>> a >>>>>>>>>>>> negative offset. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Here is output from a working udf with >>>>> timestamp, >>>>>>> the >>>>>>>>> pyarrow >>>>>>>>>>>> Array >>>>>>>>>>>>>>>>>>>> displays in UTC time, I believe. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> (Timestamp Array) >>>>>>>>>>>>>>>>>>>> type timestamp[us, tz=America/Los_Angeles] >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> [ >>>>>>>>>>>>>>>>>>>> 1969-01-01 09:01:01.000000 >>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (Pandas Conversion) >>>>>>>>>>>>>>>>>>>> 0 1969-01-01 01:01:01-08:00 >>>>>>>>>>>>>>>>>>>> Name: _0, dtype: datetime64[ns, >>>>>> America/Los_Angeles] >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (Timezone Localized) >>>>>>>>>>>>>>>>>>>> 0 1969-01-01 01:01:01 >>>>>>>>>>>>>>>>>>>> Name: _0, dtype: datetime64[ns] >>>>>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'll have to dig in further at another time and >>>>>>> debug >>>>>>>>> where >>>>>>>>>>>> the values go >>>>>>>>>>>>>>>>>>>> wrong. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Jul 18, 2020 at 9:51 PM Micah >>>>> Kornfield < >>>>>>>>>>>> emkornfi...@gmail.com> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> +1 (binding) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Ran wheel and binary tests on ubuntu 19.04 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal >>>>> Richardson < >>>>>>>>>>>>>>>>>>>>> neal.p.richard...@gmail.com> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> +1 (binding) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> In addition to the usual verification on >>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/pull/7787, >>>>> I've >>>>>>>>>>>> successfully staged the >>>>>>>>>>>>>>>>>>>>> R >>>>>>>>>>>>>>>>>>>>>> binary artifacts on Windows ( >>>>>>>>>>>>>>>>>>>>>> https://github.com/r-windows/ >>>>>>> rtools-packages/pull/126 >>>>>>>>> ), >>>>>>>>>>>> macOS ( >>>>>>>>>>>>>>>>>>>>>> >>>>>> https://github.com/autobrew/homebrew-core/pull/12 >>>>>>> ), >>>>>>>>> and >>>>>>>>>>>> Linux ( >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/ >>>>>>>>>>>> 172977277) >>>>>>>>>>>>>>>>>>>>> using >>>>>>>>>>>>>>>>>>>>>> the release candidate. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> And I agree with the judgment about skipping >>>>> a >>>>>> JS >>>>>>>>> release >>>>>>>>>>>> artifact. Looks >>>>>>>>>>>>>>>>>>>>>> like there hasn't been a code change since >>>>>>> October so >>>>>>>>>>>> there's no point. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Neal >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes >>>>> McKinney < >>>>>>>>>>>> wesmck...@gmail.com> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I see the JS failures as well. I think it >>>>> is a >>>>>>>>> failure >>>>>>>>>>>> localized to >>>>>>>>>>>>>>>>>>>>>>> newer Node versions since our JavaScript CI >>>>>> works >>>>>>>>> fine. I >>>>>>>>>>>> don't think >>>>>>>>>>>>>>>>>>>>>>> it should block the release given the lack >>>>> of >>>>>>>>> development >>>>>>>>>>>> activity in >>>>>>>>>>>>>>>>>>>>>>> JavaScript [1] -- if any JS devs are >>>>> concerned >>>>>>> about >>>>>>>>>>>> publishing an >>>>>>>>>>>>>>>>>>>>>>> artifact then we can skip pushing it to NPM >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> @Ryan it seems it may be something >>>>> environment >>>>>>>>> related on >>>>>>>>>>>> your >>>>>>>>>>>>>>>>>>>>>>> machine, I'm on Ubuntu 18.04 and have not >>>>> seen >>>>>>> this. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> * Python 3.8 wheel's tests are failed. >>>>> 3.5, >>>>>> 3.6 >>>>>>>>> and 3.7 >>>>>>>>>>>>>>>>>>>>>>>> are passed. It seems that -larrow and >>>>>>>>> -larrow_python >>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>> Cython are failed. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I suspect this is related to >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/commit/ >>>>>>>>>>>> 120c21f4bf66d2901b3a353a1f67bac3c3355924#diff- >>>>>>>>>>>> 0f69784b44040448d17d0e4e8a641fe8 >>>>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>>>> but I don't think it's a blocking issue >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> [1]: >>>>>>>>> https://github.com/apache/arrow/commits/master/js >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray >>>>> < >>>>>>>>>>>> rym...@dremio.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I've tested Java and it looks good. However >>>>>> the >>>>>>>>> verify >>>>>>>>>>>> script keeps >>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>> bailing with protobuf related errors: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_ >>>>>>>>>>>> proto.pb.cc' >>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> friends cant find protobuf definitions. A >>>>> bit >>>>>>> odd as >>>>>>>>>>>> cmake can see >>>>>>>>>>>>>>>>>>>>>>> protobuf >>>>>>>>>>>>>>>>>>>>>>>> headers and builds directly off master work >>>>>> just >>>>>>>>> fine. >>>>>>>>>>>> Has anyone >>>>>>>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>>>>>>>>> experienced this? I am on ubutnu 18.04 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine >>>>>> Pitrou >>>>>>> < >>>>>>>>>>>> anto...@python.org> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> +1 (binding). I tested on Ubuntu 18.04. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> * Wheels verification went fine. >>>>>>>>>>>>>>>>>>>>>>>>> * Source verification went fine with CUDA >>>>>>> enabled >>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I didn't test the binaries. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Antoine. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a >>>>>> écrit >>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I would like to propose the second >>>>> release >>>>>>>>> candidate >>>>>>>>>>>> (RC1) of >>>>>>>>>>>>>>>>>>>>>> Apache >>>>>>>>>>>>>>>>>>>>>>>>>> Arrow version 1.0.0. >>>>>>>>>>>>>>>>>>>>>>>>>> This is a major release consisting of 826 >>>>>>>>> resolved JIRA >>>>>>>>>>>>>>>>>>>>> issues[1]. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> The verification of the first release >>>>>>> candidate >>>>>>>>> (RC0) >>>>>>>>>>>> has failed >>>>>>>>>>>>>>>>>>>>>>> [0], and >>>>>>>>>>>>>>>>>>>>>>>>>> the packaging scripts were unable to >>>>> produce >>>>>>> two >>>>>>>>>>>> wheels. Compared >>>>>>>>>>>>>>>>>>>>>>>>>> to RC0 this release candidate includes >>>>>>> additional >>>>>>>>>>>> patches for the >>>>>>>>>>>>>>>>>>>>>>>>>> following bugs: ARROW-9506, ARROW-9504, >>>>>>>>> ARROW-9497, >>>>>>>>>>>>>>>>>>>>>>>>>> ARROW-9500, ARROW-9499. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> This release candidate is based on >>>>> commit: >>>>>>>>>>>>>>>>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 >>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> The source release rc1 is hosted at [3]. >>>>>>>>>>>>>>>>>>>>>>>>>> The binary artifacts are hosted at >>>>>>> [4][5][6][7]. >>>>>>>>>>>>>>>>>>>>>>>>>> The changelog is located at [8]. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Please download, verify checksums and >>>>>>> signatures, >>>>>>>>> run >>>>>>>>>>>> the unit >>>>>>>>>>>>>>>>>>>>>> tests, >>>>>>>>>>>>>>>>>>>>>>>>>> and vote on the release. See [9] for how >>>>> to >>>>>>>>> validate a >>>>>>>>>>>> release >>>>>>>>>>>>>>>>>>>>>>> candidate. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> The vote will be open for at least 72 >>>>> hours. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0 >>>>>>>>>>>>>>>>>>>>>>>>>> [ ] +0 >>>>>>>>>>>>>>>>>>>>>>>>>> [ ] -1 Do not release this as Apache >>>>> Arrow >>>>>>> 1.0.0 >>>>>>>>>>>> because... >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [0]: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment- >>>>>>>>>>>> 659065370 >>>>>>>>>>>>>>>>>>>>>>>>>> [1]: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/ >>>>>>> jira/issues/?jql=project%20% >>>>>>>>>>>> 3D%20ARROW%20AND%20status%20in%20%28Resolved%2C% >>>>>>> 20Closed%29%20AND% >>>>>>>>>>>> 20fixVersion%20%3D%201.0.0 >>>>>>>>>>>>>>>>>>>>>>>>>> [2]: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/tree/ >>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 >>>>>>>>>>>>>>>>>>>>>>>>>> [3]: >>>>>>>>>>>>>>>>>>>>>>> https://dist.apache.org/repos/ >>>>>>>>>>>> dist/dev/arrow/apache-arrow-1.0.0-rc1 >>>>>>>>>>>>>>>>>>>>>>>>>> [4]: https://bintray.com/apache/ >>>>>>>>>>>> arrow/centos-rc/1.0.0-rc1 >>>>>>>>>>>>>>>>>>>>>>>>>> [5]: https://bintray.com/apache/ >>>>>>>>>>>> arrow/debian-rc/1.0.0-rc1 >>>>>>>>>>>>>>>>>>>>>>>>>> [6]: https://bintray.com/apache/ >>>>>>>>>>>> arrow/python-rc/1.0.0-rc1 >>>>>>>>>>>>>>>>>>>>>>>>>> [7]: https://bintray.com/apache/ >>>>>>>>>>>> arrow/ubuntu-rc/1.0.0-rc1 >>>>>>>>>>>>>>>>>>>>>>>>>> [8]: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/blob/ >>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md >>>>>>>>>>>>>>>>>>>>>>>>>> [9]: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/ >>>>>>> confluence/display/ARROW/How+ >>>>>>>>>>>> to+Verify+Release+Candidates >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>>