On Tue, Jul 21, 2020 at 1:01 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > Just to summarize my understanding: > 1. We will live with the rollback of the CL. > 2. A new RC is being cut with this rollback. > > I think this is OK. I'm going to not rush the proper fix or flags in the > current PR which tries to fix it. +1 > > But I would like to make another PR which disable > `to_pandas(timestamp_as_object=True)`. Before I put in the effort to do > this, I'd like to gauge if people feel it is worth cutting a new RC over. Personally I don't have a strong opinion on this, I'm fine with both cutting a new RC (tomorrow) or leaving it as is.
Perhaps we'll have to cut another RC because of a weird conda-win packaging failure that occurred after a conda-forge update and it's still unclear whether we'll be able to solve it directly in the conda-forge feedstock (if it will be present there at all). Waiting for @Uwe Korn's response on it. > > On Mon, Jul 20, 2020 at 2:56 PM Krisztián Szűcs <szucs.kriszt...@gmail.com> > wrote: >> >> On Mon, Jul 20, 2020 at 11:00 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >> >> >> If yes then `timestamp_as_object` keyword arguments seems like a new >> >> feature, so strictly speaking it's not a regression compared to the >> >> previous release. >> > >> > Yes, I don't think we should be releasing new features that are know to be >> > half baked and based on discussions elsewhere will likely need a backward >> > compatibility mode just in case users come to rely on the flawed >> > implementation. >> >> Ehh, I just read your response and I already cut RC2 including ARROW-5359 >> [1]. >> I'm afraid I won't be able to cut another RC today, so I'll finish this one. >> >> [1]: >> https://github.com/apache/arrow/commit/11ee468dcd32196d49332b3b7001ca33d959eafd >> >> > >> > I think we should remove or cause the flag to error for the 1.0 release at >> > least, so we aren't digging ourselves further into a hole. >> > >> > On Mon, Jul 20, 2020 at 12:41 PM Krisztián Szűcs >> > <szucs.kriszt...@gmail.com> wrote: >> >> >> >> The conversations in the pull requests are pretty broad so I'm just >> >> guessing, but do you refer that `to_pandas(timestamp_as_object=True)` >> >> drops the timezone information? >> >> If yes then `timestamp_as_object` keyword arguments seems like a new >> >> feature, so strictly speaking it's not a regression compared to the >> >> previous release. >> >> >> >> I agree that we shouldn't leave known bugs (I don't like it either), >> >> but I'm afraid proper timezone support will require more effort. Like >> >> currently we also strip timezone information when converting from >> >> datetime.time(..., tzinfo) objects, or the missing timezone support in >> >> the temporal casts. >> >> >> >> On Mon, Jul 20, 2020 at 7:36 PM Micah Kornfield <emkornfi...@gmail.com> >> >> wrote: >> >> > >> >> > I just wanted to clarify. doing a full rollback of the patch means >> >> > that https://issues.apache.org/jira/browse/ARROW-5359 would get >> >> > released out of the gate with a bug in it. >> >> > >> >> > On Mon, Jul 20, 2020 at 7:48 AM Antoine Pitrou <anto...@python.org> >> >> > wrote: >> >> >> >> >> >> >> >> >> If the release condition is for the regression to be fixed in less than >> >> >> 24 hours (less than 12 hours now?), I think we should simply revert the >> >> >> original PR and work on a fix more leisurely for 1.1.0 (or even 1.0.1). >> >> >> >> >> >> Unless it really causes havoc for Spark users, in which case a >> >> >> circumvention should be found. >> >> >> >> >> >> Regards >> >> >> >> >> >> Antoine. >> >> >> >> >> >> >> >> >> Le 20/07/2020 à 16:46, Krisztián Szűcs a écrit : >> >> >> > If I understand correctly we used to just store the timestamp and the >> >> >> > timezone if an explicit arrow type was passed during the >> >> >> > python->arrow >> >> >> > conversion, but the timestamp values were not changed in any way. >> >> >> > Micah's current patch changes the python->arrow conversion behavior >> >> >> > to >> >> >> > normalize all values to utc timestamps. >> >> >> > >> >> >> > While it's definitely an improvement over the previously ignored >> >> >> > timezones, I'm not sure that it won't cause unexpected regressions in >> >> >> > the users' codebases. >> >> >> > I'm still trying to better understand the issue and its compatibility >> >> >> > implications, but my intuition tells me that we should apply the >> >> >> > reversion instead and properly handle the datetime value conversions >> >> >> > in an upcoming minor release. >> >> >> > >> >> >> > Either way we should move this conversation to the pull request [1], >> >> >> > because the code snippets pasted here are hardly readable. >> >> >> > >> >> >> > [1]: https://github.com/apache/arrow/pull/7805 >> >> >> > >> >> >> > On Mon, Jul 20, 2020 at 9:40 AM Sutou Kouhei <k...@clear-code.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> Done: >> >> >> >> https://github.com/apache/arrow/pull/7805#issuecomment-660855376 >> >> >> >> >> >> >> >> We can use ...-3.8-... not ...-3.7-... because we don't have >> >> >> >> ...-3.7-... task in >> >> >> >> https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml. >> >> >> >> >> >> >> >> In >> >> >> >> <cak7z5t8hqcsd3meg42cuzkscpjr3zndsvrjmm8vied0gzto...@mail.gmail.com> >> >> >> >> "Re: [VOTE] Release Apache Arrow 1.0.0 - RC1" on Mon, 20 Jul 2020 >> >> >> >> 00:14:00 -0700, >> >> >> >> Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> >> >> >> >> >> >>> FYI, I'm not sure if it is a permissions issue or I've done >> >> >> >>> something wrong >> >> >> >>> but github-actions does not seem to be responding to >> >> >> >>> "@github-actions >> >> >> >>> <https://github.com/github-actions> crossbow submit >> >> >> >>> test-conda-python-3.7-spark-master" when I enter it. If someone >> >> >> >>> could kick >> >> >> >>> off the spark integration test I would be grateful. >> >> >> >>> >> >> >> >>> On Mon, Jul 20, 2020 at 12:09 AM Micah Kornfield >> >> >> >>> <emkornfi...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Thanks Bryan. I cherry-picked your change onto my change [1] >> >> >> >>>> which now >> >> >> >>>> honors timezone aware datetime objects on ingestion. I've kicked >> >> >> >>>> off the >> >> >> >>>> spark integration tests. >> >> >> >>>> >> >> >> >>>> If this change doesn't work I think the correct course of action >> >> >> >>>> is to >> >> >> >>>> provide an environment variable in python to turn back to the old >> >> >> >>>> behavior >> >> >> >>>> (ignoring timezones on conversion). I think honoring timezone >> >> >> >>>> information >> >> >> >>>> where possible is a strict improvement but I agree we should give >> >> >> >>>> users an >> >> >> >>>> option to not break if they wish to upgrade to the latest >> >> >> >>>> version. I need >> >> >> >>>> to get some sleep but I will have another PR posted tomorrow >> >> >> >>>> evening if the >> >> >> >>>> current one doesn't unblock the release. >> >> >> >>>> >> >> >> >>>> [1] https://github.com/apache/arrow/pull/7805 >> >> >> >>>> >> >> >> >>>> On Sun, Jul 19, 2020 at 10:50 PM Bryan Cutler <cutl...@gmail.com> >> >> >> >>>> wrote: >> >> >> >>>> >> >> >> >>>>> I'd rather not see ARROW-9223 reverted, if possible. I will put >> >> >> >>>>> up my >> >> >> >>>>> hacked patch to Spark for this so we can test against it if >> >> >> >>>>> needed, and >> >> >> >>>>> could share my branch if anyone else wants to test it locally. >> >> >> >>>>> >> >> >> >>>>> On Sun, Jul 19, 2020 at 7:35 PM Micah Kornfield >> >> >> >>>>> <emkornfi...@gmail.com> >> >> >> >>>>> wrote: >> >> >> >>>>> >> >> >> >>>>>> I'll spend some time tonight on it and if I can't get round >> >> >> >>>>>> trip working >> >> >> >>>>>> I'll handle reverting >> >> >> >>>>>> >> >> >> >>>>>> On Sunday, July 19, 2020, Wes McKinney <wesmck...@gmail.com> >> >> >> >>>>>> wrote: >> >> >> >>>>>> >> >> >> >>>>>>> On Sun, Jul 19, 2020 at 7:33 PM Neal Richardson >> >> >> >>>>>>> <neal.p.richard...@gmail.com> wrote: >> >> >> >>>>>>>> >> >> >> >>>>>>>> It sounds like you may have identified a pyarrow bug, which >> >> >> >>>>>>>> sounds >> >> >> >>>>> not >> >> >> >>>>>>>> good, though I don't know enough about the broader context to >> >> >> >>>>>>>> know >> >> >> >>>>>>> whether >> >> >> >>>>>>>> this is (1) worse than 0.17 or (2) release blocking. I defer >> >> >> >>>>>>>> to >> >> >> >>>>> y'all >> >> >> >>>>>> who >> >> >> >>>>>>>> know better. >> >> >> >>>>>>>> >> >> >> >>>>>>>> If there are quirks in how Spark handles timezone-naive >> >> >> >>>>>>>> timestamps, >> >> >> >>>>>>>> shouldn't the fix/workaround go in pyspark, not pyarrow? For >> >> >> >>>>>>>> what >> >> >> >>>>> it's >> >> >> >>>>>>>> worth, I dealt with similar Spark timezone issues in R >> >> >> >>>>>>>> recently: >> >> >> >>>>>>>> https://github.com/sparklyr/sparklyr/issues/2439 I handled >> >> >> >>>>>>>> with it >> >> >> >>>>> (in >> >> >> >>>>>>>> sparklyr, not the arrow R package) by always setting a >> >> >> >>>>>>>> timezone when >> >> >> >>>>>>>> sending data to Spark. Not ideal but it made the numbers >> >> >> >>>>>>>> "right". >> >> >> >>>>>>> >> >> >> >>>>>>> Since people are running this code in production we need to be >> >> >> >>>>>>> careful >> >> >> >>>>>>> about disrupting them. Unfortunately I'm at the limit of how >> >> >> >>>>>>> much time >> >> >> >>>>>>> I can spend on this, but releasing with ARROW-9223 as is >> >> >> >>>>>>> (without >> >> >> >>>>>>> being partially or fully reverted) makes me deeply >> >> >> >>>>>>> uncomfortable. So I >> >> >> >>>>>>> hope the matter can be resolved. >> >> >> >>>>>>> >> >> >> >>>>>>>> Neal >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> On Sun, Jul 19, 2020 at 5:13 PM Wes McKinney >> >> >> >>>>>>>> <wesmck...@gmail.com> >> >> >> >>>>>>> wrote: >> >> >> >>>>>>>> >> >> >> >>>>>>>>> Honestly I think reverting is the best option. This change >> >> >> >>>>> evidently >> >> >> >>>>>>>>> needs more time to "season" and perhaps this is motivation to >> >> >> >>>>> enhance >> >> >> >>>>>>>>> test coverage in a number of places. >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> On Sun, Jul 19, 2020 at 7:11 PM Wes McKinney >> >> >> >>>>>>>>> <wesmck...@gmail.com >> >> >> >>>>>> >> >> >> >>>>>>> wrote: >> >> >> >>>>>>>>>> >> >> >> >>>>>>>>>> I am OK with any solution that doesn't delay the production >> >> >> >>>>>>>>>> of >> >> >> >>>>> the >> >> >> >>>>>>>>>> next RC by more than 24 hours >> >> >> >>>>>>>>>> >> >> >> >>>>>>>>>> On Sun, Jul 19, 2020 at 7:08 PM Micah Kornfield < >> >> >> >>>>>>> emkornfi...@gmail.com> >> >> >> >>>>>>>>> wrote: >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> If I read the example right it looks like constructing from >> >> >> >>>>>> python >> >> >> >>>>>>>>> types >> >> >> >>>>>>>>>>> isn't keeping timezones into in place? I can try make a >> >> >> >>>>>>>>>>> patch >> >> >> >>>>>> that >> >> >> >>>>>>>>> fixes >> >> >> >>>>>>>>>>> that tonight or would the preference be to revert my patch >> >> >> >>>>> (note >> >> >> >>>>>> I >> >> >> >>>>>>>>> think >> >> >> >>>>>>>>>>> another bug with a prior bug was fixed in my PR as well) >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> -Micah >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> On Sunday, July 19, 2020, Wes McKinney >> >> >> >>>>>>>>>>> <wesmck...@gmail.com> >> >> >> >>>>>>> wrote: >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>>> I think I see the problem now: >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [40]: parr >> >> >> >>>>>>>>>>>> Out[40]: >> >> >> >>>>>>>>>>>> 0 {'f0': 1969-12-31 16:00:00-08:00} >> >> >> >>>>>>>>>>>> 1 {'f0': 1969-12-31 16:00:00.000001-08:00} >> >> >> >>>>>>>>>>>> 2 {'f0': 1969-12-31 16:00:00.000002-08:00} >> >> >> >>>>>>>>>>>> dtype: object >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [41]: parr[0]['f0'] >> >> >> >>>>>>>>>>>> Out[41]: datetime.datetime(1969, 12, 31, 16, 0, >> >> >> >>>>>> tzinfo=<DstTzInfo >> >> >> >>>>>>>>>>>> 'America/Los_Angeles' PST-1 day, 16:00:00 STD>) >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [42]: pa.array(parr) >> >> >> >>>>>>>>>>>> Out[42]: >> >> >> >>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f0893706a60> >> >> >> >>>>>>>>>>>> -- is_valid: all not null >> >> >> >>>>>>>>>>>> -- child 0 type: timestamp[us] >> >> >> >>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>> 1969-12-31 16:00:00.000000, >> >> >> >>>>>>>>>>>> 1969-12-31 16:00:00.000001, >> >> >> >>>>>>>>>>>> 1969-12-31 16:00:00.000002 >> >> >> >>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [43]: pa.array(parr).field(0).type >> >> >> >>>>>>>>>>>> Out[43]: TimestampType(timestamp[us]) >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> On 0.17.1 >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [8]: arr = pa.array([0, 1, 2], type=pa.timestamp('us', >> >> >> >>>>>>>>>>>> 'America/Los_Angeles')) >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [9]: arr >> >> >> >>>>>>>>>>>> Out[9]: >> >> >> >>>>>>>>>>>> <pyarrow.lib.TimestampArray object at 0x7f9dede69d00> >> >> >> >>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000000, >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000001, >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000002 >> >> >> >>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [10]: struct_arr = pa.StructArray.from_arrays([arr], >> >> >> >>>>>>> names=['f0']) >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [11]: struct_arr >> >> >> >>>>>>>>>>>> Out[11]: >> >> >> >>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f9ded0016e0> >> >> >> >>>>>>>>>>>> -- is_valid: all not null >> >> >> >>>>>>>>>>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles] >> >> >> >>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000000, >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000001, >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000002 >> >> >> >>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [12]: struct_arr.to_pandas() >> >> >> >>>>>>>>>>>> Out[12]: >> >> >> >>>>>>>>>>>> 0 {'f0': 1970-01-01 00:00:00} >> >> >> >>>>>>>>>>>> 1 {'f0': 1970-01-01 00:00:00.000001} >> >> >> >>>>>>>>>>>> 2 {'f0': 1970-01-01 00:00:00.000002} >> >> >> >>>>>>>>>>>> dtype: object >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [13]: pa.array(struct_arr.to_pandas()) >> >> >> >>>>>>>>>>>> Out[13]: >> >> >> >>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f9ded003210> >> >> >> >>>>>>>>>>>> -- is_valid: all not null >> >> >> >>>>>>>>>>>> -- child 0 type: timestamp[us] >> >> >> >>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000000, >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000001, >> >> >> >>>>>>>>>>>> 1970-01-01 00:00:00.000002 >> >> >> >>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> In [14]: pa.array(struct_arr.to_pandas()).type >> >> >> >>>>>>>>>>>> Out[14]: StructType(struct<f0: timestamp[us]>) >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> So while the time zone is getting stripped in both cases, >> >> >> >>>>> the >> >> >> >>>>>>> failure >> >> >> >>>>>>>>>>>> to round trip is a problem. If we are going to attach the >> >> >> >>>>> time >> >> >> >>>>>>> zone >> >> >> >>>>>>>>> in >> >> >> >>>>>>>>>>>> to_pandas() then we need to respect it when going the >> >> >> >>>>>>>>>>>> other >> >> >> >>>>>> way. >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> This looks like a regression to me and so I'm inclined to >> >> >> >>>>>> revise >> >> >> >>>>>>> my >> >> >> >>>>>>>>>>>> vote on the release to -0/-1 >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:46 PM Wes McKinney < >> >> >> >>>>>>> wesmck...@gmail.com> >> >> >> >>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> Ah I forgot that this is a "feature" of nanosecond >> >> >> >>>>> timestamps >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> In [21]: arr = pa.array([0, 1, 2], >> >> >> >>>>>>>>>>>>> type=pa.timestamp('us', >> >> >> >>>>>>>>>>>>> 'America/Los_Angeles')) >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> In [22]: struct_arr = pa.StructArray.from_arrays([arr], >> >> >> >>>>>>>>> names=['f0']) >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> In [23]: struct_arr.to_pandas() >> >> >> >>>>>>>>>>>>> Out[23]: >> >> >> >>>>>>>>>>>>> 0 {'f0': 1969-12-31 16:00:00-08:00} >> >> >> >>>>>>>>>>>>> 1 {'f0': 1969-12-31 16:00:00.000001-08:00} >> >> >> >>>>>>>>>>>>> 2 {'f0': 1969-12-31 16:00:00.000002-08:00} >> >> >> >>>>>>>>>>>>> dtype: object >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> So this is working as intended, such as it is >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:40 PM Wes McKinney < >> >> >> >>>>>>> wesmck...@gmail.com> >> >> >> >>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> There seems to be other broken StructArray stuff >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> In [14]: arr = pa.array([0, 1, 2], >> >> >> >>>>> type=pa.timestamp('ns', >> >> >> >>>>>>>>>>>>>> 'America/Los_Angeles')) >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> In [15]: struct_arr = pa.StructArray.from_arrays([arr], >> >> >> >>>>>>>>> names=['f0']) >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> In [16]: struct_arr >> >> >> >>>>>>>>>>>>>> Out[16]: >> >> >> >>>>>>>>>>>>>> <pyarrow.lib.StructArray object at 0x7f089370f590> >> >> >> >>>>>>>>>>>>>> -- is_valid: all not null >> >> >> >>>>>>>>>>>>>> -- child 0 type: timestamp[ns, tz=America/Los_Angeles] >> >> >> >>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>> 1970-01-01 00:00:00.000000000, >> >> >> >>>>>>>>>>>>>> 1970-01-01 00:00:00.000000001, >> >> >> >>>>>>>>>>>>>> 1970-01-01 00:00:00.000000002 >> >> >> >>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> In [17]: struct_arr.to_pandas() >> >> >> >>>>>>>>>>>>>> Out[17]: >> >> >> >>>>>>>>>>>>>> 0 {'f0': 0} >> >> >> >>>>>>>>>>>>>> 1 {'f0': 1} >> >> >> >>>>>>>>>>>>>> 2 {'f0': 2} >> >> >> >>>>>>>>>>>>>> dtype: object >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> All in all it appears that this part of the project >> >> >> >>>>> needs >> >> >> >>>>>>> some >> >> >> >>>>>>>>> TLC >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:16 PM Wes McKinney < >> >> >> >>>>>>>>> wesmck...@gmail.com> >> >> >> >>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> Well, the problem is that time zones are really >> >> >> >>>>> finicky >> >> >> >>>>>>>>> comparing >> >> >> >>>>>>>>>>>>>>> Spark (which uses a localtime interpretation of >> >> >> >>>>>> timestamps >> >> >> >>>>>>>>> without >> >> >> >>>>>>>>>>>>>>> time zone) and Arrow (which has naive timestamps -- a >> >> >> >>>>>>> concept >> >> >> >>>>>>>>> similar >> >> >> >>>>>>>>>>>>>>> but different from the SQL concept TIMESTAMP WITHOUT >> >> >> >>>>> TIME >> >> >> >>>>>>> ZONE >> >> >> >>>>>>>>> -- and >> >> >> >>>>>>>>>>>>>>> tz-aware timestamps). So somewhere there is a time >> >> >> >>>>> zone >> >> >> >>>>>>> being >> >> >> >>>>>>>>>>>> stripped >> >> >> >>>>>>>>>>>>>>> or applied/localized which may result in the >> >> >> >>>>> transferred >> >> >> >>>>>>> data >> >> >> >>>>>>>>> to/from >> >> >> >>>>>>>>>>>>>>> Spark being shifted by the time zone offset. I think >> >> >> >>>>> it's >> >> >> >>>>>>>>> important >> >> >> >>>>>>>>>>>>>>> that we determine what the problem is -- if it's a >> >> >> >>>>>> problem >> >> >> >>>>>>>>> that has >> >> >> >>>>>>>>>>>> to >> >> >> >>>>>>>>>>>>>>> be fixed in Arrow (and it's not clear to me that it >> >> >> >>>>> is) >> >> >> >>>>>>> it's >> >> >> >>>>>>>>> worth >> >> >> >>>>>>>>>>>>>>> spending some time to understand what's going on to >> >> >> >>>>> avoid >> >> >> >>>>>>> the >> >> >> >>>>>>>>>>>>>>> possibility of patch release on account of this. >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 6:12 PM Neal Richardson >> >> >> >>>>>>>>>>>>>>> <neal.p.richard...@gmail.com> wrote: >> >> >> >>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>> If it’s a display problem, should it block the >> >> >> >>>>> release? >> >> >> >>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>> Sent from my iPhone >> >> >> >>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>> On Jul 19, 2020, at 3:57 PM, Wes McKinney < >> >> >> >>>>>>>>> wesmck...@gmail.com> >> >> >> >>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>> I opened https://issues.apache.org/ >> >> >> >>>>>>> jira/browse/ARROW-9525 >> >> >> >>>>>>>>>>>> about the >> >> >> >>>>>>>>>>>>>>>>> display problem. My guess is that there are other >> >> >> >>>>>>> problems >> >> >> >>>>>>>>>>>> lurking >> >> >> >>>>>>>>>>>>>>>>> here >> >> >> >>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney < >> >> >> >>>>>>>>>>>> wesmck...@gmail.com> wrote: >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> hi Bryan, >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> This is a display bug >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> In [6]: arr = pa.array([0, 1, 2], >> >> >> >>>>>>> type=pa.timestamp('ns', >> >> >> >>>>>>>>>>>>>>>>>> 'America/Los_Angeles')) >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> In [7]: arr.view('int64') >> >> >> >>>>>>>>>>>>>>>>>> Out[7]: >> >> >> >>>>>>>>>>>>>>>>>> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30> >> >> >> >>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>> 0, >> >> >> >>>>>>>>>>>>>>>>>> 1, >> >> >> >>>>>>>>>>>>>>>>>> 2 >> >> >> >>>>>>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> In [8]: arr >> >> >> >>>>>>>>>>>>>>>>>> Out[8]: >> >> >> >>>>>>>>>>>>>>>>>> <pyarrow.lib.TimestampArray object at >> >> >> >>>>>> 0x7fd1b8aae6e0> >> >> >> >>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>> 1970-01-01 00:00:00.000000000, >> >> >> >>>>>>>>>>>>>>>>>> 1970-01-01 00:00:00.000000001, >> >> >> >>>>>>>>>>>>>>>>>> 1970-01-01 00:00:00.000000002 >> >> >> >>>>>>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> In [9]: arr.to_pandas() >> >> >> >>>>>>>>>>>>>>>>>> Out[9]: >> >> >> >>>>>>>>>>>>>>>>>> 0 1969-12-31 16:00:00-08:00 >> >> >> >>>>>>>>>>>>>>>>>> 1 1969-12-31 16:00:00.000000001-08:00 >> >> >> >>>>>>>>>>>>>>>>>> 2 1969-12-31 16:00:00.000000002-08:00 >> >> >> >>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles] >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> the repr of TimestampArray doesn't take into >> >> >> >>>>> account >> >> >> >>>>>>> the >> >> >> >>>>>>>>>>>> timezone >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> In [10]: arr[0] >> >> >> >>>>>>>>>>>>>>>>>> Out[10]: <pyarrow.TimestampScalar: >> >> >> >>>>>>> Timestamp('1969-12-31 >> >> >> >>>>>>>>>>>>>>>>>> 16:00:00-0800', tz='America/Los_Angeles')> >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> So if it's incorrect, the problem is happening >> >> >> >>>>>>> somewhere >> >> >> >>>>>>>>> before >> >> >> >>>>>>>>>>>> or >> >> >> >>>>>>>>>>>>>>>>>> while the StructArray is being created. If I had >> >> >> >>>>> to >> >> >> >>>>>>> guess >> >> >> >>>>>>>>> it's >> >> >> >>>>>>>>>>>> caused >> >> >> >>>>>>>>>>>>>>>>>> by the tzinfo of the datetime.datetime values not >> >> >> >>>>>>> being >> >> >> >>>>>>>>> handled >> >> >> >>>>>>>>>>>> in the >> >> >> >>>>>>>>>>>>>>>>>> way that they were before >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney < >> >> >> >>>>>>>>>>>> wesmck...@gmail.com> wrote: >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> Well this is not good and pretty disappointing >> >> >> >>>>>> given >> >> >> >>>>>>>>> that we >> >> >> >>>>>>>>>>>> had nearly a month to sort through the implications of >> >> >> >>>>> Micah’s >> >> >> >>>>>>>>> patch. We >> >> >> >>>>>>>>>>>> should try to resolve this ASAP >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler < >> >> >> >>>>>>>>>>>> cutl...@gmail.com> wrote: >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> +0 (non-binding) >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> I ran verification script for binaries and then >> >> >> >>>>>>> source, >> >> >> >>>>>>>>> as >> >> >> >>>>>>>>>>>> below, and both >> >> >> >>>>>>>>>>>>>>>>>>>> look good >> >> >> >>>>>>>>>>>>>>>>>>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 >> >> >> >>>>>>>>> TEST_SOURCE=1 >> >> >> >>>>>>>>>>>> TEST_CPP=1 >> >> >> >>>>>>>>>>>>>>>>>>>> TEST_PYTHON=1 TEST_JAVA=1 >> >> >> >>>>> TEST_INTEGRATION_CPP=1 >> >> >> >>>>>>>>>>>> TEST_INTEGRATION_JAVA=1 >> >> >> >>>>>>>>>>>>>>>>>>>> dev/release/verify-release-candidate.sh source >> >> >> >>>>>>> 1.0.0 1 >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> I tried to patch Spark locally to verify the >> >> >> >>>>>> recent >> >> >> >>>>>>>>> change in >> >> >> >>>>>>>>>>>> nested >> >> >> >>>>>>>>>>>>>>>>>>>> timestamps and was not able to get things >> >> >> >>>>> working >> >> >> >>>>>>> quite >> >> >> >>>>>>>>>>>> right, but I'm not >> >> >> >>>>>>>>>>>>>>>>>>>> sure if the problem is in Spark, Arrow or my >> >> >> >>>>>> patch - >> >> >> >>>>>>>>> hence my >> >> >> >>>>>>>>>>>> vote of +0. >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> Here is what I'm seeing >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> ``` >> >> >> >>>>>>>>>>>>>>>>>>>> (Input as datetime) >> >> >> >>>>>>>>>>>>>>>>>>>> datetime.datetime(2018, 3, 10, 0, 0) >> >> >> >>>>>>>>>>>>>>>>>>>> datetime.datetime(2018, 3, 15, 0, 0) >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> (Struct Array) >> >> >> >>>>>>>>>>>>>>>>>>>> -- is_valid: all not null >> >> >> >>>>>>>>>>>>>>>>>>>> -- child 0 type: timestamp[us, >> >> >> >>>>>>> tz=America/Los_Angeles] >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000, >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000 >> >> >> >>>>>>>>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>>>>>>>> -- child 1 type: timestamp[us, >> >> >> >>>>>>> tz=America/Los_Angeles] >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000, >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000 >> >> >> >>>>>>>>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> (Flattened Arrays) >> >> >> >>>>>>>>>>>>>>>>>>>> types [TimestampType(timestamp[us, >> >> >> >>>>>>>>> tz=America/Los_Angeles]), >> >> >> >>>>>>>>>>>>>>>>>>>> TimestampType(timestamp[us, >> >> >> >>>>>>> tz=America/Los_Angeles])] >> >> >> >>>>>>>>>>>>>>>>>>>> [<pyarrow.lib.TimestampArray object at >> >> >> >>>>>>> 0x7ffbbd88f520> >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000, >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-10 00:00:00.000000 >> >> >> >>>>>>>>>>>>>>>>>>>> ], <pyarrow.lib.TimestampArray object at >> >> >> >>>>>>> 0x7ffba958be50> >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000, >> >> >> >>>>>>>>>>>>>>>>>>>> 2018-03-15 00:00:00.000000 >> >> >> >>>>>>>>>>>>>>>>>>>> ]] >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> (Pandas Conversion) >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> 0 2018-03-09 16:00:00-08:00 >> >> >> >>>>>>>>>>>>>>>>>>>> 1 2018-03-09 16:00:00-08:00 >> >> >> >>>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles], >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> 0 2018-03-14 17:00:00-07:00 >> >> >> >>>>>>>>>>>>>>>>>>>> 1 2018-03-14 17:00:00-07:00 >> >> >> >>>>>>>>>>>>>>>>>>>> dtype: datetime64[ns, America/Los_Angeles]] >> >> >> >>>>>>>>>>>>>>>>>>>> ``` >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> Based on output of existing a correct timestamp >> >> >> >>>>>>> udf, it >> >> >> >>>>>>>>> looks >> >> >> >>>>>>>>>>>> like the >> >> >> >>>>>>>>>>>>>>>>>>>> pyarrow Struct Array values are wrong and >> >> >> >>>>> that's >> >> >> >>>>>>> carried >> >> >> >>>>>>>>>>>> through the >> >> >> >>>>>>>>>>>>>>>>>>>> flattened arrays, causing the Pandas values to >> >> >> >>>>>> have >> >> >> >>>>>>> a >> >> >> >>>>>>>>>>>> negative offset. >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> Here is output from a working udf with >> >> >> >>>>> timestamp, >> >> >> >>>>>>> the >> >> >> >>>>>>>>> pyarrow >> >> >> >>>>>>>>>>>> Array >> >> >> >>>>>>>>>>>>>>>>>>>> displays in UTC time, I believe. >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> ``` >> >> >> >>>>>>>>>>>>>>>>>>>> (Timestamp Array) >> >> >> >>>>>>>>>>>>>>>>>>>> type timestamp[us, tz=America/Los_Angeles] >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> [ >> >> >> >>>>>>>>>>>>>>>>>>>> 1969-01-01 09:01:01.000000 >> >> >> >>>>>>>>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>>>>>>>> ] >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> (Pandas Conversion) >> >> >> >>>>>>>>>>>>>>>>>>>> 0 1969-01-01 01:01:01-08:00 >> >> >> >>>>>>>>>>>>>>>>>>>> Name: _0, dtype: datetime64[ns, >> >> >> >>>>>> America/Los_Angeles] >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> (Timezone Localized) >> >> >> >>>>>>>>>>>>>>>>>>>> 0 1969-01-01 01:01:01 >> >> >> >>>>>>>>>>>>>>>>>>>> Name: _0, dtype: datetime64[ns] >> >> >> >>>>>>>>>>>>>>>>>>>> ``` >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> I'll have to dig in further at another time and >> >> >> >>>>>>> debug >> >> >> >>>>>>>>> where >> >> >> >>>>>>>>>>>> the values go >> >> >> >>>>>>>>>>>>>>>>>>>> wrong. >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>> On Sat, Jul 18, 2020 at 9:51 PM Micah >> >> >> >>>>> Kornfield < >> >> >> >>>>>>>>>>>> emkornfi...@gmail.com> >> >> >> >>>>>>>>>>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> +1 (binding) >> >> >> >>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> Ran wheel and binary tests on ubuntu 19.04 >> >> >> >>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal >> >> >> >>>>> Richardson < >> >> >> >>>>>>>>>>>>>>>>>>>>> neal.p.richard...@gmail.com> >> >> >> >>>>>>>>>>>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> +1 (binding) >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> In addition to the usual verification on >> >> >> >>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/pull/7787, >> >> >> >>>>> I've >> >> >> >>>>>>>>>>>> successfully staged the >> >> >> >>>>>>>>>>>>>>>>>>>>> R >> >> >> >>>>>>>>>>>>>>>>>>>>>> binary artifacts on Windows ( >> >> >> >>>>>>>>>>>>>>>>>>>>>> https://github.com/r-windows/ >> >> >> >>>>>>> rtools-packages/pull/126 >> >> >> >>>>>>>>> ), >> >> >> >>>>>>>>>>>> macOS ( >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>> https://github.com/autobrew/homebrew-core/pull/12 >> >> >> >>>>>>> ), >> >> >> >>>>>>>>> and >> >> >> >>>>>>>>>>>> Linux ( >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/ >> >> >> >>>>>>>>>>>> 172977277) >> >> >> >>>>>>>>>>>>>>>>>>>>> using >> >> >> >>>>>>>>>>>>>>>>>>>>>> the release candidate. >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> And I agree with the judgment about skipping >> >> >> >>>>> a >> >> >> >>>>>> JS >> >> >> >>>>>>>>> release >> >> >> >>>>>>>>>>>> artifact. Looks >> >> >> >>>>>>>>>>>>>>>>>>>>>> like there hasn't been a code change since >> >> >> >>>>>>> October so >> >> >> >>>>>>>>>>>> there's no point. >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> Neal >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes >> >> >> >>>>> McKinney < >> >> >> >>>>>>>>>>>> wesmck...@gmail.com> >> >> >> >>>>>>>>>>>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> I see the JS failures as well. I think it >> >> >> >>>>> is a >> >> >> >>>>>>>>> failure >> >> >> >>>>>>>>>>>> localized to >> >> >> >>>>>>>>>>>>>>>>>>>>>>> newer Node versions since our JavaScript CI >> >> >> >>>>>> works >> >> >> >>>>>>>>> fine. I >> >> >> >>>>>>>>>>>> don't think >> >> >> >>>>>>>>>>>>>>>>>>>>>>> it should block the release given the lack >> >> >> >>>>> of >> >> >> >>>>>>>>> development >> >> >> >>>>>>>>>>>> activity in >> >> >> >>>>>>>>>>>>>>>>>>>>>>> JavaScript [1] -- if any JS devs are >> >> >> >>>>> concerned >> >> >> >>>>>>> about >> >> >> >>>>>>>>>>>> publishing an >> >> >> >>>>>>>>>>>>>>>>>>>>>>> artifact then we can skip pushing it to NPM >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> @Ryan it seems it may be something >> >> >> >>>>> environment >> >> >> >>>>>>>>> related on >> >> >> >>>>>>>>>>>> your >> >> >> >>>>>>>>>>>>>>>>>>>>>>> machine, I'm on Ubuntu 18.04 and have not >> >> >> >>>>> seen >> >> >> >>>>>>> this. >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> On >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> * Python 3.8 wheel's tests are failed. >> >> >> >>>>> 3.5, >> >> >> >>>>>> 3.6 >> >> >> >>>>>>>>> and 3.7 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> are passed. It seems that -larrow and >> >> >> >>>>>>>>> -larrow_python >> >> >> >>>>>>>>>>>> for >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> Cython are failed. >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> I suspect this is related to >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/commit/ >> >> >> >>>>>>>>>>>> 120c21f4bf66d2901b3a353a1f67bac3c3355924#diff- >> >> >> >>>>>>>>>>>> 0f69784b44040448d17d0e4e8a641fe8 >> >> >> >>>>>>>>>>>>>>>>>>>>>>> , >> >> >> >>>>>>>>>>>>>>>>>>>>>>> but I don't think it's a blocking issue >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> [1]: >> >> >> >>>>>>>>> https://github.com/apache/arrow/commits/master/js >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray >> >> >> >>>>> < >> >> >> >>>>>>>>>>>> rym...@dremio.com> wrote: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> I've tested Java and it looks good. However >> >> >> >>>>>> the >> >> >> >>>>>>>>> verify >> >> >> >>>>>>>>>>>> script keeps >> >> >> >>>>>>>>>>>>>>>>>>>>> on >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> bailing with protobuf related errors: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_ >> >> >> >>>>>>>>>>>> proto.pb.cc' >> >> >> >>>>>>>>>>>>>>>>>>>>> and >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> friends cant find protobuf definitions. A >> >> >> >>>>> bit >> >> >> >>>>>>> odd as >> >> >> >>>>>>>>>>>> cmake can see >> >> >> >>>>>>>>>>>>>>>>>>>>>>> protobuf >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> headers and builds directly off master work >> >> >> >>>>>> just >> >> >> >>>>>>>>> fine. >> >> >> >>>>>>>>>>>> Has anyone >> >> >> >>>>>>>>>>>>>>>>>>>>> else >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> experienced this? I am on ubutnu 18.04 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine >> >> >> >>>>>> Pitrou >> >> >> >>>>>>> < >> >> >> >>>>>>>>>>>> anto...@python.org> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> +1 (binding). I tested on Ubuntu 18.04. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> * Wheels verification went fine. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> * Source verification went fine with CUDA >> >> >> >>>>>>> enabled >> >> >> >>>>>>>>> and >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> I didn't test the binaries. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> Regards >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> Antoine. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a >> >> >> >>>>>> écrit >> >> >> >>>>>>> : >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> I would like to propose the second >> >> >> >>>>> release >> >> >> >>>>>>>>> candidate >> >> >> >>>>>>>>>>>> (RC1) of >> >> >> >>>>>>>>>>>>>>>>>>>>>> Apache >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Arrow version 1.0.0. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> This is a major release consisting of 826 >> >> >> >>>>>>>>> resolved JIRA >> >> >> >>>>>>>>>>>>>>>>>>>>> issues[1]. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> The verification of the first release >> >> >> >>>>>>> candidate >> >> >> >>>>>>>>> (RC0) >> >> >> >>>>>>>>>>>> has failed >> >> >> >>>>>>>>>>>>>>>>>>>>>>> [0], and >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> the packaging scripts were unable to >> >> >> >>>>> produce >> >> >> >>>>>>> two >> >> >> >>>>>>>>>>>> wheels. Compared >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> to RC0 this release candidate includes >> >> >> >>>>>>> additional >> >> >> >>>>>>>>>>>> patches for the >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> following bugs: ARROW-9506, ARROW-9504, >> >> >> >>>>>>>>> ARROW-9497, >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> ARROW-9500, ARROW-9499. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> This release candidate is based on >> >> >> >>>>> commit: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 >> >> >> >>>>> [2] >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> The source release rc1 is hosted at [3]. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> The binary artifacts are hosted at >> >> >> >>>>>>> [4][5][6][7]. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> The changelog is located at [8]. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Please download, verify checksums and >> >> >> >>>>>>> signatures, >> >> >> >>>>>>>>> run >> >> >> >>>>>>>>>>>> the unit >> >> >> >>>>>>>>>>>>>>>>>>>>>> tests, >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> and vote on the release. See [9] for how >> >> >> >>>>> to >> >> >> >>>>>>>>> validate a >> >> >> >>>>>>>>>>>> release >> >> >> >>>>>>>>>>>>>>>>>>>>>>> candidate. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> The vote will be open for at least 72 >> >> >> >>>>> hours. >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [ ] +0 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [ ] -1 Do not release this as Apache >> >> >> >>>>> Arrow >> >> >> >>>>>>> 1.0.0 >> >> >> >>>>>>>>>>>> because... >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [0]: >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment- >> >> >> >>>>>>>>>>>> 659065370 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [1]: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/ >> >> >> >>>>>>> jira/issues/?jql=project%20% >> >> >> >>>>>>>>>>>> 3D%20ARROW%20AND%20status%20in%20%28Resolved%2C% >> >> >> >>>>>>> 20Closed%29%20AND% >> >> >> >>>>>>>>>>>> 20fixVersion%20%3D%201.0.0 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [2]: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/tree/ >> >> >> >>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [3]: >> >> >> >>>>>>>>>>>>>>>>>>>>>>> https://dist.apache.org/repos/ >> >> >> >>>>>>>>>>>> dist/dev/arrow/apache-arrow-1.0.0-rc1 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [4]: https://bintray.com/apache/ >> >> >> >>>>>>>>>>>> arrow/centos-rc/1.0.0-rc1 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [5]: https://bintray.com/apache/ >> >> >> >>>>>>>>>>>> arrow/debian-rc/1.0.0-rc1 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [6]: https://bintray.com/apache/ >> >> >> >>>>>>>>>>>> arrow/python-rc/1.0.0-rc1 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [7]: https://bintray.com/apache/ >> >> >> >>>>>>>>>>>> arrow/ubuntu-rc/1.0.0-rc1 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [8]: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/arrow/blob/ >> >> >> >>>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> [9]: >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/ >> >> >> >>>>>>> confluence/display/ARROW/How+ >> >> >> >>>>>>>>>>>> to+Verify+Release+Candidates >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>> >> >> >> >>>>>>> >> >> >> >>>>>> >> >> >> >>>>> >> >> >> >>>>