Is there or should there be some checking of digests just to make sure that we are really testing against the same thing in /tmp/test-spark that we are distributing from the archive?
On Thu, Jul 19, 2018 at 11:15 AM Sean Owen <sro...@apache.org> wrote: > Ideally, that list is updated with each release, yes. Non-current releases > will now always download from archive.apache.org though. But we run into > rate-limiting problems if that gets pinged too much. So yes good to keep > the list only to current branches. > > It looks like the download is cached in /tmp/test-spark, for what it's > worth. > > On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> +1 this has been problematic. >> >> Also, this list needs to be updated every time we make a new release? >> >> Plus can we cache them on Jenkins, maybe we can avoid downloading the >> same thing from Apache archive every test run. >> >> >> ------------------------------ >> *From:* Marco Gaido <marcogaid...@gmail.com> >> *Sent:* Monday, July 16, 2018 11:12 PM >> *To:* Hyukjin Kwon >> *Cc:* Sean Owen; dev >> *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness >> of HiveExternalCatalogVersionsSuite >> >> +1 too >> >> On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gurwls...@gmail.com> wrote: >> >>> +1 >>> >>> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sro...@apache.org>님이 작성: >>> >>>> Fix is committed to branches back through 2.2.x, where this test was >>>> added. >>>> >>>> There is still some issue; I'm seeing that archive.apache.org is >>>> rate-limiting downloads and frequently returning 503 errors. >>>> >>>> We can help, I guess, by avoiding testing against non-current releases. >>>> Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is >>>> now effectively EOL right? >>>> >>>> I can make that quick change too if everyone's amenable, in order to >>>> prevent more failures in this test from master. >>>> >>>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> Yesterday I cleaned out old Spark releases from the mirror system -- >>>>> we're supposed to only keep the latest release from active branches out on >>>>> mirrors. (All releases are available from the Apache archive site.) >>>>> >>>>> Having done so I realized quickly that the >>>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being >>>>> available from mirrors. It has been flaky, as sometimes mirrors are >>>>> unreliable. I think now it will not work for any versions except 2.3.1, >>>>> 2.2.2, 2.1.3. >>>>> >>>>> Because we do need to clean those releases out of the mirrors soon >>>>> anyway, and because they're flaky sometimes, I propose adding logic to the >>>>> test to fall back on downloading from the Apache archive site. >>>>> >>>>> ... and I'll do that right away to unblock >>>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported >>>>> to >>>>> other branches as they will still be testing against potentially >>>>> non-current Spark releases. >>>>> >>>>> Sean >>>>> >>>>