Re: [VOTE] Spark 2.3.1 (RC4)

Ricardo Almeida Sun, 03 Jun 2018 07:26:05 -0700

+1 (non-binding)

On 3 June 2018 at 09:23, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:


> +1
>
> Bests,
> Dongjoon.
>
> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <denny.g....@gmail.com> wrote:
>
>> +1
>>
>> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> I'll give that a try, but I'll still have to figure out what to do if
>>> none of the release builds work with hadoop-aws, since Flintrock deploys
>>> Spark release builds to set up a cluster. Building Spark is slow, so we
>>> only do it if the user specifically requests a Spark version by git hash.
>>> (This is basically how spark-ec2 did things, too.)
>>>
>>>
>>> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>>
>>>> If you're building your own Spark, definitely try the hadoop-cloud
>>>> profile. Then you don't even need to pull anything at runtime,
>>>> everything is already packaged with Spark.
>>>>
>>>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>>>> <nicholas.cham...@gmail.com> wrote:
>>>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work
>>>> for me
>>>> > either (even building with -Phadoop-2.7). I guess I’ve been relying
>>>> on an
>>>> > unsupported pattern and will need to figure something else out going
>>>> forward
>>>> > in order to use s3a://.
>>>> >
>>>> >
>>>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <van...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> I have personally never tried to include hadoop-aws that way. But at
>>>> >> the very least, I'd try to use the same version of Hadoop as the
>>>> Spark
>>>> >> build (2.7.3 IIRC). I don't really expect a different version to
>>>> work,
>>>> >> and if it did in the past it definitely was not by design.
>>>> >>
>>>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>>>> >> <nicholas.cham...@gmail.com> wrote:
>>>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember
>>>> correctly,
>>>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
>>>> release,
>>>> >> > so
>>>> >> > it appears something has changed since then.
>>>> >> >
>>>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
>>>> >> >
>>>> >> > My goal here is simply to confirm that this release of Spark works
>>>> with
>>>> >> > hadoop-aws like past releases did, particularly for Flintrock
>>>> users who
>>>> >> > use
>>>> >> > Spark with S3A.
>>>> >> >
>>>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>>>> builds
>>>> >> > with
>>>> >> > every Spark release. If the -hadoop2.7 release build won’t work
>>>> with
>>>> >> > hadoop-aws anymore, are there plans to provide a new build type
>>>> that
>>>> >> > will?
>>>> >> >
>>>> >> > Apologies if the question is poorly formed. I’m batting a bit
>>>> outside my
>>>> >> > league here. Again, my goal is simply to confirm that I/my users
>>>> still
>>>> >> > have
>>>> >> > a way to use s3a://. In the past, that way was simply to call
>>>> pyspark
>>>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
>>>> similar.
>>>> >> > If
>>>> >> > that will no longer work, I’m trying to confirm that the change of
>>>> >> > behavior
>>>> >> > is intentional or acceptable (as a review for the Spark project)
>>>> and
>>>> >> > figure
>>>> >> > out what I need to change (as due diligence for Flintrock’s users).
>>>> >> >
>>>> >> > Nick
>>>> >> >
>>>> >> >
>>>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <van...@cloudera.com
>>>> >
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Using the hadoop-aws package is probably going to be a little more
>>>> >> >> complicated than that. The best bet is to use a custom build of
>>>> Spark
>>>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
>>>> >> >> looking at some nasty dependency issues, especially if you end up
>>>> >> >> mixing different versions of Hadoop.
>>>> >> >>
>>>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>>>> >> >> <nicholas.cham...@gmail.com> wrote:
>>>> >> >> > I was able to successfully launch a Spark cluster on EC2 at
>>>> 2.3.1 RC4
>>>> >> >> > using
>>>> >> >> > Flintrock. However, trying to load the hadoop-aws package gave
>>>> me
>>>> >> >> > some
>>>> >> >> > errors.
>>>> >> >> >
>>>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>>>> >> >> >
>>>> >> >> > <snipped>
>>>> >> >> >
>>>> >> >> > :: problems summary ::
>>>> >> >> > :::: WARNINGS
>>>> >> >> >                 [NOT FOUND  ]
>>>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
>>>> >> >> >         ==== local-m2-cache: tried
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-
>>>> json/1.9/jersey-json-1.9.jar
>>>> >> >> >                 [NOT FOUND  ]
>>>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)
>>>> (0ms)
>>>> >> >> >         ==== local-m2-cache: tried
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-
>>>> server/1.9/jersey-server-1.9.jar
>>>> >> >> >                 [NOT FOUND  ]
>>>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>>>> >> >> >         ==== local-m2-cache: tried
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/
>>>> jettison/1.1/jettison-1.1.jar
>>>> >> >> >                 [NOT FOUND  ]
>>>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>>>> >> >> >         ==== local-m2-cache: tried
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-
>>>> impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>>>> >> >> >
>>>> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws,
>>>> but I
>>>> >> >> > called
>>>> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what
>>>> else to
>>>> >> >> > try.
>>>> >> >> >
>>>> >> >> > Any quick pointers?
>>>> >> >> >
>>>> >> >> > Nick
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <
>>>> van...@cloudera.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> Starting with my own +1 (binding).
>>>> >> >> >>
>>>> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <
>>>> van...@cloudera.com>
>>>> >> >> >> wrote:
>>>> >> >> >> > Please vote on releasing the following candidate as Apache
>>>> Spark
>>>> >> >> >> > version
>>>> >> >> >> > 2.3.1.
>>>> >> >> >> >
>>>> >> >> >> > Given that I expect at least a few people to be busy with
>>>> Spark
>>>> >> >> >> > Summit
>>>> >> >> >> > next
>>>> >> >> >> > week, I'm taking the liberty of setting an extended voting
>>>> period.
>>>> >> >> >> > The
>>>> >> >> >> > vote
>>>> >> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's
>>>> 12:00
>>>> >> >> >> > PDT).
>>>> >> >> >> >
>>>> >> >> >> > It passes with a majority of +1 votes, which must include at
>>>> least
>>>> >> >> >> > 3
>>>> >> >> >> > +1
>>>> >> >> >> > votes
>>>> >> >> >> > from the PMC.
>>>> >> >> >> >
>>>> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
>>>> >> >> >> > [ ] -1 Do not release this package because ...
>>>> >> >> >> >
>>>> >> >> >> > To learn more about Apache Spark, please see
>>>> >> >> >> > http://spark.apache.org/
>>>> >> >> >> >
>>>> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
>>>> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>>>> >> >> >> >
>>>> >> >> >> > The release files, including signatures, digests, etc. can be
>>>> >> >> >> > found
>>>> >> >> >> > at:
>>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>>>> >> >> >> >
>>>> >> >> >> > Signatures used for Spark RCs can be found in this file:
>>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>> >> >> >> >
>>>> >> >> >> > The staging repository for this release can be found at:
>>>> >> >> >> >
>>>> >> >> >> >
>>>> >> >> >> > https://repository.apache.org/content/repositories/orgapache
>>>> spark-1272/
>>>> >> >> >> >
>>>> >> >> >> > The documentation corresponding to this release can be found
>>>> at:
>>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs
>>>> /
>>>> >> >> >> >
>>>> >> >> >> > The list of bug fixes going into 2.3.1 can be found at the
>>>> >> >> >> > following
>>>> >> >> >> > URL:
>>>> >> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342
>>>> 432
>>>> >> >> >> >
>>>> >> >> >> > FAQ
>>>> >> >> >> >
>>>> >> >> >> > =========================
>>>> >> >> >> > How can I help test this release?
>>>> >> >> >> > =========================
>>>> >> >> >> >
>>>> >> >> >> > If you are a Spark user, you can help us test this release by
>>>> >> >> >> > taking
>>>> >> >> >> > an existing Spark workload and running on this release
>>>> candidate,
>>>> >> >> >> > then
>>>> >> >> >> > reporting any regressions.
>>>> >> >> >> >
>>>> >> >> >> > If you're working in PySpark you can set up a virtual env and
>>>> >> >> >> > install
>>>> >> >> >> > the current RC and see if anything important breaks, in the
>>>> >> >> >> > Java/Scala
>>>> >> >> >> > you can add the staging repository to your projects
>>>> resolvers and
>>>> >> >> >> > test
>>>> >> >> >> > with the RC (make sure to clean up the artifact cache
>>>> before/after
>>>> >> >> >> > so
>>>> >> >> >> > you don't end up building with a out of date RC going
>>>> forward).
>>>> >> >> >> >
>>>> >> >> >> > ===========================================
>>>> >> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
>>>> >> >> >> > ===========================================
>>>> >> >> >> >
>>>> >> >> >> > The current list of open tickets targeted at 2.3.1 can be
>>>> found
>>>> >> >> >> > at:
>>>> >> >> >> > https://s.apache.org/Q3Uo
>>>> >> >> >> >
>>>> >> >> >> > Committers should look at those and triage. Extremely
>>>> important
>>>> >> >> >> > bug
>>>> >> >> >> > fixes, documentation, and API tweaks that impact
>>>> compatibility
>>>> >> >> >> > should
>>>> >> >> >> > be worked on immediately. Everything else please retarget to
>>>> an
>>>> >> >> >> > appropriate release.
>>>> >> >> >> >
>>>> >> >> >> > ==================
>>>> >> >> >> > But my bug isn't fixed?
>>>> >> >> >> > ==================
>>>> >> >> >> >
>>>> >> >> >> > In order to make timely releases, we will typically not hold
>>>> the
>>>> >> >> >> > release unless the bug in question is a regression from the
>>>> >> >> >> > previous
>>>> >> >> >> > release. That being said, if there is something which is a
>>>> >> >> >> > regression
>>>> >> >> >> > that has not been correctly targeted please ping me or a
>>>> committer
>>>> >> >> >> > to
>>>> >> >> >> > help target the issue.
>>>> >> >> >> >
>>>> >> >> >> >
>>>> >> >> >> > --
>>>> >> >> >> > Marcelo
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> --
>>>> >> >> >> Marcelo
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> ------------------------------------------------------------
>>>> ---------
>>>> >> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>> >> >> >>
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Marcelo
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>
>

Re: [VOTE] Spark 2.3.1 (RC4)

Reply via email to