Re: [VOTE] Spark 2.3.1 (RC4)

Wenchen Fan Sat, 02 Jun 2018 16:46:00 -0700

+1

On Sun, Jun 3, 2018 at 6:54 AM, Marcelo Vanzin <[email protected]> wrote:


> If you're building your own Spark, definitely try the hadoop-cloud
> profile. Then you don't even need to pull anything at runtime,
> everything is already packaged with Spark.
>
> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
> <[email protected]> wrote:
> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me
> > either (even building with -Phadoop-2.7). I guess I’ve been relying on an
> > unsupported pattern and will need to figure something else out going
> forward
> > in order to use s3a://.
> >
> >
> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <[email protected]>
> wrote:
> >>
> >> I have personally never tried to include hadoop-aws that way. But at
> >> the very least, I'd try to use the same version of Hadoop as the Spark
> >> build (2.7.3 IIRC). I don't really expect a different version to work,
> >> and if it did in the past it definitely was not by design.
> >>
> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
> >> <[email protected]> wrote:
> >> > Building with -Phadoop-2.7 didn’t help, and if I remember correctly,
> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0
> release,
> >> > so
> >> > it appears something has changed since then.
> >> >
> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that.
> >> >
> >> > My goal here is simply to confirm that this release of Spark works
> with
> >> > hadoop-aws like past releases did, particularly for Flintrock users
> who
> >> > use
> >> > Spark with S3A.
> >> >
> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
> builds
> >> > with
> >> > every Spark release. If the -hadoop2.7 release build won’t work with
> >> > hadoop-aws anymore, are there plans to provide a new build type that
> >> > will?
> >> >
> >> > Apologies if the question is poorly formed. I’m batting a bit outside
> my
> >> > league here. Again, my goal is simply to confirm that I/my users still
> >> > have
> >> > a way to use s3a://. In the past, that way was simply to call pyspark
> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very
> similar.
> >> > If
> >> > that will no longer work, I’m trying to confirm that the change of
> >> > behavior
> >> > is intentional or acceptable (as a review for the Spark project) and
> >> > figure
> >> > out what I need to change (as due diligence for Flintrock’s users).
> >> >
> >> > Nick
> >> >
> >> >
> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <[email protected]>
> >> > wrote:
> >> >>
> >> >> Using the hadoop-aws package is probably going to be a little more
> >> >> complicated than that. The best bet is to use a custom build of Spark
> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably
> >> >> looking at some nasty dependency issues, especially if you end up
> >> >> mixing different versions of Hadoop.
> >> >>
> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
> >> >> <[email protected]> wrote:
> >> >> > I was able to successfully launch a Spark cluster on EC2 at 2.3.1
> RC4
> >> >> > using
> >> >> > Flintrock. However, trying to load the hadoop-aws package gave me
> >> >> > some
> >> >> > errors.
> >> >> >
> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
> >> >> >
> >> >> > <snipped>
> >> >> >
> >> >> > :: problems summary ::
> >> >> > :::: WARNINGS
> >> >> >                 [NOT FOUND  ]
> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms)
> >> >> >         ==== local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/
> jersey-json/1.9/jersey-json-1.9.jar
> >> >> >                 [NOT FOUND  ]
> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
> >> >> >         ==== local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/
> jersey-server/1.9/jersey-server-1.9.jar
> >> >> >                 [NOT FOUND  ]
> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
> >> >> >         ==== local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/org/codehaus/
> jettison/jettison/1.1/jettison-1.1.jar
> >> >> >                 [NOT FOUND  ]
> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
> >> >> >         ==== local-m2-cache: tried
> >> >> >
> >> >> >
> >> >> >
> >> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/
> jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
> >> >> >
> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws, but I
> >> >> > called
> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what else to
> >> >> > try.
> >> >> >
> >> >> > Any quick pointers?
> >> >> >
> >> >> > Nick
> >> >> >
> >> >> >
> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin <[email protected]
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Starting with my own +1 (binding).
> >> >> >>
> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin <
> [email protected]>
> >> >> >> wrote:
> >> >> >> > Please vote on releasing the following candidate as Apache Spark
> >> >> >> > version
> >> >> >> > 2.3.1.
> >> >> >> >
> >> >> >> > Given that I expect at least a few people to be busy with Spark
> >> >> >> > Summit
> >> >> >> > next
> >> >> >> > week, I'm taking the liberty of setting an extended voting
> period.
> >> >> >> > The
> >> >> >> > vote
> >> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's 12:00
> >> >> >> > PDT).
> >> >> >> >
> >> >> >> > It passes with a majority of +1 votes, which must include at
> least
> >> >> >> > 3
> >> >> >> > +1
> >> >> >> > votes
> >> >> >> > from the PMC.
> >> >> >> >
> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1
> >> >> >> > [ ] -1 Do not release this package because ...
> >> >> >> >
> >> >> >> > To learn more about Apache Spark, please see
> >> >> >> > http://spark.apache.org/
> >> >> >> >
> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3):
> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
> >> >> >> >
> >> >> >> > The release files, including signatures, digests, etc. can be
> >> >> >> > found
> >> >> >> > at:
> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
> >> >> >> >
> >> >> >> > Signatures used for Spark RCs can be found in this file:
> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >> >> >
> >> >> >> > The staging repository for this release can be found at:
> >> >> >> >
> >> >> >> >
> >> >> >> > https://repository.apache.org/content/repositories/
> orgapachespark-1272/
> >> >> >> >
> >> >> >> > The documentation corresponding to this release can be found at:
> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
> >> >> >> >
> >> >> >> > The list of bug fixes going into 2.3.1 can be found at the
> >> >> >> > following
> >> >> >> > URL:
> >> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342432
> >> >> >> >
> >> >> >> > FAQ
> >> >> >> >
> >> >> >> > =========================
> >> >> >> > How can I help test this release?
> >> >> >> > =========================
> >> >> >> >
> >> >> >> > If you are a Spark user, you can help us test this release by
> >> >> >> > taking
> >> >> >> > an existing Spark workload and running on this release
> candidate,
> >> >> >> > then
> >> >> >> > reporting any regressions.
> >> >> >> >
> >> >> >> > If you're working in PySpark you can set up a virtual env and
> >> >> >> > install
> >> >> >> > the current RC and see if anything important breaks, in the
> >> >> >> > Java/Scala
> >> >> >> > you can add the staging repository to your projects resolvers
> and
> >> >> >> > test
> >> >> >> > with the RC (make sure to clean up the artifact cache
> before/after
> >> >> >> > so
> >> >> >> > you don't end up building with a out of date RC going forward).
> >> >> >> >
> >> >> >> > ===========================================
> >> >> >> > What should happen to JIRA tickets still targeting 2.3.1?
> >> >> >> > ===========================================
> >> >> >> >
> >> >> >> > The current list of open tickets targeted at 2.3.1 can be found
> >> >> >> > at:
> >> >> >> > https://s.apache.org/Q3Uo
> >> >> >> >
> >> >> >> > Committers should look at those and triage. Extremely important
> >> >> >> > bug
> >> >> >> > fixes, documentation, and API tweaks that impact compatibility
> >> >> >> > should
> >> >> >> > be worked on immediately. Everything else please retarget to an
> >> >> >> > appropriate release.
> >> >> >> >
> >> >> >> > ==================
> >> >> >> > But my bug isn't fixed?
> >> >> >> > ==================
> >> >> >> >
> >> >> >> > In order to make timely releases, we will typically not hold the
> >> >> >> > release unless the bug in question is a regression from the
> >> >> >> > previous
> >> >> >> > release. That being said, if there is something which is a
> >> >> >> > regression
> >> >> >> > that has not been correctly targeted please ping me or a
> committer
> >> >> >> > to
> >> >> >> > help target the issue.
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Marcelo
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Marcelo
> >> >> >>
> >> >> >>
> >> >> >> ------------------------------------------------------------
> ---------
> >> >> >> To unsubscribe e-mail: [email protected]
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: [VOTE] Spark 2.3.1 (RC4)

Reply via email to