+1 (non-binding) On 3 June 2018 at 09:23, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
> +1 > > Bests, > Dongjoon. > > On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <denny.g....@gmail.com> wrote: > >> +1 >> >> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> I'll give that a try, but I'll still have to figure out what to do if >>> none of the release builds work with hadoop-aws, since Flintrock deploys >>> Spark release builds to set up a cluster. Building Spark is slow, so we >>> only do it if the user specifically requests a Spark version by git hash. >>> (This is basically how spark-ec2 did things, too.) >>> >>> >>> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <van...@cloudera.com> >>> wrote: >>> >>>> If you're building your own Spark, definitely try the hadoop-cloud >>>> profile. Then you don't even need to pull anything at runtime, >>>> everything is already packaged with Spark. >>>> >>>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas >>>> <nicholas.cham...@gmail.com> wrote: >>>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work >>>> for me >>>> > either (even building with -Phadoop-2.7). I guess I’ve been relying >>>> on an >>>> > unsupported pattern and will need to figure something else out going >>>> forward >>>> > in order to use s3a://. >>>> > >>>> > >>>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <van...@cloudera.com> >>>> wrote: >>>> >> >>>> >> I have personally never tried to include hadoop-aws that way. But at >>>> >> the very least, I'd try to use the same version of Hadoop as the >>>> Spark >>>> >> build (2.7.3 IIRC). I don't really expect a different version to >>>> work, >>>> >> and if it did in the past it definitely was not by design. >>>> >> >>>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas >>>> >> <nicholas.cham...@gmail.com> wrote: >>>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember >>>> correctly, >>>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 >>>> release, >>>> >> > so >>>> >> > it appears something has changed since then. >>>> >> > >>>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that. >>>> >> > >>>> >> > My goal here is simply to confirm that this release of Spark works >>>> with >>>> >> > hadoop-aws like past releases did, particularly for Flintrock >>>> users who >>>> >> > use >>>> >> > Spark with S3A. >>>> >> > >>>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop >>>> builds >>>> >> > with >>>> >> > every Spark release. If the -hadoop2.7 release build won’t work >>>> with >>>> >> > hadoop-aws anymore, are there plans to provide a new build type >>>> that >>>> >> > will? >>>> >> > >>>> >> > Apologies if the question is poorly formed. I’m batting a bit >>>> outside my >>>> >> > league here. Again, my goal is simply to confirm that I/my users >>>> still >>>> >> > have >>>> >> > a way to use s3a://. In the past, that way was simply to call >>>> pyspark >>>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very >>>> similar. >>>> >> > If >>>> >> > that will no longer work, I’m trying to confirm that the change of >>>> >> > behavior >>>> >> > is intentional or acceptable (as a review for the Spark project) >>>> and >>>> >> > figure >>>> >> > out what I need to change (as due diligence for Flintrock’s users). >>>> >> > >>>> >> > Nick >>>> >> > >>>> >> > >>>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <van...@cloudera.com >>>> > >>>> >> > wrote: >>>> >> >> >>>> >> >> Using the hadoop-aws package is probably going to be a little more >>>> >> >> complicated than that. The best bet is to use a custom build of >>>> Spark >>>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably >>>> >> >> looking at some nasty dependency issues, especially if you end up >>>> >> >> mixing different versions of Hadoop. >>>> >> >> >>>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas >>>> >> >> <nicholas.cham...@gmail.com> wrote: >>>> >> >> > I was able to successfully launch a Spark cluster on EC2 at >>>> 2.3.1 RC4 >>>> >> >> > using >>>> >> >> > Flintrock. However, trying to load the hadoop-aws package gave >>>> me >>>> >> >> > some >>>> >> >> > errors. >>>> >> >> > >>>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 >>>> >> >> > >>>> >> >> > <snipped> >>>> >> >> > >>>> >> >> > :: problems summary :: >>>> >> >> > :::: WARNINGS >>>> >> >> > [NOT FOUND ] >>>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms) >>>> >> >> > ==== local-m2-cache: tried >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey- >>>> json/1.9/jersey-json-1.9.jar >>>> >> >> > [NOT FOUND ] >>>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) >>>> (0ms) >>>> >> >> > ==== local-m2-cache: tried >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey- >>>> server/1.9/jersey-server-1.9.jar >>>> >> >> > [NOT FOUND ] >>>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms) >>>> >> >> > ==== local-m2-cache: tried >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > file:/home/ec2-user/.m2/repository/org/codehaus/jettison/ >>>> jettison/1.1/jettison-1.1.jar >>>> >> >> > [NOT FOUND ] >>>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms) >>>> >> >> > ==== local-m2-cache: tried >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb- >>>> impl/2.2.3-1/jaxb-impl-2.2.3-1.jar >>>> >> >> > >>>> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws, >>>> but I >>>> >> >> > called >>>> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what >>>> else to >>>> >> >> > try. >>>> >> >> > >>>> >> >> > Any quick pointers? >>>> >> >> > >>>> >> >> > Nick >>>> >> >> > >>>> >> >> > >>>> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin < >>>> van...@cloudera.com> >>>> >> >> > wrote: >>>> >> >> >> >>>> >> >> >> Starting with my own +1 (binding). >>>> >> >> >> >>>> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin < >>>> van...@cloudera.com> >>>> >> >> >> wrote: >>>> >> >> >> > Please vote on releasing the following candidate as Apache >>>> Spark >>>> >> >> >> > version >>>> >> >> >> > 2.3.1. >>>> >> >> >> > >>>> >> >> >> > Given that I expect at least a few people to be busy with >>>> Spark >>>> >> >> >> > Summit >>>> >> >> >> > next >>>> >> >> >> > week, I'm taking the liberty of setting an extended voting >>>> period. >>>> >> >> >> > The >>>> >> >> >> > vote >>>> >> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's >>>> 12:00 >>>> >> >> >> > PDT). >>>> >> >> >> > >>>> >> >> >> > It passes with a majority of +1 votes, which must include at >>>> least >>>> >> >> >> > 3 >>>> >> >> >> > +1 >>>> >> >> >> > votes >>>> >> >> >> > from the PMC. >>>> >> >> >> > >>>> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1 >>>> >> >> >> > [ ] -1 Do not release this package because ... >>>> >> >> >> > >>>> >> >> >> > To learn more about Apache Spark, please see >>>> >> >> >> > http://spark.apache.org/ >>>> >> >> >> > >>>> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3): >>>> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4 >>>> >> >> >> > >>>> >> >> >> > The release files, including signatures, digests, etc. can be >>>> >> >> >> > found >>>> >> >> >> > at: >>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/ >>>> >> >> >> > >>>> >> >> >> > Signatures used for Spark RCs can be found in this file: >>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS >>>> >> >> >> > >>>> >> >> >> > The staging repository for this release can be found at: >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > https://repository.apache.org/content/repositories/orgapache >>>> spark-1272/ >>>> >> >> >> > >>>> >> >> >> > The documentation corresponding to this release can be found >>>> at: >>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs >>>> / >>>> >> >> >> > >>>> >> >> >> > The list of bug fixes going into 2.3.1 can be found at the >>>> >> >> >> > following >>>> >> >> >> > URL: >>>> >> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342 >>>> 432 >>>> >> >> >> > >>>> >> >> >> > FAQ >>>> >> >> >> > >>>> >> >> >> > ========================= >>>> >> >> >> > How can I help test this release? >>>> >> >> >> > ========================= >>>> >> >> >> > >>>> >> >> >> > If you are a Spark user, you can help us test this release by >>>> >> >> >> > taking >>>> >> >> >> > an existing Spark workload and running on this release >>>> candidate, >>>> >> >> >> > then >>>> >> >> >> > reporting any regressions. >>>> >> >> >> > >>>> >> >> >> > If you're working in PySpark you can set up a virtual env and >>>> >> >> >> > install >>>> >> >> >> > the current RC and see if anything important breaks, in the >>>> >> >> >> > Java/Scala >>>> >> >> >> > you can add the staging repository to your projects >>>> resolvers and >>>> >> >> >> > test >>>> >> >> >> > with the RC (make sure to clean up the artifact cache >>>> before/after >>>> >> >> >> > so >>>> >> >> >> > you don't end up building with a out of date RC going >>>> forward). >>>> >> >> >> > >>>> >> >> >> > =========================================== >>>> >> >> >> > What should happen to JIRA tickets still targeting 2.3.1? >>>> >> >> >> > =========================================== >>>> >> >> >> > >>>> >> >> >> > The current list of open tickets targeted at 2.3.1 can be >>>> found >>>> >> >> >> > at: >>>> >> >> >> > https://s.apache.org/Q3Uo >>>> >> >> >> > >>>> >> >> >> > Committers should look at those and triage. Extremely >>>> important >>>> >> >> >> > bug >>>> >> >> >> > fixes, documentation, and API tweaks that impact >>>> compatibility >>>> >> >> >> > should >>>> >> >> >> > be worked on immediately. Everything else please retarget to >>>> an >>>> >> >> >> > appropriate release. >>>> >> >> >> > >>>> >> >> >> > ================== >>>> >> >> >> > But my bug isn't fixed? >>>> >> >> >> > ================== >>>> >> >> >> > >>>> >> >> >> > In order to make timely releases, we will typically not hold >>>> the >>>> >> >> >> > release unless the bug in question is a regression from the >>>> >> >> >> > previous >>>> >> >> >> > release. That being said, if there is something which is a >>>> >> >> >> > regression >>>> >> >> >> > that has not been correctly targeted please ping me or a >>>> committer >>>> >> >> >> > to >>>> >> >> >> > help target the issue. >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > -- >>>> >> >> >> > Marcelo >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> -- >>>> >> >> >> Marcelo >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> ------------------------------------------------------------ >>>> --------- >>>> >> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >> >> >> >>>> >> >> > >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> -- >>>> >> >> Marcelo >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Marcelo >>>> >>>> >>>> >>>> -- >>>> Marcelo >>>> >>> >