+1 On Sun, Jun 3, 2018 at 6:12 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote:
> +1 > > 2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida <ricardo.alme...@actnowib.com>님이 > 작성: > >> +1 (non-binding) >> >> On 3 June 2018 at 09:23, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >> >>> +1 >>> >>> Bests, >>> Dongjoon. >>> >>> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <denny.g....@gmail.com> wrote: >>> >>>> +1 >>>> >>>> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < >>>> nicholas.cham...@gmail.com> wrote: >>>> >>>>> I'll give that a try, but I'll still have to figure out what to do if >>>>> none of the release builds work with hadoop-aws, since Flintrock deploys >>>>> Spark release builds to set up a cluster. Building Spark is slow, so we >>>>> only do it if the user specifically requests a Spark version by git hash. >>>>> (This is basically how spark-ec2 did things, too.) >>>>> >>>>> >>>>> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <van...@cloudera.com> >>>>> wrote: >>>>> >>>>>> If you're building your own Spark, definitely try the hadoop-cloud >>>>>> profile. Then you don't even need to pull anything at runtime, >>>>>> everything is already packaged with Spark. >>>>>> >>>>>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas >>>>>> <nicholas.cham...@gmail.com> wrote: >>>>>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work >>>>>> for me >>>>>> > either (even building with -Phadoop-2.7). I guess I’ve been relying >>>>>> on an >>>>>> > unsupported pattern and will need to figure something else out >>>>>> going forward >>>>>> > in order to use s3a://. >>>>>> > >>>>>> > >>>>>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <van...@cloudera.com> >>>>>> wrote: >>>>>> >> >>>>>> >> I have personally never tried to include hadoop-aws that way. But >>>>>> at >>>>>> >> the very least, I'd try to use the same version of Hadoop as the >>>>>> Spark >>>>>> >> build (2.7.3 IIRC). I don't really expect a different version to >>>>>> work, >>>>>> >> and if it did in the past it definitely was not by design. >>>>>> >> >>>>>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas >>>>>> >> <nicholas.cham...@gmail.com> wrote: >>>>>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember >>>>>> correctly, >>>>>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 >>>>>> release, >>>>>> >> > so >>>>>> >> > it appears something has changed since then. >>>>>> >> > >>>>>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try that. >>>>>> >> > >>>>>> >> > My goal here is simply to confirm that this release of Spark >>>>>> works with >>>>>> >> > hadoop-aws like past releases did, particularly for Flintrock >>>>>> users who >>>>>> >> > use >>>>>> >> > Spark with S3A. >>>>>> >> > >>>>>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop >>>>>> builds >>>>>> >> > with >>>>>> >> > every Spark release. If the -hadoop2.7 release build won’t work >>>>>> with >>>>>> >> > hadoop-aws anymore, are there plans to provide a new build type >>>>>> that >>>>>> >> > will? >>>>>> >> > >>>>>> >> > Apologies if the question is poorly formed. I’m batting a bit >>>>>> outside my >>>>>> >> > league here. Again, my goal is simply to confirm that I/my users >>>>>> still >>>>>> >> > have >>>>>> >> > a way to use s3a://. In the past, that way was simply to call >>>>>> pyspark >>>>>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something very >>>>>> similar. >>>>>> >> > If >>>>>> >> > that will no longer work, I’m trying to confirm that the change >>>>>> of >>>>>> >> > behavior >>>>>> >> > is intentional or acceptable (as a review for the Spark project) >>>>>> and >>>>>> >> > figure >>>>>> >> > out what I need to change (as due diligence for Flintrock’s >>>>>> users). >>>>>> >> > >>>>>> >> > Nick >>>>>> >> > >>>>>> >> > >>>>>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin < >>>>>> van...@cloudera.com> >>>>>> >> > wrote: >>>>>> >> >> >>>>>> >> >> Using the hadoop-aws package is probably going to be a little >>>>>> more >>>>>> >> >> complicated than that. The best bet is to use a custom build of >>>>>> Spark >>>>>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're probably >>>>>> >> >> looking at some nasty dependency issues, especially if you end >>>>>> up >>>>>> >> >> mixing different versions of Hadoop. >>>>>> >> >> >>>>>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas >>>>>> >> >> <nicholas.cham...@gmail.com> wrote: >>>>>> >> >> > I was able to successfully launch a Spark cluster on EC2 at >>>>>> 2.3.1 RC4 >>>>>> >> >> > using >>>>>> >> >> > Flintrock. However, trying to load the hadoop-aws package >>>>>> gave me >>>>>> >> >> > some >>>>>> >> >> > errors. >>>>>> >> >> > >>>>>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 >>>>>> >> >> > >>>>>> >> >> > <snipped> >>>>>> >> >> > >>>>>> >> >> > :: problems summary :: >>>>>> >> >> > :::: WARNINGS >>>>>> >> >> > [NOT FOUND ] >>>>>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (2ms) >>>>>> >> >> > ==== local-m2-cache: tried >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/ >>>>>> jersey-json/1.9/jersey-json-1.9.jar >>>>>> >> >> > [NOT FOUND ] >>>>>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) >>>>>> (0ms) >>>>>> >> >> > ==== local-m2-cache: tried >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/jersey/ >>>>>> jersey-server/1.9/jersey-server-1.9.jar >>>>>> >> >> > [NOT FOUND ] >>>>>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms) >>>>>> >> >> > ==== local-m2-cache: tried >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > file:/home/ec2-user/.m2/repository/org/codehaus/ >>>>>> jettison/jettison/1.1/jettison-1.1.jar >>>>>> >> >> > [NOT FOUND ] >>>>>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms) >>>>>> >> >> > ==== local-m2-cache: tried >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > file:/home/ec2-user/.m2/repository/com/sun/xml/bind/ >>>>>> jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar >>>>>> >> >> > >>>>>> >> >> > I’d guess I’m probably using the wrong version of hadoop-aws, >>>>>> but I >>>>>> >> >> > called >>>>>> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m not sure what >>>>>> else to >>>>>> >> >> > try. >>>>>> >> >> > >>>>>> >> >> > Any quick pointers? >>>>>> >> >> > >>>>>> >> >> > Nick >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin < >>>>>> van...@cloudera.com> >>>>>> >> >> > wrote: >>>>>> >> >> >> >>>>>> >> >> >> Starting with my own +1 (binding). >>>>>> >> >> >> >>>>>> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin < >>>>>> van...@cloudera.com> >>>>>> >> >> >> wrote: >>>>>> >> >> >> > Please vote on releasing the following candidate as Apache >>>>>> Spark >>>>>> >> >> >> > version >>>>>> >> >> >> > 2.3.1. >>>>>> >> >> >> > >>>>>> >> >> >> > Given that I expect at least a few people to be busy with >>>>>> Spark >>>>>> >> >> >> > Summit >>>>>> >> >> >> > next >>>>>> >> >> >> > week, I'm taking the liberty of setting an extended voting >>>>>> period. >>>>>> >> >> >> > The >>>>>> >> >> >> > vote >>>>>> >> >> >> > will be open until Friday, June 8th, at 19:00 UTC (that's >>>>>> 12:00 >>>>>> >> >> >> > PDT). >>>>>> >> >> >> > >>>>>> >> >> >> > It passes with a majority of +1 votes, which must include >>>>>> at least >>>>>> >> >> >> > 3 >>>>>> >> >> >> > +1 >>>>>> >> >> >> > votes >>>>>> >> >> >> > from the PMC. >>>>>> >> >> >> > >>>>>> >> >> >> > [ ] +1 Release this package as Apache Spark 2.3.1 >>>>>> >> >> >> > [ ] -1 Do not release this package because ... >>>>>> >> >> >> > >>>>>> >> >> >> > To learn more about Apache Spark, please see >>>>>> >> >> >> > http://spark.apache.org/ >>>>>> >> >> >> > >>>>>> >> >> >> > The tag to be voted on is v2.3.1-rc4 (commit 30aaa5a3): >>>>>> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4 >>>>>> >> >> >> > >>>>>> >> >> >> > The release files, including signatures, digests, etc. can >>>>>> be >>>>>> >> >> >> > found >>>>>> >> >> >> > at: >>>>>> >> >> >> > https://dist.apache.org/repos/ >>>>>> dist/dev/spark/v2.3.1-rc4-bin/ >>>>>> >> >> >> > >>>>>> >> >> >> > Signatures used for Spark RCs can be found in this file: >>>>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>> >> >> >> > >>>>>> >> >> >> > The staging repository for this release can be found at: >>>>>> >> >> >> > >>>>>> >> >> >> > >>>>>> >> >> >> > https://repository.apache.org/content/repositories/ >>>>>> orgapachespark-1272/ >>>>>> >> >> >> > >>>>>> >> >> >> > The documentation corresponding to this release can be >>>>>> found at: >>>>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4- >>>>>> docs/ >>>>>> >> >> >> > >>>>>> >> >> >> > The list of bug fixes going into 2.3.1 can be found at the >>>>>> >> >> >> > following >>>>>> >> >> >> > URL: >>>>>> >> >> >> > https://issues.apache.org/jira/projects/SPARK/versions/ >>>>>> 12342432 >>>>>> >> >> >> > >>>>>> >> >> >> > FAQ >>>>>> >> >> >> > >>>>>> >> >> >> > ========================= >>>>>> >> >> >> > How can I help test this release? >>>>>> >> >> >> > ========================= >>>>>> >> >> >> > >>>>>> >> >> >> > If you are a Spark user, you can help us test this release >>>>>> by >>>>>> >> >> >> > taking >>>>>> >> >> >> > an existing Spark workload and running on this release >>>>>> candidate, >>>>>> >> >> >> > then >>>>>> >> >> >> > reporting any regressions. >>>>>> >> >> >> > >>>>>> >> >> >> > If you're working in PySpark you can set up a virtual env >>>>>> and >>>>>> >> >> >> > install >>>>>> >> >> >> > the current RC and see if anything important breaks, in the >>>>>> >> >> >> > Java/Scala >>>>>> >> >> >> > you can add the staging repository to your projects >>>>>> resolvers and >>>>>> >> >> >> > test >>>>>> >> >> >> > with the RC (make sure to clean up the artifact cache >>>>>> before/after >>>>>> >> >> >> > so >>>>>> >> >> >> > you don't end up building with a out of date RC going >>>>>> forward). >>>>>> >> >> >> > >>>>>> >> >> >> > =========================================== >>>>>> >> >> >> > What should happen to JIRA tickets still targeting 2.3.1? >>>>>> >> >> >> > =========================================== >>>>>> >> >> >> > >>>>>> >> >> >> > The current list of open tickets targeted at 2.3.1 can be >>>>>> found >>>>>> >> >> >> > at: >>>>>> >> >> >> > https://s.apache.org/Q3Uo >>>>>> >> >> >> > >>>>>> >> >> >> > Committers should look at those and triage. Extremely >>>>>> important >>>>>> >> >> >> > bug >>>>>> >> >> >> > fixes, documentation, and API tweaks that impact >>>>>> compatibility >>>>>> >> >> >> > should >>>>>> >> >> >> > be worked on immediately. Everything else please retarget >>>>>> to an >>>>>> >> >> >> > appropriate release. >>>>>> >> >> >> > >>>>>> >> >> >> > ================== >>>>>> >> >> >> > But my bug isn't fixed? >>>>>> >> >> >> > ================== >>>>>> >> >> >> > >>>>>> >> >> >> > In order to make timely releases, we will typically not >>>>>> hold the >>>>>> >> >> >> > release unless the bug in question is a regression from the >>>>>> >> >> >> > previous >>>>>> >> >> >> > release. That being said, if there is something which is a >>>>>> >> >> >> > regression >>>>>> >> >> >> > that has not been correctly targeted please ping me or a >>>>>> committer >>>>>> >> >> >> > to >>>>>> >> >> >> > help target the issue. >>>>>> >> >> >> > >>>>>> >> >> >> > >>>>>> >> >> >> > -- >>>>>> >> >> >> > Marcelo >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> -- >>>>>> >> >> >> Marcelo >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> ------------------------------------------------------------ >>>>>> --------- >>>>>> >> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >> >> >> >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> -- >>>>>> >> >> Marcelo >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Marcelo >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Marcelo >>>>>> >>>>> >>> >> -- John