Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Krishna Sankar
+1 (non-binding, of course) (Hope I made it in time. ~T-20 !) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 25:52 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib (iPython 4.0, FYI, notebook install is separate “conda install ipython” and then “conda install ju

Re: State of the Build

2015-11-06 Thread Michael Armbrust
Its not included, it is downloaded on demand. That said I think the fact that we can download the jar is a huge feature of SBT, no installation needed, build the project as long as you have a JVM. On Fri, Nov 6, 2015 at 4:49 PM, Jakob Odersky wrote: > > Can you clarify which sbt jar (by path) ?

Re: State of the Build

2015-11-06 Thread Jakob Odersky
> Can you clarify which sbt jar (by path) ? Any of them. Sbt is a build tool, and I don't understand why it is included in a source repository. It would be like including make in a project. On 6 November 2015 at 16:43, Ted Yu wrote: > bq. include an sbt jar in the source repo > > Can you clarify

Re: State of the Build

2015-11-06 Thread Ted Yu
bq. include an sbt jar in the source repo Can you clarify which sbt jar (by path) ? I tried 'git log' on the following files but didn't see commit history: ./build/sbt-launch-0.13.7.jar ./build/zinc-0.3.5.3/lib/sbt-interface.jar ./sbt/sbt-launch-0.13.2.jar ./sbt/sbt-launch-0.13.5.jar On Fri, No

Re: State of the Build

2015-11-06 Thread Jakob Odersky
[Reposting to the list again, I really should double-check that reply-to-all button] in the mean-time, as a light Friday-afternoon patch I was thinking about splitting the ~600loc-single-build sbt file into something more manageable like the Akka build (without changing any dependencies or setting

Re: State of the Build

2015-11-06 Thread Koert Kuipers
oh ok i think i got it... i hope! since the app runs with the spark assembly jar on its classpath, the exact version as resolved by spark's build process is actually on the apps classpath. sorry didnt mean the pollute this thread with my own dependency resolution confusion. On Fri, Nov 6, 2015 a

Re: State of the Build

2015-11-06 Thread Patrick Wendell
I think there are a few minor differences in the dependency graph that arise from this. For a given user, the probability it affects them is low - it needs to conflict with a library a user application is using. However the probability it affects *some users* is very high and we do see small change

Re: State of the Build

2015-11-06 Thread Marcelo Vanzin
On Fri, Nov 6, 2015 at 3:04 PM, Koert Kuipers wrote: > if i understand it correctly it would cause compatibility breaks for > applications on top of spark, because those applications use the exact same > current resolution logic (so basically they are maven apps), and the change > would make them

Re: State of the Build

2015-11-06 Thread Koert Kuipers
thats interesting... if i understand it correctly it would cause compatibility breaks for applications on top of spark, because those applications use the exact same current resolution logic (so basically they are maven apps), and the change would make them inconsistent? by that logic all existin

Re: State of the Build

2015-11-06 Thread Patrick Wendell
I think we'd have to standardize on Maven-style resolution, or I'd at least like to see that path explored first. The issue is if we switch the standard now, it could cause compatibility breaks for applications on top of Spark. On Fri, Nov 6, 2015 at 2:28 PM, Jakob Odersky wrote: > Reposting to

Re: State of the Build

2015-11-06 Thread Jakob Odersky
Reposting to the list... Thanks for all the feedback everyone, I get a clearer picture of the reasoning and implications now. Koert, according to your post in this thread http://apache-spark-developers-list.1001551.n3.nabble.com/Master-build-fails-tt14895.html#a15023, it is apparently very easy t

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
ok, i think i've kicked jenkins enough that it's now working again w/o spamming tracebacks. sorry for the interruption... i should have realized that touching the house of cards (aka ghprb plugin) would cause it to fall down no matter what i did. :) shane ps - did i mention the hardware for ou

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
gonna have to kick jenkins again, folks. sorry! On Fri, Nov 6, 2015 at 1:11 PM, shane knapp wrote: > i (stupidly) updated the ghprb plugin as the version we're using is > really, really old. this re-wrote the config and broke stuff. > > so, i just downgraded the plugin back to the last known wo

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
i (stupidly) updated the ghprb plugin as the version we're using is really, really old. this re-wrote the config and broke stuff. so, i just downgraded the plugin back to the last known working version, and noticed that some of the fields in the xml are missing. thankfully i have a backup config

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread Josh Rosen
Are you sure that the credentials are missing? Also: did you enable GitHub commit status updating by accident / configuration loss? That might explain the errors here, since our keys don't have permissions to use that API. On Fri, Nov 6, 2015 at 12:54 PM, shane knapp wrote: > alright, i'm downgr

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
alright, i'm downgrading our ghprb plugin back to the last known working version. this will require a jenkins restart, which i will do immediately. sorry about this! :( On Fri, Nov 6, 2015 at 12:35 PM, shane knapp wrote: > a pox on the github pull request builder... the update wiped out the >

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
a pox on the github pull request builder... the update wiped out the github auth creds. :\ On Fri, Nov 6, 2015 at 12:30 PM, shane knapp wrote: > looking in to this now. > > On Fri, Nov 6, 2015 at 12:28 PM, Michael Armbrust > wrote: >> I'm noticing several problems with Jenkins since the upgrad

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
looking in to this now. On Fri, Nov 6, 2015 at 12:28 PM, Michael Armbrust wrote: > I'm noticing several problems with Jenkins since the upgrade. > > PR comments say: "Build started sha1 is merged." instead of actually > printing the hash > > Also: > https://amplab.cs.berkeley.edu/jenkins/job/Spar

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread Michael Armbrust
I'm noticing several problems with Jenkins since the upgrade. PR comments say: "Build started sha1 is merged." instead of actually printing the hash Also: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45246/console GitHub pull request #9527 of commit 0e0959efada849a56430d303

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Tom Graves
Its either --num-workers or --num-executors when using the spark-class interface directly.  If you use spark-submit with --num-executors it ends up setting spark.executor.instances which works around the issue. Tom On Friday, November 6, 2015 2:14 PM, Marcelo Vanzin wrote: The way

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Marcelo Vanzin
The way I read Tom's report, it just affects a long-deprecated command line option (--num-workers). I wouldn't block the release for it. On Fri, Nov 6, 2015 at 12:10 PM, Sean Owen wrote: > Hm, if I read that right, looks like --num-executors doesn't work at > all on YARN unless dynamic allocation

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Sean Owen
Hm, if I read that right, looks like --num-executors doesn't work at all on YARN unless dynamic allocation is on? the fix is easy, but sounds like it could be a Blocker. On Fri, Nov 6, 2015 at 2:51 PM, Tom Graves wrote: > While running our regression tests I found > https://issues.apache.org/jir

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Michael Armbrust
+1 On Fri, Nov 6, 2015 at 9:27 AM, Chester Chen wrote: > +1 > Test against CDH5.4.2 with hadoop 2.6.0 version using yesterday's code, > build locally. > > Regression running in Yarn Cluster mode against few internal ML ( logistic > regression, linear regression, random forest and statistic summa

Re: Master build fails ?

2015-11-06 Thread Steve Loughran
> On 6 Nov 2015, at 17:35, Marcelo Vanzin wrote: > > On Fri, Nov 6, 2015 at 2:21 AM, Steve Loughran wrote: >> Maven's closest-first policy has a different flaw, namely that its not >> always obvious why a guava 14.0 that is two hops of transitiveness should >> take priority over a 16.0 versio

Re: Master build fails ?

2015-11-06 Thread Marcelo Vanzin
On Fri, Nov 6, 2015 at 2:21 AM, Steve Loughran wrote: > Maven's closest-first policy has a different flaw, namely that its not always > obvious why a guava 14.0 that is two hops of transitiveness should take > priority over a 16.0 version three hops away. Especially when that 0.14 > version sho

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Chester Chen
+1 Test against CDH5.4.2 with hadoop 2.6.0 version using yesterday's code, build locally. Regression running in Yarn Cluster mode against few internal ML ( logistic regression, linear regression, random forest and statistic summary) as well Mlib KMeans. all seems to work fine. Chester On Tue, N

Re: Looking for the method executors uses to write to HDFS

2015-11-06 Thread Reynold Xin
Are you looking for this? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala#L69 On Wed, Nov 4, 2015 at 5:11 AM, Tóth Zoltán wrote: > Hi, > > I'd like to write a parquet file from the driver. I could use

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
and we're back! On Fri, Nov 6, 2015 at 7:39 AM, shane knapp wrote: > this is happening now. > > On Thu, Nov 5, 2015 at 11:08 AM, shane knapp wrote: >> well, i forgot to put this on my calendar and didn't get around to >> getting it done this morning. :) >> >> anyways, i'll be shooting for tomor

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-06 Thread shane knapp
this is happening now. On Thu, Nov 5, 2015 at 11:08 AM, shane knapp wrote: > well, i forgot to put this on my calendar and didn't get around to > getting it done this morning. :) > > anyways, i'll be shooting for tomorrow (friday) morning instead. > > shane > > On Mon, Nov 2, 2015 at 9:55 AM, sh

Re: Master build fails ?

2015-11-06 Thread Ted Yu
Since maven is the preferred build vehicle, ivy style dependencies policy would produce surprising results compared to today's behavior. I would suggest staying with current dependencies policy. My two cents. On Fri, Nov 6, 2015 at 6:25 AM, Koert Kuipers wrote: > if there is no strong preferen

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Tom Graves
 While running our regression tests I found  https://issues.apache.org/jira/browse/SPARK-11555.  It is a break in backwards compatibility but its using the old spark-class and --num-workers interface which I hope no one is still using.   I'm a +0 as it doesn't seem super critical but I hate to br

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-06 Thread Luc Bourlier
+1 (non binding) Tested the integration with Mesos in the different configurations. Luc Le jeu. 5 nov. 2015 à 21:02, Nicholas Chammas a écrit : > -0 > > The spark-ec2 version is still set to 1.5.1 > . > > Nick > > On Wed, No

Re: Master build fails ?

2015-11-06 Thread Koert Kuipers
if there is no strong preference for one dependencies policy over another, but consistency between the 2 systems is desired, then i believe maven can be made to behave like ivy pretty easily with a setting in the pom On Fri, Nov 6, 2015 at 5:21 AM, Steve Loughran wrote: > > > On 5 Nov 2015, at 2

Re: Ready to talk about Spark 2.0?

2015-11-06 Thread Jean-Baptiste Onofré
Hi Sean, Happy to see this discussion. I'm working on PoC to run Camel on Spark Streaming. The purpose is to have an ingestion and integration platform directly running on Spark Streaming. Basically, we would be able to use a Camel Spark DSL like: from("jms:queue:foo").choice().when(predica

Ready to talk about Spark 2.0?

2015-11-06 Thread Sean Owen
Since branch-1.6 is cut, I was going to make version 1.7.0 in JIRA. However I've had a few side conversations recently about Spark 2.0, and I know I and others have a number of ideas about it already. I'll go ahead and make 1.7.0, but thought I'd ask, how much other interest is there in starting t

GraphX EdgePartition format

2015-11-06 Thread Daniel Margo
I was looking through the GraphX source and noticed that the topology of an EdgePartition is a triplet of source, destination, and data columns -- essentially a COO sparse matrix -- sorted by source, and equipped with an index from each (global) vertex ID to the start of its (local) source cluster.

Re: pyspark with pypy not work for spark-1.5.1

2015-11-06 Thread Chang Ya-Hsuan
Hi I run ./python/ru-tests to test following modules of spark-1.5.1: [pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming] against to following pypy versions: pypy-2.2.1 pypy-2.3 pypy-2.3.1 pypy-2.4.0 pypy-2.5.0 pypy-2.5.1 pypy-2.6.0 pypy-2.6.1 pypy-4.0.0 exc

Re: Master build fails ?

2015-11-06 Thread Steve Loughran
> On 5 Nov 2015, at 20:07, Marcelo Vanzin wrote: > > Man that command is slow. Anyway, it seems guava 16 is being brought > transitively by curator 2.6.0 which should have been overridden by the > explicit dependency on curator 2.4.0, but apparently, as Steve > mentioned, sbt/ivy decided to brea