Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-26 Thread Josh Suereth
t; >>> sbt with the existing Hadoop ecosystem. > >>> > >>> Particularly the difficulties in using Sbt + Maven together (something > which > >>> tends block more than just spark from adopting sbt). > >>> > >>> I'm more than happy to listen and see what we can do on the sbt side > to make > >>> this as seamless as possible for all parties. > >>> > >>> Thanks! > >>> > >>> > >>> > >>> -- > >>> View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-tp2315p5682.html > >>> Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > >> > >> > >> > >> -- > >> -- > >> Evan Chan > >> Staff Engineer > >> e...@ooyala.com | > >

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-14 Thread Matei Zaharia
;> sbt with the existing Hadoop ecosystem. >>> >>> Particularly the difficulties in using Sbt + Maven together (something which >>> tends block more than just spark from adopting sbt). >>> >>> I'm more than happy to listen and see what

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-14 Thread Patrick Wendell
re than happy to listen and see what we can do on the sbt side to make >> this as seamless as possible for all parties. >> >> Thanks! >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-tp2315p5682.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. > > > > -- > -- > Evan Chan > Staff Engineer > e...@ooyala.com |

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-14 Thread Evan Chan
> > Thanks! > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-of-Maven-and-SBT-Build-in-Spark-tp2315p5682.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -- -- Evan Chan Staff Engineer e...@ooyala.com |

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-14 Thread jsuereth
block more than just spark from adopting sbt). I'm more than happy to listen and see what we can do on the sbt side to make this as seamless as possible for all parties. Thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Necessity-o

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-12 Thread Konstantin Boudnik
I think Kevin's point is somewhat different: there's no question that Sbt can be integrated into Maven ecosystem - mostly the repositories and artifact management, of course. However, Sbt is a niche build tool and is unlikely to be widely supported by engineering teams nor IT organizations. Sbt isn

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-11 Thread Koert Kuipers
Asm is such a mess. And their suggested solution being everyone should shade it sounds pretty awful to me (not uncommon to have shaded asm 15 times in a single project). But I guess it you are right that shading is only way to deal with it at this point... On Mar 11, 2014 5:35 PM, "Kevin Markey" w

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-11 Thread Koert Kuipers
we have a maven corporate repository inhouse and of course we also use maven central. sbt can handle retrieving from and publishing to maven repositories just fine. we have maven, ant/ivy and sbt projects depending on each others artifacts. not sure i see the issue there. On Tue, Mar 11, 2014 at

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-11 Thread Kevin Markey
Pardon my late entry into the fray, here, but we've just struggled though some library conflicts that could have been avoided and whose story shed some light on this question. We have been integrating Spark with a number of other components. We discovered several conflicts, most easily elimina

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-06 Thread Konstantin Boudnik
On Wed, Feb 26, 2014 at 09:22AM, Sean Owen wrote: > Side point -- "provides" scope is not the same as an exclude. > "provides" means, this artifact is used directly by this code (compile > time), but it is not necessary to package it, since it will be > available from a runtime container. Exclusion

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-06 Thread Konstantin Boudnik
With all due respect Patrick - this approach is seeking for troubles. Proacively ;) Cos On Tue, Feb 25, 2014 at 04:09PM, Patrick Wendell wrote: > What I mean is this. AFIAK the shader plug-in is primarily designed > for creating uber jars which contain spark and all dependencies. But > since Spar

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-06 Thread Konstantin Boudnik
On Tue, Feb 25, 2014 at 03:20PM, Evan Chan wrote: > The correct way to exclude dependencies in SBT is actually to declare > a dependency as "provided". I'm not familiar with Maven or its Yes, I believe this would be equivalent to the maven exclusion of an artifact's transitive deps. Cos > depe

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-01 Thread Matei Zaharia
We would like to cross-build Spark for Scala 2.11 and 2.10 eventually (they’re a lot closer than 2.10 and 2.9). In Maven this might mean creating two POMs or a special variable for the version or something. Matei On Mar 1, 2014, at 12:15 PM, Koert Kuipers wrote: > does maven support cross bui

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-01 Thread Koert Kuipers
does maven support cross building for different scala versions? we do this inhouse all the time with sbt. i know spark does not cross build at this point, but is it guaranteed to stay that way? On Sat, Mar 1, 2014 at 12:02 PM, Koert Kuipers wrote: > i am still unsure what is wrong with sbt ass

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-03-01 Thread Koert Kuipers
i am still unsure what is wrong with sbt assembly. i would like a real-world example of where it does not work, that i can run. this is what i know: 1) sbt assembly works fine for version conflicts for an artifact. no exclusion rules are needed. 2) if artifacts have the same classes inside yet a

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-28 Thread Mridul Muralidharan
On Sat, Mar 1, 2014 at 2:05 AM, Patrick Wendell wrote: > Hey, > > Thanks everyone for chiming in on this. I wanted to summarize these > issues a bit particularly wrt the constituents involved - does this > seem accurate? > > = Spark Users = > In general those linking against Spark should be totall

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-28 Thread Mark Hamstra
Couple of comments: 1) Whether the Spark POM is produced by SBT or Maven shouldn't matter for those who just need to link against published artifacts, but right now SBT and Maven do not produce equivalent POMs for Spark -- I think 2) Incremental builds using Maven are trivially more difficult

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-28 Thread Patrick Wendell
Hey, Thanks everyone for chiming in on this. I wanted to summarize these issues a bit particularly wrt the constituents involved - does this seem accurate? = Spark Users = In general those linking against Spark should be totally unaffected by the build choice. Spark will continue to publish well-

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Mridul Muralidharan
On Feb 26, 2014 11:12 PM, "Patrick Wendell" wrote: > > @mridul - As far as I know both Maven and Sbt use fairly similar > processes for building the assembly/uber jar. We actually used to > package spark with sbt and there were no specific issues we > encountered and AFAIK sbt respects versioning

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Nathan Kronenfeld
On Wed, Feb 26, 2014 at 2:11 PM, Sean Owen wrote: > I also favor Maven. I don't the the logic is "because it's common". As > Sandy says, it's because of the things that brings: more plugins, > easier to consume by more developers, etc. These are, however, just > some reasons 'for', and have to be

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Koert Kuipers
yes. the Build.scala file behaves like a configuration file mostly, but because it is scala you can use the full power of a real language when needed. also i found writing sbt plugins doable (but not easy). On Feb 26, 2014 2:12 PM, "Sean Owen" wrote: > I also favor Maven. I don't the the logic

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Evan Chan
Can't maven pom's include other ones? So what if we remove the artifact specs from the main pom, have them generated by sbt make-pom, and include the generated file in the main pom.xml?I guess, just trying to figure out how much this would help (it seems at least it would remove the issue of m

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Mark Hamstra
Yes, but the POM generated in that fashion is only sufficient for linking with Spark, not for building Spark or serving as a basis from which to build a customized Spark with Maven. So, starting from SparkBuild.scala and generating a POM with make-pom, those who wish to build a customized Spark wi

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Evan Chan
Mark, No, I haven't tried this myself yet :-p Also I would expect that sbt-pom-reader does not do assemblies at all because that is an SBT plugin, so we would still need code to include sbt-assembly. There is also the trick question of how to include the assembly stuff into sbt-pom-reader

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Sean Owen
I also favor Maven. I don't the the logic is "because it's common". As Sandy says, it's because of the things that brings: more plugins, easier to consume by more developers, etc. These are, however, just some reasons 'for', and have to be considered against the other pros and cons. The choice of

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Koert Kuipers
i dont buy the argument that we should use it because its the most common. if all we would do is use what is most common then we should switch to java, svn and maven On Wed, Feb 26, 2014 at 1:38 PM, Mark Grover wrote: > Hi Patrick, > And, to pile on what Sandy said. In my opinion, it's definit

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Mark Hamstra
Evan, Have you actually tried to build Spark using its POM file and sbt-pom-reader? I just made a first, naive attempt, and I'm still sorting through just what this did and didn't produce. It looks like the basic jar files are at least very close to correct, and may be just fine, but that buildi

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Mark Grover
Hi Patrick, And, to pile on what Sandy said. In my opinion, it's definitely more than just a matter of convenience. My comment below applies both to distribution builders but also people who have their own internal "distributions" (a few examples of which we have already seen on this thread already

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Sandy Ryza
@patrick - It seems like my point about being able to inherit the root pom was addressed and there's a way to handle this. The larger point I meant to make is that Maven is by far the most common build tool in projects that are likely to share contributors with Spark. I personally know 10 people

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Patrick Wendell
@mridul - As far as I know both Maven and Sbt use fairly similar processes for building the assembly/uber jar. We actually used to package spark with sbt and there were no specific issues we encountered and AFAIK sbt respects versioning of transitive dependencies correctly. Do you have a specific b

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Evan Chan
I'd like to propose the following way to move forward, based on the comments I've seen: 1. Aggressively clean up the giant dependency graph. One ticket I might work on if I have time is SPARK-681 which might remove the giant fastutil dependency (~15MB by itself). 2. Take an intermediate step

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Koert Kuipers
We maintain in house spark build using sbt. We have no problem using sbt assembly. We did add a few exclude statements for transitive dependencies. The main enemy of assemblies are jars that include stuff they shouldn't (kryo comes to mind, I think they include logback?), new versions of jars that

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Sean Owen
Side point -- "provides" scope is not the same as an exclude. "provides" means, this artifact is used directly by this code (compile time), but it is not necessary to package it, since it will be available from a runtime container. Exclusions make an artifact, that would otherwise be available, una

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Mridul Muralidharan
The problem is, the complete spark dependency graph is fairly large, and there are lot of conflicting versions in there. In particular, when we bump versions of dependencies - making managing this messy at best. Now, I have not looked in detail at how maven manages this - it might just be accident

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Chester Chen
@Sandy Yes, in sbt with multiple projects setup, you can easily set a variable in the build.scala and reference the version number from all dependent projects . Regarding mix of java and scala projects, in my workplace , we have both java and scala codes. The sbt can be used to build both with

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Qiuzhuang Lian
We use jarjar Ant plugin task to assemble into one fat jar. Qiuzhuang On Wed, Feb 26, 2014 at 11:26 AM, Evan chan wrote: > Actually you can control exactly how sbt assembly merges or resolves > conflicts. I believe the default settings however lead to order which > cannot be controlled. > > I

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan chan
Actually you can control exactly how sbt assembly merges or resolves conflicts. I believe the default settings however lead to order which cannot be controlled. I do wish for a smarter fat jar plugin. -Evan To be free is not merely to cast off one's chains, but to live in a way that respec

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Mridul Muralidharan
On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell wrote: > Evan - this is a good thing to bring up. Wrt the shader plug-in - > right now we don't actually use it for bytecode shading - we simply > use it for creating the uber jar with excludes (which sbt supports > just fine via assembly). Not re

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
Sandy, I believe the sbt-pom-reader plugin might work very well for this exact use case. Otherwise, the SBT build file is just Scala code, so it can easily read the pom XML directly if needed and parse stuff out. On Tue, Feb 25, 2014 at 4:36 PM, Sandy Ryza wrote: > To perhaps restate what some

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Sandy Ryza
To perhaps restate what some have said, Maven is by far the most common build tool for the Hadoop / JVM data ecosystem. While Maven is less pretty than SBT, expertise in it is abundant. SBT requires contributors to projects in the ecosystem to learn yet another tool. If we think of Spark as a pr

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
Hi Patrick, If you include shaded dependencies inside of the main Spark jar, such that it would have combined classes from all dependencies, wouldn't you end up with a sub-assembly jar? It would be dangerous in that since it is a single unit, it would break normal packaging assumptions that the j

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Patrick Wendell
What I mean is this. AFIAK the shader plug-in is primarily designed for creating uber jars which contain spark and all dependencies. But since Spark is something people depend on in Maven, what I actually want is to create the normal old Spark jar [1], but then include shaded versions of some of ou

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
Patrick -- not sure I understand your request, do you mean - somehow creating a shaded jar (eg with maven shader plugin) - then including it in the spark jar (which would then be an assembly)? On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell wrote: > Evan - this is a good thing to bring up. Wrt t

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Patrick Wendell
Evan - this is a good thing to bring up. Wrt the shader plug-in - right now we don't actually use it for bytecode shading - we simply use it for creating the uber jar with excludes (which sbt supports just fine via assembly). I was wondering actually, do you know if it's possible to added shaded a

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread yao
Hi Patrick, > (b) You have downloaded Spark and forked it's maven build to change around > the dependencies. We go with this approach. We've cloned Spark repo and currently maintain our own branch. The idea is to fix Spark issues found in our production system first and contribute back to commu

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Patrick Wendell
Hey Yao, Would you mind explaining exactly how your company extends the Spark maven build? For instance: (a) You are depending on Spark in your build and your build is using Maven. (b) You have downloaded Spark and forked it's maven build to change around the dependencies. (c) You are writing pom

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
The problem is that plugins are not equivalent. There is AFAIK no equivalent to the maven shader plugin for SBT. There is an SBT plugin which can apparently read POM XML files (sbt-pom-reader). However, it can't possibly handle plugins, which is still problematic. On Tue, Feb 25, 2014 at 3:31 P

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread yao
I would prefer keep both of them, it would be better even if that means pom.xml will be generated using sbt. Some company, like my current one, have their own build infrastructures built on top of maven. It is not easy to support sbt for these potential spark clients. But I do agree to only keep on

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Sravya Tirukkovalur
I am no sbt guru, but I could exclude transitive dependencies this way: libraryDependencies += "log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms") Thanks! On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan wrote: > The correct way to exclude dependencies in SBT is actually to declare > a depe

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
The correct way to exclude dependencies in SBT is actually to declare a dependency as "provided". I'm not familiar with Maven or its dependencySet, but provided will mark the entire dependency tree as excluded. It is also possible to exclude jar by jar, but this is pretty error prone and messy.

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Koert Kuipers
yes in sbt assembly you can exclude jars (although i never had a need for this) and files in jars. for example i frequently remove log4j.properties, because for whatever reason hadoop decided to include it making it very difficult to use our own logging config. On Tue, Feb 25, 2014 at 4:24 PM,

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Konstantin Boudnik
On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: > Kos - thanks for chiming in. Could you be more specific about what is > available in maven and not in sbt for these issues? I took a look at > the bigtop code relating to Spark. As far as I could tell [1] was the > main point of integration