Hi Patrick, If you include shaded dependencies inside of the main Spark jar, such that it would have combined classes from all dependencies, wouldn't you end up with a sub-assembly jar? It would be dangerous in that since it is a single unit, it would break normal packaging assumptions that the jar only contains its own classes, and maven/sbt/ivy/etc is used to resolve the remaining deps.... but maybe I don't know what you mean.
The shader plugin in maven is apparently used to 1) build uber jars - this is the part that sbt-assembly also does 2) "shade" existing jars, ie rename the classes and rewrite bytecode depending on them such that it doesn't conflict with other jars having the same classes -- this is something sbt-assembly doesn't do, which you point out is done manually. On Tue, Feb 25, 2014 at 4:09 PM, Patrick Wendell <pwend...@gmail.com> wrote: > What I mean is this. AFIAK the shader plug-in is primarily designed > for creating uber jars which contain spark and all dependencies. But > since Spark is something people depend on in Maven, what I actually > want is to create the normal old Spark jar [1], but then include > shaded versions of some of our dependencies inside of it. Not sure if > that's even possible. > > The way we do shading now is we manually publish shaded versions of > some dependencies to maven central as their own artifacts. > > http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-core_2.10/0.9.0-incubating/spark-core_2.10-0.9.0-incubating.jar > > On Tue, Feb 25, 2014 at 4:04 PM, Evan Chan <e...@ooyala.com> wrote: >> Patrick -- not sure I understand your request, do you mean >> - somehow creating a shaded jar (eg with maven shader plugin) >> - then including it in the spark jar (which would then be an assembly)? >> >> On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell <pwend...@gmail.com> wrote: >>> Evan - this is a good thing to bring up. Wrt the shader plug-in - >>> right now we don't actually use it for bytecode shading - we simply >>> use it for creating the uber jar with excludes (which sbt supports >>> just fine via assembly). >>> >>> I was wondering actually, do you know if it's possible to added shaded >>> artifacts to the *spark jar* using this plug-in (e.g. not an uber >>> jar)? That's something I could see being really handy in the future. >>> >>> - Patrick >>> >>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote: >>>> The problem is that plugins are not equivalent. There is AFAIK no >>>> equivalent to the maven shader plugin for SBT. >>>> There is an SBT plugin which can apparently read POM XML files >>>> (sbt-pom-reader). However, it can't possibly handle plugins, which >>>> is still problematic. >>>> >>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote: >>>>> I would prefer keep both of them, it would be better even if that means >>>>> pom.xml will be generated using sbt. Some company, like my current one, >>>>> have their own build infrastructures built on top of maven. It is not easy >>>>> to support sbt for these potential spark clients. But I do agree to only >>>>> keep one if there is a promising way to generate correct configuration >>>>> from >>>>> the other. >>>>> >>>>> -Shengzhe >>>>> >>>>> >>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote: >>>>> >>>>>> The correct way to exclude dependencies in SBT is actually to declare >>>>>> a dependency as "provided". I'm not familiar with Maven or its >>>>>> dependencySet, but provided will mark the entire dependency tree as >>>>>> excluded. It is also possible to exclude jar by jar, but this is >>>>>> pretty error prone and messy. >>>>>> >>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com> wrote: >>>>>> > yes in sbt assembly you can exclude jars (although i never had a need >>>>>> > for >>>>>> > this) and files in jars. >>>>>> > >>>>>> > for example i frequently remove log4j.properties, because for whatever >>>>>> > reason hadoop decided to include it making it very difficult to use our >>>>>> own >>>>>> > logging config. >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <c...@apache.org> >>>>>> wrote: >>>>>> > >>>>>> >> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: >>>>>> >> > Kos - thanks for chiming in. Could you be more specific about what >>>>>> >> > is >>>>>> >> > available in maven and not in sbt for these issues? I took a look at >>>>>> >> > the bigtop code relating to Spark. As far as I could tell [1] was >>>>>> >> > the >>>>>> >> > main point of integration with the build system (maybe there are >>>>>> >> > other >>>>>> >> > integration points)? >>>>>> >> > >>>>>> >> > > - in order to integrate Spark well into existing Hadoop stack it >>>>>> was >>>>>> >> > > necessary to have a way to avoid transitive dependencies >>>>>> >> duplications and >>>>>> >> > > possible conflicts. >>>>>> >> > > >>>>>> >> > > E.g. Maven assembly allows us to avoid adding _all_ Hadoop >>>>>> >> > > libs >>>>>> >> and later >>>>>> >> > > merely declare Spark package dependency on standard Bigtop >>>>>> Hadoop >>>>>> >> > > packages. And yes - Bigtop packaging means the naming and >>>>>> >> > > layout >>>>>> >> would be >>>>>> >> > > standard across all commercial Hadoop distributions that are >>>>>> worth >>>>>> >> > > mentioning: ASF Bigtop convenience binary packages, and >>>>>> Cloudera or >>>>>> >> > > Hortonworks packages. Hence, the downstream user doesn't need >>>>>> >> > > to >>>>>> >> spend any >>>>>> >> > > effort to make sure that Spark "clicks-in" properly. >>>>>> >> > >>>>>> >> > The sbt build also allows you to plug in a Hadoop version similar to >>>>>> >> > the maven build. >>>>>> >> >>>>>> >> I am actually talking about an ability to exclude a set of >>>>>> >> dependencies >>>>>> >> from an >>>>>> >> assembly, similarly to what's happening in dependencySet sections of >>>>>> >> assembly/src/main/assembly/assembly.xml >>>>>> >> If there is a comparable functionality in Sbt, that would help quite a >>>>>> bit, >>>>>> >> apparently. >>>>>> >> >>>>>> >> Cos >>>>>> >> >>>>>> >> > > - Maven provides a relatively easy way to deal with the jar-hell >>>>>> >> problem, >>>>>> >> > > although the original maven build was just Shader'ing >>>>>> >> > > everything >>>>>> >> into a >>>>>> >> > > huge lump of class files. Oftentimes ending up with classes >>>>>> >> slamming on >>>>>> >> > > top of each other from different transitive dependencies. >>>>>> >> > >>>>>> >> > AFIAK we are only using the shade plug-in to deal with conflict >>>>>> >> > resolution in the assembly jar. These are dealt with in sbt via the >>>>>> >> > sbt assembly plug-in in an identical way. Is there a difference? >>>>>> >> >>>>>> >> I am bringing up the Sharder, because it is an awful hack, which is >>>>>> can't >>>>>> >> be >>>>>> >> used in real controlled deployment. >>>>>> >> >>>>>> >> Cos >>>>>> >> >>>>>> >> > [1] >>>>>> >> >>>>>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master >>>>>> >> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Evan Chan >>>>>> Staff Engineer >>>>>> e...@ooyala.com | >>>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Evan Chan >>>> Staff Engineer >>>> e...@ooyala.com | >> >> >> >> -- >> -- >> Evan Chan >> Staff Engineer >> e...@ooyala.com | -- -- Evan Chan Staff Engineer e...@ooyala.com |