Actually you can control exactly how sbt assembly merges or resolves conflicts. I believe the default settings however lead to order which cannot be controlled.
I do wish for a smarter fat jar plugin. -Evan To be free is not merely to cast off one's chains, but to live in a way that respects & enhances the freedom of others. (#NelsonMandela) > On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mri...@gmail.com> wrote: > >> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <pwend...@gmail.com> wrote: >> Evan - this is a good thing to bring up. Wrt the shader plug-in - >> right now we don't actually use it for bytecode shading - we simply >> use it for creating the uber jar with excludes (which sbt supports >> just fine via assembly). > > > Not really - as I mentioned initially in this thread, sbt's assembly > does not take dependencies into account properly : and can overwrite > newer classes with older versions. > From an assembly point of view, sbt is not very good : we are yet to > try it after 2.10 shift though (and probably wont, given the mess it > created last time). > > Regards, > Mridul > > > > > >> >> I was wondering actually, do you know if it's possible to added shaded >> artifacts to the *spark jar* using this plug-in (e.g. not an uber >> jar)? That's something I could see being really handy in the future. >> >> - Patrick >> >>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote: >>> The problem is that plugins are not equivalent. There is AFAIK no >>> equivalent to the maven shader plugin for SBT. >>> There is an SBT plugin which can apparently read POM XML files >>> (sbt-pom-reader). However, it can't possibly handle plugins, which >>> is still problematic. >>> >>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote: >>>> I would prefer keep both of them, it would be better even if that means >>>> pom.xml will be generated using sbt. Some company, like my current one, >>>> have their own build infrastructures built on top of maven. It is not easy >>>> to support sbt for these potential spark clients. But I do agree to only >>>> keep one if there is a promising way to generate correct configuration from >>>> the other. >>>> >>>> -Shengzhe >>>> >>>> >>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote: >>>>> >>>>> The correct way to exclude dependencies in SBT is actually to declare >>>>> a dependency as "provided". I'm not familiar with Maven or its >>>>> dependencySet, but provided will mark the entire dependency tree as >>>>> excluded. It is also possible to exclude jar by jar, but this is >>>>> pretty error prone and messy. >>>>> >>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com> wrote: >>>>>> yes in sbt assembly you can exclude jars (although i never had a need for >>>>>> this) and files in jars. >>>>>> >>>>>> for example i frequently remove log4j.properties, because for whatever >>>>>> reason hadoop decided to include it making it very difficult to use our >>>>> own >>>>>> logging config. >>>>>> >>>>>> >>>>>> >>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <c...@apache.org> >>>>>> wrote: >>>>>> >>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: >>>>>>>> Kos - thanks for chiming in. Could you be more specific about what is >>>>>>>> available in maven and not in sbt for these issues? I took a look at >>>>>>>> the bigtop code relating to Spark. As far as I could tell [1] was the >>>>>>>> main point of integration with the build system (maybe there are other >>>>>>>> integration points)? >>>>>>>> >>>>>>>>> - in order to integrate Spark well into existing Hadoop stack it >>>>> was >>>>>>>>> necessary to have a way to avoid transitive dependencies >>>>>>> duplications and >>>>>>>>> possible conflicts. >>>>>>>>> >>>>>>>>> E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs >>>>>>> and later >>>>>>>>> merely declare Spark package dependency on standard Bigtop >>>>> Hadoop >>>>>>>>> packages. And yes - Bigtop packaging means the naming and layout >>>>>>> would be >>>>>>>>> standard across all commercial Hadoop distributions that are >>>>> worth >>>>>>>>> mentioning: ASF Bigtop convenience binary packages, and >>>>> Cloudera or >>>>>>>>> Hortonworks packages. Hence, the downstream user doesn't need to >>>>>>> spend any >>>>>>>>> effort to make sure that Spark "clicks-in" properly. >>>>>>>> >>>>>>>> The sbt build also allows you to plug in a Hadoop version similar to >>>>>>>> the maven build. >>>>>>> >>>>>>> I am actually talking about an ability to exclude a set of dependencies >>>>>>> from an >>>>>>> assembly, similarly to what's happening in dependencySet sections of >>>>>>> assembly/src/main/assembly/assembly.xml >>>>>>> If there is a comparable functionality in Sbt, that would help quite a >>>>> bit, >>>>>>> apparently. >>>>>>> >>>>>>> Cos >>>>>>> >>>>>>>>> - Maven provides a relatively easy way to deal with the jar-hell >>>>>>> problem, >>>>>>>>> although the original maven build was just Shader'ing everything >>>>>>> into a >>>>>>>>> huge lump of class files. Oftentimes ending up with classes >>>>>>> slamming on >>>>>>>>> top of each other from different transitive dependencies. >>>>>>>> >>>>>>>> AFIAK we are only using the shade plug-in to deal with conflict >>>>>>>> resolution in the assembly jar. These are dealt with in sbt via the >>>>>>>> sbt assembly plug-in in an identical way. Is there a difference? >>>>>>> >>>>>>> I am bringing up the Sharder, because it is an awful hack, which is >>>>> can't >>>>>>> be >>>>>>> used in real controlled deployment. >>>>>>> >>>>>>> Cos >>>>>>> >>>>>>>> [1] >>>>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Evan Chan >>>>> Staff Engineer >>>>> e...@ooyala.com | >>> >>> >>> >>> -- >>> -- >>> Evan Chan >>> Staff Engineer >>> e...@ooyala.com |