The problem is, the complete spark dependency graph is fairly large, and there are lot of conflicting versions in there. In particular, when we bump versions of dependencies - making managing this messy at best.
Now, I have not looked in detail at how maven manages this - it might just be accidental that we get a decent out-of-the-box assembled shaded jar (since we dont do anything great to configure it). With current state of sbt in spark, it definitely is not a good solution : if we can enhance it (or it already is ?), while keeping the management of the version/dependency graph manageable, I dont have any objections to using sbt or maven ! Too many exclude versions, pinned versions, etc would just make things unmanageable in future. Regards, Mridul On Wed, Feb 26, 2014 at 8:56 AM, Evan chan <e...@ooyala.com> wrote: > Actually you can control exactly how sbt assembly merges or resolves > conflicts. I believe the default settings however lead to order which cannot > be controlled. > > I do wish for a smarter fat jar plugin. > > -Evan > To be free is not merely to cast off one's chains, but to live in a way that > respects & enhances the freedom of others. (#NelsonMandela) > >> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mri...@gmail.com> wrote: >> >>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <pwend...@gmail.com> wrote: >>> Evan - this is a good thing to bring up. Wrt the shader plug-in - >>> right now we don't actually use it for bytecode shading - we simply >>> use it for creating the uber jar with excludes (which sbt supports >>> just fine via assembly). >> >> >> Not really - as I mentioned initially in this thread, sbt's assembly >> does not take dependencies into account properly : and can overwrite >> newer classes with older versions. >> From an assembly point of view, sbt is not very good : we are yet to >> try it after 2.10 shift though (and probably wont, given the mess it >> created last time). >> >> Regards, >> Mridul >> >> >> >> >> >>> >>> I was wondering actually, do you know if it's possible to added shaded >>> artifacts to the *spark jar* using this plug-in (e.g. not an uber >>> jar)? That's something I could see being really handy in the future. >>> >>> - Patrick >>> >>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote: >>>> The problem is that plugins are not equivalent. There is AFAIK no >>>> equivalent to the maven shader plugin for SBT. >>>> There is an SBT plugin which can apparently read POM XML files >>>> (sbt-pom-reader). However, it can't possibly handle plugins, which >>>> is still problematic. >>>> >>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote: >>>>> I would prefer keep both of them, it would be better even if that means >>>>> pom.xml will be generated using sbt. Some company, like my current one, >>>>> have their own build infrastructures built on top of maven. It is not easy >>>>> to support sbt for these potential spark clients. But I do agree to only >>>>> keep one if there is a promising way to generate correct configuration >>>>> from >>>>> the other. >>>>> >>>>> -Shengzhe >>>>> >>>>> >>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote: >>>>>> >>>>>> The correct way to exclude dependencies in SBT is actually to declare >>>>>> a dependency as "provided". I'm not familiar with Maven or its >>>>>> dependencySet, but provided will mark the entire dependency tree as >>>>>> excluded. It is also possible to exclude jar by jar, but this is >>>>>> pretty error prone and messy. >>>>>> >>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com> >>>>>>> wrote: >>>>>>> yes in sbt assembly you can exclude jars (although i never had a need >>>>>>> for >>>>>>> this) and files in jars. >>>>>>> >>>>>>> for example i frequently remove log4j.properties, because for whatever >>>>>>> reason hadoop decided to include it making it very difficult to use our >>>>>> own >>>>>>> logging config. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <c...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: >>>>>>>>> Kos - thanks for chiming in. Could you be more specific about what is >>>>>>>>> available in maven and not in sbt for these issues? I took a look at >>>>>>>>> the bigtop code relating to Spark. As far as I could tell [1] was the >>>>>>>>> main point of integration with the build system (maybe there are other >>>>>>>>> integration points)? >>>>>>>>> >>>>>>>>>> - in order to integrate Spark well into existing Hadoop stack it >>>>>> was >>>>>>>>>> necessary to have a way to avoid transitive dependencies >>>>>>>> duplications and >>>>>>>>>> possible conflicts. >>>>>>>>>> >>>>>>>>>> E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs >>>>>>>> and later >>>>>>>>>> merely declare Spark package dependency on standard Bigtop >>>>>> Hadoop >>>>>>>>>> packages. And yes - Bigtop packaging means the naming and layout >>>>>>>> would be >>>>>>>>>> standard across all commercial Hadoop distributions that are >>>>>> worth >>>>>>>>>> mentioning: ASF Bigtop convenience binary packages, and >>>>>> Cloudera or >>>>>>>>>> Hortonworks packages. Hence, the downstream user doesn't need to >>>>>>>> spend any >>>>>>>>>> effort to make sure that Spark "clicks-in" properly. >>>>>>>>> >>>>>>>>> The sbt build also allows you to plug in a Hadoop version similar to >>>>>>>>> the maven build. >>>>>>>> >>>>>>>> I am actually talking about an ability to exclude a set of dependencies >>>>>>>> from an >>>>>>>> assembly, similarly to what's happening in dependencySet sections of >>>>>>>> assembly/src/main/assembly/assembly.xml >>>>>>>> If there is a comparable functionality in Sbt, that would help quite a >>>>>> bit, >>>>>>>> apparently. >>>>>>>> >>>>>>>> Cos >>>>>>>> >>>>>>>>>> - Maven provides a relatively easy way to deal with the jar-hell >>>>>>>> problem, >>>>>>>>>> although the original maven build was just Shader'ing everything >>>>>>>> into a >>>>>>>>>> huge lump of class files. Oftentimes ending up with classes >>>>>>>> slamming on >>>>>>>>>> top of each other from different transitive dependencies. >>>>>>>>> >>>>>>>>> AFIAK we are only using the shade plug-in to deal with conflict >>>>>>>>> resolution in the assembly jar. These are dealt with in sbt via the >>>>>>>>> sbt assembly plug-in in an identical way. Is there a difference? >>>>>>>> >>>>>>>> I am bringing up the Sharder, because it is an awful hack, which is >>>>>> can't >>>>>>>> be >>>>>>>> used in real controlled deployment. >>>>>>>> >>>>>>>> Cos >>>>>>>> >>>>>>>>> [1] >>>>>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> Evan Chan >>>>>> Staff Engineer >>>>>> e...@ooyala.com | >>>> >>>> >>>> >>>> -- >>>> -- >>>> Evan Chan >>>> Staff Engineer >>>> e...@ooyala.com |