Re: [DISCUSS] Necessity of Maven and SBT Build in Spark

Evan chan Tue, 25 Feb 2014 19:28:25 -0800

Actually you can control exactly how sbt assembly merges or resolves conflicts. 
 I believe the default settings however lead to order which cannot be 
controlled.


I do wish for a smarter fat jar plugin.  

-Evan
To be free is not merely to cast off one's chains, but to live in a way that 
respects & enhances the freedom of others. (#NelsonMandela)

> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mri...@gmail.com> wrote:
> 
>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <pwend...@gmail.com> wrote:
>> Evan - this is a good thing to bring up. Wrt the shader plug-in -
>> right now we don't actually use it for bytecode shading - we simply
>> use it for creating the uber jar with excludes (which sbt supports
>> just fine via assembly).
> 
> 
> Not really - as I mentioned initially in this thread, sbt's assembly
> does not take dependencies into account properly : and can overwrite
> newer classes with older versions.
> From an assembly point of view, sbt is not very good : we are yet to
> try it after 2.10 shift though (and probably wont, given the mess it
> created last time).
> 
> Regards,
> Mridul
> 
> 
> 
> 
> 
>> 
>> I was wondering actually, do you know if it's possible to added shaded
>> artifacts to the *spark jar* using this plug-in (e.g. not an uber
>> jar)? That's something I could see being really handy in the future.
>> 
>> - Patrick
>> 
>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote:
>>> The problem is that plugins are not equivalent.  There is AFAIK no
>>> equivalent to the maven shader plugin for SBT.
>>> There is an SBT plugin which can apparently read POM XML files
>>> (sbt-pom-reader).   However, it can't possibly handle plugins, which
>>> is still problematic.
>>> 
>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote:
>>>> I would prefer keep both of them, it would be better even if that means
>>>> pom.xml will be generated using sbt. Some company, like my current one,
>>>> have their own build infrastructures built on top of maven. It is not easy
>>>> to support sbt for these potential spark clients. But I do agree to only
>>>> keep one if there is a promising way to generate correct configuration from
>>>> the other.
>>>> 
>>>> -Shengzhe
>>>> 
>>>> 
>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote:
>>>>> 
>>>>> The correct way to exclude dependencies in SBT is actually to declare
>>>>> a dependency as "provided".   I'm not familiar with Maven or its
>>>>> dependencySet, but provided will mark the entire dependency tree as
>>>>> excluded.   It is also possible to exclude jar by jar, but this is
>>>>> pretty error prone and messy.
>>>>> 
>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>>>> yes in sbt assembly you can exclude jars (although i never had a need for
>>>>>> this) and files in jars.
>>>>>> 
>>>>>> for example i frequently remove log4j.properties, because for whatever
>>>>>> reason hadoop decided to include it making it very difficult to use our
>>>>> own
>>>>>> logging config.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <c...@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
>>>>>>>> Kos - thanks for chiming in. Could you be more specific about what is
>>>>>>>> available in maven and not in sbt for these issues? I took a look at
>>>>>>>> the bigtop code relating to Spark. As far as I could tell [1] was the
>>>>>>>> main point of integration with the build system (maybe there are other
>>>>>>>> integration points)?
>>>>>>>> 
>>>>>>>>>  - in order to integrate Spark well into existing Hadoop stack it
>>>>> was
>>>>>>>>>    necessary to have a way to avoid transitive dependencies
>>>>>>> duplications and
>>>>>>>>>    possible conflicts.
>>>>>>>>> 
>>>>>>>>>    E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs
>>>>>>> and later
>>>>>>>>>    merely declare Spark package dependency on standard Bigtop
>>>>> Hadoop
>>>>>>>>>    packages. And yes - Bigtop packaging means the naming and layout
>>>>>>> would be
>>>>>>>>>    standard across all commercial Hadoop distributions that are
>>>>> worth
>>>>>>>>>    mentioning: ASF Bigtop convenience binary packages, and
>>>>> Cloudera or
>>>>>>>>>    Hortonworks packages. Hence, the downstream user doesn't need to
>>>>>>> spend any
>>>>>>>>>    effort to make sure that Spark "clicks-in" properly.
>>>>>>>> 
>>>>>>>> The sbt build also allows you to plug in a Hadoop version similar to
>>>>>>>> the maven build.
>>>>>>> 
>>>>>>> I am actually talking about an ability to exclude a set of dependencies
>>>>>>> from an
>>>>>>> assembly, similarly to what's happening in dependencySet sections of
>>>>>>>    assembly/src/main/assembly/assembly.xml
>>>>>>> If there is a comparable functionality in Sbt, that would help quite a
>>>>> bit,
>>>>>>> apparently.
>>>>>>> 
>>>>>>> Cos
>>>>>>> 
>>>>>>>>>  - Maven provides a relatively easy way to deal with the jar-hell
>>>>>>> problem,
>>>>>>>>>    although the original maven build was just Shader'ing everything
>>>>>>> into a
>>>>>>>>>    huge lump of class files. Oftentimes ending up with classes
>>>>>>> slamming on
>>>>>>>>>    top of each other from different transitive dependencies.
>>>>>>>> 
>>>>>>>> AFIAK we are only using the shade plug-in to deal with conflict
>>>>>>>> resolution in the assembly jar. These are dealt with in sbt via the
>>>>>>>> sbt assembly plug-in in an identical way. Is there a difference?
>>>>>>> 
>>>>>>> I am bringing up the Sharder, because it is an awful hack, which is
>>>>> can't
>>>>>>> be
>>>>>>> used in real controlled deployment.
>>>>>>> 
>>>>>>> Cos
>>>>>>> 
>>>>>>>> [1]
>>>>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> --
>>>>> Evan Chan
>>>>> Staff Engineer
>>>>> e...@ooyala.com  |
>>> 
>>> 
>>> 
>>> --
>>> --
>>> Evan Chan
>>> Staff Engineer
>>> e...@ooyala.com  |

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

Reply via email to

Re: [DISCUSS] Necessity of Maven and SBT Build in Spark