Hi Patrick,

If you include shaded dependencies inside of the main Spark jar, such
that it would have combined classes from all dependencies, wouldn't
you end up with a sub-assembly jar?  It would be dangerous in that
since it is a single unit, it would break normal packaging assumptions
that the jar only contains its own classes, and maven/sbt/ivy/etc is
used to resolve the remaining deps.... but maybe I don't know what you
mean.

The shader plugin in maven is apparently used to
1) build uber jars  - this is the part that sbt-assembly also does
2) "shade" existing jars, ie rename the classes and rewrite bytecode
depending on them such that it doesn't conflict with other jars having
the same classes  -- this is something sbt-assembly doesn't do, which
you point out is done manually.



On Tue, Feb 25, 2014 at 4:09 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> What I mean is this. AFIAK the shader plug-in is primarily designed
> for creating uber jars which contain spark and all dependencies. But
> since Spark is something people depend on in Maven, what I actually
> want is to create the normal old Spark jar [1], but then include
> shaded versions of some of our dependencies inside of it. Not sure if
> that's even possible.
>
> The way we do shading now is we manually publish shaded versions of
> some dependencies to maven central as their own artifacts.
>
> http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-core_2.10/0.9.0-incubating/spark-core_2.10-0.9.0-incubating.jar
>
> On Tue, Feb 25, 2014 at 4:04 PM, Evan Chan <e...@ooyala.com> wrote:
>> Patrick -- not sure I understand your request, do you mean
>> - somehow creating a shaded jar (eg with maven shader plugin)
>> - then including it in the spark jar (which would then be an assembly)?
>>
>> On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell <pwend...@gmail.com> wrote:
>>> Evan - this is a good thing to bring up. Wrt the shader plug-in -
>>> right now we don't actually use it for bytecode shading - we simply
>>> use it for creating the uber jar with excludes (which sbt supports
>>> just fine via assembly).
>>>
>>> I was wondering actually, do you know if it's possible to added shaded
>>> artifacts to the *spark jar* using this plug-in (e.g. not an uber
>>> jar)? That's something I could see being really handy in the future.
>>>
>>> - Patrick
>>>
>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <e...@ooyala.com> wrote:
>>>> The problem is that plugins are not equivalent.  There is AFAIK no
>>>> equivalent to the maven shader plugin for SBT.
>>>> There is an SBT plugin which can apparently read POM XML files
>>>> (sbt-pom-reader).   However, it can't possibly handle plugins, which
>>>> is still problematic.
>>>>
>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote:
>>>>> I would prefer keep both of them, it would be better even if that means
>>>>> pom.xml will be generated using sbt. Some company, like my current one,
>>>>> have their own build infrastructures built on top of maven. It is not easy
>>>>> to support sbt for these potential spark clients. But I do agree to only
>>>>> keep one if there is a promising way to generate correct configuration 
>>>>> from
>>>>> the other.
>>>>>
>>>>> -Shengzhe
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote:
>>>>>
>>>>>> The correct way to exclude dependencies in SBT is actually to declare
>>>>>> a dependency as "provided".   I'm not familiar with Maven or its
>>>>>> dependencySet, but provided will mark the entire dependency tree as
>>>>>> excluded.   It is also possible to exclude jar by jar, but this is
>>>>>> pretty error prone and messy.
>>>>>>
>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>>>> > yes in sbt assembly you can exclude jars (although i never had a need 
>>>>>> > for
>>>>>> > this) and files in jars.
>>>>>> >
>>>>>> > for example i frequently remove log4j.properties, because for whatever
>>>>>> > reason hadoop decided to include it making it very difficult to use our
>>>>>> own
>>>>>> > logging config.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <c...@apache.org>
>>>>>> wrote:
>>>>>> >
>>>>>> >> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
>>>>>> >> > Kos - thanks for chiming in. Could you be more specific about what 
>>>>>> >> > is
>>>>>> >> > available in maven and not in sbt for these issues? I took a look at
>>>>>> >> > the bigtop code relating to Spark. As far as I could tell [1] was 
>>>>>> >> > the
>>>>>> >> > main point of integration with the build system (maybe there are 
>>>>>> >> > other
>>>>>> >> > integration points)?
>>>>>> >> >
>>>>>> >> > >   - in order to integrate Spark well into existing Hadoop stack it
>>>>>> was
>>>>>> >> > >     necessary to have a way to avoid transitive dependencies
>>>>>> >> duplications and
>>>>>> >> > >     possible conflicts.
>>>>>> >> > >
>>>>>> >> > >     E.g. Maven assembly allows us to avoid adding _all_ Hadoop 
>>>>>> >> > > libs
>>>>>> >> and later
>>>>>> >> > >     merely declare Spark package dependency on standard Bigtop
>>>>>> Hadoop
>>>>>> >> > >     packages. And yes - Bigtop packaging means the naming and 
>>>>>> >> > > layout
>>>>>> >> would be
>>>>>> >> > >     standard across all commercial Hadoop distributions that are
>>>>>> worth
>>>>>> >> > >     mentioning: ASF Bigtop convenience binary packages, and
>>>>>> Cloudera or
>>>>>> >> > >     Hortonworks packages. Hence, the downstream user doesn't need 
>>>>>> >> > > to
>>>>>> >> spend any
>>>>>> >> > >     effort to make sure that Spark "clicks-in" properly.
>>>>>> >> >
>>>>>> >> > The sbt build also allows you to plug in a Hadoop version similar to
>>>>>> >> > the maven build.
>>>>>> >>
>>>>>> >> I am actually talking about an ability to exclude a set of 
>>>>>> >> dependencies
>>>>>> >> from an
>>>>>> >> assembly, similarly to what's happening in dependencySet sections of
>>>>>> >>     assembly/src/main/assembly/assembly.xml
>>>>>> >> If there is a comparable functionality in Sbt, that would help quite a
>>>>>> bit,
>>>>>> >> apparently.
>>>>>> >>
>>>>>> >> Cos
>>>>>> >>
>>>>>> >> > >   - Maven provides a relatively easy way to deal with the jar-hell
>>>>>> >> problem,
>>>>>> >> > >     although the original maven build was just Shader'ing 
>>>>>> >> > > everything
>>>>>> >> into a
>>>>>> >> > >     huge lump of class files. Oftentimes ending up with classes
>>>>>> >> slamming on
>>>>>> >> > >     top of each other from different transitive dependencies.
>>>>>> >> >
>>>>>> >> > AFIAK we are only using the shade plug-in to deal with conflict
>>>>>> >> > resolution in the assembly jar. These are dealt with in sbt via the
>>>>>> >> > sbt assembly plug-in in an identical way. Is there a difference?
>>>>>> >>
>>>>>> >> I am bringing up the Sharder, because it is an awful hack, which is
>>>>>> can't
>>>>>> >> be
>>>>>> >> used in real controlled deployment.
>>>>>> >>
>>>>>> >> Cos
>>>>>> >>
>>>>>> >> > [1]
>>>>>> >>
>>>>>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
>>>>>> >>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Evan Chan
>>>>>> Staff Engineer
>>>>>> e...@ooyala.com  |
>>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Evan Chan
>>>> Staff Engineer
>>>> e...@ooyala.com  |
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Reply via email to