Re: [DISCUSS] Necessity of Maven and SBT Build in Spark

yao Tue, 25 Feb 2014 15:52:17 -0800

Hi Patrick,


> (b) You have downloaded Spark and forked it's maven build to change around
> the dependencies.


We go with this approach. We've cloned Spark repo and currently maintain
our own branch. The idea is to fix Spark issues found in our production
system first and contribute back to community later (if it is accepted). We
use maven + jenkins here and our deployment engineer customize their
configuration. We might lose some build features (spell check, coding style
check) if make a exception for Spark (using sbt), even for maven, we make a
special configuration just for Spark. Java is still widely used and only
few teams start experimenting Scala. I would say this change will affect
people like us who maintains their own Spark branch. It's okay to go with
sbt since it's the standard build tool for scala, but I think we still want
the ability to use maven as alternative.

Thanks
-Shengzhe


On Tue, Feb 25, 2014 at 3:40 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> Hey Yao,
>
> Would you mind explaining exactly how your company extends the Spark
> maven build? For instance:
>
> (a) You are depending on Spark in your build and your build is using Maven.
> (b) You have downloaded Spark and forked it's maven build to change
> around the dependencies.
> (c) You are writing pom files that extend the Spark pom.
>
> If it's just (a) - then whether Spark itself uses sbt/maven will make
> no difference. We'd publish identical poms.
>
> - Patrick
>
> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaosheng...@gmail.com> wrote:
> > I would prefer keep both of them, it would be better even if that means
> > pom.xml will be generated using sbt. Some company, like my current one,
> > have their own build infrastructures built on top of maven. It is not
> easy
> > to support sbt for these potential spark clients. But I do agree to only
> > keep one if there is a promising way to generate correct configuration
> from
> > the other.
> >
> > -Shengzhe
> >
> >
> > On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan <e...@ooyala.com> wrote:
> >
> >> The correct way to exclude dependencies in SBT is actually to declare
> >> a dependency as "provided".   I'm not familiar with Maven or its
> >> dependencySet, but provided will mark the entire dependency tree as
> >> excluded.   It is also possible to exclude jar by jar, but this is
> >> pretty error prone and messy.
> >>
> >> On Tue, Feb 25, 2014 at 2:45 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >> > yes in sbt assembly you can exclude jars (although i never had a need
> for
> >> > this) and files in jars.
> >> >
> >> > for example i frequently remove log4j.properties, because for whatever
> >> > reason hadoop decided to include it making it very difficult to use
> our
> >> own
> >> > logging config.
> >> >
> >> >
> >> >
> >> > On Tue, Feb 25, 2014 at 4:24 PM, Konstantin Boudnik <c...@apache.org>
> >> wrote:
> >> >
> >> >> On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
> >> >> > Kos - thanks for chiming in. Could you be more specific about what
> is
> >> >> > available in maven and not in sbt for these issues? I took a look
> at
> >> >> > the bigtop code relating to Spark. As far as I could tell [1] was
> the
> >> >> > main point of integration with the build system (maybe there are
> other
> >> >> > integration points)?
> >> >> >
> >> >> > >   - in order to integrate Spark well into existing Hadoop stack
> it
> >> was
> >> >> > >     necessary to have a way to avoid transitive dependencies
> >> >> duplications and
> >> >> > >     possible conflicts.
> >> >> > >
> >> >> > >     E.g. Maven assembly allows us to avoid adding _all_ Hadoop
> libs
> >> >> and later
> >> >> > >     merely declare Spark package dependency on standard Bigtop
> >> Hadoop
> >> >> > >     packages. And yes - Bigtop packaging means the naming and
> layout
> >> >> would be
> >> >> > >     standard across all commercial Hadoop distributions that are
> >> worth
> >> >> > >     mentioning: ASF Bigtop convenience binary packages, and
> >> Cloudera or
> >> >> > >     Hortonworks packages. Hence, the downstream user doesn't
> need to
> >> >> spend any
> >> >> > >     effort to make sure that Spark "clicks-in" properly.
> >> >> >
> >> >> > The sbt build also allows you to plug in a Hadoop version similar
> to
> >> >> > the maven build.
> >> >>
> >> >> I am actually talking about an ability to exclude a set of
> dependencies
> >> >> from an
> >> >> assembly, similarly to what's happening in dependencySet sections of
> >> >>     assembly/src/main/assembly/assembly.xml
> >> >> If there is a comparable functionality in Sbt, that would help quite
> a
> >> bit,
> >> >> apparently.
> >> >>
> >> >> Cos
> >> >>
> >> >> > >   - Maven provides a relatively easy way to deal with the
> jar-hell
> >> >> problem,
> >> >> > >     although the original maven build was just Shader'ing
> everything
> >> >> into a
> >> >> > >     huge lump of class files. Oftentimes ending up with classes
> >> >> slamming on
> >> >> > >     top of each other from different transitive dependencies.
> >> >> >
> >> >> > AFIAK we are only using the shade plug-in to deal with conflict
> >> >> > resolution in the assembly jar. These are dealt with in sbt via the
> >> >> > sbt assembly plug-in in an identical way. Is there a difference?
> >> >>
> >> >> I am bringing up the Sharder, because it is an awful hack, which is
> >> can't
> >> >> be
> >> >> used in real controlled deployment.
> >> >>
> >> >> Cos
> >> >>
> >> >> > [1]
> >> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
> >> >>
> >>
> >>
> >>
> >> --
> >> --
> >> Evan Chan
> >> Staff Engineer
> >> e...@ooyala.com  |
> >>
>

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

Reply via email to

Re: [DISCUSS] Necessity of Maven and SBT Build in Spark