Re: “mapreduce.job.user.classpath.first” for Spark

Corey Nolet Wed, 04 Feb 2015 12:47:42 -0800

> leading to a real classloader hell if you tried to add a newer version of
jar that spark already used.


Spark at least makes this easier on the Guava side [1] by shading the
package names internally so that there's no possibility of a conflict.
Elasticsearch & Storm do this too for many of their dependencies and I
think it's a great practice for libraries that are only used internally-
specifically when those internal libraries are not exposed at all to the
outside. If you are only using said libraries internally, that strategy may
work for you as well, Koert. I'm going to ask about this on the Hadoop list
as well to see if maybe there was a decision against it for reasons I
haven't thought of.

> Another suggestion is to build Spark by yourself.

I'm having trouble seeing what you mean here, Marcelo. Guava is already
shaded to a different package for the 1.2.0 release. It shouldn't be
causing conflicts.

[1] https://issues.apache.org/jira/browse/SPARK-2848

On Wed, Feb 4, 2015 at 2:35 PM, Koert Kuipers <ko...@tresata.com> wrote:

> the whole spark.files.userClassPathFirs never really worked for me in
> standalone mode, since jars were added dynamically which means they had
> different classloaders leading to a real classloader hell if you tried to
> add a newer version of jar that spark already used. see:
> https://issues.apache.org/jira/browse/SPARK-1863
>
> do i understand it correctly that on yarn the the customer jars are truly
> placed before the yarn and spark jars on classpath? meaning at container
> construction time, on the same classloader? that would be great news for
> me. it would open up the possibility of using newer versions of many
> libraries.
>
>
> On Wed, Feb 4, 2015 at 1:12 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> Hi Corey,
>>
>> When you run on Yarn, Yarn's libraries are placed in the classpath,
>> and they have precedence over your app's. So, with Spark 1.2, you'll
>> get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get
>> Guava 14 from Spark, so still a problem for you).
>>
>> Right now, the option Markus mentioned
>> (spark.yarn.user.classpath.first) can be a workaround for you, since
>> it will place your app's jars before Yarn's on the classpath.
>>
>>
>> On Tue, Feb 3, 2015 at 8:20 PM, Corey Nolet <cjno...@gmail.com> wrote:
>> > I'm having a really bad dependency conflict right now with Guava
>> versions
>> > between my Spark application in Yarn and (I believe) Hadoop's version.
>> >
>> > The problem is, my driver has the version of Guava which my application
>> is
>> > expecting (15.0) while it appears the Spark executors that are working
>> on my
>> > RDDs have a much older version (assuming it's the old version on the
>> Hadoop
>> > classpath).
>> >
>> > Is there a property like "mapreduce.job.user.classpath.first' that I
>> can set
>> > to make sure my own classpath is extablished first on the executors?
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: “mapreduce.job.user.classpath.first” for Spark

Reply via email to