Re: Is it possible to use Spark, Maven, and Hadoop 2?

Frank Austin Nothaft Sun, 29 Jun 2014 16:49:17 -0700

Hi Robert,

I’m not sure about sbt; we’re currently using Maven to build. We do create a 
single jar though, via the Maven shade plugin. Our project has three 
components, and we routinely distribute the jar for our project’s CLI out 
across a cluster. If you’re interested, here are our project’s master pom and 
the pom for our CLI. There are a few dependencies that we exclude from 
hadoop-client:


• asm/asm
• org.jboss.netty/netty
• org.codehaus.jackson/*
• org.sonatype.sisu.inject/*

We've built and ran this successfully across both Hadoop 1.0.4 and 2.2.0-2.2.5.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

On Jun 29, 2014, at 4:20 PM, Robert James <srobertja...@gmail.com> wrote:

> On 6/29/14, FRANK AUSTIN NOTHAFT <fnoth...@berkeley.edu> wrote:
>> Robert,
>> 
>> You can build a Spark application using Maven for Hadoop 2 by adding a
>> dependency on the Hadoop 2.* hadoop-client package. If you define any
>> Hadoop Input/Output formats, you may also need to depend on the
>> hadoop-mapreduce package.
> 
> Thank you Frank.  Is it possible to do sbt-assembly after that? I get
> conflicts, because Spark requires via Maven Hadoop 1.  I've tried
> excluding that via sbt, but still get conflicts within Hadoop 2, with
> different components requiring different versions of other jars.
> 
> Is it possible to make a jar assembly using your approach? How? If
> not: How do you distribute the jars to the workers?
> 
>> 
>> On Sun, Jun 29, 2014 at 12:20 PM, Robert James <srobertja...@gmail.com>
>> wrote:
>> 
>>> Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop
>>> 2, the Maven repository only seems to have one version, which uses
>>> Hadoop 1.
>>> 
>>> Is it possible to use a Maven link and Hadoop 2? What is the id?
>>> 
>>> If not: How can I use the prebuilt binaries to use Hadoop 2? Do I just
>>> copy the lib/ dir into my classpath?
>>> 
>>

Re: Is it possible to use Spark, Maven, and Hadoop 2?

Reply via email to