Re: Spark 1.1 / cdh4 stuck using old hadoop client?

Paul Wais Tue, 16 Sep 2014 13:00:50 -0700

Hi Sean,

Great catch! Yes I was including Spark as a dependency and it was
making its way into my uber jar.  Following the advice I just found at
Stackoverflow[1],  I marked Spark as a provided dependency and that
appeared to fix my Hadoop client issue.  Thanks for your help!!!
Perhaps they maintainers might consider setting this in the Quickstart
guide pom.xml ( http://spark.apache.org/docs/latest/quick-start.html )


In summary, here's what worked:
 * Hadoop 2.3 cdh5
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0.tar.gz
 * Spark 1.1 for Hadoop 2.3
http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-hadoop2.3.tgz

pom.xml snippets: https://gist.github.com/ypwais/ff188611d4806aa05ed9

[1] 
http://stackoverflow.com/questions/24747037/how-to-define-a-dependency-scope-in-maven-to-include-a-library-in-compile-run

Thanks everybody!!
-Paul


On Tue, Sep 16, 2014 at 3:55 AM, Sean Owen <so...@cloudera.com> wrote:
> From the caller / application perspective, you don't care what version
> of Hadoop Spark is running on on the cluster. The Spark API you
> compile against is the same. When you spark-submit the app, at
> runtime, Spark is using the Hadoop libraries from the cluster, which
> are the right version.
>
> So when you build your app, you mark Spark as a 'provided' dependency.
> Therefore in general, no, you do not build Spark for yourself if you
> are a Spark app creator.
>
> (Of course, your app would care if it were also using Hadoop libraries
> directly. In that case, you will want to depend on hadoop-client, and
> the right version for your cluster, but still mark it as provided.)
>
> The version Spark is built against only matters when you are deploying
> Spark's artifacts on the cluster to set it up.
>
> Your error suggests there is still a version mismatch. Either you
> deployed a build that was not compatible, or, maybe you are packaging
> a version of Spark with your app which is incompatible and
> interfering.
>
> For example, the artifacts you get via Maven depend on Hadoop 1.0.4. I
> suspect that's what you're doing -- packaging Spark(+Hadoop1.0.4) with
> your app, when it shouldn't be packaged.
>
> Spark works out of the box with just about any modern combo of HDFS and YARN.
>
> On Tue, Sep 16, 2014 at 2:28 AM, Paul Wais <pw...@yelp.com> wrote:
>> Dear List,
>>
>> I'm having trouble getting Spark 1.1 to use the Hadoop 2 API for
>> reading SequenceFiles.  In particular, I'm seeing:
>>
>> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
>> Server IPC version 7 cannot communicate with client version 4
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1070)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>         at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source)
>>         ...
>>
>> When invoking JavaSparkContext#newAPIHadoopFile().  (With args
>> validSequenceFileURI, SequenceFileInputFormat.class, Text.class,
>> BytesWritable.class, new Job().getConfiguration() -- Pretty close to
>> the unit test here:
>> https://github.com/apache/spark/blob/f0f1ba09b195f23f0c89af6fa040c9e01dfa8951/core/src/test/java/org/apache/spark/JavaAPISuite.java#L916
>> )
>>
>>
>> This error indicates to me that Spark is using an old hadoop client to
>> do reads.  Oddly I'm able to do /writes/ ok, i.e. I'm able to write
>> via JavaPairRdd#saveAsNewAPIHadoopFile() to my hdfs cluster.
>>
>>
>> Do I need to explicitly build spark for modern hadoop??  I previously
>> had an hdfs cluster running hadoop 2.3.0 and I was getting a similar
>> error (server is using version 9, client is using version 4).
>>
>>
>> I'm using Spark 1.1 cdh4 as well as hadoop cdh4 from the links posted
>> on spark's site:
>>  * http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz
>>  * http://d3kbcqa49mib13.cloudfront.net/hadoop-2.0.0-cdh4.2.0.tar.gz
>>
>>
>> What distro of hadoop is used at Data Bricks?  Are there distros of
>> Spark 1.1 and hadoop that should work together out-of-the-box?
>> (Previously I had Spark 1.0.0 and Hadoop 2.3 working fine..)
>>
>> Thanks for any help anybody can give me here!
>> -Paul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark 1.1 / cdh4 stuck using old hadoop client?

Reply via email to