Hi Sean, Great catch! Yes I was including Spark as a dependency and it was making its way into my uber jar. Following the advice I just found at Stackoverflow[1], I marked Spark as a provided dependency and that appeared to fix my Hadoop client issue. Thanks for your help!!! Perhaps they maintainers might consider setting this in the Quickstart guide pom.xml ( http://spark.apache.org/docs/latest/quick-start.html )
In summary, here's what worked: * Hadoop 2.3 cdh5 http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0.tar.gz * Spark 1.1 for Hadoop 2.3 http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-hadoop2.3.tgz pom.xml snippets: https://gist.github.com/ypwais/ff188611d4806aa05ed9 [1] http://stackoverflow.com/questions/24747037/how-to-define-a-dependency-scope-in-maven-to-include-a-library-in-compile-run Thanks everybody!! -Paul On Tue, Sep 16, 2014 at 3:55 AM, Sean Owen <so...@cloudera.com> wrote: > From the caller / application perspective, you don't care what version > of Hadoop Spark is running on on the cluster. The Spark API you > compile against is the same. When you spark-submit the app, at > runtime, Spark is using the Hadoop libraries from the cluster, which > are the right version. > > So when you build your app, you mark Spark as a 'provided' dependency. > Therefore in general, no, you do not build Spark for yourself if you > are a Spark app creator. > > (Of course, your app would care if it were also using Hadoop libraries > directly. In that case, you will want to depend on hadoop-client, and > the right version for your cluster, but still mark it as provided.) > > The version Spark is built against only matters when you are deploying > Spark's artifacts on the cluster to set it up. > > Your error suggests there is still a version mismatch. Either you > deployed a build that was not compatible, or, maybe you are packaging > a version of Spark with your app which is incompatible and > interfering. > > For example, the artifacts you get via Maven depend on Hadoop 1.0.4. I > suspect that's what you're doing -- packaging Spark(+Hadoop1.0.4) with > your app, when it shouldn't be packaged. > > Spark works out of the box with just about any modern combo of HDFS and YARN. > > On Tue, Sep 16, 2014 at 2:28 AM, Paul Wais <pw...@yelp.com> wrote: >> Dear List, >> >> I'm having trouble getting Spark 1.1 to use the Hadoop 2 API for >> reading SequenceFiles. In particular, I'm seeing: >> >> Exception in thread "main" org.apache.hadoop.ipc.RemoteException: >> Server IPC version 7 cannot communicate with client version 4 >> at org.apache.hadoop.ipc.Client.call(Client.java:1070) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) >> at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source) >> ... >> >> When invoking JavaSparkContext#newAPIHadoopFile(). (With args >> validSequenceFileURI, SequenceFileInputFormat.class, Text.class, >> BytesWritable.class, new Job().getConfiguration() -- Pretty close to >> the unit test here: >> https://github.com/apache/spark/blob/f0f1ba09b195f23f0c89af6fa040c9e01dfa8951/core/src/test/java/org/apache/spark/JavaAPISuite.java#L916 >> ) >> >> >> This error indicates to me that Spark is using an old hadoop client to >> do reads. Oddly I'm able to do /writes/ ok, i.e. I'm able to write >> via JavaPairRdd#saveAsNewAPIHadoopFile() to my hdfs cluster. >> >> >> Do I need to explicitly build spark for modern hadoop?? I previously >> had an hdfs cluster running hadoop 2.3.0 and I was getting a similar >> error (server is using version 9, client is using version 4). >> >> >> I'm using Spark 1.1 cdh4 as well as hadoop cdh4 from the links posted >> on spark's site: >> * http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz >> * http://d3kbcqa49mib13.cloudfront.net/hadoop-2.0.0-cdh4.2.0.tar.gz >> >> >> What distro of hadoop is used at Data Bricks? Are there distros of >> Spark 1.1 and hadoop that should work together out-of-the-box? >> (Previously I had Spark 1.0.0 and Hadoop 2.3 working fine..) >> >> Thanks for any help anybody can give me here! >> -Paul >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org