(For the benefit of other users) The workaround appears to be building spark for the exact Hadoop version and building the app with spark as a provided dependency + without the hadoop-client as a direct dependency of the app. With that, hdfs access works just fine.
On Fri, Jul 25, 2014 at 11:50 PM, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > That's right, I'm looking to depend on spark in general and change only > the hadoop client deps. The spark master and slaves use the > spark-1.0.1-bin-hadoop1 binaries from the downloads page. The relevant > snippet from the app's maven pom is as follows: > > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-core_2.10</artifactId> > <version>1.0.1</version> > <scope>provided</scope> > </dependency> > <dependency> > <groupId>org.apache.hadoop</groupId> > <artifactId>hadoop-client</artifactId> > <version>0.20.2-cdh3u5</version> > <type>jar</type> > </dependency> > </dependencies> > > <repositories> > <repository> > <id>Cloudera repository</id> > <url> > https://repository.cloudera.com/artifactory/cloudera-repos/</url> > </repository> > <repository> > <id>Akka repository</id> > <url>http://repo.akka.io/releases</url> > </repository> > </repositories> > > > Thanks, > Bharath > > > On Fri, Jul 25, 2014 at 10:29 PM, Sean Owen <so...@cloudera.com> wrote: > >> If you link against the pre-built binary, that's for Hadoop 1.0.4. Can >> you show your deps to clarify what you are depending on? Building >> custom Spark and depending on it is a different thing from depending >> on plain Spark and changing its deps. I think you want the latter. >> >> On Fri, Jul 25, 2014 at 5:46 PM, Bharath Ravi Kumar <reachb...@gmail.com> >> wrote: >> > Thanks for responding. I used the pre built spark binaries meant for >> > hadoop1,cdh3u5. I do not intend to build spark against a specific >> > distribution. Irrespective of whether I build my app with the explicit >> cdh >> > hadoop client dependency, I get the same error message. I also verified >> > that my app's uber jar had pulled in the cdh hadoop client >> dependencies. >> > >> > On 25-Jul-2014 9:26 pm, "Sean Owen" <so...@cloudera.com> wrote: >> >> >> >> This indicates your app is not actually using the version of the HDFS >> >> client you think. You built Spark from source with the right deps it >> >> seems, but are you sure you linked to your build in your app? >> >> >> >> On Fri, Jul 25, 2014 at 4:32 PM, Bharath Ravi Kumar < >> reachb...@gmail.com> >> >> wrote: >> >> > Any suggestions to work around this issue ? The pre built spark >> >> > binaries >> >> > don't appear to work against cdh as documented, unless there's a >> build >> >> > issue, which seems unlikely. >> >> > >> >> > On 25-Jul-2014 3:42 pm, "Bharath Ravi Kumar" <reachb...@gmail.com> >> >> > wrote: >> >> >> >> >> >> >> >> >> I'm encountering a hadoop client protocol mismatch trying to read >> from >> >> >> HDFS (cdh3u5) using the pre-build spark from the downloads page >> (linked >> >> >> under "For Hadoop 1 (HDP1, CDH3)"). I've also followed the >> >> >> instructions at >> >> >> >> >> >> >> http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html >> >> >> (i.e. building the app against hadoop-client 0.20.2-cdh3u5), but >> >> >> continue to >> >> >> see the following error regardless of whether I link the app with >> the >> >> >> cdh >> >> >> client: >> >> >> >> >> >> 14/07/25 09:53:43 INFO client.AppClient$ClientActor: Executor >> updated: >> >> >> app-20140725095343-0016/1 is now RUNNING >> >> >> 14/07/25 09:53:43 WARN util.NativeCodeLoader: Unable to load >> >> >> native-hadoop >> >> >> library for your platform... using builtin-java classes where >> >> >> applicable >> >> >> 14/07/25 09:53:43 WARN snappy.LoadSnappy: Snappy native library not >> >> >> loaded >> >> >> Exception in thread "main" >> org.apache.hadoop.ipc.RPC$VersionMismatch: >> >> >> Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version >> >> >> mismatch. >> >> >> (client = 61, server = 63) >> >> >> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401) >> >> >> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) >> >> >> >> >> >> >> >> >> While I can build spark against the exact hadoop distro version, I'd >> >> >> rather work with the standard prebuilt binaries, making additional >> >> >> changes >> >> >> while building the app if necessary. Any >> workarounds/recommendations? >> >> >> >> >> >> Thanks, >> >> >> Bharath >> > >