I think your observation is correct. e.g. http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.3.1 shows that it depends on hadoop-client <http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client> from hadoop 2.2
Cheers On Tue, May 19, 2015 at 6:17 PM, Edward Sargisson <esa...@pobox.com> wrote: > Hi, > I'd like to confirm an observation I've just made. Specifically that spark > is only available in repo1.maven.org for one Hadoop variant. > > The Spark source can be compiled against a number of different Hadoops > using profiles. Yay. > However, the spark jars in repo1.maven.org appear to be compiled against > one specific Hadoop and no other differentiation is made. (I can see a > difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4 in > the version I compiled locally). > > The implication here is that if you have a pom file asking for > spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2 > version. Maven assumes that non-snapshot artifacts never change so trying > to load an Hadoop 1 version will end in tears. > > This then means that if you compile code against spark-core then there > will probably be classpath NoClassDefFound issues unless the Hadoop 2 > version is exactly the one you want. > > Have I gotten this correct? > > It happens that our little app is using a Spark context directly from a > Jetty webapp and the classpath differences were/are causing some confusion. > We are currently installing a Hadoop 1 spark master and worker. > > Thanks a lot! > Edward >