Hi, I'd like to confirm an observation I've just made. Specifically that spark is only available in repo1.maven.org for one Hadoop variant.
The Spark source can be compiled against a number of different Hadoops using profiles. Yay. However, the spark jars in repo1.maven.org appear to be compiled against one specific Hadoop and no other differentiation is made. (I can see a difference with hadoop-client being 2.2.0 in repo1.maven.org and 1.0.4 in the version I compiled locally). The implication here is that if you have a pom file asking for spark-core_2.10 version 1.3.1 then Maven will only give you an Hadoop 2 version. Maven assumes that non-snapshot artifacts never change so trying to load an Hadoop 1 version will end in tears. This then means that if you compile code against spark-core then there will probably be classpath NoClassDefFound issues unless the Hadoop 2 version is exactly the one you want. Have I gotten this correct? It happens that our little app is using a Spark context directly from a Jetty webapp and the classpath differences were/are causing some confusion. We are currently installing a Hadoop 1 spark master and worker. Thanks a lot! Edward