Hi. I want to run Spark, and more specifically the "Pre-build with user-provided Hadoop" version from the downloads page, but I can't find any documentation on how to connect the two components together (namely Spark and Hadoop).
I've had some success in settting SPARK_CLASSPATH to my hadoop distribution lib/ directory, containing jar files such as hadoop-core, hadoop-common etc. However, there seems to be many native libraries included in the assembly jar for Spark versions pre-built for Hadoop distributions (I'm specifically missing the libsnappy.so files) that are not by default included in distributions such as Cloudera Hadoop. Have anyone here actually tried to run Spark without Hadoop included in the assembly jar and/or have any more resources where I can read about the proper way of connecting them? As an aside, the spark-assembly jar in the Spark version pre-built for user-provided Hadoop distributions is named spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should be called spark-assembly-1.4.0-without-hadoop.jar :) -- Herman -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org