Running Spark on user-provided Hadoop installation

hermansc Thu, 30 Jul 2015 01:49:18 -0700

Hi.

I want to run Spark, and more specifically the "Pre-build with user-provided
Hadoop" version from the downloads page, but I can't find any documentation
on how to connect the two components together (namely Spark and Hadoop).


I've had some success in settting SPARK_CLASSPATH to my hadoop distribution
lib/ directory, containing jar files such as hadoop-core, hadoop-common etc.

However, there seems to be many native libraries included in the assembly
jar for Spark versions pre-built for Hadoop distributions (I'm specifically
missing the libsnappy.so files) that are not by default included in
distributions such as Cloudera Hadoop.

Have anyone here actually tried to run Spark without Hadoop included in the
assembly jar and/or have any more resources where I can read about the
proper way of connecting them?

As an aside, the spark-assembly jar in the Spark version pre-built for
user-provided Hadoop distributions is named
spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should
be called spark-assembly-1.4.0-without-hadoop.jar :)

-- 
Herman



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Running Spark on user-provided Hadoop installation

Reply via email to