Hi all, I tried a couple ways, but couldn't get it to work..
The following seems to be what the online document ( http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting: SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client Help info of spark-shell seems to be suggesting "--master yarn --deploy-mode cluster". But either way, I am seeing the following messages: 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) My guess is that spark-shell is trying to talk to resource manager to setup spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from though. I am running CDH5 with two resource managers in HA mode. Their IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up. Any ideas? Thanks. -Simon