specifying worker nodes when using the repl?

Eric Friedman Mon, 19 May 2014 08:10:41 -0700

Hi

I am working with a Cloudera 5 cluster with 192 nodes and can’t work out how to 
get the spark repo to use more than 2 nodes in an interactive session.


So, this works, but is non-interactive (using yarn-client as MASTER)

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/bin/spark-class \
  org.apache.spark.deploy.yarn.Client \
  --jar 
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/examples/lib/spark-examples_2.10-0.9.0-cdh5.0.0.jar
 \
  --class org.apache.spark.examples.SparkPi \
  --args yarn-standalone \
  --args 10 \
  --num-workers 100

There does not appear to be an (obvious?) way to get more than 2 nodes involved 
from the repl.

I am running the REPL like this:

#!/bin/sh

. /etc/spark/conf.cloudera.spark/spark-env.sh

export SPARK_JAR=hdfs://nameservice1/user/spark/share/lib/spark-assembly.jar

export SPARK_WORKER_MEMORY=512m

export MASTER=yarn-client

exec $SPARK_HOME/bin/spark-shell

Now if I comment out the line with `export SPARK_JAR=…’ and run this again, I 
get an error like this:

14/05/19 08:03:41 ERROR Client: Error: You must set SPARK_JAR environment 
variable!
Usage: org.apache.spark.deploy.yarn.Client [options] 
Options:
  --jar JAR_PATH             Path to your application's JAR file (required in 
yarn-cluster mode)
  --class CLASS_NAME         Name of your application's main class (required)
  --args ARGS                Arguments to be passed to your application's main 
class.
                             Mutliple invocations are possible, each will be 
passed in order.
  --num-workers NUM          Number of workers to start (Default: 2)
  […]

But none of those options are exposed at the `spark-shell’ level.

Thanks in advance for your guidance.

Eric

specifying worker nodes when using the repl?

Reply via email to