Hi team didata,

This doesn't directly answer your question, but with Spark 1.1, instead of
user the driver options, it's better to pass your spark properties using
the "conf" option.

E.g.
pyspark --master yarn-client --conf spark.shuffle.spill=true --conf
spark.yarn.executor.memoryOverhead=512M

Additionally, executor and memory have dedicated options:

pyspark --master yarn-client --conf spark.shuffle.spill=true --conf
spark.yarn.executor.memoryOverhead=512M --driver-memory 3G
--executor-memory 5G

-Sandy


On Tue, Sep 16, 2014 at 6:22 PM, Dimension Data, LLC. <
subscripti...@didata.us> wrote:

>
>
> Hello friends:
>
> Yesterday I compiled Spark 1.1.0 against CDH5's Hadoop/YARN distribution.
> Everything went fine, and everything seems
> to work, but for the following.
>
> Following are two invocations of the 'pyspark' script, one with enclosing
> quotes around the options passed to
> '--driver-java-options', and one without them. I added the following
> one-line in the 'pyspark' script to
> show my problem...
>
> ADDED: echo "xxx${PYSPARK_SUBMIT_ARGS}xxx" # Added after the line that
> exports this variable.
>
> =========================================================
>
> FIRST:
> [ without enclosing quotes ]:
>
>     user@linux$ pyspark --master yarn-client --driver-java-options
> -Dspark.executor.memory=1G -Dspark.ui.port=8468 -Dspark.driver.memory=512M
> -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3
> -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar
> xxx --master yarn-client --driver-java-options
> -Dspark.executor.memory=1Gxxx  <--- echo statement show option truncation.
>
> While this succeeds in getting to a pyspark shell prompt (sc), the context
> isn't setup properly because, as seen
> in red above and below, all but the first option took effect. (Note
> spark.executor.memory is correct but that's only because
> my spark defaults coincide with it.)
>
> 14/09/16 17:35:32 INFO yarn.Client:   command: $JAVA_HOME/bin/java -server
> -Xmx512m -Djava.io.tmpdir=$PWD/tmp
> '-Dspark.tachyonStore.folderName=spark-e225c04d-5333-4ca6-9a78-1c3392438d89'
> '-Dspark.serializer.objectStreamReset=100' '-Dspark.executor.memory=1G'
> '-Dspark.rdd.compress=True' '-Dspark.yarn.secondary.jars='
> '-Dspark.submit.pyFiles='
> '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer'
> '-Dspark.driver.host=dstorm' '-Dspark.driver.appUIHistoryAddress=' '-
> Dspark.app.name=PySparkShell' '-Dspark.driver.appUIAddress=dstorm:4040'
> '-Dspark.driver.extraJavaOptions=-Dspark.executor.memory=1G'
> '-Dspark.fileserver.uri=http://192.168.0.16:60305'
> '-Dspark.driver.port=44616' '-Dspark.master=yarn-client'
> org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar
> null  --arg  'dstorm:44616' --executor-memory 1024 --executor-cores 1 
> --num-executors
> 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
>
> (Note: I happen to notice that 'spark.driver.memory' is missing as well).
>
> ===========================================
>
> NEXT:
>
> [ So let's try with enclosing quotes ]
>     user@linux$ pyspark --master yarn-client --driver-java-options
> '-Dspark.executor.memory=1G -Dspark.ui.port=8468 -Dspark.driver.memory=512M
> -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3
> -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar'
> xxx --master yarn-client --driver-java-options "-Dspark.executor.memory=1G
> -Dspark.ui.port=8468 -Dspark.driver.memory=512M
> -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3
> -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar"xxx
>
> While this does have all the options (shown in the red echo output above
> and the command executed below), pyspark invocation fails, indicating
> that the application ended before I got to a shell prompt.
> See below snippet.
>
> 14/09/16 17:44:12 INFO yarn.Client:   command: $JAVA_HOME/bin/java -server
> -Xmx512m -Djava.io.tmpdir=$PWD/tmp
> '-Dspark.tachyonStore.folderName=spark-3b62ece7-a22a-4d0a-b773-1f5601e5eada'
> '-Dspark.executor.memory=1G' '-Dspark.driver.memory=512M'
> '-Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar'
> '-Dspark.serializer.objectStreamReset=100' '-Dspark.executor.instances=3'
> '-Dspark.rdd.compress=True' '-Dspark.yarn.secondary.jars='
> '-Dspark.submit.pyFiles=' '-Dspark.ui.port=8468'
> '-Dspark.driver.host=dstorm'
> '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer'
> '-Dspark.driver.appUIHistoryAddress=' '-Dspark.app.name=PySparkShell'
> '-Dspark.driver.appUIAddress=dstorm:8468' '-D
> spark.yarn.executor.memoryOverhead=512M'
> '-Dspark.driver.extraJavaOptions=-Dspark.executor.memory=1G
> -Dspark.ui.port=8468 -Dspark.driver.memory=512M
> -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3
> -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar'
> '-Dspark.fileserver.uri=http://192.168.0.16:54171'
> '-Dspark.master=yarn-client' '-Dspark.driver.port=58542'
> org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar
> null  --arg  'dstorm:58542' --executor-memory 1024 --executor-cores 1
> --num-executors  3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
>
>
> [ ... SNIP ... ]
> 4/09/16 17:44:12 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>      appMasterRpcPort: -1
>      appStartTime: 1410903852044
>      yarnAppState: ACCEPTED
>
> 14/09/16 17:44:13 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>      appMasterRpcPort: -1
>      appStartTime: 1410903852044
>      yarnAppState: ACCEPTED
>
> 14/09/16 17:44:14 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>      appMasterRpcPort: -1
>      appStartTime: 1410903852044
>      yarnAppState: ACCEPTED
>
> 14/09/16 17:44:15 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>      appMasterRpcPort: 0
>      appStartTime: 1410903852044
>      yarnAppState: RUNNING
>
> 14/09/16 17:44:19 ERROR cluster.YarnClientSchedulerBackend: Yarn
> application already ended: FAILED
>
>
> Am I doing something wrong?
>
> Thank you in advance!
> Team didata
>
>
>
>
>

Reply via email to