Hi team didata, This doesn't directly answer your question, but with Spark 1.1, instead of user the driver options, it's better to pass your spark properties using the "conf" option.
E.g. pyspark --master yarn-client --conf spark.shuffle.spill=true --conf spark.yarn.executor.memoryOverhead=512M Additionally, executor and memory have dedicated options: pyspark --master yarn-client --conf spark.shuffle.spill=true --conf spark.yarn.executor.memoryOverhead=512M --driver-memory 3G --executor-memory 5G -Sandy On Tue, Sep 16, 2014 at 6:22 PM, Dimension Data, LLC. < subscripti...@didata.us> wrote: > > > Hello friends: > > Yesterday I compiled Spark 1.1.0 against CDH5's Hadoop/YARN distribution. > Everything went fine, and everything seems > to work, but for the following. > > Following are two invocations of the 'pyspark' script, one with enclosing > quotes around the options passed to > '--driver-java-options', and one without them. I added the following > one-line in the 'pyspark' script to > show my problem... > > ADDED: echo "xxx${PYSPARK_SUBMIT_ARGS}xxx" # Added after the line that > exports this variable. > > ========================================================= > > FIRST: > [ without enclosing quotes ]: > > user@linux$ pyspark --master yarn-client --driver-java-options > -Dspark.executor.memory=1G -Dspark.ui.port=8468 -Dspark.driver.memory=512M > -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3 > -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar > xxx --master yarn-client --driver-java-options > -Dspark.executor.memory=1Gxxx <--- echo statement show option truncation. > > While this succeeds in getting to a pyspark shell prompt (sc), the context > isn't setup properly because, as seen > in red above and below, all but the first option took effect. (Note > spark.executor.memory is correct but that's only because > my spark defaults coincide with it.) > > 14/09/16 17:35:32 INFO yarn.Client: command: $JAVA_HOME/bin/java -server > -Xmx512m -Djava.io.tmpdir=$PWD/tmp > '-Dspark.tachyonStore.folderName=spark-e225c04d-5333-4ca6-9a78-1c3392438d89' > '-Dspark.serializer.objectStreamReset=100' '-Dspark.executor.memory=1G' > '-Dspark.rdd.compress=True' '-Dspark.yarn.secondary.jars=' > '-Dspark.submit.pyFiles=' > '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer' > '-Dspark.driver.host=dstorm' '-Dspark.driver.appUIHistoryAddress=' '- > Dspark.app.name=PySparkShell' '-Dspark.driver.appUIAddress=dstorm:4040' > '-Dspark.driver.extraJavaOptions=-Dspark.executor.memory=1G' > '-Dspark.fileserver.uri=http://192.168.0.16:60305' > '-Dspark.driver.port=44616' '-Dspark.master=yarn-client' > org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar > null --arg 'dstorm:44616' --executor-memory 1024 --executor-cores 1 > --num-executors > 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr > > (Note: I happen to notice that 'spark.driver.memory' is missing as well). > > =========================================== > > NEXT: > > [ So let's try with enclosing quotes ] > user@linux$ pyspark --master yarn-client --driver-java-options > '-Dspark.executor.memory=1G -Dspark.ui.port=8468 -Dspark.driver.memory=512M > -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3 > -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar' > xxx --master yarn-client --driver-java-options "-Dspark.executor.memory=1G > -Dspark.ui.port=8468 -Dspark.driver.memory=512M > -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3 > -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar"xxx > > While this does have all the options (shown in the red echo output above > and the command executed below), pyspark invocation fails, indicating > that the application ended before I got to a shell prompt. > See below snippet. > > 14/09/16 17:44:12 INFO yarn.Client: command: $JAVA_HOME/bin/java -server > -Xmx512m -Djava.io.tmpdir=$PWD/tmp > '-Dspark.tachyonStore.folderName=spark-3b62ece7-a22a-4d0a-b773-1f5601e5eada' > '-Dspark.executor.memory=1G' '-Dspark.driver.memory=512M' > '-Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar' > '-Dspark.serializer.objectStreamReset=100' '-Dspark.executor.instances=3' > '-Dspark.rdd.compress=True' '-Dspark.yarn.secondary.jars=' > '-Dspark.submit.pyFiles=' '-Dspark.ui.port=8468' > '-Dspark.driver.host=dstorm' > '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer' > '-Dspark.driver.appUIHistoryAddress=' '-Dspark.app.name=PySparkShell' > '-Dspark.driver.appUIAddress=dstorm:8468' '-D > spark.yarn.executor.memoryOverhead=512M' > '-Dspark.driver.extraJavaOptions=-Dspark.executor.memory=1G > -Dspark.ui.port=8468 -Dspark.driver.memory=512M > -Dspark.yarn.executor.memoryOverhead=512M -Dspark.executor.instances=3 > -Dspark.yarn.jar=hdfs://namenode:8020/user/spark/share/lib/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.2.jar' > '-Dspark.fileserver.uri=http://192.168.0.16:54171' > '-Dspark.master=yarn-client' '-Dspark.driver.port=58542' > org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar > null --arg 'dstorm:58542' --executor-memory 1024 --executor-cores 1 > --num-executors 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr > > > [ ... SNIP ... ] > 4/09/16 17:44:12 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > appMasterRpcPort: -1 > appStartTime: 1410903852044 > yarnAppState: ACCEPTED > > 14/09/16 17:44:13 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > appMasterRpcPort: -1 > appStartTime: 1410903852044 > yarnAppState: ACCEPTED > > 14/09/16 17:44:14 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > appMasterRpcPort: -1 > appStartTime: 1410903852044 > yarnAppState: ACCEPTED > > 14/09/16 17:44:15 INFO cluster.YarnClientSchedulerBackend: Application > report from ASM: > appMasterRpcPort: 0 > appStartTime: 1410903852044 > yarnAppState: RUNNING > > 14/09/16 17:44:19 ERROR cluster.YarnClientSchedulerBackend: Yarn > application already ended: FAILED > > > Am I doing something wrong? > > Thank you in advance! > Team didata > > > > >