Yes, we are on Spark 0.9.0 so that explains the first piece, thanks! Also, yes, I meant SPARK_WORKER_MEMORY. Thanks for the hierarchy. Similarly is there some best practice on setting SPARK_WORKER_INSTANCES and spark.default.parallelism?
Thanks, Arun On Tue, May 20, 2014 at 3:04 PM, Andrew Or <and...@databricks.com> wrote: > I'm assuming you're running Spark 0.9.x, because in the latest version of > Spark you shouldn't have to add the HADOOP_CONF_DIR to the java class path > manually. I tested this out on my own YARN cluster and was able to confirm > that. > > In Spark 1.0, SPARK_MEM is deprecated and should not be used. Instead, you > should set the per-executor memory through spark.executor.memory, which has > the same effect but takes higher priority. By YARN_WORKER_MEM, do you mean > SPARK_EXECUTOR_MEMORY? It also does the same thing. In Spark 1.0, the > priority hierarchy is as follows: > > spark.executor.memory (set through spark-defaults.conf) > > SPARK_EXECUTOR_MEMORY > SPARK_MEM (deprecated) > > In Spark 0.9, the hierarchy very similar: > > spark.executor.memory (set through SPARK_JAVA_OPTS in spark-env) > > SPARK_MEM > > For more information: > > http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/configuration.html > http://spark.apache.org/docs/0.9.1/configuration.html > > > > 2014-05-20 11:30 GMT-07:00 Arun Ahuja <aahuj...@gmail.com>: > > I was actually able to get this to work. I was NOT setting the classpath >> properly originally. >> >> Simply running >> java -cp /etc/hadoop/conf/:<yarn, hadoop jars> com.domain.JobClass >> >> and setting yarn-client as the spark master worked for me. Originally I >> had not put the configuration on the classpath. Also, I used >> $SPARK_HOME/bin/compute_classpath.sh now now to get all of the relevant >> jars. The job properly connects to the am at the correct port. >> >> Is there any intuition on how spark executor map to yarn workers or how >> the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM? >> >> Thanks, >> Arun >> >> >> On Tue, May 20, 2014 at 2:25 PM, Andrew Or <and...@databricks.com> wrote: >> >>> Hi Gaurav and Arun, >>> >>> Your settings seem reasonable; as long as YARN_CONF_DIR or >>> HADOOP_CONF_DIR is properly set, the application should be able to find the >>> correct RM port. Have you tried running the examples in yarn-client mode, >>> and your custom application in yarn-standalone (now yarn-cluster) mode? >>> >>> >>> >>> 2014-05-20 5:17 GMT-07:00 gaurav.dasgupta <gaurav.d...@gmail.com>: >>> >>> Few more details I would like to provide (Sorry as I should have provided >>>> with the previous post): >>>> >>>> *- Spark Version = 0.9.1 (using pre-built spark-0.9.1-bin-hadoop2) >>>> - Hadoop Version = 2.4.0 (Hortonworks) >>>> - I am trying to execute a Spark Streaming program* >>>> >>>> Because I am using Hortornworks Hadoop (HDP), YARN is configured with >>>> different port numbers than the default Apache's default >>>> configurations. For >>>> example, *resourcemanager.address* is <IP>:8050 in HDP whereas it >>>> defaults >>>> to <IP>:8032. >>>> >>>> When I run the Spark examples using bin/run-example, I can see in the >>>> console logs, that it is connecting to the right port configured by HDP, >>>> i.e., 8050. Please refer the below console log: >>>> >>>> */[root@host spark-0.9.1-bin-hadoop2]# SPARK_YARN_MODE=true >>>> >>>> SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar >>>> >>>> SPARK_YARN_APP_JAR=examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar >>>> bin/run-example org.apache.spark.examples.HdfsTest yarn-client >>>> /user/root/test >>>> SLF4J: Class path contains multiple SLF4J bindings. >>>> SLF4J: Found binding in >>>> >>>> [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>> SLF4J: Found binding in >>>> >>>> [jar:file:/usr/local/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>> explanation. >>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>> 14/05/20 06:55:29 INFO slf4j.Slf4jLogger: Slf4jLogger started >>>> 14/05/20 06:55:29 INFO Remoting: Starting remoting >>>> 14/05/20 06:55:29 INFO Remoting: Remoting started; listening on >>>> addresses >>>> :[akka.tcp://spark@<IP:60988] >>>> 14/05/20 06:55:29 INFO Remoting: Remoting now listens on addresses: >>>> [akka.tcp://spark@<IP>:60988] >>>> 14/05/20 06:55:29 INFO spark.SparkEnv: Registering BlockManagerMaster >>>> 14/05/20 06:55:29 INFO storage.DiskBlockManager: Created local >>>> directory at >>>> /tmp/spark-local-20140520065529-924f >>>> 14/05/20 06:55:29 INFO storage.MemoryStore: MemoryStore started with >>>> capacity 4.2 GB. >>>> 14/05/20 06:55:29 INFO network.ConnectionManager: Bound socket to port >>>> 35359 >>>> with id = ConnectionManagerId(<IP>,35359) >>>> 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Trying to register >>>> BlockManager >>>> 14/05/20 06:55:29 INFO storage.BlockManagerMasterActor$BlockManagerInfo: >>>> Registering block manager <IP>:35359 with 4.2 GB RAM >>>> 14/05/20 06:55:29 INFO storage.BlockManagerMaster: Registered >>>> BlockManager >>>> 14/05/20 06:55:29 INFO spark.HttpServer: Starting HTTP Server >>>> 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT >>>> 14/05/20 06:55:29 INFO server.AbstractConnector: Started >>>> SocketConnector@0.0.0.0:59418 >>>> 14/05/20 06:55:29 INFO broadcast.HttpBroadcast: Broadcast server >>>> started at >>>> http://<IP>:59418 >>>> 14/05/20 06:55:29 INFO spark.SparkEnv: Registering MapOutputTracker >>>> 14/05/20 06:55:29 INFO spark.HttpFileServer: HTTP File server directory >>>> is >>>> /tmp/spark-fc34fdc8-d940-420b-b184-fc7a8a65501a >>>> 14/05/20 06:55:29 INFO spark.HttpServer: Starting HTTP Server >>>> 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT >>>> 14/05/20 06:55:29 INFO server.AbstractConnector: Started >>>> SocketConnector@0.0.0.0:53425 >>>> 14/05/20 06:55:29 INFO server.Server: jetty-7.x.y-SNAPSHOT >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/storage/rdd,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/storage,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/stages/stage,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/stages/pool,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/stages,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/environment,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/executors,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/metrics/json,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/static,null} >>>> 14/05/20 06:55:29 INFO handler.ContextHandler: started >>>> o.e.j.s.h.ContextHandler{/,null} >>>> 14/05/20 06:55:29 INFO server.AbstractConnector: Started >>>> SelectChannelConnector@0.0.0.0:4040 >>>> 14/05/20 06:55:29 INFO ui.SparkUI: Started Spark Web UI at http:// >>>> <IP>:4040 >>>> 14/05/20 06:55:29 WARN util.NativeCodeLoader: Unable to load >>>> native-hadoop >>>> library for your platform... using builtin-java classes where applicable >>>> 14/05/20 06:55:29 INFO spark.SparkContext: Added JAR >>>> >>>> /usr/local/spark-0.9.1-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar >>>> at http://<IP>:53425/jars/spark-examples_2.10-assembly-0.9.1.jar with >>>> timestamp 1400586929921 >>>> 14/05/20 06:55:30 INFO client.RMProxy: Connecting to ResourceManager at >>>> <IP>:8050 >>>> 14/05/20 06:55:30 INFO yarn.Client: Got Cluster metric info from >>>> ApplicationsManager (ASM), number of NodeManagers: 9 >>>> 14/05/20 06:55:30 INFO yarn.Client: Queue info ... queueName: default, >>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,/* >>>> >>>> But, when I running my own custom spark streaming code, it is trying to >>>> connect to port number 8032 instead and hence unable to connect. Refer >>>> the >>>> below log: >>>> >>>> */[root@host spark-0.9.1-bin-hadoop2]# SPARK_YARN_MODE=true >>>> >>>> SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar >>>> SPARK_YARN_APP_JAR=/home/gaurav/SparkStreamExample.jar java -cp >>>> >>>> /home/gaurav/SparkStreamExample.jar:assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar >>>> SparkStreamExample yarn-client <IP> 9999 >>>> log4j:WARN No appenders could be found for logger >>>> (akka.event.slf4j.Slf4jLogger). >>>> log4j:WARN Please initialize the log4j system properly. >>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor >>>> more info. >>>> 14/05/20 07:04:38 INFO SparkEnv: Using Spark's default log4j profile: >>>> org/apache/spark/log4j-defaults.properties >>>> 14/05/20 07:04:38 INFO SparkEnv: Registering BlockManagerMaster >>>> 14/05/20 07:04:38 INFO DiskBlockManager: Created local directory at >>>> /tmp/spark-local-20140520070438-5eae >>>> 14/05/20 07:04:38 INFO MemoryStore: MemoryStore started with capacity >>>> 4.2 >>>> GB. >>>> 14/05/20 07:04:38 INFO ConnectionManager: Bound socket to port 49869 >>>> with id >>>> = ConnectionManagerId(<IP>,49869) >>>> 14/05/20 07:04:38 INFO BlockManagerMaster: Trying to register >>>> BlockManager >>>> 14/05/20 07:04:38 INFO BlockManagerMasterActor$BlockManagerInfo: >>>> Registering >>>> block manager <IP>:49869 with 4.2 GB RAM >>>> 14/05/20 07:04:38 INFO BlockManagerMaster: Registered BlockManager >>>> 14/05/20 07:04:38 INFO HttpServer: Starting HTTP Server >>>> 14/05/20 07:04:38 INFO HttpBroadcast: Broadcast server started at >>>> http://<IP>:36946 >>>> 14/05/20 07:04:38 INFO SparkEnv: Registering MapOutputTracker >>>> 14/05/20 07:04:38 INFO HttpFileServer: HTTP File server directory is >>>> /tmp/spark-414ba274-adc0-4a0e-b1a4-9c1f048cbf37 >>>> 14/05/20 07:04:38 INFO HttpServer: Starting HTTP Server >>>> 14/05/20 07:04:38 INFO SparkUI: Started Spark Web UI at http:// >>>> <IP>:4040 >>>> 14/05/20 07:04:38 WARN NativeCodeLoader: Unable to load native-hadoop >>>> library for your platform... using builtin-java classes where applicable >>>> 14/05/20 07:04:38 INFO SparkContext: Added JAR >>>> /home/gaurav/SparkStreamExample.jar at >>>> http://<IP>:40053/jars/SparkStreamExample.jar with timestamp >>>> 1400587478500 >>>> 14/05/20 07:04:38 INFO RMProxy: Connecting to ResourceManager at >>>> /0.0.0.0:8032 >>>> 14/05/20 07:04:39 INFO Client: Retrying connect to server: >>>> 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is >>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) >>>> 14/05/20 07:04:40 INFO Client: Retrying connect to server: >>>> 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is >>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) >>>> 14/05/20 07:04:41 INFO Client: Retrying connect to server: >>>> 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is >>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) >>>> 14/05/20 07:04:42 INFO Client: Retrying connect to server: >>>> 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is >>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>>> SECONDS)/* >>>> >>>> Do I need to specify the YARN ports configured by HDP to Spark somehow? >>>> How >>>> the example jobs can detect the correct YARN ports? >>>> >>>> Thanks in advance. >>>> >>>> -- Gaurav >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-configuration-file-doesn-t-work-when-run-with-yarn-client-mode-tp1418p6097.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>> >>> >> >