I think I’ve found the (maybe partial, but major) reason.
It’s between the following lines, (it’s newly captured, but essentially the same place that Zoltán Zvara picked: 15/05/08 11:36:32 INFO BlockManagerMaster: Registered BlockManager 15/05/08 11:36:38 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@cluster04:55237/user/Executor#-149550753] with ID 1 When I read the logs on cluster side, the following lines were found: (the exact time is different with above line, but it’s the difference between machines) 15/05/08 11:36:23 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000 15/05/08 11:36:28 INFO impl.AMRMClientImpl: Received new token for : cluster04:45454 It seemed that Spark deliberately sleeps 5 secs. I’ve read the Spark source code, and in org.apache.spark.deploy.yarn.ApplicationMaster.scala, launchReporterThread() had the code for that. It loops calling allocator.allocateResources() and Thread.sleep(). For sleep, it reads the configuration variable spark.yarn.scheduler.heartbeat.interval-ms (the default value is 5000, which is 5 secs). According to the comment, “we want to be reasonably responsive without causing too many requests to RM”. So, unless YARN immediately fulfill the allocation request, it seems that 5 secs will be wasted. When I modified the configuration variable to 1000, it only waited for 1 sec. Here is the log lines after the change: 15/05/08 11:47:21 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 1000 15/05/08 11:47:22 INFO impl.AMRMClientImpl: Received new token for : cluster04:45454 4 secs saved. So, when one does not want to wait 5 secs, one can change the spark.yarn.scheduler.heartbeat.interval-ms. I hope that the additional overhead it incurs would be negligible. From: Zoltán Zvara [mailto:zoltan.zv...@gmail.com] Sent: Thursday, May 07, 2015 10:05 PM To: Taeyun Kim; user@spark.apache.org Subject: Re: YARN mode startup takes too long (10+ secs) Without considering everything, just a few hints: You are running on YARN. From 09:18:34 to 09:18:37 your application is in state ACCEPTED. There is a noticeable overhead introduced due to communicating with YARN's ResourceManager, NodeManager and given that the YARN scheduler needs time to make a decision. I guess somewhere from 09:18:38 to 09:18:43 your application JAR gets copied to another container requested by the Spark ApplicationMaster deployed on YARN's container 0. Deploying an executor needs further resource negotiations with the ResourceManager usually. Also, as I said, your JAR and Executor's code requires copying to the container's local directory - execution blocked until that is complete. On Thu, May 7, 2015 at 3:09 AM Taeyun Kim <taeyun....@innowireless.com> wrote: Hi, I’m running a spark application with YARN-client or YARN-cluster mode. But it seems to take too long to startup. It takes 10+ seconds to initialize the spark context. Is this normal? Or can it be optimized? The environment is as follows: - Hadoop: Hortonworks HDP 2.2 (Hadoop 2.6) - Spark: 1.3.1 - Client: Windows 7, but similar result on CentOS 6.6 The following is the startup part of the application log. (Some private information was edited) ‘Main: Initializing context’ at the first line and ‘MainProcessor: Deleting previous output files’ at the last line are the logs by the application. Others in between are from Spark itself. Application logic is executed after this log is displayed. --- 15/05/07 09:18:31 INFO Main: Initializing context 15/05/07 09:18:31 INFO SparkContext: Running Spark version 1.3.1 15/05/07 09:18:31 INFO SecurityManager: Changing view acls to: myuser,myapp 15/05/07 09:18:31 INFO SecurityManager: Changing modify acls to: myuser,myapp 15/05/07 09:18:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); users with modify permissions: Set(myuser, myapp) 15/05/07 09:18:31 INFO Slf4jLogger: Slf4jLogger started 15/05/07 09:18:31 INFO Remoting: Starting remoting 15/05/07 09:18:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mymachine:54449] 15/05/07 09:18:31 INFO Utils: Successfully started service 'sparkDriver' on port 54449. 15/05/07 09:18:31 INFO SparkEnv: Registering MapOutputTracker 15/05/07 09:18:32 INFO SparkEnv: Registering BlockManagerMaster 15/05/07 09:18:32 INFO DiskBlockManager: Created local directory at C:\Users\myuser\AppData\Local\Temp\spark-2d3db9d6-ea78-438e-956f-be9c1dcf3a9d\blockmgr-e9ade223-a4b8-4d9f-b038-efd66adf9772 15/05/07 09:18:32 INFO MemoryStore: MemoryStore started with capacity 1956.7 MB 15/05/07 09:18:32 INFO HttpFileServer: HTTP File server directory is C:\Users\myuser\AppData\Local\Temp\spark-ff40d73b-e8ab-433e-88c4-35da27fb6278\httpd-def9220f-ac3a-4dd2-9ac1-2c593b94b2d9 15/05/07 09:18:32 INFO HttpServer: Starting HTTP Server 15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/07 09:18:32 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54450 15/05/07 09:18:32 INFO Utils: Successfully started service 'HTTP file server' on port 54450. 15/05/07 09:18:32 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/07 09:18:32 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/07 09:18:32 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/07 09:18:32 INFO SparkUI: Started SparkUI at http://mymachine:4040 15/05/07 09:18:32 INFO SparkContext: Added JAR file:/D:/Projects/MyApp/MyApp.jar at http://10.111.111.199:54450/jars/MyApp.jar with timestamp 1430957912240 15/05/07 09:18:32 INFO RMProxy: Connecting to ResourceManager at cluster01/10.111.111.11:8050 15/05/07 09:18:32 INFO Client: Requesting a new application from cluster with 3 NodeManagers 15/05/07 09:18:32 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (23040 MB per container) 15/05/07 09:18:32 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/07 09:18:32 INFO Client: Setting up container launch context for our AM 15/05/07 09:18:32 INFO Client: Preparing resources for our AM container 15/05/07 09:18:32 INFO Client: Source and destination file systems are the same. Not copying hdfs://cluster01/apps/spark/spark-assembly-1.3.1-hadoop2.6.0.jar 15/05/07 09:18:32 INFO Client: Setting up the launch environment for our AM container 15/05/07 09:18:33 INFO SecurityManager: Changing view acls to: myuser,myapp 15/05/07 09:18:33 INFO SecurityManager: Changing modify acls to: myuser,myapp 15/05/07 09:18:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); users with modify permissions: Set(myuser, myapp) 15/05/07 09:18:33 INFO Client: Submitting application 2 to ResourceManager 15/05/07 09:18:33 INFO YarnClientImpl: Submitted application application_1430956687773_0002 15/05/07 09:18:34 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:34 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1430957906540 final status: UNDEFINED tracking URL: http://cluster01:8088/proxy/application_1430956687773_0002/ user: myapp 15/05/07 09:18:35 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:36 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:37 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:37 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@cluster02:39698/user/YarnAM#-1579648782] 15/05/07 09:18:37 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cluster01, PROXY_URI_BASES -> http://cluster01:8088/proxy/application_1430956687773_0002), /proxy/application_1430956687773_0002 15/05/07 09:18:37 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15/05/07 09:18:38 INFO Client: Application report for application_1430956687773_0002 (state: RUNNING) 15/05/07 09:18:38 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: cluster02 ApplicationMaster RPC port: 0 queue: default start time: 1430957906540 final status: UNDEFINED tracking URL: http://cluster01:8088/proxy/application_1430956687773_0002/ user: myapp 15/05/07 09:18:38 INFO YarnClientSchedulerBackend: Application application_1430956687773_0002 has started running. 15/05/07 09:18:38 INFO NettyBlockTransferService: Server created on 54491 15/05/07 09:18:38 INFO BlockManagerMaster: Trying to register BlockManager 15/05/07 09:18:38 INFO BlockManagerMasterActor: Registering block manager mymachine:54491 with 1956.7 MB RAM, BlockManagerId(<driver>, mymachine, 54491) 15/05/07 09:18:38 INFO BlockManagerMaster: Registered BlockManager 15/05/07 09:18:43 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@cluster02:44996/user/Executor#-786778979] with ID 1 15/05/07 09:18:43 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 15/05/07 09:18:43 INFO MainProcessor: Deleting previous output files Thanks.