Hi,
I'm running a spark application with YARN-client or YARN-cluster mode. But it seems to take too long to startup. It takes 10+ seconds to initialize the spark context. Is this normal? Or can it be optimized? The environment is as follows: - Hadoop: Hortonworks HDP 2.2 (Hadoop 2.6) - Spark: 1.3.1 - Client: Windows 7, but similar result on CentOS 6.6 The following is the startup part of the application log. (Some private information was edited) 'Main: Initializing context' at the first line and 'MainProcessor: Deleting previous output files' at the last line are the logs by the application. Others in between are from Spark itself. Application logic is executed after this log is displayed. --- 15/05/07 09:18:31 INFO Main: Initializing context 15/05/07 09:18:31 INFO SparkContext: Running Spark version 1.3.1 15/05/07 09:18:31 INFO SecurityManager: Changing view acls to: myuser,myapp 15/05/07 09:18:31 INFO SecurityManager: Changing modify acls to: myuser,myapp 15/05/07 09:18:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); users with modify permissions: Set(myuser, myapp) 15/05/07 09:18:31 INFO Slf4jLogger: Slf4jLogger started 15/05/07 09:18:31 INFO Remoting: Starting remoting 15/05/07 09:18:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mymachine:54449] 15/05/07 09:18:31 INFO Utils: Successfully started service 'sparkDriver' on port 54449. 15/05/07 09:18:31 INFO SparkEnv: Registering MapOutputTracker 15/05/07 09:18:32 INFO SparkEnv: Registering BlockManagerMaster 15/05/07 09:18:32 INFO DiskBlockManager: Created local directory at C:\Users\myuser\AppData\Local\Temp\spark-2d3db9d6-ea78-438e-956f-be9c1dcf3a9 d\blockmgr-e9ade223-a4b8-4d9f-b038-efd66adf9772 15/05/07 09:18:32 INFO MemoryStore: MemoryStore started with capacity 1956.7 MB 15/05/07 09:18:32 INFO HttpFileServer: HTTP File server directory is C:\Users\myuser\AppData\Local\Temp\spark-ff40d73b-e8ab-433e-88c4-35da27fb627 8\httpd-def9220f-ac3a-4dd2-9ac1-2c593b94b2d9 15/05/07 09:18:32 INFO HttpServer: Starting HTTP Server 15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/07 09:18:32 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54450 15/05/07 09:18:32 INFO Utils: Successfully started service 'HTTP file server' on port 54450. 15/05/07 09:18:32 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/07 09:18:32 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/07 09:18:32 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/07 09:18:32 INFO SparkUI: Started SparkUI at http://mymachine:4040 15/05/07 09:18:32 INFO SparkContext: Added JAR file:/D:/Projects/MyApp/MyApp.jar at http://10.111.111.199:54450/jars/MyApp.jar with timestamp 1430957912240 15/05/07 09:18:32 INFO RMProxy: Connecting to ResourceManager at cluster01/10.111.111.11:8050 15/05/07 09:18:32 INFO Client: Requesting a new application from cluster with 3 NodeManagers 15/05/07 09:18:32 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (23040 MB per container) 15/05/07 09:18:32 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/07 09:18:32 INFO Client: Setting up container launch context for our AM 15/05/07 09:18:32 INFO Client: Preparing resources for our AM container 15/05/07 09:18:32 INFO Client: Source and destination file systems are the same. Not copying hdfs://cluster01/apps/spark/spark-assembly-1.3.1-hadoop2.6.0.jar 15/05/07 09:18:32 INFO Client: Setting up the launch environment for our AM container 15/05/07 09:18:33 INFO SecurityManager: Changing view acls to: myuser,myapp 15/05/07 09:18:33 INFO SecurityManager: Changing modify acls to: myuser,myapp 15/05/07 09:18:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); users with modify permissions: Set(myuser, myapp) 15/05/07 09:18:33 INFO Client: Submitting application 2 to ResourceManager 15/05/07 09:18:33 INFO YarnClientImpl: Submitted application application_1430956687773_0002 15/05/07 09:18:34 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:34 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1430957906540 final status: UNDEFINED tracking URL: http://cluster01:8088/proxy/application_1430956687773_0002/ user: myapp 15/05/07 09:18:35 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:36 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:37 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED) 15/05/07 09:18:37 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@cluster02:39698/user/YarnAM#-1579648782] 15/05/07 09:18:37 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cluster01, PROXY_URI_BASES -> http://cluster01:8088/proxy/application_1430956687773_0002), /proxy/application_1430956687773_0002 15/05/07 09:18:37 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 15/05/07 09:18:38 INFO Client: Application report for application_1430956687773_0002 (state: RUNNING) 15/05/07 09:18:38 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: cluster02 ApplicationMaster RPC port: 0 queue: default start time: 1430957906540 final status: UNDEFINED tracking URL: http://cluster01:8088/proxy/application_1430956687773_0002/ user: myapp 15/05/07 09:18:38 INFO YarnClientSchedulerBackend: Application application_1430956687773_0002 has started running. 15/05/07 09:18:38 INFO NettyBlockTransferService: Server created on 54491 15/05/07 09:18:38 INFO BlockManagerMaster: Trying to register BlockManager 15/05/07 09:18:38 INFO BlockManagerMasterActor: Registering block manager mymachine:54491 with 1956.7 MB RAM, BlockManagerId(<driver>, mymachine, 54491) 15/05/07 09:18:38 INFO BlockManagerMaster: Registered BlockManager 15/05/07 09:18:43 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@cluster02:44996/user/Executor#-786778979] with ID 1 15/05/07 09:18:43 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 15/05/07 09:18:43 INFO MainProcessor: Deleting previous output files Thanks.