I think so.

In fact, the flow is: allocator.allocateResources() -> sleep -> 
allocator.allocateResources() -> sleep ¡¦

But I guess that on the first allocateResources() the allocation is not 
fulfilled. So sleep occurs.

 

From: Zoltán Zvara [mailto:zoltan.zv...@gmail.com] 
Sent: Friday, May 08, 2015 4:25 PM
To: Taeyun Kim; user@spark.apache.org
Subject: Re: YARN mode startup takes too long (10+ secs)

 

So is this sleep occurs before allocating resources for the first few executors 
to start the job?

 

On Fri, May 8, 2015 at 6:23 AM Taeyun Kim <taeyun....@innowireless.com> wrote:

I think I¡¯ve found the (maybe partial, but major) reason.

 

It¡¯s between the following lines, (it¡¯s newly captured, but essentially the 
same place that Zoltán Zvara picked:

 

15/05/08 11:36:32 INFO BlockManagerMaster: Registered BlockManager

15/05/08 11:36:38 INFO YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkExecutor@cluster04:55237/user/Executor#-149550753] with 
ID 1

 

When I read the logs on cluster side, the following lines were found: (the 
exact time is different with above line, but it¡¯s the difference between 
machines)

 

15/05/08 11:36:23 INFO yarn.ApplicationMaster: Started progress reporter thread 
- sleep time : 5000

15/05/08 11:36:28 INFO impl.AMRMClientImpl: Received new token for : 
cluster04:45454

 

It seemed that Spark deliberately sleeps 5 secs.

I¡¯ve read the Spark source code, and in 
org.apache.spark.deploy.yarn.ApplicationMaster.scala, launchReporterThread() 
had the code for that.

It loops calling allocator.allocateResources() and Thread.sleep(). 

For sleep, it reads the configuration variable 
spark.yarn.scheduler.heartbeat.interval-ms (the default value is 5000, which is 
5 secs).

According to the comment, ¡°we want to be reasonably responsive without causing 
too many requests to RM¡±.

So, unless YARN immediately fulfill the allocation request, it seems that 5 
secs will be wasted.

 

When I modified the configuration variable to 1000, it only waited for 1 sec.

 

Here is the log lines after the change:

 

15/05/08 11:47:21 INFO yarn.ApplicationMaster: Started progress reporter thread 
- sleep time : 1000

15/05/08 11:47:22 INFO impl.AMRMClientImpl: Received new token for : 
cluster04:45454

 

4 secs saved.

So, when one does not want to wait 5 secs, one can change the 
spark.yarn.scheduler.heartbeat.interval-ms.

I hope that the additional overhead it incurs would be negligible.  

 

 

From: Zoltán Zvara [mailto:zoltan.zv...@gmail.com] 
Sent: Thursday, May 07, 2015 10:05 PM
To: Taeyun Kim; user@spark.apache.org
Subject: Re: YARN mode startup takes too long (10+ secs)

 

Without considering everything, just a few hints:

You are running on YARN. From 09:18:34 to 09:18:37 your application is in state 
ACCEPTED. There is a noticeable overhead introduced due to communicating with 
YARN's ResourceManager, NodeManager and given that the YARN scheduler needs 
time to make a decision. I guess somewhere from 09:18:38 to 09:18:43 your 
application JAR gets copied to another container requested by the Spark 
ApplicationMaster deployed on YARN's container 0. Deploying an executor needs 
further resource negotiations with the ResourceManager usually. Also, as I 
said, your JAR and Executor's code requires copying to the container's local 
directory - execution blocked until that is complete.

 

On Thu, May 7, 2015 at 3:09 AM Taeyun Kim <taeyun....@innowireless.com> wrote:

Hi,

 

I’m running a spark application with YARN-client or YARN-cluster mode.

But it seems to take too long to startup.

It takes 10+ seconds to initialize the spark context.

Is this normal? Or can it be optimized?

 

The environment is as follows:

- Hadoop: Hortonworks HDP 2.2 (Hadoop 2.6)

- Spark: 1.3.1

- Client: Windows 7, but similar result on CentOS 6.6

 

The following is the startup part of the application log. (Some private 
information was edited)

‘Main: Initializing context’ at the first line and ‘MainProcessor: Deleting 
previous output files’ at the last line are the logs by the application. Others 
in between are from Spark itself. Application logic is executed after this log 
is displayed.

 

---

 

15/05/07 09:18:31 INFO Main: Initializing context

15/05/07 09:18:31 INFO SparkContext: Running Spark version 1.3.1

15/05/07 09:18:31 INFO SecurityManager: Changing view acls to: myuser,myapp

15/05/07 09:18:31 INFO SecurityManager: Changing modify acls to: myuser,myapp

15/05/07 09:18:31 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); 
users with modify permissions: Set(myuser, myapp)

15/05/07 09:18:31 INFO Slf4jLogger: Slf4jLogger started

15/05/07 09:18:31 INFO Remoting: Starting remoting

15/05/07 09:18:31 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkDriver@mymachine:54449]

15/05/07 09:18:31 INFO Utils: Successfully started service 'sparkDriver' on 
port 54449.

15/05/07 09:18:31 INFO SparkEnv: Registering MapOutputTracker

15/05/07 09:18:32 INFO SparkEnv: Registering BlockManagerMaster

15/05/07 09:18:32 INFO DiskBlockManager: Created local directory at 
C:\Users\myuser\AppData\Local\Temp\spark-2d3db9d6-ea78-438e-956f-be9c1dcf3a9d\blockmgr-e9ade223-a4b8-4d9f-b038-efd66adf9772

15/05/07 09:18:32 INFO MemoryStore: MemoryStore started with capacity 1956.7 MB

15/05/07 09:18:32 INFO HttpFileServer: HTTP File server directory is 
C:\Users\myuser\AppData\Local\Temp\spark-ff40d73b-e8ab-433e-88c4-35da27fb6278\httpd-def9220f-ac3a-4dd2-9ac1-2c593b94b2d9

15/05/07 09:18:32 INFO HttpServer: Starting HTTP Server

15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT

15/05/07 09:18:32 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54450

15/05/07 09:18:32 INFO Utils: Successfully started service 'HTTP file server' 
on port 54450.

15/05/07 09:18:32 INFO SparkEnv: Registering OutputCommitCoordinator

15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT

15/05/07 09:18:32 INFO AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:4040

15/05/07 09:18:32 INFO Utils: Successfully started service 'SparkUI' on port 
4040.

15/05/07 09:18:32 INFO SparkUI: Started SparkUI at http://mymachine:4040

15/05/07 09:18:32 INFO SparkContext: Added JAR 
file:/D:/Projects/MyApp/MyApp.jar at http://10.111.111.199:54450/jars/MyApp.jar 
with timestamp 1430957912240

15/05/07 09:18:32 INFO RMProxy: Connecting to ResourceManager at 
cluster01/10.111.111.11:8050

15/05/07 09:18:32 INFO Client: Requesting a new application from cluster with 3 
NodeManagers

15/05/07 09:18:32 INFO Client: Verifying our application has not requested more 
than the maximum memory capability of the cluster (23040 MB per container)

15/05/07 09:18:32 INFO Client: Will allocate AM container, with 896 MB memory 
including 384 MB overhead

15/05/07 09:18:32 INFO Client: Setting up container launch context for our AM

15/05/07 09:18:32 INFO Client: Preparing resources for our AM container

15/05/07 09:18:32 INFO Client: Source and destination file systems are the 
same. Not copying 
hdfs://cluster01/apps/spark/spark-assembly-1.3.1-hadoop2.6.0.jar

15/05/07 09:18:32 INFO Client: Setting up the launch environment for our AM 
container

15/05/07 09:18:33 INFO SecurityManager: Changing view acls to: myuser,myapp

15/05/07 09:18:33 INFO SecurityManager: Changing modify acls to: myuser,myapp

15/05/07 09:18:33 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); 
users with modify permissions: Set(myuser, myapp)

15/05/07 09:18:33 INFO Client: Submitting application 2 to ResourceManager

15/05/07 09:18:33 INFO YarnClientImpl: Submitted application 
application_1430956687773_0002

15/05/07 09:18:34 INFO Client: Application report for 
application_1430956687773_0002 (state: ACCEPTED)

15/05/07 09:18:34 INFO Client: 

            client token: N/A

           diagnostics: N/A

           ApplicationMaster host: N/A

           ApplicationMaster RPC port: -1

           queue: default

           start time: 1430957906540

           final status: UNDEFINED

           tracking URL: 
http://cluster01:8088/proxy/application_1430956687773_0002/

           user: myapp

15/05/07 09:18:35 INFO Client: Application report for 
application_1430956687773_0002 (state: ACCEPTED)

15/05/07 09:18:36 INFO Client: Application report for 
application_1430956687773_0002 (state: ACCEPTED)

15/05/07 09:18:37 INFO Client: Application report for 
application_1430956687773_0002 (state: ACCEPTED)

15/05/07 09:18:37 INFO YarnClientSchedulerBackend: ApplicationMaster registered 
as Actor[akka.tcp://sparkYarnAM@cluster02:39698/user/YarnAM#-1579648782]

15/05/07 09:18:37 INFO YarnClientSchedulerBackend: Add WebUI Filter. 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> 
cluster01, PROXY_URI_BASES -> 
http://cluster01:8088/proxy/application_1430956687773_0002), 
/proxy/application_1430956687773_0002

15/05/07 09:18:37 INFO JettyUtils: Adding filter: 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter

15/05/07 09:18:38 INFO Client: Application report for 
application_1430956687773_0002 (state: RUNNING)

15/05/07 09:18:38 INFO Client: 

            client token: N/A

           diagnostics: N/A

           ApplicationMaster host: cluster02

           ApplicationMaster RPC port: 0

           queue: default

           start time: 1430957906540

           final status: UNDEFINED

           tracking URL: 
http://cluster01:8088/proxy/application_1430956687773_0002/

           user: myapp

15/05/07 09:18:38 INFO YarnClientSchedulerBackend: Application 
application_1430956687773_0002 has started running.

15/05/07 09:18:38 INFO NettyBlockTransferService: Server created on 54491

15/05/07 09:18:38 INFO BlockManagerMaster: Trying to register BlockManager

15/05/07 09:18:38 INFO BlockManagerMasterActor: Registering block manager 
mymachine:54491 with 1956.7 MB RAM, BlockManagerId(<driver>, mymachine, 54491)

15/05/07 09:18:38 INFO BlockManagerMaster: Registered BlockManager

15/05/07 09:18:43 INFO YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkExecutor@cluster02:44996/user/Executor#-786778979] with 
ID 1

15/05/07 09:18:43 INFO YarnClientSchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.8

15/05/07 09:18:43 INFO MainProcessor: Deleting previous output files

 

Thanks.

Reply via email to