[jira] [Commented] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

zlzhang0122 (Jira) Tue, 06 Jul 2021 20:16:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376184#comment-17376184
 ]


zlzhang0122 commented on FLINK-23194:
-------------------------------------

[~trohrmann] [~pnowojski] 
[Matthias|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mapohl] 
any suggestion is appreciate

> Cache and reuse the ContainerLaunchContext and accelarate the progress of 
> createTaskExecutorLaunchContext on yarn
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-23194
>                 URL: https://issues.apache.org/jira/browse/FLINK-23194
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: zlzhang0122
>            Priority: Major
>             Fix For: 1.14.0
>
>
> When starting the TaskExecutor in container on yarn, this will create 
> ContainerLaunchContext for n times(n represent the number of the TaskManager).
> When I examined the progress of this creation, I found that most of them were 
> in common and had nothing to do with the particular TaskManager except the 
> launchCommand. We can create ContainerLaunchContext once and reuse it. Only 
> the launchCommand need to create separately for every particular TaskManager.
> So I propose that we can cache and reuse the ContainerLaunchContext object to 
> accelerate this creation progress. 
> I think this can have some benefit like below:
>  # this can accelerate the creation of ContainerLaunchContext and also the 
> start of the TaskExecutor, especially under the situation of massive 
> TaskManager.
>  # this can decrease the pressure of the HDFS, etc. 
>  # this can also avoid the suddenly failure of the HDFS or yarn, etc.
> We have implemented this on our production environment. So far there has no 
> problem and have a good benefit. Please let me know if there's any point that 
> I haven't considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

Reply via email to