[jira] [Commented] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

zlzhang0122 (Jira) Wed, 14 Jul 2021 00:18:08 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380393#comment-17380393
 ]


zlzhang0122 commented on FLINK-23194:
-------------------------------------

[~trohrmann] [~fly_in_gis] sorry for the late reply. [~trohrmann] Since 
startTaskExecutorInContainer is an asynchronous call, so the acceleration of 
the speed of creation of ContainerLaunchContext is very limited, but it 
definitely decrease the rpc call of the HDFS namenode. [~fly_in_gis] You are 
right, when registerLocalResource for keytab, yarnConf and krb5Conf, we will 
access the HDFS and call the method getFileStatus, cache the 
ContainerLaunchContext will decrease the pressure on HDFS namenode. And if 
TaskManager have different resource specs or JVM parameters in the future, we 
can just not cache these parameters and expose a interface to set them, so, it 
doesn't matter I think.

> Cache and reuse the ContainerLaunchContext and accelarate the progress of 
> createTaskExecutorLaunchContext on yarn
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-23194
>                 URL: https://issues.apache.org/jira/browse/FLINK-23194
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: zlzhang0122
>            Priority: Major
>             Fix For: 1.14.0
>
>
> When starting the TaskExecutor in container on yarn, this will create 
> ContainerLaunchContext for n times(n represent the number of the TaskManager).
> When I examined the progress of this creation, I found that most of them were 
> in common and had nothing to do with the particular TaskManager except the 
> launchCommand. We can create ContainerLaunchContext once and reuse it. Only 
> the launchCommand need to create separately for every particular TaskManager.
> So I propose that we can cache and reuse the ContainerLaunchContext object to 
> accelerate this creation progress. 
> I think this can have some benefit like below:
>  # this can accelerate the creation of ContainerLaunchContext and also the 
> start of the TaskExecutor, especially under the situation of massive 
> TaskManager.
>  # this can decrease the pressure of the HDFS, etc. 
>  # this can also avoid the suddenly failure of the HDFS or yarn, etc.
> We have implemented this on our production environment. So far there has no 
> problem and have a good benefit. Please let me know if there's any point that 
> I haven't considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-23194) Cache and reuse the ContainerLaunchContext and accelarate the progress of createTaskExecutorLaunchContext on yarn

Reply via email to