[ https://issues.apache.org/jira/browse/FLINK-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377001#comment-17377001 ]
Yang Wang commented on FLINK-23194: ----------------------------------- AFAIK, except for the key tab and Kerberos file, we will not access the HDFS while creating {{ContainerLaunchContext}}. Right? Because we already encode the Yarn local resources to a string in the {{YarnClusterDescriptor}} and decode it when creating {{ContainerLaunchContext}}. Moreover, TaskManager might have different resource specs or JVM parameters in the future, then caching the {{ContainerLaunchContext}} will not make sense. > Cache and reuse the ContainerLaunchContext and accelarate the progress of > createTaskExecutorLaunchContext on yarn > ----------------------------------------------------------------------------------------------------------------- > > Key: FLINK-23194 > URL: https://issues.apache.org/jira/browse/FLINK-23194 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN > Affects Versions: 1.13.1, 1.12.4 > Reporter: zlzhang0122 > Priority: Major > Fix For: 1.14.0 > > > When starting the TaskExecutor in container on yarn, this will create > ContainerLaunchContext for n times(n represent the number of the TaskManager). > When I examined the progress of this creation, I found that most of them were > in common and had nothing to do with the particular TaskManager except the > launchCommand. We can create ContainerLaunchContext once and reuse it. Only > the launchCommand need to create separately for every particular TaskManager. > So I propose that we can cache and reuse the ContainerLaunchContext object to > accelerate this creation progress. > I think this can have some benefit like below: > # this can accelerate the creation of ContainerLaunchContext and also the > start of the TaskExecutor, especially under the situation of massive > TaskManager. > # this can decrease the pressure of the HDFS, etc. > # this can also avoid the suddenly failure of the HDFS or yarn, etc. > We have implemented this on our production environment. So far there has no > problem and have a good benefit. Please let me know if there's any point that > I haven't considered. -- This message was sent by Atlassian Jira (v8.3.4#803005)