[ https://issues.apache.org/jira/browse/FLINK-21008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yang Wang updated FLINK-21008: ------------------------------ Summary: Residual HA related Kubernetes ConfigMaps and ZooKeeper nodes when cluster entrypoint received SIGTERM in shutdown (was: ClusterEntrypoint#shutDownAsync may not be fully executed) > Residual HA related Kubernetes ConfigMaps and ZooKeeper nodes when cluster > entrypoint received SIGTERM in shutdown > ------------------------------------------------------------------------------------------------------------------ > > Key: FLINK-21008 > URL: https://issues.apache.org/jira/browse/FLINK-21008 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.11.3, 1.12.1 > Reporter: Yang Wang > Assignee: Yang Wang > Priority: Critical > Fix For: 1.13.0 > > > Recently, in our internal use case for native K8s integration with K8s HA > enabled, we found that the leader related ConfigMaps could be residual in > some corner situations. > After some investigations, I think it is possibly caused by the inappropriate > shutdown process. > In {{ClusterEntrypoint#shutDownAsync}}, we first call the > {{closeClusterComponent}}, which also includes deregistering the Flink > application from cluster management(e.g. Yarn, K8s). Then we call the > {{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster > management do the deregister very fast, the JobManager process receives > SIGNAL 15 before or is being executing the {{stopClusterServices}} and > {{cleanupDirectories}}. The jvm process will directly exit then. So the two > methods may not be executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)