Yang Wang created FLINK-21008:
---------------------------------

             Summary: ClusterEntrypoint#shutDownAsync may not be fully executed
                 Key: FLINK-21008
                 URL: https://issues.apache.org/jira/browse/FLINK-21008
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
            Reporter: Yang Wang


Recently, in our internal use case for native K8s integration with K8s HA 
enabled, we found that the leader related ConfigMaps could be residual in some 
corner situations.

After some investigations, I think it is possibly caused by the inappropriate 
shutdown process.

In {{ClusterEntrypoint#shutDownAsync}}, we first call the 
{{closeClusterComponent}}, which also includes deregistering the Flink 
application from cluster management(e.g. Yarn, K8s). Then we call the 
{{stopClusterServices}} and {{cleanupDirectories}}. Imagine that the cluster 
management do the deregister very fast, the JobManager process receives SIGNAL 
15 before or is being executing the {{stopClusterServices}} and 
{{cleanupDirectories}}. The jvm process will directly exit then. So the two 
methods may not be executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to