[ https://issues.apache.org/jira/browse/FLINK-35145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932348#comment-17932348 ]
Nishita Pattanayak commented on FLINK-35145: -------------------------------------------- Is this being worked upon? I would like to give this a go in that case. We have also seen that if we try to perform cluster cleanup and the flinksessionjob is already in a terminal state (FAILED: while the operator tries to first cancel the flinksessionjobs). It is blocked as it says flinksession job is already in terminal state and Flinkdeployment still has flinksessionjob tied to it, which does not let clustercleanup happen until flinksessionjob CRD is completed deleted. > Add timeout for cluster termination > ----------------------------------- > > Key: FLINK-35145 > URL: https://issues.apache.org/jira/browse/FLINK-35145 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.20.0 > Reporter: Zhanghao Chen > Priority: Major > Fix For: 2.0.0 > > > Currently, cluster termination may be blocked forever as there's no timeout > for that. For example, for an Application cluster with ZK HA enabled, when ZK > cluster is down, the cluster will reach termination status, but the > termination process will be blocked when trying to clean up HA data on ZK, > where the ZK client will retry connecting to ZK forever. Similar phenomenon > can be observed when an HDFS outage occurs. > I propose adding a timeout for the cluster termination process in > ClusterEntryPoint# > shutDownAsync method. -- This message was sent by Atlassian Jira (v8.20.10#820010)