[ https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434081#comment-17434081 ]
Aitozi commented on FLINK-24624: -------------------------------- [~wangyang0918] Besides that, I create this issue also want to discuss that do we have to guarantee the k8s resource is cleaned when we deploy a session or application mode cluster failed. As far as i know(I am doing some test to deploy kubernetes deploy), there is residual k8s resources in some situations like: 1. deployClusterInternal success , but failed to getClusterClient from the {{ClusterClientProvider}} which is shown in this issue. 2. deploySessionCluster success, but we have problem with deployment to spawn a ready pod due to the resource or schedule problem or webhook intercept of kubernetes We can simply to try-catch the deploySessionCluster method block to solve the case 1 which have been done in my PR. But I still have some concern about the case2. I think there there should be a deadline to spawn a cluster , the related resource should be destroy after timeout. > Add clean up phase when kubernetes session start failed > ------------------------------------------------------- > > Key: FLINK-24624 > URL: https://issues.apache.org/jira/browse/FLINK-24624 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.14.0 > Reporter: Aitozi > Priority: Major > Labels: pull-request-available > > Serval k8s resources are created when deploy the kubernetes session. But the > resource are left there when deploy failed. This will lead to the next > failure or resource leak. So I think we should add the clean up phase when > start failed -- This message was sent by Atlassian Jira (v8.3.4#803005)