[ https://issues.apache.org/jira/browse/FLINK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433448#comment-17433448 ]
Aitozi commented on FLINK-24624: -------------------------------- After looking into the failure, It's caused by the lack of permission {{2021-10-24 23:10:30,385 ERROR org.apache.flink.kubernetes.cli.KubernetesSessionCli [] - Error while running the Flink session.2021-10-24 23:10:30,385 ERROR org.apache.flink.kubernetes.cli.KubernetesSessionCli [] - Error while running the Flink session.io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: [https://xxxx/api/v1/nodes]. Message: Forbidden! User xxx doesn't have permission. nodes is forbidden: User "xxx" cannot list resource "nodes" in API group "" at the cluster scope: noopinion by orca and marlin and k8s rbac. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:521) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:143) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:555) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:90) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getLoadBalancerRestEndpoint(Fabric8FlinkKubeClient.java:463) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndPointFromService(Fabric8FlinkKubeClient.java:438) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient.getRestEndpoint(Fabric8FlinkKubeClient.java:191) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$1(KubernetesClusterDescriptor.java:98) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:164) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:114) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:198) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:198) [flink-dist_2.12-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]}} > Add clean up phase when kubernetes session start failed > ------------------------------------------------------- > > Key: FLINK-24624 > URL: https://issues.apache.org/jira/browse/FLINK-24624 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.14.0 > Reporter: Aitozi > Priority: Major > > Serval k8s resources are created when deploy the kubernetes session. But the > resource are left there when deploy failed. This will lead to the next > failure or resource leak. So I think we should add the clean up phase when > start failed -- This message was sent by Atlassian Jira (v8.3.4#803005)