Canbin Zheng created FLINK-15817: ------------------------------------ Summary: Kubernetes Resource leak while deployment exception happens Key: FLINK-15817 URL: https://issues.apache.org/jira/browse/FLINK-15817 Project: Flink Issue Type: Sub-task Components: Deployment / Kubernetes Affects Versions: 1.10.0 Reporter: Canbin Zheng Fix For: 1.11.0, 1.10.1
When we deploy a new session cluster on Kubernetes cluster, usually there are four steps to create the Kubernetes components, and the creation order is as below: internal Service -> rest Service -> ConfigMap -> JobManager Deployment. After the internal Service is created, any Exceptions that fail the cluster deployment progress would cause Kubernetes Resource leak, for example: # If failed to create rest Service due to service name constraint([FLINK-15816|https://issues.apache.org/jira/browse/FLINK-15816]), the internal Service would not be cleaned up when the deploy progress terminates. # If failed to create JobManager Deployment(a case is that _jobmanager.heap.size_ is too small such as 512M, which is less than the default configuration value of 'containerized.heap-cutoff-min'), the internal Service, the rest Service, and the ConfigMap all leaks. This ticket proposes to do some clean-ups(cleans the residual Services and ConfigMap) if the cluster deployment progress terminates accidentally on the client-side. -- This message was sent by Atlassian Jira (v8.3.4#803005)