[ https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann closed FLINK-21942. --------------------------------- Fix Version/s: 1.12.3 Resolution: Fixed Fixed via 1.13.0: 8aa510b705bdcfe5b8ff69bc0e294a56b437f53e 6b40ff1f384c5a2253c8393c3612d3384ae6bfc5 2eb5d1ce886824fb9eb61847ab56ffba4223a2bf 1.12.3: 3409e7f7e52d1dcb70ce238177bcd837f9bb15d3 8c475b3f0e40be34325a7b37a5b4dbbca738b55d c25dc3f83e07adf4f0788d09201b03bfc8e92801 > KubernetesLeaderRetrievalDriver not closed after terminated which lead to > connection leak > ----------------------------------------------------------------------------------------- > > Key: FLINK-21942 > URL: https://issues.apache.org/jira/browse/FLINK-21942 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.12.2, 1.13.0 > Reporter: Yi Tang > Assignee: Yang Wang > Priority: Major > Labels: k8s-ha, pull-request-available > Fix For: 1.13.0, 1.12.3 > > Attachments: image-2021-03-24-18-08-30-196.png, > image-2021-03-24-18-08-42-116.png, jstack.l > > > Looks like KubernetesLeaderRetrievalDriver is not closed even if the > KubernetesLeaderElectionDriver is closed and job reach globally terminated. > This will lead to many configmap watching be still active with connections to > K8s. > When the connections exceeds max concurrent requests, those new configmap > watching can not be started. Finally leads to all new jobs submitted timeout. > [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you > confirm this issue? > But when many jobs are running in same session cluster, the config map > watching is required to be active. Maybe we should merge all config maps > watching? -- This message was sent by Atlassian Jira (v8.3.4#803005)