[ 
https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-21942.
---------------------------------
    Fix Version/s: 1.12.3
       Resolution: Fixed

Fixed via

1.13.0:
8aa510b705bdcfe5b8ff69bc0e294a56b437f53e
6b40ff1f384c5a2253c8393c3612d3384ae6bfc5
2eb5d1ce886824fb9eb61847ab56ffba4223a2bf

1.12.3:
3409e7f7e52d1dcb70ce238177bcd837f9bb15d3
8c475b3f0e40be34325a7b37a5b4dbbca738b55d
c25dc3f83e07adf4f0788d09201b03bfc8e92801

> KubernetesLeaderRetrievalDriver not closed after terminated which lead to 
> connection leak
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-21942
>                 URL: https://issues.apache.org/jira/browse/FLINK-21942
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.2, 1.13.0
>            Reporter: Yi Tang
>            Assignee: Yang Wang
>            Priority: Major
>              Labels: k8s-ha, pull-request-available
>             Fix For: 1.13.0, 1.12.3
>
>         Attachments: image-2021-03-24-18-08-30-196.png, 
> image-2021-03-24-18-08-42-116.png, jstack.l
>
>
> Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
> KubernetesLeaderElectionDriver is closed and job reach globally terminated.
> This will lead to many configmap watching be still active with connections to 
> K8s.
> When the connections exceeds max concurrent requests, those new configmap 
> watching can not be started. Finally leads to all new jobs submitted timeout.
> [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
> confirm this issue?
> But when many jobs are running in same session cluster, the config map 
> watching is required to be active. Maybe we should merge all config maps 
> watching?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to