[ https://issues.apache.org/jira/browse/FLINK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904215#comment-17904215 ]
Matthias Pohl commented on FLINK-36451: --------------------------------------- Fix is merged to {{master}}. I'm preparing the backports for 1.19 and 1.20 now. > Kubernetes Application JobManager Potential Deadlock and TaskManager Pod > Residuals > ---------------------------------------------------------------------------------- > > Key: FLINK-36451 > URL: https://issues.apache.org/jira/browse/FLINK-36451 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.19.1 > Environment: * Flink version: 1.19.1 > * - Deployment mode: Flink Kubernetes Application Mode > * - JVM version: OpenJDK 17 > > Reporter: xiechenling > Assignee: Matthias Pohl > Priority: Major > Labels: pull-request-available > Attachments: 1.png, 2.png, jobmanager.log, jstack.txt > > > In Kubernetes Application Mode, when there is significant etcd latency or > instability, the Flink JobManager may enter a deadlock situation. > Additionally, TaskManager pods are not cleaned up properly, resulting in > stale resources that prevent the Flink job from recovering correctly. This > issue occurs during frequent service restarts or network instability. -- This message was sent by Atlassian Jira (v8.20.10#820010)