[ https://issues.apache.org/jira/browse/FLINK-36451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-36451: ---------------------------------- Fix Version/s: 2.0.0 1.19.2 1.20.1 > Kubernetes Application JobManager Potential Deadlock and TaskManager Pod > Residuals > ---------------------------------------------------------------------------------- > > Key: FLINK-36451 > URL: https://issues.apache.org/jira/browse/FLINK-36451 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.19.1 > Environment: * Flink version: 1.19.1 > * - Deployment mode: Flink Kubernetes Application Mode > * - JVM version: OpenJDK 17 > > Reporter: xiechenling > Assignee: Matthias Pohl > Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.19.2, 1.20.1 > > Attachments: 1.png, 2.png, jobmanager.log, jstack.txt > > > In Kubernetes Application Mode, when there is significant etcd latency or > instability, the Flink JobManager may enter a deadlock situation. > Additionally, TaskManager pods are not cleaned up properly, resulting in > stale resources that prevent the Flink job from recovering correctly. This > issue occurs during frequent service restarts or network instability. -- This message was sent by Atlassian Jira (v8.20.10#820010)