xiechenling created FLINK-36451: ----------------------------------- Summary: Kubernetes Application JobManager Potential Deadlock and TaskManager Pod Residuals Key: FLINK-36451 URL: https://issues.apache.org/jira/browse/FLINK-36451 Project: Flink Issue Type: Bug Affects Versions: 1.19.1 Environment: * Flink version: 1.19.1 * - Deployment mode: Flink Kubernetes Application Mode * - JVM version: OpenJDK 17
Reporter: xiechenling Attachments: 1.png, 2.png, jobmanager.log, jstack.txt In Kubernetes Application Mode, when there is significant etcd latency or instability, the Flink JobManager may enter a deadlock situation. Additionally, TaskManager pods are not cleaned up properly, resulting in stale resources that prevent the Flink job from recovering correctly. This issue occurs during frequent service restarts or network instability. -- This message was sent by Atlassian Jira (v8.20.10#820010)