xiechenling created FLINK-36451:
-----------------------------------

             Summary: Kubernetes Application JobManager Potential Deadlock and 
TaskManager Pod Residuals
                 Key: FLINK-36451
                 URL: https://issues.apache.org/jira/browse/FLINK-36451
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.19.1
         Environment: * Flink version: 1.19.1
 * - Deployment mode: Flink Kubernetes Application Mode
 * - JVM version: OpenJDK 17

 
            Reporter: xiechenling
         Attachments: 1.png, 2.png, jobmanager.log, jstack.txt

In Kubernetes Application Mode, when there is significant etcd latency or 
instability, the Flink JobManager may enter a deadlock situation. Additionally, 
TaskManager pods are not cleaned up properly, resulting in stale resources that 
prevent the Flink job from recovering correctly. This issue occurs during 
frequent service restarts or network instability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to