Hello Team, I would like to bring attention to a potential bug regarding Kubernetes HA in Flink 1.15.
In our implementation, we utilize the TRAP command in our entrypoint script to perform cleanup tasks based on the exit code of the Jobmanager. However, we have observed an issue where, when using Kubernetes HA, the JVM sometimes returns an incorrect exit code of 0, even though the job is failing. The jobmanager log shows 'Terminating cluster entrypoint process StandaloneApplicationClusterEntryPoint with exit code 1445.' but the TRAP cmd got a 0 exit code. Additionally, we have consistently encountered the following log whenever this issue arises: 'RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.' which has been mentioned in this ticket <https://issues.apache.org/jira/browse/FLINK-26772> To address this problem, we attempted a switch to Zookeeper HA, and the issue no longer persisted. Let me know what you think. Best regards, Wei