Hello Team,

I would like to bring attention to a potential bug regarding Kubernetes HA
in Flink 1.15.

In our implementation, we utilize the TRAP command in our entrypoint script
to perform cleanup tasks based on the exit code of the Jobmanager. However,
we have observed an issue where, when using Kubernetes HA, the JVM
sometimes returns an incorrect exit code of 0, even though the job is
failing. The jobmanager log shows 'Terminating cluster entrypoint process
StandaloneApplicationClusterEntryPoint with exit code 1445.' but the TRAP
cmd got a 0 exit code.

Additionally, we have consistently encountered the following log whenever
this issue arises: 'RECEIVED SIGNAL 15: SIGTERM. Shutting down as
requested.' which has been mentioned in this ticket
<https://issues.apache.org/jira/browse/FLINK-26772>

To address this problem, we attempted a switch to Zookeeper HA, and the
issue no longer persisted.

Let me know what you think.
Best regards,
Wei

Reply via email to