[ https://issues.apache.org/jira/browse/FLINK-30444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648994#comment-17648994 ]
Gyula Fora commented on FLINK-30444: ------------------------------------ This also not consistent with some other startup errors such as, missing application jar. That causes a jobmanager restart loop, but does not put the job a terminal FAILED state. This behaviour is more desirable as it doesn't lead to empty application clusters on Kubernetes > State recovery error not handled correctly and always causes JM failure > ----------------------------------------------------------------------- > > Key: FLINK-30444 > URL: https://issues.apache.org/jira/browse/FLINK-30444 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission > Affects Versions: 1.16.0, 1.14.6, 1.15.3 > Reporter: Gyula Fora > Priority: Critical > > When you submit a job in Application mode and you try to restore from an > incompatible savepoint, there is a very unexpected behaviour. > Even with the following config: > {noformat} > execution.shutdown-on-application-finish: false > execution.submit-failed-job-on-application-error: true{noformat} > The job goes into a FAILED state, and the jobmanager fails. In a kubernetes > environment (when using the native kubernetes integration) this means that > the JobManager is restarted automatically. > This will mean that if you have jobresult store enabled, after the JM comes > back you will end up with an empty application cluster. > I think the correct behaviour would be, depending on the above mention config: > 1. If there is a job recovery error and you have > (execution.submit-failed-job-on-application-error) configured, then the job > should show up as failed, and the JM should not exit (if > execution.shutdown-on-application-finish is false) > 2. If (execution.shutdown-on-application-finish is true) then the jobmanager > should exit cleanly like on normal job terminal state and thus stop the > deployment in Kubernetes, preventing a JM restart cycle -- This message was sent by Atlassian Jira (v8.20.10#820010)