Hi, Puneet:
Like Terry says, if you find your job failed unexpectedly, you could check
the configuration restart-strategy in your flink-conf.yaml. If the restart
strategy is set to be disabled or none, the job will transition to failed
once it encounters a failover. The job would also fail itself
Hi Terry Wang,
So adding to above provided context.. whenever task manager goes down, jobs go
into failed state and do not restart. Even though there are good enough free
slots available on other task manager to get restarted on.
Regards,
Puneet
> On 04-Mar-2022, at 4:54 PM, Terry Wang wrote:
Hi, Puneet~
AFAIK, that should be expected behavior that jobs on crashed TaskManager
restarts. HA means there is no single point risk but Flink job still need
to through failover to ensure state and data consistency. You may refer
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops