Job manager sometimes doesn't restore job from checkpoint post TaskManager failure

Kevin Lam Thu, 19 Aug 2021 07:06:38 -0700

Hi all,

I've noticed that sometimes when task managers go down--it looks like the
job is not restored from checkpoint, but instead restarted from a fresh
state (when I go to the job's checkpoint tab in the UI, I don't see the
restore, and the number in the job overview all get reset). Under what
circumstances does this happen?


I've been trying to debug and we really want the job to restore from the
checkpoint at all times for our use case.

We're running Apache Flink 1.13 on Kubernetes in a high availability
set-up.

Thanks in advance!

Job manager sometimes doesn't restore job from checkpoint post TaskManager failure

Reply via email to