Re: Job manager sometimes doesn't restore job from checkpoint post TaskManager failure

2021-08-23 Thread Kevin Lam
Hi, I was able to understand what was causing this. We were using the restart strategy `fixed-delay` with a maximum number of restarts set to 10. Using exponential-delay resolved the issue of restarting the job from fresh. On Thu, Aug 19, 2021 at 2:04 PM Chesnay Schepler wrote: > How do you dep

Re: Job manager sometimes doesn't restore job from checkpoint post TaskManager failure

2021-08-19 Thread Chesnay Schepler
How do you deploy Flink on Kubernetes? Do you use the standalone or native

Job manager sometimes doesn't restore job from checkpoint post TaskManager failure

2021-08-19 Thread Kevin Lam
Hi all, I've noticed that sometimes when task managers go down--it looks like the job is not restored from checkpoint, but instead restarted from a fresh state (when I go to the job's checkpoint tab in the UI, I don't see the restore, and the number in the job overview all get reset). Under what c