Hi all, I've noticed that sometimes when task managers go down--it looks like the job is not restored from checkpoint, but instead restarted from a fresh state (when I go to the job's checkpoint tab in the UI, I don't see the restore, and the number in the job overview all get reset). Under what circumstances does this happen?
I've been trying to debug and we really want the job to restore from the checkpoint at all times for our use case. We're running Apache Flink 1.13 on Kubernetes in a high availability set-up. Thanks in advance!