Hi team, I met a weird issue when a job tries to recover from JM failure. The success checkpoint before JM crashed is 41205
``` {"log":"2022-05-10 14:55:40,663 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Completed checkpoint 41205 for job 00000000000000000000000000000000 (9453840 bytes in 1922 ms).\n","stream":"stdout","time":"2022-05-10T14:55:40.663286893Z"} ``` However JM tries to recover the job with an old checkpoint 41051 which doesn't exist that leads to unrecoverable state ``` "2022-05-10 14:59:38,949 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying to retrieve checkpoint 41051.\n" ``` Full log attached -- Regards, Tao
jm.log
Description: Binary data