Hi,
I think you need to specify the directory of an concrete checkpoint instead of
the root directory for checkpoints to restore the states. The directory name
should be like chk-${id}.
The job id will change if you re-submit the job, so jobmanager is not able to
recognize the retained checkpo
Hi Folks,
I'm trying to restart my program with restored state from a checkpoint after
a program failure (restart strategies tried but exhausted), but I'm not
picking up the restored state. What am I doing wrong here?
*Summary*
I'm using a very simple app on 1 node just to learn checkpointing.
A