IIUC, when you suspend a job it remains in memory but with no CPU time
allocated. If you reboot the node, the job state is lost (unless it uses
checkpointing). When you restarted the jobs, they actually began a new
run (Slurm doesn't know if they use checkpointing or not). You've been
lucky tha
Hi,
I am using an old slurm version 20.11.8 and we had to reboot our cluster
today for maintenance. I suspended all the jobs on it with the command
scontrol suspend list_job_ids and all the jobs paused and were suspended.
However, when I tried to resume them after the reboot, scontrol resume did