
We use PreemptMode and PriorityTier within Slurm to suspend low priority jobs 
when more urgent work needs to be done. This generally works well, but on 
occasion resumed jobs fail to restart - which is to say Slurm sets the job 
status to running but the actual code doesn't recover from being suspended.

Technically everything is working as expected, but I wondered if there was any 
best practice to pass onto users about how to cope with this state? Obviously 
not a direct Slurm question, but wondered if others had experience with this 
and any advice on how best to limit the impact?


slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to