Hi Nicolas!

In Slurm lingo this is "job requeueing".  The JobRequeue
slurm.conf parameter controls whether Slurm tries to start those
jobs again (requeue vs. job exit).

The slurm.conf doc puts it nicely:

This option controls the default ability for batch jobs to be
requeued. Jobs may be requeued explicitly by a system
administrator, after node failure, or upon preemption by a
higher priority job. If JobRequeue is set to a value of 1, then
batch jobs may be requeued unless explicitly disabled by the
user. If JobRequeue is set to a value of 0, then batch jobs will
not be requeued unless explicitly enabled by the user. Use the
sbatch --no-requeue or --requeue option to change the default
behavior for individual jobs. The default value is 1.

--
Paul Brunk, system administrator
Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia


On 8/18/22, 1:57 PM, "slurm-users" <slurm-users-boun...@lists.schedmd.com> 
wrote:
Hi!

In this week, my machines rebooted and the jobs that was running restarted and 
I've lost the progress that it made. So, can I prevent that restart of jobs? 
For example if my machines reboot the jobs get cancelled.


Thanks you.
NĂ­colas

Reply via email to