Hi Nicolas! In Slurm lingo this is "job requeueing". The JobRequeue slurm.conf parameter controls whether Slurm tries to start those jobs again (requeue vs. job exit).
The slurm.conf doc puts it nicely: This option controls the default ability for batch jobs to be requeued. Jobs may be requeued explicitly by a system administrator, after node failure, or upon preemption by a higher priority job. If JobRequeue is set to a value of 1, then batch jobs may be requeued unless explicitly disabled by the user. If JobRequeue is set to a value of 0, then batch jobs will not be requeued unless explicitly enabled by the user. Use the sbatch --no-requeue or --requeue option to change the default behavior for individual jobs. The default value is 1. -- Paul Brunk, system administrator Advanced Computing Resource Center Enterprise IT Svcs, the University of Georgia On 8/18/22, 1:57 PM, "slurm-users" <slurm-users-boun...@lists.schedmd.com> wrote: Hi! In this week, my machines rebooted and the jobs that was running restarted and I've lost the progress that it made. So, can I prevent that restart of jobs? For example if my machines reboot the jobs get cancelled. Thanks you. NĂcolas