Re: [slurm-users] Aborting a job from inside the prolog

2023-06-20 Thread Gerhard Strangar
Alexander Grund wrote: > Although it may be better to not drain it, I'm a bit nervous with "exit > 0" as it is very important that the job does not start/continue, i.e. > the user code (sbatch script/srun) is never executed in that case. > So I want to be sure that an `scancel` on the job in its

Re: [slurm-users] Aborting a job from inside the prolog

2023-06-20 Thread Alexander Grund
Am 19.06.23 um 17:32 schrieb Gerhard Strangar: Try to exit with 0, because it's not your prolog that failed. That seemingly works. I do see a value in exiting with 1 to drain the node to investigate why/what has exactly failed. Although it may be better to not drain it, I'm a bit nervous wit

Re: [slurm-users] Aborting a job from inside the prolog

2023-06-19 Thread Gerhard Strangar
Alexander Grund wrote: > Our first approach with `scancel $SLURM_JOB_ID; exit 1` doesn't seem to > work as the (sbatch) job still gets re-queued. Try to exit with 0, because it's not your prolog that failed.

[slurm-users] Aborting a job from inside the prolog

2023-06-14 Thread Alexander Grund
Hi, We are doing some checking on the users Job inside the prolog script and upon failure of those checks the job should be canceled. Our first approach with `scancel $SLURM_JOB_ID; exit 1` doesn't seem to work as the (sbatch) job still gets re-queued. Is this possible at all (i.e. prevent