Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Loris Bennett
Daniel Letai writes: > Not sure about automatically canceling a job array, except perhaps by > submitting 2 consecutive arrays - first of size 20, and the other with the > rest of > the elements and a dependency of afterok. That said, a single job in a job > array in Slurm documentation is ref

Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Daniel Letai
Not sure about automatically canceling a job array, except perhaps by submitting 2 consecutive arrays - first of size 20, and the other with the rest of the elements and a dependency of afterok. That said, a single job in a job array in Slurm documentation is refe

[slurm-users] Dynamic nodes - startup and job output

2023-08-01 Thread Graham Pearce
I have a small test cluster where I'm investigating the use of dynamic nodes before we bring up a production cluster. I have two questions: 1. I have used slurmd -Z --conf... to successfully bring up a dynamic node, which works fine alongside the static nodes, apart from one problem. The jo

[slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Josef Dvoracek
my users found the beauty of job arrays, and they tend to use it every then and now. Sometimes human factor steps in, and something is wrong in job array specification, and cluster "works" on one failed array job after another. Isn't there any way how to automatically stop/scancel/? job array