In addition to Sean’s recommendation, your user might want to use job arrays
[1]. That’s less stress on the scheduler, and throughput should be equivalent
to independent jobs.
[1] https://slurm.schedmd.com/job_array.html
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology S
You can also limit the number of jobs per user the backfill scheduler is
considering. All SchedulerParameters are worth a read if you haven't yet.
from the slurm.conf man page...
bf_max_job_user=#
The maximum number of jobs per user to attempt
starting with the backf
Hi Mike,
I think you want to set MaxSubmitJobs on the users account association. The
parameter is described in the sacctmgr documentation as being the maximum
number of jobs a user can have in state running or pending.
https://slurm.schedmd.com/sacctmgr.html
Thanks,
-Sean
On Wed, Mar 18, 2020
Howdy,
We are running Slurm 18.08. We have a user who has, twice, submitted over 15
thousand jobs to the cluster (the queue normally has a couple thousand jobs at
any given time).
This results in Slurm being unresponsive to user requests / job submits. I
suspect the scheduler is getting bogged
Hi Gestio,
yes, that is something, we have done several times.
The coordinators are able to cancel other users jobs in the account.
We have instructed the corrdinators, to not change anything regarding
the accounting database (the things describben in the manual), it is
primarily used to cancel