In addition to Sean’s recommendation, your user might want to use job arrays 
[1]. That’s less stress on the scheduler, and throughput should be equivalent 
to independent jobs.

[1] https://slurm.schedmd.com/job_array.html

--
Mike Renfro, PhD  / HPC Systems Administrator, Information Technology Services
931 372-3601<tel:931%20372-3601>      / Tennessee Tech University

On Mar 18, 2020, at 12:10 PM, Hanby, Mike <mha...@uab.edu> wrote:



External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

________________________________
Howdy,

We are running Slurm 18.08. We have a user who has, twice, submitted over 15 
thousand jobs to the cluster (the queue normally has a couple thousand jobs at 
any given time).

This results in Slurm being unresponsive to user requests / job submits. I 
suspect the scheduler is getting bogged down doing backfill processing.

Is there any way to limit the maximum number of jobs a single user can have in 
the queue at any given time?

----------------
Mike Hanby
mhanby @ uab.edu
Systems Analyst III - Enterprise
IT Research Computing Services
The University of Alabama at Birmingham

Reply via email to