Resolved, thanks to Adam Hough on sighpcsyspros Slack. Before, I had MaxSubmitJobsPerUser=8, and really wanted MaxJobsPerUser=8
- MaxJobsPerUser= The maximum number of jobs a user can have running at a given time. - MaxSubmitJobsPerUser= The maximum number of jobs a user can have running and pending at a given time. Now: $ sacctmgr list qos normal,gpu format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,maxjobsperuser,flags Name Priority GraceTime PreemptMode UsageFactor GrpTRESRunMin MaxSubmitPU MaxJobsPU Flags ---------- ---------- ---------- ----------- ----------- ------------- ----------- --------- -------------------- gpu 0 00:00:00 cluster 1.000000 8 normal 0 00:00:00 cluster 1.000000 $ for n in $(seq 9); do sbatch --time=00:10:00 --partition=gpu omp_hw.sh; done Submitted batch job 150670 Submitted batch job 150671 Submitted batch job 150672 Submitted batch job 150673 Submitted batch job 150674 Submitted batch job 150675 Submitted batch job 150676 Submitted batch job 150677 Submitted batch job 150678 [renfro@login hw]$ squeue -u $USER -p gpu JOBID PARTI NAME USER ST TIME CPUS NODES MIN_MEMORY NODELIST(REASON) GRES 150678 gpu omp_hw.sh renfro PD 0:00 1 1 4000M (QOSMaxJobsPerUs (null) 150670 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150671 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150672 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150673 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150674 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150675 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150676 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) 150677 gpu omp_hw.sh renfro R 0:06 1 1 4000M gpunode001 (null) $ scancel -u $USER -p gpu > On Jan 25, 2019, at 10:35 AM, Renfro, Michael <ren...@tntech.edu> wrote: > > Hey, folks. Running 17.02.10 with Bright Cluster Manager 8.0. > > I wanted to limit queue-stuffing on my GPU nodes, similar to what > AssocGrpCPURunMinutesLimit does. The current goal is to restrict a user to > having 8 active or queued jobs in the production GPU partition, and block > (not reject) other jobs to allow other users fair access to the queue. I'm > good with a time limit instead of a job number limit, too. > > I'd assumed a partition QOS was the way to go, as the sacctmgr man page reads > in part: > > Flags Used by the slurmctld to override or enforce certain > characteristics. > Valid options are > > DenyOnLimit > If set, jobs using this QOS will be rejected at submission time > if they do not conform to the QOS 'Max' limits. Group limits will also be > treated like 'Max' limits as well and will be denied if they go over. By > default jobs that go over these limits will pend until they conform. This > currently only applies to QOS and Association limits. > > So avoid setting the DenyOnLimit flag, and extra jobs will pend until they > conform, right? My QOS settings for 8 active or pending GPU jobs per user are > as follows: > > $ sacctmgr list qos normal,gpu > format=name,priority,gracetime,preemptmode,usagefactor,grptresrunmin,MaxSubmitJobsPerUser,flags > Name Priority GraceTime PreemptMode UsageFactor GrpTRESRunMin > MaxSubmitPU Flags > ---------- ---------- ---------- ----------- ----------- ------------- > ----------- -------------------- > normal 0 00:00:00 cluster 1.000000 > gpu 0 00:00:00 cluster 1.000000 > 8 > > Partition settings, where the gpu QOS is applied to jobs in the gpu partition: > > $ egrep 'PartitionName=(batch|gpu) ' /etc/slurm/slurm.conf > PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 > DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL > PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO > Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 > AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO > OverTimeLimit=0 State=UP Nodes=node[001-040] > PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 > MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 > DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 > PreemptMode=OFF ReqResv=NO DefMemPerCPU=4000 AllowAccounts=ALL AllowQos=ALL > LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO > OverTimeLimit=0 State=UP Nodes=gpunode[001-004] > > Original submission specifying CPUs, time, GRES, QOS, and partition, which > accepts jobs 1-8, and rejects job 9 even though I haven't set the DenyOnLimit > flag: > > $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 > --gres=gpu --qos=gpu --partition=gpu omp_hw.sh; done > Submitted batch job 150548 > Submitted batch job 150549 > Submitted batch job 150550 > Submitted batch job 150551 > Submitted batch job 150552 > Submitted batch job 150553 > Submitted batch job 150554 > Submitted batch job 150555 > sbatch: error: Batch job submission failed: Job violates accounting/QOS > policy (job submit limit, user's size and/or time limits) > $ scancel -u $USER -p gpu > > Minimized down to just the specification for CPUs, time, and partition, same > results, since the gpu QOS is automatically applied to jobs in the gpu > partition: > > $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 > --partition=gpu omp_hw.sh; done > Submitted batch job 150556 > Submitted batch job 150557 > Submitted batch job 150558 > Submitted batch job 150559 > Submitted batch job 150560 > Submitted batch job 150561 > Submitted batch job 150562 > Submitted batch job 150563 > sbatch: error: Batch job submission failed: Job violates accounting/QOS > policy (job submit limit, user's size and/or time limits) > $ scancel -u $USER -p gpu > > Running in the batch partition with the normal QOS, all 9 jobs are accepted: > > $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 > --partition=batch omp_hw.sh; done > Submitted batch job 150564 > Submitted batch job 150565 > Submitted batch job 150566 > Submitted batch job 150567 > Submitted batch job 150568 > Submitted batch job 150569 > Submitted batch job 150570 > Submitted batch job 150571 > Submitted batch job 150572 > $ scancel -u $USER -p batch > > Running in the batch partition with the gpu QOS explicitly specified, accepts > jobs 1-8, and rejects job 9: > > $ for n in $(seq 9); do sbatch --nodes=1 --cpus-per-task=1 --time=00:10:00 > --partition=batch --qos=gpu omp_hw.sh; done > Submitted batch job 150573 > Submitted batch job 150574 > Submitted batch job 150575 > Submitted batch job 150576 > Submitted batch job 150577 > Submitted batch job 150578 > Submitted batch job 150579 > Submitted batch job 150580 > sbatch: error: Batch job submission failed: Job violates accounting/QOS > policy (job submit limit, user's size and/or time limits) > $ scancel -u $USER -p batch > > So the behavior appears to be triggered by the gpu QOS. What might I have > missed? > > -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services > 931 372-3601 / Tennessee Tech University > >