It would like to impose a time limit stricter than the partition limit on a certain subset of users. I should be able to do this with a QOS, but I can't get it to work. What am I missing?
At https://slurm.schedmd.com/resource_limits.html it says, "Slurm's hierarchical limits are enforced in the following order ...: 1. Partition QOS limit 2. Job QOS limit 3. User association 4. Account association(s), ascending the hierarchy 5. Root/Cluster association 6. Partition limit 7. None Note: If limits are defined at multiple points in this hierarchy, the point in this list where the limit is first defined will be used." And there's a little more later about the Partition limit being an upper bound on everything. This says to me that if: * there is a large time limit on a partition, * there is a smaller time limit on the job QOS, and * the partition has no associated QOS, then the MaxWall on the Job QOS should have effect. But that's not what I observe. I've created a QOS 'nonpaying' with MaxWall=1-0:0:0, and set MaxTime=7-0:0:0 on partition 'general'. I set the association on user1 so that their job will get QOS 'nonpaying', then submit a job with --time=7-0:0:0, and it runs: $ scontrol show partition general | egrep 'QoS|MaxTime' AllocNodes=ALL Default=YES QoS=N/A MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED $ sacctmgr show qos nonpaying format=name,flags,maxwall Name Flags MaxWall ---------- -------------------- ----------- nonpaying 1-00:00:00 $ scontrol show job 33 | egrep 'QOS|JobState|TimeLimit' Priority=4294901728 Nice=0 Account=acad1 QOS=nonpaying JobState=RUNNING Reason=None Dependency=(null) RunTime=00:00:40 TimeLimit=7-00:00:00 TimeMin=N/A $ scontrol show config | grep AccountingStorageEnforce AccountingStorageEnforce = associations,limits,qos Help!? -- Ross Dickson, Computational Research Consultant ACENET -- Compute Canada -- Dalhousie University