[slurm-users] Limit numbers of jobs on a shared node?

Liam Forbes Fri, 04 May 2018 05:54:51 -0700

Good Morning Smart People,

We have three "big memory" nodes. We'd like to limit the number of jobs that 
run per node in two partitions that share these nodes. Jobs in these two 
partitions are limited to a single node max. We'd only like 8 or fewer jobs 
from each partition to run per node. So at most only 16 jobs should be allowed 
to share a given node.


Currently, we have 
  SelectType=select/cons_res
  SelectTypeParameters=CR_CPU
in our slurm.conf

The nodes are defined as:
NodeName=n[144-146] NodeAddr=10.50.50.[144-146] CPUs=56 Sockets=2 
CoresPerSocket=14 ThreadsPerCore=2 RealMemory=1500000 State=UNKNOWN

The two partitions are defined as:
PartitionName=analysis Nodes=n[144-146] MaxTime=4-0:0 MaxNodes=1 State=UP 
AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO Default=NO
PartitionName=bio Nodes=n[144-146] MaxTime=14-0:0 MaxNodes=1 State=UP 
AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO Default=NO

We discovered the hard way this means users can run 4 jobs per each of the 56 
CPUs/threads on each node. Oops! Not what we intended.

All our other compute nodes are defined as exclusive; we don't allow multiple 
jobs to run on them.

Any recommendations how to implement the 8 jobs per partition per node limit 
we'd like? Should we switch our SelectTypeParameters to CR_Socket or 
CR_Socket_Memory, for example?

-- 
Regards,
-liam

-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes  lofor...@alaska.edu  ph: 907-450-8618 fax: 907-450-8601
UAF Research Computing Systems Senior HPC Engineer              CISSP

[slurm-users] Limit numbers of jobs on a shared node?

Reply via email to