We have three "big memory" nodes. We'd like to limit the number of jobs that run per node in two partitions that share these nodes. Jobs in these two partitions are limited to a single node max. We'd only like 8 or fewer jobs from either partition to run per node. So at most only 16 jobs should be allowed to share a given node.
Currently, we have SelectType=select/cons_res SelectTypeParameters=CR_CPU in our slurm.conf The nodes are defined as: NodeName=n[144-146] NodeAddr=10.50.50.[144-146] CPUs=56 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 RealMemory=1500000 State=UNKNOWN The two partitions are defined as: PartitionName=analysis Nodes=n[144-146] MaxTime=4-0:0 MaxNodes=1 State=UP AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO Default=NO PartitionName=bio Nodes=n[144-146] MaxTime=14-0:0 MaxNodes=1 State=UP AllowGroups=all Priority=100 OverSubscribe=FORCE:4 Hidden=NO Default=NO We discovered the hard way this means users can run 4 jobs per each of the 56 CPUs/threads on each node. Oops! Not what we intended. All our other compute nodes are defined as exclusive, and we don't allow multiple jobs to run on them. Any recommendations how to implement the 8 jobs per partition per node limit we'd like? Should we switch our SelectTypeParameters to CR_Socket or CR_Socket_Memory, for example? -- Regards, -liam -There are uncountably more irrational fears than rational ones. -P. Dolan Liam Forbes lofor...@alaska.edu ph: 907-450-8618 fax: 907-450-8601 UAF Research Computing Systems Senior HPC Engineer CISSP