Not trying to argue unnecessarily, but what you describe is not a universal rule, regardless of QOS.
Our GPU nodes are members of 3 GPU-related partitions, 2 more resource-limited non-GPU partitions, and one of two larger-memory partitions. It’s set up this way to minimize idle resources (due to us not buying enough GPUs in those nodes to keep all the CPUs busy, plus our other nodes having limited numbers of DIMM slots for larger-memory jobs). First terminal, results in a job running in the ‘any-interactive’ partition on gpunode002. We have a job submit plugin that automatically routes jobs to ‘interactive’, ‘gpu-interactive’, or ‘any-interactive’ depending on the resources requested: ===== [renfro@login rosetta-job]$ type hpcshell hpcshell is a function hpcshell () { srun --partition=interactive $@ --pty bash -i } [renfro@login rosetta-job]$ hpcshell [renfro@gpunode002(job 751070) rosetta-job]$ ===== Second terminal, simultaneous to first terminal, results in a job running in the ‘gpu-interactive’ partition on gpunode002: ===== [renfro@login ~]$ hpcshell --gres=gpu [renfro@gpunode002(job 751071) ~]$ squeue -t R -u $USER JOBID PARTI NAME USER ST TIME S:C: NODES MIN_MEMORY NODELIST(REASON) SUBMIT_TIME START_TIME END_TIME TRES_PER_NODE 751071 gpu-i bash renfro R 0:08 *:*: 1 2000M gpunode002 2020-06-16T08:27:50 2020-06-16T08:27:50 2020-06-16T10:27:50 gpu 751070 any-i bash renfro R 0:18 *:*: 1 2000M gpunode002 2020-06-16T08:27:40 2020-06-16T08:27:40 2020-06-16T10:27:41 N/A [renfro@gpunode002(job 751071) ~]$ ===== Selected configuration details (excluding things like resource ranges and defaults): NodeName=gpunode[001-003] CoresPerSocket=14 RealMemory=382000 Sockets=2 ThreadsPerCore=1 Weight=10011 Gres=gpu:2 NodeName=gpunode004 CoresPerSocket=14 RealMemory=894000 Sockets=2 ThreadsPerCore=1 Weight=10021 Gres=gpu:2 PartitionName=gpu Default=NO MaxCPUsPerNode=16 ExclusiveUser=NO State=UP Nodes=gpunode[001-004] PartitionName=gpu-debug Default=NO MaxCPUsPerNode=16 ExclusiveUser=NO State=UP Nodes=gpunode[001-004] PartitionName=gpu-interactive Default=NO MaxCPUsPerNode=16 ExclusiveUser=NO State=UP Nodes=gpunode[001-004] PartitionName=any-interactive Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP Nodes=node[001-040],gpunode[001-004] PartitionName=any-debug Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP Nodes=node[001-040],gpunode[001-004] PartitionName=bigmem Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP Nodes=gpunode[001-003] PartitionName=hugemem Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP Nodes=gpunode004 > On Jun 16, 2020, at 8:14 AM, Diego Zuccato <diego.zucc...@unibo.it> wrote: > > Il 16/06/20 09:39, Loris Bennett ha scritto: > >>> Maybe it's already known and obvious, but... Remember that a node can be >>> allocated to only one partition. >> Maybe I am misunderstanding you, but I think that this is not the case. >> A node can be in multiple partitions. > *Assigned* to multiple partitions: OK. > But once slurm schedules jon in "partGPU" on that node, the whole node > is unavailable for jobs in "partCPU", even if the GPU job is using only > 1% of the resources. > >> We have nodes belonging to >> individual research groups which are in both a separate partition just >> for the group and in a 'scavenger' partition for everyone (but with >> lower priority add maximum run-time). > More or less our current config. Quite inefficient, at least for us: too > many unuseable resources due to small jobs. > >>> So, if you have the mixed nodes in bot >>> partitions and there's a GPU job running, a non-gpu job will find that >>> node marked as busy because it's allocated to another partition. >>> That's why we're drastically reducing the number of partitions we have >>> and will avoid shared nodes. >> Again I don't this is explanation. If a job is running on a GPU node, >> but not using all the CPUs, then a CPU-only job should be able to start >> on that node, unless some form of exclusivity has been set up, such as >> ExclusiveUser=YES for the partition. > Nope. The whole node gets allocated to one partition at a time. So if > the GPU job and the CPU one are in different partitions, it's expected > that only one starts. The behaviour you're looking for is the one of > QoS: define a single partition w/ multiple QoS and both jobs will run > concurrently. > > If you think about it, that's the meaning of "partition" :) > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 >