Il 16/06/20 09:39, Loris Bennett ha scritto: >> Maybe it's already known and obvious, but... Remember that a node can be >> allocated to only one partition. > Maybe I am misunderstanding you, but I think that this is not the case. > A node can be in multiple partitions. *Assigned* to multiple partitions: OK. But once slurm schedules jon in "partGPU" on that node, the whole node is unavailable for jobs in "partCPU", even if the GPU job is using only 1% of the resources.
> We have nodes belonging to > individual research groups which are in both a separate partition just > for the group and in a 'scavenger' partition for everyone (but with > lower priority add maximum run-time). More or less our current config. Quite inefficient, at least for us: too many unuseable resources due to small jobs. >> So, if you have the mixed nodes in bot >> partitions and there's a GPU job running, a non-gpu job will find that >> node marked as busy because it's allocated to another partition. >> That's why we're drastically reducing the number of partitions we have >> and will avoid shared nodes. > Again I don't this is explanation. If a job is running on a GPU node, > but not using all the CPUs, then a CPU-only job should be able to start > on that node, unless some form of exclusivity has been set up, such as > ExclusiveUser=YES for the partition. Nope. The whole node gets allocated to one partition at a time. So if the GPU job and the CPU one are in different partitions, it's expected that only one starts. The behaviour you're looking for is the one of QoS: define a single partition w/ multiple QoS and both jobs will run concurrently. If you think about it, that's the meaning of "partition" :) -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786