Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

Diego Zuccato Tue, 16 Jun 2020 06:18:17 -0700

Il 16/06/20 09:39, Loris Bennett ha scritto:

>> Maybe it's already known and obvious, but... Remember that a node can be
>> allocated to only one partition.
> Maybe I am misunderstanding you, but I think that this is not the case.
> A node can be in multiple partitions.
*Assigned* to multiple partitions: OK.
But once slurm schedules jon in "partGPU" on that node, the whole node
is unavailable for jobs in "partCPU", even if the GPU job is using only
1% of the resources.


>  We have nodes belonging to
> individual research groups which are in both a separate partition just
> for the group and in a 'scavenger' partition for everyone (but with
> lower priority add maximum run-time).
More or less our current config. Quite inefficient, at least for us: too
many unuseable resources due to small jobs.

>> So, if you have the mixed nodes in bot
>> partitions and there's a GPU job running, a non-gpu job will find that
>> node marked as busy because it's allocated to another partition.
>> That's why we're drastically reducing the number of partitions we have
>> and will avoid shared nodes.
> Again I don't this is explanation.  If a job is running on a GPU node,
> but not using all the CPUs, then a CPU-only job should be able to start
> on that node, unless some form of exclusivity has been set up, such as
> ExclusiveUser=YES for the partition.
Nope. The whole node gets allocated to one partition at a time. So if
the GPU job and the CPU one are in different partitions, it's expected
that only one starts. The behaviour you're looking for is the one of
QoS: define a single partition w/ multiple QoS and both jobs will run
concurrently.

If you think about it, that's the meaning of "partition" :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

Reply via email to