Diego Zuccato <diego.zucc...@unibo.it> writes: > Il 16/06/20 09:39, Loris Bennett ha scritto: > >>> Maybe it's already known and obvious, but... Remember that a node can be >>> allocated to only one partition. >> Maybe I am misunderstanding you, but I think that this is not the case. >> A node can be in multiple partitions. > > *Assigned* to multiple partitions: OK. > But once slurm schedules jon in "partGPU" on that node, the whole node > is unavailable for jobs in "partCPU", even if the GPU job is using only > 1% of the resources.
Thanks for pointing this out - I hadn't been aware of this. Is there anywhere in the documentation where this is explicitly stated? >> We have nodes belonging to >> individual research groups which are in both a separate partition just >> for the group and in a 'scavenger' partition for everyone (but with >> lower priority add maximum run-time). > > More or less our current config. Quite inefficient, at least for us: too > many unuseable resources due to small jobs. Our scavenger partition tends to be used mostly by a small number of users each with a huge number of small, short jobs. Thus, they tend to fill nodes and not block resources for that long, but I probably need to look at this a bit more carefully. >>> So, if you have the mixed nodes in bot >>> partitions and there's a GPU job running, a non-gpu job will find that >>> node marked as busy because it's allocated to another partition. >>> That's why we're drastically reducing the number of partitions we have >>> and will avoid shared nodes. >> Again I don't this is explanation. If a job is running on a GPU node, >> but not using all the CPUs, then a CPU-only job should be able to start >> on that node, unless some form of exclusivity has been set up, such as >> ExclusiveUser=YES for the partition. > Nope. The whole node gets allocated to one partition at a time. So if > the GPU job and the CPU one are in different partitions, it's expected > that only one starts. The behaviour you're looking for is the one of > QoS: define a single partition w/ multiple QoS and both jobs will run > concurrently. > > If you think about it, that's the meaning of "partition" :) Like I said, this is new to me, but personally I don't think that linguistically speaking it is obvious. If the actual membership of a node to a partition changes over time and just depends on which jobs happen to be running on it at a given moment, to my mind, that's not much like the physical concept of partitioning a room or a city. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de