We've done this though with job_submit.lua. Mostly with OS updates. We
add a feature to everything then proceed. Telling users that adding a
feature gets you on the "new" nodes.
I can send you the snippet if you're using the job_submit.lua script.
Bill
On 6/14/24 2:18 PM, David Magda via slurm-users wrote:
Hello,
What I’m looking for is a way for a node to continue to be in the same
partition, and have the same QoS(es), but only be chosen if a particular
capability is being asked for. This is because we are rolling something (OS
upgrade) out slowly to a small batch of nodes at first, and then more and more
over time, and do not want to interrupt users’ workflows: we want them to
default the ‘current’ nodes and only land on the ‘special’ ones if requested.
(At a certain point the ‘special’ ones will become the majority and we’d swap
the behaviour.)
Slurm has the well-known feature item that can be put on a node(s):
A comma-delimited list of arbitrary strings indicative of some characteristic
associated with the node. There is no value or count associated with a feature
at this time, a node either has a feature or it does not. A desired feature may
contain a numeric component indicating, for example, processor speed but this
numeric component will be considered to be part of the feature string. Features
are intended to be used to filter nodes eligible to run jobs via the
--constraintargument. By default a node has no features. Also see Gres for
being able to have more control such as types and count. Using features is
faster than scheduling against GRES but is limited to Boolean operations.
https://slurm.schedmd.com/slurm.conf.html#OPT_Features
So if there are (a bunch of) partitions, and nodes with-in those partitions, a
job can be submitted to a partition and it can be run any any available node,
or even be requested to run a particular node (--nodelist). With the above (and
--constraint / --prefer), a particular sub-set of node(s) can be requested. But
(AIUI) that sub-set is also available generally to everyone, even if a
particular feature is not requested.
Is there a way to tell Slurm to not schedule a job on a node UNLESS a flag or
option is set? Or is it necessary to set up new partition(s) or QoS(es)? I see
that AllowAccounts (and AllowGroups) is applicable only to Partitions, and not
(AFAICT) on a per node basis.
We’re currently on 22.05.x, but upgrading is fine.
Regards,
David
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com