We've done this though with job_submit.lua. Mostly with OS updates. We add a feature to everything then proceed. Telling users that adding a feature gets you on the "new" nodes.

I can send you the snippet if you're using the job_submit.lua script.

Bill

On 6/14/24 2:18 PM, David Magda via slurm-users wrote:
Hello,

What I’m looking for is a way for a node to continue to be in the same 
partition, and have the same QoS(es), but only be chosen if a particular 
capability is being asked for. This is because we are rolling something (OS 
upgrade) out slowly to a small batch of nodes at first, and then more and more 
over time, and do not want to interrupt users’ workflows: we want them to 
default the ‘current’ nodes and only land on the ‘special’ ones if requested. 
(At a certain point the ‘special’ ones will become the majority and we’d swap 
the behaviour.)

Slurm has the well-known feature item that can be put on a node(s):

A comma-delimited list of arbitrary strings indicative of some characteristic 
associated with the node. There is no value or count associated with a feature 
at this time, a node either has a feature or it does not. A desired feature may 
contain a numeric component indicating, for example, processor speed but this 
numeric component will be considered to be part of the feature string. Features 
are intended to be used to filter nodes eligible to run jobs via the 
--constraintargument. By default a node has no features. Also see Gres for 
being able to have more control such as types and count. Using features is 
faster than scheduling against GRES but is limited to Boolean operations.


        https://slurm.schedmd.com/slurm.conf.html#OPT_Features

So if there are (a bunch of) partitions, and nodes with-in those partitions, a 
job can be submitted to a partition and it can be run any any available node, 
or even be requested to run a particular node (--nodelist). With the above (and 
--constraint / --prefer), a particular sub-set of node(s) can be requested. But 
(AIUI) that sub-set is also available generally to everyone, even if a 
particular feature is not requested.

Is there a way to tell Slurm to not schedule a job on a node UNLESS a flag or 
option is set? Or is it necessary to set up new partition(s) or QoS(es)? I see 
that AllowAccounts (and AllowGroups) is applicable only to Partitions, and not 
(AFAICT) on a per node basis.

We’re currently on 22.05.x, but upgrading is fine.

Regards,
David


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to