Re: [slurm-users] Questions about adding new nodes to Slurm

Tina Friedrich Tue, 04 May 2021 05:27:40 -0700

Hello,

a lot of people already gave very good answer to how to tackle this.

Still, I thought it worth pointing this out - you said 'you need tobasically shut down slurm, update the slurm.conf file, then restart'.That makes it sound like a major operation with lots of prep required.


It's not like that at all. Updating slurm.conf is not a major operation.

There's absolutely no reason to shut things down first & then change thefile. You can edit the file / ship out a new version (however you like)and then restart the daemons.

The daemons do not have to all be restarted simultaneously. It is of noconsequence if they're running with out-of-sync config files for a bit,really. (There's a flag you can set if you want to suppress the warning- 'NO_CONF_HASH' debug flag I think).

Restarting the dameons (slurmctld, slurmd, ...) is safe. It does notrequire cluster downtime or anything.

I control slurm.conf using configuration management; the configmanagement process restarts the appropriate daemon (slurmctld, slurmd,slurmdbd) if the file changed. This certainly never happens at the sametime; there's splay in that. It doesn't even necessarily happen on thecontroller first, or anything like that.

What I'm trying to get across - I have a feeling this 'updating thecluster wide config file' and 'file must be the same on all nodes' is alot less of a procedure (and a lot less strict) than you currentlyimagine it to be :)


Tina

On 27/04/2021 19:35, David Henkemeyer wrote:

Hello,
I'm new to Slurm (coming from PBS), and so I will likely have a fewquestions over the next several weeks, as I work to transition myinfrastructure from PBS to Slurm.
My first question has to do with *_adding nodes to Slurm_*. Accordingto the FAQ (and other articles I've read), you need to basically shutdown slurm, update the slurm.conf file /*on all nodes in the cluster*/,then restart slurm.
- Why do all nodes need to know about all other nodes? From what I haveread, its Slurm does a checksum comparison of the slurm.conf file acrossall nodes. Is this the only reason all nodes need to know about allother nodes?- Can I create a symlink that points <sysconfdir>/slurm.conf to aslurm.conf file on an NFS mount point, which is mounted on all thenodes? This way, I would only need to update a single file, thenrestart Slurm across the entire cluster.- Any additional help/resources for adding/removing nodes to Slurm wouldbe much appreciated. Perhaps there is a "toolkit" out there to automatesome of these operations (which is what I already have for PBS, and willcreate for Slurm, if something doesn't already exist).
Thank you all,

David


--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] Questions about adding new nodes to Slurm

Reply via email to