Hi Tina, Thank you for sharing. This matches my observations when I checked if slurm could do what I am upto: manage AWS EC2 dynamic(spot) instances.
After replacing MySQL with REDIS now i wonder what would it take to make slurm node addition | removal dynamic. I've been looking at the source code for many months now and trying to decide if it can be done. I am using configless, 3 controllers, 2 slurmdbs with a redis sentinel based robust backend. Steven On Thu., May 5, 2022, 08:57 Tina Friedrich, <tina.friedr...@it.ox.ac.uk> wrote: > Hi List, > > out of curiosity - I would assume that if running configless, one > doesn't manually need to restart slurmd on the nodes if the config changes? > > Hi Steven, > > I have no idea if you want to do it every couple of minutes and what the > implications are of that (although I've certainly manage to restart them > every 5 minutes by accident with no real problems caused), but - > generally, restarting the daemons (slurmctld, slurmd) is a non-issue, as > it's a safe operation. There's no risk to running jobs or anything. I > have the config management restart them if any files change. It also > doesn't seem to matter if the restarts of the controller & the node > daemons are splayed a bit (i.e. don't happen at the same time), or what > order they happen in. > > Tina > > On 05/05/2022 13:17, Steven Varga wrote: > > Thank you for the quick reply! I know I am pushing my luck here: is it > > possible to modify slurm: src/common/[read_conf.c, node_conf.c] > > src/slurmctld/[read_config.c, ...] such that the state can be maintained > > dynamically? -- or cheaper to write a job manager with less features but > > supporting dynamic nodes from ground up? > > best wishes: steve > > > > On Thu, May 5, 2022 at 12:29 AM Christopher Samuel <ch...@csamuel.org > > <mailto:ch...@csamuel.org>> wrote: > > > > On 5/4/22 7:26 pm, Steven Varga wrote: > > > > > I am wondering what is the best way to update node changes, such > as > > > addition and removal of nodes to SLURM. The excerpts below > suggest a > > > full restart, can someone confirm this? > > > > You are correct, you need to restart slurmctld and slurmd daemons at > > present. See https://slurm.schedmd.com/faq.html#add_nodes > > <https://slurm.schedmd.com/faq.html#add_nodes> > > > > All the best, > > Chris > > -- > > Chris Samuel : http://www.csamuel.org/ <http://www.csamuel.org/> > > : Berkeley, CA, USA > > > > -- > Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator > > Research Computing and Support Services > IT Services, University of Oxford > http://www.arc.ox.ac.uk http://www.it.ox.ac.uk > >