Hi Tina,
Thank you for sharing. This matches my observations when I checked if
slurm could do what I am upto: manage AWS EC2 dynamic(spot) instances.
After replacing MySQL with REDIS now i wonder what would it take to make
slurm node addition | removal dynamic. I've been looking at the source
code for many months now and trying to decide if it can be done.
I am using configless, 3 controllers, 2 slurmdbs with a redis sentinel
based robust backend.
Steven
On Thu., May 5, 2022, 08:57 Tina Friedrich, <tina.friedr...@it.ox.ac.uk
<mailto:tina.friedr...@it.ox.ac.uk>> wrote:
Hi List,
out of curiosity - I would assume that if running configless, one
doesn't manually need to restart slurmd on the nodes if the config
changes?
Hi Steven,
I have no idea if you want to do it every couple of minutes and what
the
implications are of that (although I've certainly manage to restart
them
every 5 minutes by accident with no real problems caused), but -
generally, restarting the daemons (slurmctld, slurmd) is a
non-issue, as
it's a safe operation. There's no risk to running jobs or anything. I
have the config management restart them if any files change. It also
doesn't seem to matter if the restarts of the controller & the node
daemons are splayed a bit (i.e. don't happen at the same time), or what
order they happen in.
Tina
On 05/05/2022 13:17, Steven Varga wrote:
> Thank you for the quick reply! I know I am pushing my luck here:
is it
> possible to modify slurm: src/common/[read_conf.c, node_conf.c]
> src/slurmctld/[read_config.c, ...] such that the state can be
maintained
> dynamically? -- or cheaper to write a job manager with less
features but
> supporting dynamic nodes from ground up?
> best wishes: steve
>
> On Thu, May 5, 2022 at 12:29 AM Christopher Samuel
<ch...@csamuel.org <mailto:ch...@csamuel.org>
> <mailto:ch...@csamuel.org <mailto:ch...@csamuel.org>>> wrote:
>
> On 5/4/22 7:26 pm, Steven Varga wrote:
>
> > I am wondering what is the best way to update node
changes, such as
> > addition and removal of nodes to SLURM. The excerpts below
suggest a
> > full restart, can someone confirm this?
>
> You are correct, you need to restart slurmctld and slurmd
daemons at
> present. See https://slurm.schedmd.com/faq.html#add_nodes
<https://slurm.schedmd.com/faq.html#add_nodes>
> <https://slurm.schedmd.com/faq.html#add_nodes
<https://slurm.schedmd.com/faq.html#add_nodes>>
>
> All the best,
> Chris
> --
> Chris Samuel : http://www.csamuel.org/
<http://www.csamuel.org/> <http://www.csamuel.org/
<http://www.csamuel.org/>>
> : Berkeley, CA, USA
>
-- Tina Friedrich, Advanced Research Computing Snr HPC Systems
Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>