Hi Steven,

I think truly dynamic adding and removing of nodes is something that's on the 
roadmap for slurm 23.02?

Ward

On 5/05/2022 15:28, Steven Varga wrote:
Hi Tina,
Thank you for sharing. This matches my observations when I checked if slurm 
could do what I am upto: manage AWS EC2 dynamic(spot) instances.

After replacing MySQL with REDIS now i wonder what would it take to make slurm 
node addition | removal dynamic. I've been looking at the source code for many 
months now and trying to decide if it can be done.

I am using configless, 3 controllers, 2 slurmdbs with a redis sentinel based 
robust backend.

Steven


On Thu., May 5, 2022, 08:57 Tina Friedrich, <tina.friedr...@it.ox.ac.uk 
<mailto:tina.friedr...@it.ox.ac.uk>> wrote:

    Hi List,

    out of curiosity - I would assume that if running configless, one
    doesn't manually need to restart slurmd on the nodes if the config changes?

    Hi Steven,

    I have no idea if you want to do it every couple of minutes and what the
    implications are of that (although I've certainly manage to restart them
    every 5 minutes by accident with no real problems caused), but -
    generally, restarting the daemons (slurmctld, slurmd) is a non-issue, as
    it's a safe operation. There's no risk to running jobs or anything. I
    have the config management restart them if any files change. It also
    doesn't seem to matter if the restarts of the controller & the node
    daemons are splayed a bit (i.e. don't happen at the same time), or what
    order they happen in.

    Tina

    On 05/05/2022 13:17, Steven Varga wrote:
     > Thank you for the quick reply! I know I am pushing my luck here: is it
     > possible to modify slurm: src/common/[read_conf.c, node_conf.c]
     > src/slurmctld/[read_config.c, ...] such that the state can be maintained
     > dynamically? -- or cheaper to write a job manager with less features but
     > supporting dynamic nodes from ground up?
     > best wishes: steve
     >
     > On Thu, May 5, 2022 at 12:29 AM Christopher Samuel <ch...@csamuel.org 
<mailto:ch...@csamuel.org>
     > <mailto:ch...@csamuel.org <mailto:ch...@csamuel.org>>> wrote:
     >
     >     On 5/4/22 7:26 pm, Steven Varga wrote:
     >
     >      > I am wondering what is the best way to update node changes, such 
as
     >      > addition and removal of nodes to SLURM. The excerpts below 
suggest a
     >      > full restart, can someone confirm this?
     >
     >     You are correct, you need to restart slurmctld and slurmd daemons at
     >     present.  See https://slurm.schedmd.com/faq.html#add_nodes 
<https://slurm.schedmd.com/faq.html#add_nodes>
     >     <https://slurm.schedmd.com/faq.html#add_nodes 
<https://slurm.schedmd.com/faq.html#add_nodes>>
     >
     >     All the best,
     >     Chris
     >     --
     >     Chris Samuel  : http://www.csamuel.org/ <http://www.csamuel.org/> 
<http://www.csamuel.org/ <http://www.csamuel.org/>>
     >     :  Berkeley, CA, USA
     >

-- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

    Research Computing and Support Services
    IT Services, University of Oxford
    http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk> http://www.it.ox.ac.uk 
<http://www.it.ox.ac.uk>


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to