Re: [slurm-users] Defining an empty partition

2023-05-11 Thread Steve Brasier
resources i.e. the quote position has just got typoed. Steve http://stackhpc.com/ Please note I work Tuesday to Friday. On Fri, 18 Dec 2020 at 13:17, Steve Brasier wrote: > Thank you Tina, I hadn't realised that would show as "n/a" not "down" in > that case (w

Re: [slurm-users] Running slurmd without enabling jobs on a node

2021-01-07 Thread Steve Brasier
x27;t know the best way, but if you did not put a loginnode's name > into a partition, the sinfo will not show this node and any job will not > run on this node, just because of a node have a running slurmd. > > Ahmet M. > > > 6.01.2021 19:45 tarihinde Steve Brasier yazdı: &g

[slurm-users] Running slurmd without enabling jobs on a node

2021-01-06 Thread Steve Brasier
Hi all, For a cluster in configless mode there appear to be two ways of having a login-only node (i.e. not running slurmctld) - either 1) setting DNS records or 2) "... consider running slurmd on the machine so it can manage the configuration files, but not allowing it to run jobs." [A] What's th

Re: [slurm-users] Defining an empty partition

2020-12-18 Thread Steve Brasier
artition > with no nodes. So you could just put a dummy nodename in the slurm.conf > file? Tina On Fri, 18 Dec 2020 at 11:13, Steve Brasier wrote: > Having tried just not even defining any partitions you hit this this > <https://github.com/SchedMD/slurm/blob/master/src/common/

Re: [slurm-users] Defining an empty partition

2020-12-18 Thread Steve Brasier
s part of a staged deployment? http://stackhpc.com/ Please note I work Tuesday to Friday. On Fri, 18 Dec 2020 at 10:56, Steve Brasier wrote: > Hi all, > > According to the relevant manpage > <https://slurm.schedmd.com/archive/slurm-20.02.5/slurm.conf.html> it's > possib

[slurm-users] Defining an empty partition

2020-12-18 Thread Steve Brasier
Hi all, According to the relevant manpage it's possible to define an empty partition using "Nodes= ". However this doesn't seem to work (slurm 20.2.05): [centos@testohpc-login-0 ~]$ grep -n Partition /etc/slurm/slurm.conf 72:Prior

[slurm-users] slurm elastic compute / power saving

2020-01-07 Thread Steve Brasier
Hi all, I've got elastic compute working with slurm but on "suspend" I get something like the following in the slurmcltd log: power down request repeating for node compute-2 power down request repeating for node compute-3 error: Nodes compute-[2-3] not responding The docs say that the SuspendScr

[slurm-users] How to do a clean restart of slurmctld under systemd?

2019-12-27 Thread Steve Brasier
I want to restart the slurmctld without maintaining state (playing around with some options). The slurm troubleshooting guide says to use: /etc/init.d/slurm stop /etc/init.d/slurm startclean However the control node is using systemd so while I can stop and start it with service slurmctld stop / sta

Re: [slurm-users] sched

2019-12-13 Thread Steve Brasier
t; the nodes but deletes the instances. > So then when you need a new node, it creates one, then provisions the > config, then updates the slurm cluster config... > > That's how I understand it, but I haven't tried running it myself. > > Regards, > Alex > > On Thu, D

[slurm-users] sched

2019-12-12 Thread Steve Brasier
Hi, I'm hoping someone can shed some light on the SchedMD-provided example here https://github.com/SchedMD/slurm-gcp for an autoscaling cluster on Google Cloud Plaform (GCP). I understand that slurm autoscaling uses the power saving interface to create/remove nodes and the example suspend.py and r