On 14-07-2017 23:26, Robbert Eggermont wrote:
We're adding some nodes to our cluster (17.02.5). In preparation, we've
defined the nodes in our slurm.conf with "State=FUTURE" (as descibed in
the man page). But it doesn't work like this, because when we start the
slurmd on the nodes, the nodes immediately show up as idle.
When we manually run "scontrol update NodeName=XXX State=FUTURE" the
node becomes invisible, as expected for State=FUTURE. However, after a
restart of the node (or the slurmd), the node is in state idle again,
and jobs get scheduled on the node...
So, how do we make the nodes go into State=FUTURE automatically?
Or do we simply remove the node definitions until the nodes are ready?
You may want to consider this as well:
After adding nodes to slurm.conf, the scontrol man-page says that
slurmctld must be restarted. It turns out that all slurmd daemons on
compute nodes must be restarted as well, see
https://bugs.schedmd.com/show_bug.cgi?id=3973. Hopefully this will get
fixed in 17.11.
/Ole