Thanks for confirming, Brian. That was my understanding as well. Do you have it working that way on a machine you have access to? If so, I'd be interested to see the config file, because that's not the behavior I am experiencing in my tests. In fact, in my tests Slurm will not bring down those "X nodes" but will not bring them up either, *unless* there is a job targeted to those. I may have something misconfigured, and I'd love to fix that.
Thanks! On Wed, Nov 22, 2023 at 5:46 PM Brian Andrus <toomuc...@gmail.com> wrote: > As I understand it, that setting means "Always have at least X nodes up", > which includes running jobs. So it stops any wait time for the first X jobs > being submitted, but any jobs after that will need to wait for the power_up > sequence. > > Brian Andrus > On 11/22/2023 6:58 AM, Davide DelVento wrote: > > I've started playing with powersave and have a question about > SuspendExcNodes. The documentation at > https://slurm.schedmd.com/power_save.html says > > For example nid[10-20]:4 will prevent 4 usable nodes (i.e IDLE and not > DOWN, DRAINING or already powered down) in the set nid[10-20] from being > powered down. > > I initially interpreted that as "Slurm will try to keep 4 nodes idle on as > much as possible", which would have reduced the wait time for new jobs > targeting those nodes. Instead, it appears to mean "Slurm will not shut off > the last 4 nodes which are idle in that partition, however it will not turn > on nodes which it shut off earlier unless jobs are scheduled on them" > > Most notably if the 4 idle nodes will be allocated to other jobs (and so > they are no idle anymore) slurm does not turn on any nodes which have been > shut off earlier, so it's possible (and depending on workloads perhaps even > common) to have no idle nodes on regardless of the SuspendExcNode settings. > > Is that how it works, or do I have anything else in my setting which is > causing this unexpected-to-me behavior? I think I can live with it, but > IMHO it would have been better if slurm attempted to turn on nodes > preemptively trying to match the requested SuspendExcNodes, rather than > waiting for job submissions. > > Thanks and Happy Thanksgiving to people in the USA > >