Sorry for the late reply.

For my site, I used the optional ":" separator to ensure at least 4 nodes were up. Eg: nid[10-20]:4 This means at least 4 nodes.. those nodes do not have to be the same 4 at any time, so if one is down that used to be idle, but 4 are up, that 1 will not be brought back up. I don't see this setting having much of anything to do with bringing nodes up at all with the exception of when you first start slurmctld and the settings are not met. Once there are jobs running on any of the listed nodes, they count toward the number. That is my experience with the small numbers I used. YMMV.

I have also explicitly stated nodes without the separator, which does work. I do that when I am trying to look at a node that is idle without a job on it. That stops slurm from shutting it down while I am looking at it.

Although, I do agree, the functionality of being able to have "keep at least X nodes up and idle" would be nice, that is not how I see this documented or working.

Brian Andrus

On 11/23/2023 5:12 AM, Davide DelVento wrote:
Thanks for confirming, Brian. That was my understanding as well. Do you have it working that way on a machine you have access to?  If so, I'd be interested to see the config file, because that's not the behavior I am experiencing in my tests. In fact, in my tests Slurm will not bring down those "X nodes" but will not bring them up either, *unless* there is a job targeted to those. I may have something misconfigured, and I'd love to fix that.

Thanks!

On Wed, Nov 22, 2023 at 5:46 PM Brian Andrus <toomuc...@gmail.com> wrote:

    As I understand it, that setting means "Always have at least X
    nodes up", which includes running jobs. So it stops any wait time
    for the first X jobs being submitted, but any jobs after that will
    need to wait for the power_up sequence.

    Brian Andrus

    On 11/22/2023 6:58 AM, Davide DelVento wrote:
    I've started playing with powersave and have a question about
    SuspendExcNodes. The documentation at
    https://slurm.schedmd.com/power_save.html says

    For example |nid[10-20]:4| will prevent 4 usable nodes (i.e IDLE
    and not DOWN, DRAINING or already powered down) in the set
    |nid[10-20]| from being powered down.

    I initially interpreted that as "Slurm will try to keep 4 nodes
    idle on as much as possible", which would have reduced the wait
    time for new jobs targeting those nodes. Instead, it appears to
    mean "Slurm will not shut off the last 4 nodes which are idle in
    that partition, however it will not turn on nodes which it shut
    off earlier unless jobs are scheduled on them"

    Most notably if the 4 idle nodes will be allocated to other jobs
    (and so they are no idle anymore) slurm does not turn on any
    nodes which have been shut off earlier, so it's possible (and
    depending on workloads perhaps even common) to have no idle nodes
    on regardless of the SuspendExcNode settings.

    Is that how it works, or do I have anything else in my setting
    which is causing this unexpected-to-me behavior? I think I can
    live with it, but IMHO it would have been better if slurm
    attempted to turn on nodes preemptively trying to match the
    requested SuspendExcNodes, rather than waiting for job submissions.

    Thanks and Happy Thanksgiving to people in the USA

Reply via email to