Re: [slurm-users] Slurm powersave

Ole Holm Nielsen Thu, 05 Oct 2023 02:24:17 -0700

Hi Davide,

On 10/4/23 23:03, Davide DelVento wrote:

I'm experimenting with slurm powersave and I have several questions. I'mfollowing the guidance from https://slurm.schedmd.com/power_save.html<https://slurm.schedmd.com/power_save.html> and the great presentationfrom our own https://slurm.schedmd.com/SLUG23/DTU-SLUG23.pdf<https://slurm.schedmd.com/SLUG23/DTU-SLUG23.pdf>


I presented that talk at SLUG'23 :-)

I am running slurm 23.02.3

1) I'm not sure I fully understand ReconfigFlags=KeepPowerSaveSettings
The documentations ways that if set, an "scontrol reconfig" command willpreserve the current state of SuspendExcNodes, SuspendExcParts andSuspendExcStates. Why would one *NOT* want to preserve that? What wouldhappen if one does not (or does) have this setting? For now I'm using it,assuming that it means "if I run scontrol reconfig" don't shut off nodesthat are up because I said so that they should be up in slurm.conf withthose three options" --- but I am not clear if that is really what it says.

As I understand it, the ReconfigFlags means that if you updated somesettings using scontrol, they will be lost when slurmctld is reconfigured,and the settings from slurm.conf will be used in stead.

2) the PDF above says that the problem with nodes in down and drainedstate is solved in 23.02 but that does not appear to be the case. Beforerunning my experiment, I had


$ sinfo -R
REASON               USER      TIMESTAMP           NODELIST
Not responding       root      2023-09-13T13:14:50 node31
ECC memory errors    root      2023-08-26T07:21:04 node27

and after it became

$ sinfo -R
REASON               USER      TIMESTAMP           NODELIST
Not responding       root      2023-09-13T13:14:50 node31
none                 Unknown   Unknown             node27


Please use "sinfo -lR" so that we can see the node STATE.

And that despite having excluded drain'ed nodes as below:

--- a/slurm/slurm.conf
+++ b/slurm/slurm.conf
@@ -140,12 +140,15 @@ SlurmdLogFile=/var/log/slurm/slurmd.log
  #
  #
  # POWER SAVE SUPPORT FOR IDLE NODES (optional)
+SuspendProgram=/opt/slurm/poweroff
+ResumeProgram=/opt/slurm/poweron
+SuspendTimeout=120
+ResumeTimeout=240
  #ResumeRate=
+SuspendExcNodes=node[13-32]:2
+SuspendExcStates=down,drain,fail,maint,not_responding,reserved
+BatchStartTimeout=60
+ReconfigFlags=KeepPowerSaveSettings # not sure if needed: preservecurrent status when running "scontrol reconfig"-PartitionName=compute512 Default=False Nodes=node[13-32] State=UPDefMemPerCPU=9196+PartitionName=compute512 Default=False Nodes=node[13-32] State=UPDefMemPerCPU=9196 SuspendTime=600
so probably that's not solved? Anyway, that's a nuisance, not a deal breaker


With my 23.02.5 the SuspendExcStates is working as documented :-)

3) The whole thing does not appear to be working as I intended. Myunderstanding of the "exclude node" above should have meant that slurmshould never attempt to shut off more than all idle nodes in thatpartition minus 2. Instead it shut them off all of them, and then tried toturn them back on:
$ sinfo | grep 512
compute512     up   infinite      1 alloc# node15
compute512     up   infinite      2  idle# node[14,32]
compute512     up   infinite      3  down~ node[16-17,31]
compute512     up   infinite      1 drain~ node27
compute512     up   infinite     12  idle~ node[18-26,28-30]
compute512     up   infinite      1  alloc node13

I agree that 2 nodes from node[13-32] shouldn't be suspended, according toSuspendExcNodes in the slurm.conf manual. I haven't tested this feature.

But again this is a minor nuisance which I can live with (especially if ithappens only when I "flip the switch"), and I'm mentioning only in caseit's a symptom of something else I'm doing wrong. I did try to use boththe SuspendExcNodes=node[13-32]:2 syntax as it seem more reasonable to me(compared to the rest of the file, e.g. partitions definition) and theSuspendExcNodes=node[13\-32]:2 as suggested in the slurm powersavedocumentation. Behavior, exactly identical
4) Most importantly from the output above you may have noticed two nodes(actually three by the time I ran the command below) that slurm deemed down
$ sinfo -R
REASON               USER      TIMESTAMP           NODELIST
Not responding       root      2023-09-13T13:14:50 node31
reboot timed out     slurm     2023-10-04T14:51:28 node14
reboot timed out     slurm     2023-10-04T14:52:28 node15
reboot timed out     slurm     2023-10-04T14:49:58 node32
none                 Unknown   Unknown             node27
This can't be the case, the nodes are fine, and cannot have timed outwhile "rebooting", because for now my poweroff and poweron script areidentical and literally a simple one-liner bash script doing almostnothing and the log file is populated correctly as I would expect
echo "Pretending to $0 the following node(s): $1"  >> $log_file 2>&1
So I can confirm slurm invoked the script, but then waited for something(what? starting slurmd?) which failed to occur and marked the node down.When I removed the suspend time from the partition to end the experiment,the other nodes went "magically" in production , without slurm calling mypoweron script. Of course the nodes were never powered off, but slurmthought they were, so why it did not have the problem it id with the nodewhich instead intentionally tried to power on?

IMHO, "pretending" to power down nodes defies the logic of the Slurmpower_save plugin. Slurmctld expects suspended nodes to *really* powerdown (slurmd is stopped). When slurmctld resumes a suspended node, itexpects slurmd to start up when the node is powered on. There is aResumeTimeout parameter which I've set to about 15-30 minutes in case ofdelays due to BIOS updates and the like - the default of 60 seconds is WAYtoo small!

Have you tried to experiment with the IPMI based power down/up methodexplained in the above presentation? I'd appreciate independent testingof my scripts inhttps://github.com/OleHolmNielsen/Slurm_tools/tree/master/power_save :-)


Best regards,
Ole

Re: [slurm-users] Slurm powersave

Reply via email to