In case it's useful to others: I've been able to get this working by having the "no action" script stop the slurmd daemon and start it *with the -b option*.
On Fri, Oct 6, 2023 at 4:28 AM Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > Hi Davide, > > On 10/5/23 15:28, Davide DelVento wrote: > > IMHO, "pretending" to power down nodes defies the logic of the Slurm > > power_save plugin. > > > > And it is sure useless ;) > > But I was using the suggestion from > > https://slurm.schedmd.com/power_save.html > > <https://slurm.schedmd.com/power_save.html> which says > > > > You can also configure Slurm with programs that perform no action as > > *SuspendProgram* and *ResumeProgram* to assess the potential impact of > > power saving mode before enabling it. > > I had not noticed the above sentence in the power_save manual before! So > I decided to test a "no action" power saving script, similar to what you > have done, applying it to a test partition. I conclude that "no action" > power saving DOES NOT WORK, at least in Slurm 23.02.5. So I opened a bug > report https://bugs.schedmd.com/show_bug.cgi?id=17848 to find out if the > documentation is obsolete, or if there may be a bug. Please follow that > bug to find out the answer from SchedMD. > > What I *believe* (but not with 100% certainty) really happens with power > saving in the current Slurm versions is what I wrote yesterday: > > > Slurmctld expects suspended nodes to *really* power > > down (slurmd is stopped). When slurmctld resumes a suspended node, > it > > expects slurmd to start up when the node is powered on. There is a > > ResumeTimeout parameter which I've set to about 15-30 minutes in > case of > > delays due to BIOS updates and the like - the default of 60 seconds > is > > WAY too small! > > I hope this helps, > Ole > >