Re: [slurm-users] powersave: excluding nodes

2023-12-11 Thread Davide DelVento
Forgot to mention: this is with slurm 23.02.6 (apologize for the double message) On Mon, Dec 11, 2023 at 9:49 AM Davide DelVento wrote: > Following the example from https://slurm.schedmd.com/power_save.html > regarding SuspendExcNodes > > I configured my slurm.conf with > > SuspendExcNodes=node[

Re: [slurm-users] Slurm powersave

2023-12-11 Thread Davide DelVento
In case it's useful to others: I've been able to get this working by having the "no action" script stop the slurmd daemon and start it *with the -b option*. On Fri, Oct 6, 2023 at 4:28 AM Ole Holm Nielsen wrote: > Hi Davide, > > On 10/5/23 15:28, Davide DelVento wrote: > > IMHO, "pretending"

[slurm-users] powersave: excluding nodes

2023-12-11 Thread Davide DelVento
Following the example from https://slurm.schedmd.com/power_save.html regarding SuspendExcNodes I configured my slurm.conf with SuspendExcNodes=node[01-12]:2,node[13-32]:2,node[33-34]:1,nodegpu[01-02]:1 SuspendExcStates=down,drain,fail,maint,not_responding,reserved #SuspendExcParts= (the nodes in

Re: [slurm-users] Disabling SWAP space will it effect SLURM working

2023-12-11 Thread Paul Edmon
We've been running for years with out swap on with no issues. You may want to set MemSpecLimit in your config to reserve memory for the OS, so that way you don't OOM the system with user jobs: https://slurm.schedmd.com/slurm.conf.html#OPT_MemSpecLimit -Paul Edmon- On 12/11/2023 11:19 AM, Davi

Re: [slurm-users] Troubleshooting job stuck in Pending state

2023-12-11 Thread Davide DelVento
By getting "stuck" do you mean the job stays PENDING forever or does eventually run? I've seen the latter (and I agree with you that I wish Slurm will log things like "I looked at this job and I am not starting it yet because") but not the former On Fri, Dec 8, 2023 at 9:00 AM Pacey, Mike wro

Re: [slurm-users] Disabling SWAP space will it effect SLURM working

2023-12-11 Thread Davide DelVento
A little late here, but yes everything Hans said is correct and if you are worried about slurm (or other critical system software) getting killed by OOM, you can workaround it by properly configuring cgroup. On Wed, Dec 6, 2023 at 2:06 AM Hans van Schoot wrote: > Hi Joseph, > > This might depend