[slurm-users] GPU-node not waking up after power-save

2022-10-12 Thread Loris Bennett
Hi, We use Slurm's power saving mechanism to switch of idle nodes. However, we don't currently use it for our GPU nodes. This is because in the past these nodes failed to wake up again when jobs were submitted to the GPU partition. Before we look at the issue due to the current energy situation

Re: [slurm-users] Trying to troubleshoot slurmctld start failure

2022-10-12 Thread Kevin Buckley
On 2022/10/13 03:42, Sopena Ballesteros Manuel wrote: Dear Slurm user community, I am new to slurm and trying to start a slurmd and slurmctld on same machine. I started with slurmctld which is having issues. slurmctld: ext_sensors/none: init: ExtSensors NONE plugin loaded slurmctld: debug:

[slurm-users] Trying to troubleshoot slurmctld start failure

2022-10-12 Thread Sopena Ballesteros Manuel
Dear Slurm user community, I am new to slurm and trying to start a slurmd and slurmctld on same machine. I started with slurmctld which is having issues. $ slurmctld -D -f /etc/slurm/slurm.conf -vvv slurmctld: debug: slurmctld log levels: stderr=debug2 logfile=debug2 syslog=quiet slurmctld:

Re: [slurm-users] Check consistency

2022-10-12 Thread Davide DelVento
Thanks. I don't see anything wrong from that log. On Fri, Oct 7, 2022 at 7:32 AM Paul Edmon wrote: > > The slurmctld log will print out if hosts are out of sync with the > slurmctld slurm.conf. That said it doesn't report on cgroup consistency > changes like that. It's possible that dialing up

Re: [slurm-users] Can sinfo/scontrol be called from job_submit.lua?

2022-10-12 Thread Ole Holm Nielsen
Hi Rob, On 10/12/22 15:40, Groner, Rob wrote: Otherwise, I would think that gathering information to make a decision while in the job_submit.lua would be a normal expectation.  Is there really no way to know how many nodes are up or what features are on the system while I'm processing in the jo

Re: [slurm-users] Can sinfo/scontrol be called from job_submit.lua?

2022-10-12 Thread Groner, Rob
Well, there are numerous ways to do it, but I was trying to do it as much as possible from within the slurm infrastructure. Basically, I want to react when someone submits a job requesting specific features that aren't actively available yet, and some of the actions I need to take will involve