Hi,
We use Slurm's power saving mechanism to switch of idle nodes. However,
we don't currently use it for our GPU nodes. This is because in the
past these nodes failed to wake up again when jobs were submitted to the
GPU partition. Before we look at the issue due to the current energy
situation
On 2022/10/13 03:42, Sopena Ballesteros Manuel wrote:
Dear Slurm user community,
I am new to slurm and trying to start a slurmd and slurmctld on same machine. I
started with slurmctld which is having issues.
slurmctld: ext_sensors/none: init: ExtSensors NONE plugin loaded
slurmctld: debug:
Dear Slurm user community,
I am new to slurm and trying to start a slurmd and slurmctld on same machine. I
started with slurmctld which is having issues.
$ slurmctld -D -f /etc/slurm/slurm.conf -vvv
slurmctld: debug: slurmctld log levels: stderr=debug2 logfile=debug2
syslog=quiet
slurmctld:
Thanks. I don't see anything wrong from that log.
On Fri, Oct 7, 2022 at 7:32 AM Paul Edmon wrote:
>
> The slurmctld log will print out if hosts are out of sync with the
> slurmctld slurm.conf. That said it doesn't report on cgroup consistency
> changes like that. It's possible that dialing up
Hi Rob,
On 10/12/22 15:40, Groner, Rob wrote:
Otherwise, I would think that gathering information to make a decision
while in the job_submit.lua would be a normal expectation. Is there
really no way to know how many nodes are up or what features are on the
system while I'm processing in the jo
Well, there are numerous ways to do it, but I was trying to do it as much as
possible from within the slurm infrastructure.
Basically, I want to react when someone submits a job requesting specific
features that aren't actively available yet, and some of the actions I need to
take will involve