Thanks for the advice. I checked munge's log on the system that was most
recently affected and found a few hundred of these:
2022-08-16 23:30:56 +0300 Info: Unauthorized credential for client
UID=0 GID=0
Not sure if relevant. NTP on the system is synced. I'll keep an eye on
munge in the futu
When I see odd behaviour I've found it sometimes related to either NTP issues
(the time is off) or munge errors:
* Is NTP running and is the time accurate
* Look for munge errors:
* /var/log/munge/munged.log
* sudo systemctl status munge
If it's a munge error, usually resta
I re-installed SLURM 22.05.3 and then restarted slurmd and now it's working:
# dnf reinstall slurm slurm-slurmd slurm-devel slurm-pam_slurm
# systemctl restart slurmd
The dnf.log shows that the versions were the same, so there was no mismatch
or anything:
2022-08-16T23:29:02+0300 DEBUG Reinstall
>> Hello,
>>
>> I have a problem with slurm and I cannot found the solution.
>> When I launch an srun to allocate an interactive node:
>> srun --partition=default --nodes=1 --time=01:00:00 --pty bash -i
>>
>> srun message don't appear, but if I have a job launched into the node and
>> try a
Dear list,
I've been using cgroupsv2 with SLURM 22.05 on CentOS Stream 8 successfully
for a few months now. Recently a few of my nodes have started having
problems starting slurmd. The log shows:
[2022-08-16T20:52:58.439] slurmd version 22.05.3 started
[2022-08-16T20:52:58.439] error: Controller
Maybe this was a noob question, I've just solved my problem.
I'll share my thoughts. I returned to my original settings
and rerun Ansible's playbook, reconfiguring the SlurmdSpoolDir.
* https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir_1
Maybe it is writable by root, because root can
Hello!
Is there a way to turn off slurm MPI hooks?
A job submitted via sbatch executes Intel MPI and the thread affinity
settings are incorrect.
However, running MPI manually over SSH works and all bindings are correct.
We are looking to run our MPI jobs via slurm sbatch and have the same
behavio
Dear Slurm Users,
recently, I have started a new instance of my cluster with Slurm 22.05.2
(built from source). Evertyhing seems to be configured properly and
working fine except "sbatch". The error is quite self-explanatory and
I thought it would be quite easy to fix directory permissions.
slur
Dear list,
Twice this month I've had jobs stuck in completing state (CG). When I go to
the compute node and check slurmd.log I see a message about "incompatible
plugin version", for example:
[2022-08-16T03:36:25.823] [748139.batch] done with job
[2022-08-16T12:54:21.404] [748139.extern] plugin_lo