I don't know how many times I've read the docs; I keep thinking I understand
it, but something is really wrong with prioritisation on our cluster, and we're
struggling to understand why.
The setup:
1. We have a group who submit two types of work; production jobs and
research jobs.
2. We
At this point, I’d probably crank up the logging some and see what it’s saying
in slurmctld.log.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technolo
On 11/27/24 11:38 am, Kent L. Hanson via slurm-users wrote:
I have restarted the slurmctld and slurmd services several times. I
hashed the slurm.conf files. They are the same. I ran “sinfo -a” as root
with the same result.
Are your nodes in the `FUTURE` state perhaps? What does this show?
si
Hey Ryan,
I have restarted the slurmctld and slurmd services several times. I hashed the
slurm.conf files. They are the same. I ran "sinfo -a" as root with the same
result.
Thanks,
Kent
From: Ryan Novosielski
Sent: Wednesday, November 27, 2024 9:31 AM
To: Kent L. Hanson
Cc: slurm-user
If you’re sure you’ve restarted everything after the config change, are you
also sure that you don’t have that stuff hidden from your current user? You can
try -a to rule that out. Or run as root.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*-
Hello Ole,
I have no firewall on the computenodes and I have the internal interfaces on
kadmin2, opa and eth, in the trusted zone of the firewall. It should allow
everything through. I'm using RHEL 9.4. I built the rpm packages from source
using the admin guide https://slurm.schedmd.com/quickst
Hi Kent,
This problem could perhaps be due to your firewall setup. What is your
OS, and did you install Slurm by RPM packages or what?
Does sinfo work on your SlurmctldHost=kadmin2? Is the "headnode" a
different host? Try stopping the firewalld service.
You can see some advice on firewal
I am doing a new install of slurm 24.05.3 I have all the packages built and
installed on headnode and compute node with the same munge.key, slurm.conf, and
gres.conf file. I was able to run munge and unmunge commands to test munge
successfully. Time is synced with chronyd. I can't seem to find a