Have you restarted munge on all hosts?

On 6/25/19 4:38 PM, Valerio Bellizzomi wrote:
On Tue, 2019-06-25 at 16:32 +0200, Valerio Bellizzomi wrote:
On Tue, 2019-06-25 at 08:48 -0400, Eli V wrote:
My first guess would be that the host is not listed as one of the two
controllers in the slurm.conf. Also, keep in mind munge, and thus
slurm is very sensitive to lack of clock synchronization between
nodes. FYI, I run a hand built slurm 18.08.07 on debian 8 & 9 without
issues. Haven't tried 10 yet.
I have discovered that Slurm is also sensitive to computer names.
The controller was listed but with a dot and a domain name, I have
removed the dot and domain name and resolved.

Now I have another problem, the slurmd on the compute node refuses to
connect to the controller with this error: Protocol authentication error

The exact error on the controller is "Invalid credentials", I have
copied the munge.key on both hosts but the error persists.

On Tue, Jun 25, 2019 at 1:50 AM Valerio Bellizzomi <vale...@selnet.org> wrote:
I have installed slurmctld on Debian Testing, trying to start the daemon
by hand:

# /usr/sbin/slurmctld -D -v -f /etc/slurm-llnl/slurm.conf
slurmctld: error: High latency for 1000 calls to gettimeofday(): 2072
slurmctld: pidfile not locked, assuming no running daemon
slurmctld: slurmctld version 18.08.5-2 started on cluster selroc
slurmctld: Munge cryptographic signature plugin loaded
slurmctld: error: This host (master02/master02) not a valid controller


Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383

Reply via email to