I believe that the error you need to pay attention to for this issue is this 
line:
 
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of 
sync clocks
 
 
It looks like your compute nodes clock is a full day ahead of your controller 
node. Dec. 2 instead of Dec. 1. The clocks need to be in sync for munge to work.
 
Mike Robbert
Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research 
Computing
Information and Technology Solutions (ITS)
303-273-3786 | mrobb...@mines.edu  

Our values: Trust | Integrity | Respect | Responsibility


 
 
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Nousheen 
<nousheenparv...@gmail.com>
Date: Thursday, December 1, 2022 at 06:19
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [External] [slurm-users] ERROR: slurmctld: auth/munge: _print_cred: 
DECODED

CAUTION: This email originated from outside of the Colorado School of Mines 
organization. Do not click on links or open attachments unless you recognize 
the sender and know the content is safe.

 
 

 

Hello Everyone,

 

I am using slurm version 21.08.5 and Centos 7.

 

 I successfully start slurmd on all compute nodes but when I start slurmctld on 
server node it gives the following error:

 

(base) [nousheen@nousheen ~]$ systemctl status slurmctld.service -l
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor 
preset: disabled)
   Active: active (running) since Thu 2022-12-01 12:00:42 PKT; 4h 16min ago
 Main PID: 1631 (slurmctld)
    Tasks: 10
   Memory: 4.0M
   CGroup: /system.slice/slurmctld.service
           ├─1631 /usr/sbin/slurmctld -D -s
           └─1818 slurmctld: slurmscriptd  

Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: 
DECODED: Thu Dec 01 16:17:19 2022
Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of 
sync clocks
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Munge decode 
failed: Rewound credential
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: 
ENCODED: Fri Dec 02 16:16:55 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: 
DECODED: Thu Dec 01 16:17:20 2022
Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Check for out of 
sync clocks
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Munge decode 
failed: Rewound credential
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: 
ENCODED: Fri Dec 02 16:16:56 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: 
DECODED: Thu Dec 01 16:17:21 2022
Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Check for out of 
sync clocks

 

When I run the following command on compute nodes I get the following output:

 

 [gpu101@101 ~]$ munge -n | unmunge

STATUS:           Success (0)
ENCODE_HOST:      ??? (0.0.0.101)
ENCODE_TIME:      2022-12-02 16:33:38 +0500 (1669980818)
DECODE_TIME:      2022-12-02 16:33:38 +0500 (1669980818)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              gpu101 (1000)
GID:              gpu101 (1000)
LENGTH:           0
 

Is this error because the encode_host name has question marks and the IP is 
also not picked correctly by munge. How can I correct this? All the nodes keep 
non-responding when I run a job. However, I have all the clocks synced across 
the cluster. 

 

I am new to slurm. Kindly guide me in this matter.

 

 



Best Regards,

Nousheen Parvaiz
Ph.D. Scholar 
 








ᐧ

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to