Hello Everyone, I am using slurm version 21.08.5 and Centos 7.
I successfully start slurmd on all compute nodes but when I start slurmctld on server node it gives the following error: *(base) [nousheen@nousheen ~]$ systemctl status slurmctld.service -l* ● slurmctld.service - Slurm controller daemon Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-12-01 12:00:42 PKT; 4h 16min ago Main PID: 1631 (slurmctld) Tasks: 10 Memory: 4.0M CGroup: /system.slice/slurmctld.service ├─1631 /usr/sbin/slurmctld -D -s └─1818 slurmctld: slurmscriptd Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: DECODED: Thu Dec 01 16:17:19 2022 Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Munge decode failed: Rewound credential Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: ENCODED: Fri Dec 02 16:16:55 2022 Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: DECODED: Thu Dec 01 16:17:20 2022 Dec 01 16:17:20 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Munge decode failed: Rewound credential Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: ENCODED: Fri Dec 02 16:16:56 2022 Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: auth/munge: _print_cred: DECODED: Thu Dec 01 16:17:21 2022 Dec 01 16:17:21 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks When I run the following command on compute nodes I get the following output: [gpu101@101 ~]$* munge -n | unmunge* STATUS: Success (0) ENCODE_HOST: ??? (0.0.0.101) ENCODE_TIME: 2022-12-02 16:33:38 +0500 (1669980818) DECODE_TIME: 2022-12-02 16:33:38 +0500 (1669980818) TTL: 300 CIPHER: aes128 (4) MAC: sha1 (3) ZIP: none (0) UID: gpu101 (1000) GID: gpu101 (1000) LENGTH: 0 Is this error because the encode_host name has question marks and the IP is also not picked correctly by munge. How can I correct this? All the nodes keep non-responding when I run a job. However, I have all the clocks synced across the cluster. I am new to slurm. Kindly guide me in this matter. Best Regards, Nousheen Parvaiz Ph.D. Scholar ᐧ