Hi, Thanks for your support for configuring Slurm - *Benson Muite, Michael Smith and Marcus Wagner *
Finally I am able to set up Slurm on master and compute nodes with given instructions - ntp, hostname file and firewalls settings I have followed and corrected. [root@smaster ~]# sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug up infinite 1 idle snode hpc* up infinite 1 idle smaster [root@smaster ~]# Regards, Zain On Tue, Feb 2, 2021 at 6:35 PM Zainul Abiddin <zainul1...@gmail.com> wrote: > Hi All, > I have done slurmdbd configuration and while i am trying to run account > manager with *sacct* i am getting below error. > > [root@smaster ~]# sacct > sacct: error: slurm_persist_conn_open_without_init: failed to open > persistent connection to host:localhost:6819: Connection refused > sacct: error: Sending PersistInit msg: Connection refused > sacct: error: Problem talking to the database: Connection refused > [root@smaster ~]# > > My slurmdbd configuration : > [root@smaster ~]# cat /etc/slurm/slurmdbd.conf > AuthType=auth/munge > DbdAddr=localhost > DbdHost=localhost > SlurmUser=slurm > DebugLevel=4 > LogFile=/var/log/slurm/slurmdbd.log > PidFile=/var/run/slurmdbd.pid > StorageType=accounting_storage/mysql > StorageHost=localhost > StoragePass=password > StorageUser=slurm > StorageLoc=slurm_acct_db > > [root@smaster ~]# chown slurm: /etc/slurm/slurmdbd.conf > [root@smaster ~]# chmod 600 /etc/slurm/slurmdbd.conf > [root@smaster ~]# mkdir /var/log/slurm > [root@smaster ~]# touch /var/log/slurm/slurmdbd.log > [root@smaster ~]# chown slurm: /var/log/slurm/slurmdbd.log > [root@smaster ~]# scontrol show config | grep AccountingStorageHost > AccountingStorageHost = localhost > > Note: > i have edited file /etc/slurm/slurm.conf and modified the below line > # LOGGING AND ACCOUNTING > AccountingStorageType=accounting_storage/slurmdbd > Then restarted all the services > > [root@smaster ~]# for i in munge slurmd slurmctld slurmdbd; do service $i > status; done > Redirecting to /bin/systemctl status munge.service > ● munge.service - MUNGE authentication service > Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor > preset: disabled) > Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 36min ago > Docs: man:munged(8) > Main PID: 20613 (munged) > CGroup: /system.slice/munge.service > └─20613 /usr/sbin/munged > > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Stopped MUNGE > authentication service. > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Starting MUNGE > authentication service... > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Started MUNGE > authentication service. > Redirecting to /bin/systemctl status slurmd.service > ● slurmd.service - Slurm node daemon > Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor > preset: disabled) > Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 36min ago > Main PID: 20637 (slurmd) > CGroup: /system.slice/slurmd.service > └─20637 /usr/sbin/slurmd -D > > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Started Slurm node > daemon. > Feb 02 15:30:47 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 7 for UID 0 > Feb 02 15:31:46 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 8 for UID 0 > Feb 02 15:33:43 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 9 for UID 0 > > Redirecting to /bin/systemctl status slurmctld.service > ● slurmctld.service - Slurm controller daemon > Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; > vendor preset: disabled) > Active: active (running) since Tue 2021-02-02 13:21:11 IST; 3h 36min ago > Main PID: 20660 (slurmctld) > CGroup: /system.slice/slurmctld.service > └─20660 /usr/sbin/slurmctld -D > > Feb 02 13:21:11 smaster.calligotech.com systemd[1]: Started Slurm > controller daemon. > Redirecting to /bin/systemctl status slurmdbd.service > ● slurmdbd.service - Slurm DBD accounting daemon > Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; > vendor preset: disabled) > Active: active (running) since Tue 2021-02-02 16:29:11 IST; 28min ago > Main PID: 24146 (slurmdbd) > CGroup: /system.slice/slurmdbd.service > └─24146 /usr/sbin/slurmdbd -D > > Feb 02 16:29:11 smaster.calligotech.com systemd[1]: Started Slurm DBD > accounting daemon. > [root@smaster ~]# srun --ntasks=2 --label /bin/hostname > srun: job 22 queued and waiting for resources > srun: job 22 has been allocated resources > 1: smaster.calligotech.com > 0: smaster.calligotech.com > [root@smaster ~]# > > > However when i run the below command > > [root@smaster ~]# sacct > sacct: error: slurm_persist_conn_open_without_init: failed to open > persistent connection to host:localhost:6819: Connection refused > sacct: error: Sending PersistInit msg: Connection refused > sacct: error: Problem talking to the database: Connection refused > [root@smaster ~]# > > and i have troubleshooted below steps > > [root@smaster ~]# telnet localhost 6819 > Trying ::1... > telnet: connect to address ::1: Connection refused > Trying 127.0.0.1... > telnet: connect to address 127.0.0.1: Connection refused > [root@smaster ~]# > > [root@smaster ~]# mysql -p -u slurm slurm_acct_db > Enter password: > Welcome to the MariaDB monitor. Commands end with ; or \g. > Your MariaDB connection id is 9 > Server version: 10.1.48-MariaDB MariaDB Server > > Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. > > Type 'help;' or '\h' for help. Type '\c' to clear the current input > statement. > > MariaDB [slurm_acct_db]> show tables; > Empty set (0.00 sec) > > MariaDB [slurm_acct_db]> > > Then i have added DBPort and restarted services > [root@smaster ~]# cat /etc/slurm/slurmdbd.conf > AuthType=auth/munge > DbdAddr=localhost > DbdHost=localhost > *DbdPort=6819* > SlurmUser=slurm > DebugLevel=4 > LogFile=/var/log/slurm/slurmdbd.log > PidFile=/var/run/slurmdbd.pid > StorageType=accounting_storage/mysql > StorageHost=localhost > StoragePass=password > StorageUser=slurm > StorageLoc=slurm_acct_db > [root@smaster ~]# > > [root@smaster ~]# for i in munge slurmd slurmctld slurmdbd; do service $i > status; done > Redirecting to /bin/systemctl status munge.service > ● munge.service - MUNGE authentication service > Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor > preset: disabled) > Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 55min ago > Docs: man:munged(8) > Main PID: 20613 (munged) > CGroup: /system.slice/munge.service > └─20613 /usr/sbin/munged > > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Stopped MUNGE > authentication service. > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Starting MUNGE > authentication service... > Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Started MUNGE > authentication service. > Redirecting to /bin/systemctl status slurmd.service > ● slurmd.service - Slurm node daemon > Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor > preset: disabled) > Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 55min ago > Main PID: 20637 (slurmd) > CGroup: /system.slice/slurmd.service > └─20637 /usr/sbin/slurmd -D > > Feb 02 15:30:47 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 7 for UID 0 > Feb 02 15:31:46 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 8 for UID 0 > Feb 02 15:33:43 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 9 for UID 0 > Feb 02 15:38:45 smaster.calligotech.com slurmd[20637]: slurmd: Launching > batch job 12 for UID 0 > > Redirecting to /bin/systemctl status slurmctld.service > ● slurmctld.service - Slurm controller daemon > Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; > vendor preset: disabled) > Active: active (running) since Tue 2021-02-02 13:21:11 IST; 3h 55min ago > Main PID: 20660 (slurmctld) > CGroup: /system.slice/slurmctld.service > └─20660 /usr/sbin/slurmctld -D > > Feb 02 13:21:11 smaster.calligotech.com systemd[1]: Started Slurm > controller daemon. > Redirecting to /bin/systemctl status slurmdbd.service > ● slurmdbd.service - Slurm DBD accounting daemon > Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; > vendor preset: disabled) > Active: active (running) since Tue 2021-02-02 16:29:11 IST; 47min ago > Main PID: 24146 (slurmdbd) > CGroup: /system.slice/slurmdbd.service > └─24146 /usr/sbin/slurmdbd -D > > Feb 02 16:29:11 smaster.calligotech.com systemd[1]: Started Slurm DBD > accounting daemon. > [root@smaster ~]# ps -ef |grep slurm > root 20637 1 0 13:21 ? 00:00:00 /usr/sbin/slurmd -D > slurm 20660 1 0 13:21 ? 00:00:08 /usr/sbin/slurmctld -D > root 24146 1 0 16:29 ? 00:00:00 /usr/sbin/slurmdbd -D > root 25395 18378 0 17:17 pts/2 00:00:00 grep --color=auto slurm > [root@smaster ~]# sacct > sacct: error: slurm_persist_conn_open_without_init: failed to open > persistent connection to host:localhost:6819: Connection refused > sacct: error: Sending PersistInit msg: Connection refused > sacct: error: Problem talking to the database: Connection refused > [root@smaster ~]# > > [root@smaster ~]# tail /var/log/slurm/slurmdbd.log > [2021-02-02T17:16:01.913] error: mysql_real_connect failed: 2005 Unknown > MySQL server host 'smater' (-2) > [2021-02-02T17:16:01.913] error: The database must be up when starting the > MYSQL plugin. Trying again in 5 seconds. > [2021-02-02T17:16:06.963] error: mysql_real_connect failed: 2005 Unknown > MySQL server host 'smater' (-2) > [2021-02-02T17:16:06.963] error: The database must be up when starting the > MYSQL plugin. Trying again in 5 seconds. > [2021-02-02T17:16:12.083] error: mysql_real_connect failed: 2005 Unknown > MySQL server host 'smater' (-2) > [2021-02-02T17:16:12.083] error: The database must be up when starting the > MYSQL plugin. Trying again in 5 seconds. > [2021-02-02T17:16:17.140] error: mysql_real_connect failed: 2005 Unknown > MySQL server host 'smater' (-2) > [2021-02-02T17:16:17.141] error: The database must be up when starting the > MYSQL plugin. Trying again in 5 seconds. > [2021-02-02T17:16:22.804] error: mysql_real_connect failed: 2005 Unknown > MySQL server host 'smater' (-2) > [2021-02-02T17:16:22.804] error: The database must be up when starting the > MYSQL plugin. Trying again in 5 seconds. > [root@smaster ~]# > > Still the problem remains the same. Please help me to resolve this issue. > > Regards, > Zain >