Many thanks, William. That may have been the issue. I changed the hostname to FQDN and "StorageHost=localhost" and now it seems to try connecting to the database.
[root@mannose sushil]# cat /var/log/slurm/slurmctld.log [2022-12-01T15:26:50.942] Job accounting information stored, but details not gathered [2022-12-01T15:26:50.943] slurmctld version 20.11.9 started on cluster mannose [2022-12-01T15:26:52.949] error: If munged is up, restart with --num-threads=10 [2022-12-01T15:26:52.949] error: Munge encode failed: Failed to access "Abcd_123": No such file or directory [2022-12-01T15:26:52.950] error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_PERSIST_INIT has authentication error: Invalid authentication credential [2022-12-01T15:26:52.950] error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819 [2022-12-01T15:26:52.950] error: Sending PersistInit msg: Protocol authentication error [2022-12-01T15:26:52.950] accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd [2022-12-01T15:26:54.954] error: If munged is up, restart with --num-threads=10 [2022-12-01T15:26:54.954] error: Munge encode failed: Failed to access "Abcd_123": No such file or directory [2022-12-01T15:26:54.954] error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_PERSIST_INIT has authentication error: Invalid authentication credential [2022-12-01T15:26:54.954] error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819 [2022-12-01T15:26:54.954] error: Sending PersistInit msg: Protocol authentication error [2022-12-01T15:26:54.955] error: Association database appears down, reading from state file. [2022-12-01T15:26:54.955] error: Unable to get any information from the state file [2022-12-01T15:26:54.955] fatal: slurmdbd and/or database must be up at slurmctld start time "Abcd_123" is the password. This password works to access the database: [root@mannose sushil]# mysql -p -u slurm Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 581 Server version: 5.5.68-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show grants; +--------------------------------------------------------------------------------------------------------------+ | Grants for slurm@localhost | +--------------------------------------------------------------------------------------------------------------+ | GRANT USAGE ON *.* TO 'slurm'@'localhost' IDENTIFIED BY PASSWORD '*0E54A04D59B6C9F7B7B6269BE7F30AD3E3409895' | | GRANT ALL PRIVILEGES ON `slurm_acct_db`.* TO 'slurm'@'localhost' WITH GRANT OPTION | +--------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec) MariaDB [(none)]> Any pointers to fix this? best, Sushil On Wed, Nov 30, 2022 at 5:36 PM William Brown <will...@signalbox.org.uk> wrote: > If this is a single host machine I suggest checking the /etc/hosts file to > make sure that ‘mannose’ is listed as you expect. It is generally advised > to use FQDNs for host names; the fact that the message “connection to > host:mannose:6819: Connection refused” used a short name may mean that in > a configuration file you have a shortname. Equally the incoming > connection may be coming not from the IP of ‘mannose’ but from localhost > (127.0.0.1 if you are using only IPv4). > > > > You also have a cluster name that looks like an FQDN, you may want to > change that to something else; the cluster name is I think an abstract > name, where host names must be for real nodes that are resolvable. > > > > You may also find information in /var/log/messages or /var/log/secure….if > applicable to your Linux distro. > > > > I use Slurm with firewalld and it is fine usually. > > > > William > > > > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> *On Behalf Of > *Sushil Mishra > *Sent:* 30 November 2022 22:44 > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* [slurm-users] slurm_persist_conn_open_without_init: failed to > open persistent connection to host > > > > Hi all, > > > > I installed slurm and enable accounting in a single-node machine, i.e same > server is the master and computing node. I mainly followed this page for > instructions: > > https://southgreenplatform.github.io/trainings/hpc/slurminstallation/ > > After enabling accounting I am having problems in starting > slurmctld.service. > > [root@mannose sushil]# cat /var/log/slurm/slurmctld.log > [2022-11-30T16:32:15.194] Job accounting information stored, but details > not gathered > [2022-11-30T16:32:15.195] slurmctld version 20.11.9 started on cluster > mannose.olemiss.edu > [2022-11-30T16:32:15.201] error: slurm_persist_conn_open_without_init: > failed to open persistent connection to host:mannose:6819: Connection > refused > [2022-11-30T16:32:15.201] error: Sending PersistInit msg: Connection > refused > [2022-11-30T16:32:15.201] accounting_storage/slurmdbd: > clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 > with slurmdbd > [2022-11-30T16:32:15.203] error: Sending PersistInit msg: Connection > refused > [2022-11-30T16:32:15.203] error: Association database appears down, > reading from state file. > [2022-11-30T16:32:15.203] error: Unable to get any information from the > state file > [2022-11-30T16:32:15.203] fatal: slurmdbd and/or database must be up at > slurmctld start time > > > > It is not clear why slurm port 8619 is being used while I have > SlurmctldPort=6817 and SlurmdPort=6818 set in clurm.conf. anyways, I opened > all three posrts (6817, 6818 and 6819) using 'firewall-cmd --permanent > --zone=public --add-port=6819/tcp' > > > > MariaDB [(none)]> show grants > > -> ; > > +--------------------------------------------------------------------------------------------------------------+ > | Grants for slurm@localhost > | > > +--------------------------------------------------------------------------------------------------------------+ > | GRANT USAGE ON *.* TO 'slurm'@'localhost' IDENTIFIED BY PASSWORD > '*0E54A04D59B6C9F7B7B6269BE7F30AD3E3409895' | > | GRANT ALL PRIVILEGES ON `slurm_acct_db`.* TO 'slurm'@'localhost' WITH > GRANT OPTION | > > +--------------------------------------------------------------------------------------------------------------+ > 2 rows in set (0.00 sec) > > MariaDB [(none)]> quit > > > > Can someone help in figuring out possibly what is going wrong? > > > > Best, > > SK > > >