Hi,

Slurm 17.02.3 was installed on my cluster some time ago but recently I decided to use SlurmDBD for the accounting.

After installing several packages (slurm-devel, slurm-munge, slurm-perlapi, slurm-plugins, slurm-slurmdbd and slurm-sql) and MariaDB in CentOS 7, I created an SQL database:

    mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost'
    -> identified by 'some_pass' with grant option;
    mysql> create database slurm_acct_db;

and configured the slurmdbd.conf file:

    AuthType=auth/munge
    DbdAddr=localhost
    DbdHost=localhost
    SlurmUser=slurm
    DebugLevel=4
    LogFile=/var/log/slurm/slurmdbd.log
    PidFile=/var/run/slurmdbd.pid
    StorageType=accounting_storage/mysql
    StorageHost=localhost
    StoragePass=some_pass
    StorageUser=slurm
    StorageLoc=slurm_acct_db

Then, I stopped the slurmctl daemon on the head node of my cluster and tried to start `slurmdbd`, but I got the following:

    $ systemctl start slurmdbd
Job for slurmdbd.service failed because the control process exited with error code. See "systemctl status slurmdbd.service" and "journalctl -xe" for details.
    $ systemctl status slurmdbd.service
    ● slurmdbd.service - Slurm DBD accounting daemon
Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since lun 2017-11-20 10:39:26 CET; 53s ago Process: 27592 ExecStart=/usr/sbin/slurmdbd $SLURMDBD_OPTIONS (code=exited, status=1/FAILURE)

nov 20 10:39:26 login_node systemd[1]: Starting Slurm DBD accounting daemon... nov 20 10:39:26 login_node systemd[1]: slurmdbd.service: control process exited, code=exited status=1 nov 20 10:39:26 login_node systemd[1]: Failed to start Slurm DBD accounting daemon. nov 20 10:39:26 login_node systemd[1]: Unit slurmdbd.service entered failed state.
    nov 20 10:39:26 login_node systemd[1]: slurmdbd.service failed.
    $ journalctl -xe
nov 20 10:39:26 login_node polkitd[1078]: Registered Authentication Agent for unix-process:27586:119889015 (system bus name :1.871 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /or nov 20 10:39:26 login_node systemd[1]: Starting Slurm DBD accounting daemon...
    -- Subject: Unit slurmdbd.service has begun start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit slurmdbd.service has begun starting up.
nov 20 10:39:26 login_node systemd[1]: slurmdbd.service: control process exited, code=exited status=1 nov 20 10:39:26 login_node systemd[1]: Failed to start Slurm DBD accounting daemon.
    -- Subject: Unit slurmdbd.service has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit slurmdbd.service has failed.
    --
    -- The result is failed.
nov 20 10:39:26 login_node systemd[1]: Unit slurmdbd.service entered failed state.
    nov 20 10:39:26 login_node systemd[1]: slurmdbd.service failed.
nov 20 10:39:26 login_node polkitd[1078]: Unregistered Authentication Agent for unix-process:27586:119889015 (system bus name :1.871, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, nov 20 10:40:06 login_node gmetad[1519]: data_thread() for [HPCSIE] failed to contact node 192.168.2.10 nov 20 10:40:06 login_node gmetad[1519]: data_thread() got no answer from any [HPCSIE] datasource nov 20 10:40:13 login_node dhcpd[2320]: DHCPREQUEST for 192.168.2.19 from XX:XX:XX:XX:XX:XX via enp6s0f1 nov 20 10:40:13 login_node dhcpd[2320]: DHCPACK on 192.168.2.19 to XX:XX:XX:XX:XX:XX via enp6s0f1 nov 20 10:40:39 login_node dhcpd[2320]: DHCPREQUEST for 192.168.2.13 from XX:XX:XX:XX:XX:XX via enp6s0f1 nov 20 10:40:39 login_node dhcpd[2320]: DHCPACK on 192.168.2.13 to XX:XX:XX:XX:XX:XX via enp6s0f1

I've just found out the file `/var/run/slurmdbd.pid` does not even exist.

I'd appreciate any hint on this issue.

Thanks

Reply via email to