10.4.22
On Sat, Dec 4, 2021 at 1:35 AM Brian Andrus <toomuc...@gmail.com> wrote: > Which version of Mariadb are you using? > > Brian Andrus > On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote: > > After installation of libmariadb-dev, I have reinstalled the entire slurm > with ./configure + options, make, and make install. Still, > accounting_storage_mysql.so is missing. > > > > On Sat, Dec 4, 2021 at 12:24 AM Sean Crosby <scro...@unimelb.edu.au> > wrote: > >> Did you run >> >> ./configure (with any other options you normally use) >> make >> make install >> >> on your DBD server after you installed the mariadb-devel package? >> >> ------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of >> Giuseppe G. A. Celano <giuseppegacel...@gmail.com> >> *Sent:* Saturday, 4 December 2021 10:07 >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* [EXT] Re: [slurm-users] slurmdbd does not work >> >> * External email: Please exercise caution * >> ------------------------------ >> The problem is the lack of /usr/lib/slurm/accounting_storage_mysql.so >> >> I have installed many mariadb-related packages, but that file is not >> created by slurm after installation: is there a point in the documentation >> where the installation procedure for the database is made explicit? >> >> >> >> On Fri, Dec 3, 2021 at 5:15 PM Brian Andrus <toomuc...@gmail.com> wrote: >> >> You will need to also reinstall/restart slurmdbd with the updated binary. >> >> Look in the slurmdbd logs to see what is happening there. I suspect it >> had errors updating/creating the database and tables. If you have no data >> in it yet, you can just DROP the database and restart slurmdbd. >> >> Brian Andrus >> On 12/3/2021 6:42 AM, Giuseppe G. A. Celano wrote: >> >> Thanks for the answer, Brian. I now added >> --with-mysql_config=/etc/mysql/my.cnf, but the problem is still there and >> now also slurmctld does not work, with the error: >> >> [2021-12-03T15:36:41.018] accounting_storage/slurmdbd: >> clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 >> with slurmdbd >> [2021-12-03T15:36:41.019] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.019] error: _slurm_persist_recv_msg: only read 150 >> of 2613 bytes >> [2021-12-03T15:36:41.019] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.020] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg: only read 150 >> of 2613 bytes >> [2021-12-03T15:36:41.020] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.020] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.020] error: _slurm_persist_recv_msg: only read 150 >> of 2613 bytes >> [2021-12-03T15:36:41.020] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.020] error: DBD_GET_TRES failure: No error >> [2021-12-03T15:36:41.021] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg: only read 0 of >> 2613 bytes >> [2021-12-03T15:36:41.021] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.021] error: DBD_GET_QOS failure: No error >> [2021-12-03T15:36:41.021] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.021] error: _slurm_persist_recv_msg: only read 150 >> of 2613 bytes >> [2021-12-03T15:36:41.021] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.021] error: DBD_GET_USERS failure: No error >> [2021-12-03T15:36:41.022] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg: only read 0 of >> 2613 bytes >> [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.022] error: DBD_GET_ASSOCS failure: No error >> [2021-12-03T15:36:41.022] error: _conn_readable: persistent connection >> for fd 9 experienced error[104]: Connection reset by peer >> [2021-12-03T15:36:41.022] error: _slurm_persist_recv_msg: only read 0 of >> 2613 bytes >> [2021-12-03T15:36:41.022] error: Sending PersistInit msg: No error >> [2021-12-03T15:36:41.022] error: DBD_GET_RES failure: No error >> [2021-12-03T15:36:41.022] fatal: You are running with a database but for >> some reason we have no TRES from it. This should only happen if the >> database is down and you don't have any state files. >> >> >> >> On Thu, Dec 2, 2021 at 10:36 PM Brian Andrus <toomuc...@gmail.com> wrote: >> >> >> Your slurm needs built with the support. If you have mysql-devel >> installed it should pick it up, otherwise you can specify the location with >> --with-mysql when you configure/build slurm >> >> Brian Andrus >> On 12/2/2021 12:40 PM, Giuseppe G. A. Celano wrote: >> >> Hi everyone, >> >> I am having trouble getting * slurmdbd* to work. This is the error I get: >> >> >> >> >> *error: Couldn't find the specified plugin name for >> accounting_storage/mysql looking at all files error: cannot find >> accounting_storage plugin for accounting_storage/mysql error: cannot create >> accounting_storage context for accounting_storage/mysql fatal: Unable to >> initialize accounting_storage/mysql accounting storage plugin* >> >> I have installed *mysql* (*apt install mysql*) on Ubuntu 20.04.03 and >> followed the instructions on the slurm website >> <https://slurm.schedmd.com/accounting.html>; * mysql* is running (*port >> 3306*) and these are the relevant parts in my * .conf* files: >> >> *slurm.conf* >> >> # LOGGING AND ACCOUNTING >> AccountingStorageHost=localhost >> AccountingStoragePort=3306 >> AccountingStorageType=accounting_storage/slurmdbd >> AccountingStorageUser=slurm >> JobCompType=jobcomp/none >> JobAcctGatherFrequency=30 >> JobAcctGatherType=jobacct_gather/linux >> SlurmctldDebug=info >> SlurmctldLogFile=/var/log/slurmctld.log >> SlurmdDebug=info >> SlurmdLogFile=/var/log/slurmd.log >> >> *slurmdbd.conf* >> >> AuthType=auth/munge >> DbdAddr=localhost >> DbdHost=localhost >> DbdPort=3306 >> LogFile=/var/log/slurmdbd.log >> PidFile=/var/run/slurmdbd.pid >> PluginDir=/usr/lib/slurm >> SlurmUser=slurm >> StoragePass=password >> StorageType=accounting_storage/mysql >> StorageUser=slurm >> StorageLoc=slurm_acct_db >> >> I changed the port to 3306 because otherwise *slurmdbd *could not >> communicate with *mysql*. If I run *sacct*, for example, I get: >> >> >> >> >> >> >> >> >> >> >> *sacct: error: _slurm_persist_recv_msg: read of fd 3 failed: No error >> sacct: error: _slurm_persist_recv_msg: only read 126 of 2616 bytes sacct: >> error: slurm_persist_conn_open: No response to persist_init sacct: error: >> Sending PersistInit msg: No error JobID JobName Partition >> Account AllocCPUS State ExitCode ------------ ---------- ---------- >> ---------- ---------- ---------- -------- sacct: error: >> _slurm_persist_recv_msg: read of fd 3 failed: No error sacct: error: >> _slurm_persist_recv_msg: only read 126 of 2616 bytes sacct: error: Sending >> PersistInit msg: No error sacct: error: DBD_GET_JOBS_COND failure: >> Unspecified error* >> >> Does anyone have a suggestion to solve this problem? Thank you very much. >> >> Best, >> Giuseppe >> >>