manually running it through sudo slurmdbd -D /path/to/conf is very quick on my fresh install
trying to start the slurmdbd through systemctl take 3 minutes and then crashes and fail Is there an alternative to systemctl to start the slurmdbd in the background ? But most importantly I wanted to know why it takes so long through systemctl. Maybe I can increase the timeout limit ? On Thu, May 30, 2024 at 11:54 PM Ryan Novosielski <novos...@rutgers.edu> wrote: > It may take longer to start than systemd allows for. How long does it take > to start from the command line? It’s common to need to run it manually for > upgrades to complete. > > -- > #BlackLivesMatter > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novos...@rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB > A555B, Newark > `' > > On May 30, 2024, at 20:24, Radhouane Aniba via slurm-users < > slurm-users@lists.schedmd.com> wrote: > > Ok I made some progress here. > > I removed and purged slurmdbd mysql mariadb etc .. and started from > scratch. > I added the recommended mysqld requirements > > Started slurmdbd manually : sudo slurmdbd -D /path/to/conf and everything > worked well > > When I tried to start the service sudo systemctl start slurmdbd.service > it didnt work > > sudo systemctl status slurmdbd.service > ● slurmdbd.service - Slurm DBD accounting daemon > Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor > preset: enabled) > Active: failed (Result: timeout) since Fri 2024-05-31 00:21:30 UTC; > 2min 5s ago > Process: 6258 ExecStart=/usr/sbin/slurmdbd -D > /etc/slurm-llnl/slurmdbd.conf (code=exited, status=0/SUCCESS) > > May 31 00:20:00 hannibal-hn systemd[1]: Starting Slurm DBD accounting > daemon... > May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: start operation > timed out. Terminating. > May 31 00:21:30 hannibal-hn systemd[1]: slurmdbd.service: Failed with > result 'timeout'. > May 31 00:21:30 hannibal-hn systemd[1]: Failed to start Slurm DBD > accounting daemon. > > Even though it is the same command ?! > > Any idea ? > > > On Thu, May 30, 2024 at 5:02 PM Radhouane Aniba <arad...@gmail.com> wrote: > >> Thank you Ahmet and Brian, >> >> Ahmet, which conf in particular slurmdbd is readiugn from, I parsed all >> the cnf files for mysql and I cannot find the data it is displaying here >> >> slurmdbd: debug2: Attempting to connect to localhost:3306 >> slurmdbd: debug2: innodb_buffer_pool_size: 134217728 >> slurmdbd: debug2: innodb_log_file_size: 50331648 >> slurmdbd: debug2: innodb_lock_wait_timeout: 50 >> slurmdbd: error: Database settings not recommended values: >> innodb_buffer_pool_size innodb_lock_wait_timeout >> >> >> sudo tree /etc/mysql/* >> /etc/mysql/conf.d >> ├── mysql.cnf >> └── mysqldump.cnf >> /etc/mysql/debian.cnf >> /etc/mysql/debian-start >> /etc/mysql/FROZEN >> /etc/mysql/mariadb.cnf >> /etc/mysql/mariadb.conf.d >> ├── 50-client.cnf >> ├── 50-mysql-clients.cnf >> ├── 50-mysqld_safe.cnf >> └── 50-server.cnf >> /etc/mysql/my.cnf >> /etc/mysql/my.cnf.fallback >> /etc/mysql/mysql.cnf >> /etc/mysql/mysql.conf.d >> ├── mysql.cnf >> └── mysqld.cnf >> >> On Thu, May 30, 2024 at 12:21 PM Brian Andrus via slurm-users < >> slurm-users@lists.schedmd.com> wrote: >> >>> That SIGTERM message means something is telling slurmdbd to quit. >>> >>> Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told >>> to shutdown. If you are running in the foreground, a ^C does that. If you >>> run a kill or killall on it, you will get that same message. >>> >>> Brian Andrus >>> On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote: >>> >>> Yes I can connect to my database using mysql --user=slurm >>> --password=slurmdbpass slurm_acct_db and there is no firewall blocking >>> mysql after checking the firewall question >>> >>> ALso here is the output of slurmdbd -D -vvv (note I can only run this as >>> sudo ) >>> >>> sudo slurmdbd -D -vvv >>> slurmdbd: debug: Log file re-opened >>> slurmdbd: debug: Munge authentication plugin loaded >>> slurmdbd: debug2: mysql_connect() called for db slurm_acct_db >>> slurmdbd: debug2: Attempting to connect to localhost:3306 >>> slurmdbd: debug2: innodb_buffer_pool_size: 134217728 >>> slurmdbd: debug2: innodb_log_file_size: 50331648 >>> slurmdbd: debug2: innodb_lock_wait_timeout: 50 >>> slurmdbd: error: Database settings not recommended values: >>> innodb_buffer_pool_size innodb_lock_wait_timeout >>> slurmdbd: Accounting storage MYSQL plugin loaded >>> slurmdbd: debug2: ArchiveDir = /tmp >>> slurmdbd: debug2: ArchiveScript = (null) >>> slurmdbd: debug2: AuthAltTypes = (null) >>> slurmdbd: debug2: AuthInfo = (null) >>> slurmdbd: debug2: AuthType = auth/munge >>> slurmdbd: debug2: CommitDelay = 0 >>> slurmdbd: debug2: DbdAddr = localhost >>> slurmdbd: debug2: DbdBackupHost = (null) >>> slurmdbd: debug2: DbdHost = hannibal-hn >>> slurmdbd: debug2: DbdPort = 7032 >>> slurmdbd: debug2: DebugFlags = (null) >>> slurmdbd: debug2: DebugLevel = 6 >>> slurmdbd: debug2: DebugLevelSyslog = 10 >>> slurmdbd: debug2: DefaultQOS = (null) >>> slurmdbd: debug2: LogFile = /var/log/slurmdbd.log >>> slurmdbd: debug2: MessageTimeout = 100 >>> slurmdbd: debug2: Parameters = (null) >>> slurmdbd: debug2: PidFile = /run/slurmdbd.pid >>> slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm >>> slurmdbd: debug2: PrivateData = none >>> slurmdbd: debug2: PurgeEventAfter = 1 months* >>> slurmdbd: debug2: PurgeJobAfter = 12 months* >>> slurmdbd: debug2: PurgeResvAfter = 1 months* >>> slurmdbd: debug2: PurgeStepAfter = 1 months >>> slurmdbd: debug2: PurgeSuspendAfter = 1 months >>> slurmdbd: debug2: PurgeTXNAfter = 12 months >>> slurmdbd: debug2: PurgeUsageAfter = 24 months >>> slurmdbd: debug2: SlurmUser = root(0) >>> slurmdbd: debug2: StorageBackupHost = (null) >>> slurmdbd: debug2: StorageHost = localhost >>> slurmdbd: debug2: StorageLoc = slurm_acct_db >>> slurmdbd: debug2: StoragePort = 3306 >>> slurmdbd: debug2: StorageType = accounting_storage/mysql >>> slurmdbd: debug2: StorageUser = slurm >>> slurmdbd: debug2: TCPTimeout = 2 >>> slurmdbd: debug2: TrackWCKey = 0 >>> slurmdbd: debug2: TrackSlurmctldDown= 0 >>> slurmdbd: debug2: acct_storage_p_get_connection: request new connection >>> 1 >>> slurmdbd: debug2: Attempting to connect to localhost:3306 >>> slurmdbd: slurmdbd version 19.05.5 started >>> slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024 >>> slurmdbd: debug2: Everything rolled up >>> >>> >>> It goes like this for some time and then it crashes with this message >>> >>> slurmdbd: Terminate signal (SIGINT or SIGTERM) received >>> slurmdbd: debug: rpc_mgr shutting down >>> >>> >>> On Thu, May 30, 2024 at 8:18 AM mercan <ahmet.mer...@uhem.itu.edu.tr> >>> wrote: >>> >>>> Did you try to connect database using mysql command? >>>> >>>> mysql --user=slurm --password=slurmdbpass slurm_acct_db >>>> >>>> C. Ahmet Mercan >>>> >>>> On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote: >>>> >>>> Thank you Ahmet, >>>> I dont have a firewall active. >>>> And because slurmdbd cannot connect to the database I am not able to >>>> getting it to be activated through systemctl I will share the output for >>>> slurmdbd -D -vvv shortly but overall it is always saying trying to connect >>>> to the db and then retries a couple of times and crashes >>>> >>>> R. >>>> >>>> >>>> >>>> >>>> On Thu, May 30, 2024 at 2:51 AM mercan <ahmet.mer...@uhem.itu.edu.tr> >>>> wrote: >>>> >>>>> Hi; >>>>> >>>>> Did you check can you connect db with your conf parameters from >>>>> head-node: >>>>> >>>>> mysql --user=slurm --password=slurmdbpass slurm_acct_db >>>>> >>>>> Also, check and stop firewall and selinux, if they are running. >>>>> >>>>> Last, you can stop slurmdbd, then run run terminal with: >>>>> >>>>> slurmdbd -D -vvv >>>>> >>>>> Regards; >>>>> >>>>> C. Ahmet Mercan >>>>> >>>>> On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote: >>>>> >>>>> Hi everyone >>>>> I am trying to get slurmdbd to run on my local home server but I am >>>>> really struggling. >>>>> Note : am a novice slurm user >>>>> my slurmdbd always times out even though all the details in the conf >>>>> file are correct >>>>> >>>>> My log looks like this >>>>> >>>>> [2024-05-29T20:51:30.088] Accounting storage MYSQL plugin loaded >>>>> [2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp >>>>> [2024-05-29T20:51:30.088] debug2: ArchiveScript = (null) >>>>> [2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null) >>>>> [2024-05-29T20:51:30.088] debug2: AuthInfo = (null) >>>>> [2024-05-29T20:51:30.088] debug2: AuthType = auth/munge >>>>> [2024-05-29T20:51:30.088] debug2: CommitDelay = 0 >>>>> [2024-05-29T20:51:30.088] debug2: DbdAddr = localhost >>>>> [2024-05-29T20:51:30.088] debug2: DbdBackupHost = (null) >>>>> [2024-05-29T20:51:30.088] debug2: DbdHost = head-node >>>>> [2024-05-29T20:51:30.088] debug2: DbdPort = 7032 >>>>> [2024-05-29T20:51:30.088] debug2: DebugFlags = (null) >>>>> [2024-05-29T20:51:30.088] debug2: DebugLevel = 6 >>>>> [2024-05-29T20:51:30.088] debug2: DebugLevelSyslog = 10 >>>>> [2024-05-29T20:51:30.088] debug2: DefaultQOS = (null) >>>>> [2024-05-29T20:51:30.088] debug2: LogFile = /var/log/slurmdbd.log >>>>> [2024-05-29T20:51:30.088] debug2: MessageTimeout = 100 >>>>> [2024-05-29T20:51:30.088] debug2: Parameters = (null) >>>>> [2024-05-29T20:51:30.088] debug2: PidFile = /run/slurmdbd.pid >>>>> [2024-05-29T20:51:30.088] debug2: PluginDir = >>>>> /usr/lib/x86_64-linux-gnu/slurm-wlm >>>>> [2024-05-29T20:51:30.088] debug2: PrivateData = none >>>>> [2024-05-29T20:51:30.088] debug2: PurgeEventAfter = 1 months* >>>>> [2024-05-29T20:51:30.088] debug2: PurgeJobAfter = 12 months* >>>>> [2024-05-29T20:51:30.088] debug2: PurgeResvAfter = 1 months* >>>>> [2024-05-29T20:51:30.088] debug2: PurgeStepAfter = 1 months >>>>> [2024-05-29T20:51:30.088] debug2: PurgeSuspendAfter = 1 months >>>>> [2024-05-29T20:51:30.088] debug2: PurgeTXNAfter = 12 months >>>>> [2024-05-29T20:51:30.088] debug2: PurgeUsageAfter = 24 months >>>>> [2024-05-29T20:51:30.088] debug2: SlurmUser = root(0) >>>>> [2024-05-29T20:51:30.089] debug2: StorageBackupHost = (null) >>>>> [2024-05-29T20:51:30.089] debug2: StorageHost = localhost >>>>> [2024-05-29T20:51:30.089] debug2: StorageLoc = slurm_acct_db >>>>> [2024-05-29T20:51:30.089] debug2: StoragePort = 3306 >>>>> [2024-05-29T20:51:30.089] debug2: StorageType = >>>>> accounting_storage/mysql >>>>> [2024-05-29T20:51:30.089] debug2: StorageUser = slurm >>>>> [2024-05-29T20:51:30.089] debug2: TCPTimeout = 2 >>>>> [2024-05-29T20:51:30.089] debug2: TrackWCKey = 0 >>>>> [2024-05-29T20:51:30.089] debug2: TrackSlurmctldDown= 0 >>>>> [2024-05-29T20:51:30.089] debug2: acct_storage_p_get_connection: >>>>> request new connection 1 >>>>> [2024-05-29T20:51:30.089] debug2: Attempting to connect to >>>>> localhost:3306 >>>>> [2024-05-29T20:51:30.090] slurmdbd version 19.05.5 started >>>>> [2024-05-29T20:51:30.090] debug2: running rollup at Wed May 29 >>>>> 20:51:30 2024 >>>>> [2024-05-29T20:51:30.091] debug2: Everything rolled up >>>>> [2024-05-29T20:51:49.673] Terminate signal (SIGINT or SIGTERM) >>>>> received >>>>> [2024-05-29T20:51:49.673] debug: rpc_mgr shutting down >>>>> >>>>> >>>>> >>>>> my config file looks like this >>>>> >>>>> ArchiveEvents=yes >>>>> ArchiveJobs=yes >>>>> ArchiveResvs=yes >>>>> ArchiveSteps=no >>>>> ArchiveSuspend=no >>>>> ArchiveTXN=no >>>>> ArchiveUsage=no >>>>> PurgeEventAfter=1month >>>>> PurgeJobAfter=12month >>>>> PurgeResvAfter=1month >>>>> PurgeStepAfter=1month >>>>> PurgeSuspendAfter=1month >>>>> PurgeTXNAfter=12month >>>>> PurgeUsageAfter=24month >>>>> # Authentication info >>>>> AuthType=auth/munge >>>>> # slurmDBD info >>>>> DbdAddr=localhost >>>>> DbdHost=head-node >>>>> DbdPort=7032 >>>>> SlurmUser=root >>>>> MessageTimeout=100 >>>>> DebugLevel=5 >>>>> #DefaultQOS=normal,standby >>>>> LogFile=/var/log/slurmdbd.log >>>>> PidFile=/run/slurmdbd.pid >>>>> #PrivateData=accounts,users,usage,jobs >>>>> #TrackWCKey=yes >>>>> # >>>>> # Database info >>>>> StorageType=accounting_storage/mysql >>>>> StorageHost=localhost >>>>> StoragePort=3306 >>>>> StoragePass=slurmdbpass >>>>> StorageUser=slurm >>>>> StorageLoc=slurm_acct_db >>>>> I used standard names and passwords to get started and I will change >>>>> later >>>>> >>>>> but everytime I try to start slurmdbd.service it crashes and I have >>>>> that log that I shared with you >>>>> >>>>> I use these versions >>>>> >>>>> slurmdbd -V >>>>> slurm-wlm 19.05.5 >>>>> mysql Ver 15.1 Distrib 10.3.39-MariaDB, for debian-linux-gnu (x86_64) >>>>> using readline 5.2 >>>>> Everything else Is working properly except I cannot get slurmdbd to >>>>> work and at this point I exhausted all my possible trials :) looking for >>>>> some expert insights :) >>>>> >>>>> >>>>> Any idea what I am doing wrong here ? Also I didn't compile any slurm >>>>> package. I used the binary from apt repos >>>>> >>>>> Any help will be appreciated >>>>> >>>>> Cheers >>>>> >>>>> Rad >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>> >>> >>> -- >>> *Rad Aniba, PhD* >>> >>> >>> >>> -- >>> slurm-users mailing list -- slurm-users@lists.schedmd.com >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >>> >> >> >> -- >> *Rad Aniba, PhD* >> >> > > -- > *Rad Aniba, PhD* > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > > > -- *Rad Aniba, PhD*
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com