Re: [slurm-users] [EXTERNAL] SlurmDBD losing connection to the backend MariaDB

2022-10-31 Thread Greg Wickham
Hi Richard, Slurmctld caches the updates until slurmdbd comes back online. You can see how many records are pending for the database by using the “sdiag” command and looking for “DBD Agent queue size”. If this number grows significantly it means that slurmdbd isn’t available. -Greg On 01/1

Re: [slurm-users] SlurmDBD losing connection to the backend MariaDB

2022-10-31 Thread Brian Andrus
It caches up to a point. As I understand it, that is about an hour (depending on size and how busy the cluster is, as well as available memory, etc). Brian Andrus On 10/31/2022 9:20 PM, Richard Chang wrote: Hi, Just for my info, I would like to know what happens when SlurmDBD loses connect

[slurm-users] SlurmDBD losing connection to the backend MariaDB

2022-10-31 Thread Richard Chang
Hi, Just for my info, I would like to know what happens when SlurmDBD loses connection to the backend Database, for ex, MariaDB. Does it cache the accounting info and keep them till the DB comes back up ?, or does it panic and shut down ? Thank you, RC.

Re: [slurm-users] Prolog and job_submit

2022-10-31 Thread Christopher Samuel
On 10/31/22 5:46 am, Davide DelVento wrote: Thanks for helping me find workarounds. No worries! My only other thought is that you might be able to use node features & job constraints to communicate this without the user realising. I am not sure I understand this approach. I was just tryi

Re: [slurm-users] Prolog and job_submit

2022-10-31 Thread Davide DelVento
Thanks for helping me find workarounds. > My only other thought is that you might be able to use node features & > job constraints to communicate this without the user realising. I am not sure I understand this approach. > For instance you could declare the nodes where the software is installed

Re: [slurm-users] Switch setting in slurm.conf breaks slurmctld if the switch type is not there in slurmcrld node

2022-10-31 Thread Ole Holm Nielsen
On 10/31/22 10:13, Richard Chang wrote: This is 21.08 As I have written to you previously, switch/hpe_slingshot is only supported from Slurm 22.05! /Ole On 10/31/2022 11:05 AM, Chris Samuel wrote: On 27/10/22 11:30 pm, Richard Chang wrote: Yes, the system is a HPE Cray EX, and I am tryin

Re: [slurm-users] Switch setting in slurm.conf breaks slurmctld if the switch type is not there in slurmcrld node

2022-10-31 Thread Richard Chang
This is 21.08 Than you, RC On 10/31/2022 11:05 AM, Chris Samuel wrote: On 27/10/22 11:30 pm, Richard Chang wrote: Yes, the system is a HPE Cray EX, and I am trying to use switch/hpe_slingshot. Which version of Slurm are you using Richard? All the best, Chris