Hi Richard,
Slurmctld caches the updates until slurmdbd comes back online.
You can see how many records are pending for the database by using the “sdiag”
command and looking for “DBD Agent queue size”.
If this number grows significantly it means that slurmdbd isn’t available.
-Greg
On 01/1
It caches up to a point. As I understand it, that is about an hour
(depending on size and how busy the cluster is, as well as available
memory, etc).
Brian Andrus
On 10/31/2022 9:20 PM, Richard Chang wrote:
Hi,
Just for my info, I would like to know what happens when SlurmDBD
loses connect
Hi,
Just for my info, I would like to know what happens when SlurmDBD loses
connection to the backend Database, for ex, MariaDB.
Does it cache the accounting info and keep them till the DB comes back
up ?, or does it panic and shut down ?
Thank you,
RC.
On 10/31/22 5:46 am, Davide DelVento wrote:
Thanks for helping me find workarounds.
No worries!
My only other thought is that you might be able to use node features &
job constraints to communicate this without the user realising.
I am not sure I understand this approach.
I was just tryi
Thanks for helping me find workarounds.
> My only other thought is that you might be able to use node features &
> job constraints to communicate this without the user realising.
I am not sure I understand this approach.
> For instance you could declare the nodes where the software is installed
On 10/31/22 10:13, Richard Chang wrote:
This is 21.08
As I have written to you previously, switch/hpe_slingshot is only
supported from Slurm 22.05!
/Ole
On 10/31/2022 11:05 AM, Chris Samuel wrote:
On 27/10/22 11:30 pm, Richard Chang wrote:
Yes, the system is a HPE Cray EX, and I am tryin
This is 21.08
Than you,
RC
On 10/31/2022 11:05 AM, Chris Samuel wrote:
On 27/10/22 11:30 pm, Richard Chang wrote:
Yes, the system is a HPE Cray EX, and I am trying to use
switch/hpe_slingshot.
Which version of Slurm are you using Richard?
All the best,
Chris