Hi Richard,

While trying to respond I was looking into the manual pages and while it does 
appear that slurm can support some kind of high availability(*) it doesn’t seem 
simple.

With multiple slurmctld only one can be active at any time as they share state 
information. It’s not clear how they know about each other, so this may require 
STONITH(*).

With slurmdbd, there’s “AccountingStorageHost” and 
“AccountingStorageBackupHost”, again it’s not quite clear how these interact.

In slrmdbd.conf there is “StorageBackupHost” with the description:

. . . . It is up to the backup solution to enforce the coherency of the
accounting information between the two hosts. With clustered
database solutions (active/passive HA), you would not need to use
this feature. Default is none.

On our site we’re running only a simple setup. One VM with slurmctld and 
another VM with both slurmdbd+mariadbd.

Perhaps others who have dabbled with redundancy can reply.

   -greg

(* I say this trusting the best way to get a response on the Internet is say 
something wrong and then wait for the avalanche of corrections).

On 01/11/2022, 12:08, "slurm-users" <slurm-users-boun...@lists.schedmd.com> 
wrote:


Hello Greg,

I have a two node set up. node1 is primary slurmctld + backup slurmdbd and 
node2 is primary slurmdbd + backup slurmctld and mysql database host.

 My concern is if node 2 goes down, then the backup slurmdbd will take over, 
then what will happen ?

I have read that slurmctld can cache data, but what about slurmdbd? Not sure.

I have intentionally used the slurmdbd + mariadb in the second node because I 
didn't want to overload the primary slurmctld.

I hope you all are getting the picture of how my set up is.

Thanks,

RC


On 11/1/2022 10:40 AM, Greg Wickham wrote:
Hi Richard,

Slurmctld caches the updates until slurmdbd comes back online.

You can see how many records are pending for the database by using the “sdiag” 
command and looking for “DBD Agent queue size”.

If this number grows significantly it means that slurmdbd isn’t available.

   -Greg

On 01/11/2022, 07:23, "slurm-users" 
<slurm-users-boun...@lists.schedmd.com><mailto:slurm-users-boun...@lists.schedmd.com>
 wrote:

Hi,

Just for my info, I would like to know what happens when SlurmDBD loses
connection to the backend Database, for ex, MariaDB.

Does it cache the accounting info and keep them till the DB comes back
up ?, or does it panic and shut down ?

Thank you,

RC.


Reply via email to