But there is even 3rd pair backup option if we count the slurm.conf : 
AccountingStorageHost and AccountingStorageBackupHost. I think this is what 
slurmctld referred to when it finds primary slurmdbd unavailable. 

Will slurmctld also go to read the slurmdbd.conf? It seem to be the dedicated 
configuration file for slurmdbd. So I think the DbdBackupHost should not 
influence the slurmctld’s behaviour. Otherwise the usage of 
AccountingStorageBackupHost and DbdBackupHost will be totally duplicated.

 

张天阳

网络信息中心 计算业务部

 

发件人: Daniel Letai <d...@letai.org.il> 
发送时间: 2025年2月21日 14:04
收件人: taleinterve...@sjtu.edu.cn
抄送: slurm-users@lists.schedmd.com
主题: Re: 回复: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb 
node with HA database?

 

There are 2 backuphosts configurations.

 

DbdBackupHost is used if the slurmdbd service is unavailable (timeout). In that 
case the slurmctld will try to connect to the slurmdbd on another node.

StorageBackupHost, on the other hand, is what you describe - timeout when 
connecting to a DB replica will make slurmdbd switch to using the other replica.

 

On 21/02/2025 4:45, taleinterve...@sjtu.edu.cn 
<mailto:taleinterve...@sjtu.edu.cn>  wrote:

Thank you for your insightful suggestions. Placing both slurmdbd and slurmctld 
on the same node is indeed a new structure  that we hadn’t considered before, 
and it seems to provide a much clearer logic for deployment.

 

Regarding the usage of DbdBackupHost, I would like to confirm my understanding 
of how it works. Is it mean that the DbdBackupHost option will only be 
referenced when slurmdbd service detects its local database (specified by 
StorageHost) is unavailable? And I guess in that case, the first slurmdbd 
service would act as a proxy who forwards requests to the DbdBackupHost and 
returns the data from there to slurmctld?

 

 

发件人: Daniel Letai  <mailto:d...@letai.org.il> <d...@letai.org.il> 
发送时间: 2025年2月20日 21:56
收件人: taleinterve...@sjtu.edu.cn <mailto:taleinterve...@sjtu.edu.cn> 
抄送: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> 
主题: Re: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node 
with HA database?

 

It's functionally the same with one difference - the configuration file is 
unmodified between nodes, allowing for simple deployment of nodes, and 
automation.

 

Regarding the backuphost - that depends on your setup. If you can ensure the 
slurmdbd service will stop if the local db replica is not healthy, you 
shouldn't need backuphost. Conversely, if there is no health check to ensure 
replica readiness, configure the backuphost. This will require using a 
different conf file for each node, unless setting up a more robust HA 
clustering scheme.

 

The other option is to separate the dbd from the db. Put the dbd on the ctld 
nodes (A,B) and let nodes C,D only be DB master replica (not dbd).

 

In slurm.conf on nodes A,B You will then have:

AccountingStorageHost = localhost

(without AccountingStorageBackupHost)

 

And in slurmdbd.conf you will have:

DbdHost = localhost

(without DbdBackupHost)

StorageHost = nodeC

StorageBackupHost = nodeD

 

This would mean identical slurm.conf and slurmdbd.conf on both nodes A,B, and 
no slurm conf files or processes on nodes C,D.

 

This setup assumes that the entire stack (ctld+dbd) is either working or not, 
which is usually true, as either the node is functioning or not. If the ctld is 
working but dbd is not, you will loose connection to the DB. If the ctld is not 
working, the other ctld will take charge and use its local dbd, so that 
scenario is covered.

Adding AccountingStorageBackupHost pointing to the other node is of course 
possible, but will mean different slurm.conf files which slurm will complain 
about. 

 

It will mean that most of the time you will not load balance on the 
multi-master DB replicas. Whether that is a consideration or not is for you to 
decide. 

 

On 20/02/2025 3:57, taleinterve...@sjtu.edu.cn 
<mailto:taleinterve...@sjtu.edu.cn>  wrote:

Do you mean the second configuration scheme?

I think configuring `dbdhost=localhost` is the same as configuring ` DbdAddr 
=nodeC` and ` DbdAddr =nodeD` on the two nodes respectively.

The key point is whether we should set the DbdBackupHost option and how it work?

 

 

发件人: Daniel Letai  <mailto:d...@letai.org.il> <d...@letai.org.il> 
发送时间: 2025年2月19日 18:21
收件人: slurm-users@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> 
主题: [slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with 
HA database?

 

I'm not sure it will work, didn't test it, but could you just do 
`dbdhost=localhost` to solve this?

 

 

On 18/02/2025 11:59, hermes via slurm-users wrote:

The deployment scenario is as follows:

 

nodeA                             nodeB

(slurmctld)                        (backup slurmctld)

    | \-------------------------------/ |

    | /                               \ |

nodeC                             nodeD

(slurmdbd)                      (backup slurmdbd)

(mysql)   <--multi master replica-->  (mysql)

 

Since the database is multi-master replicated, the slurmdbd should only talk to 
the mysql on its own node.

 

In such case, how should we set the slurmdbd.conf? The conf file contains 
options “DbdAddr”, “DbdHost” and “DbdBackupHost”.

Should they be consistent between nodeA-2 and nodeB-2? Such as:

DbdAddr = nodeC              | DbdAddr = nodeC

DbdHost = nodeC              | DbdHost = nodeC

DbdBackupHost = nodeD        | DbdBackupHost = nodeD

StorageHost = nodeC           | StorageHost = nodeD

 

Or maybe just set different conf and don’t use the “DbdBackupHost” like:

DbdAddr = nodeC             | DbdAddr = nodeD

DbdHost = nodeC             | DbdHost = nodeD

StorageHost = nodeC          | StorageHost = nodeD

 

I’m quite confused about the usage of DbdAddr and DbdHost. What is the 
difference between them and why only DbdHost has the backup one?

 

Another confusing point is how DbdBackupHost work. I guess It is slurmctld that 
is responsible for selecting the available slurmdbd. Since the slurm.conf 
already contains “AccountingStorageHost” and “AccountingStorageBackupHost”, why 
we need set backupdbd again on slurmdbd side?

 







 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to