Re: [slurm-users] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Brian Andrus
RC, In that scenario, the backup slurmdbd would take over, but then its database would not necessarily be in sync with the 'main' database (hence the warnings/info about it in the documentation). For my setup, I have 2 slurmdbd hosts, but they both connect to the same, separate, MariaDB serv

Re: [slurm-users] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Richard Chang
Does it mean it is best to use a single slurmdbd host in my case? My primary slurmctld is the backup slurmdbd host, and my worry is if the primary slurmdbd host ( which is also the mariadb server) goes down, will the backup slurmdbd be able to cache data and wait till the mariadb catches up ?

Re: [slurm-users] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Brian Andrus
Ole, Fair enough, it is actually slurmctld that does the caching. Technical typo on my part there. Just trying to let the user know, there is a window that they have to ensure no information is lost during a database outage. Brian Andrus On 11/1/2022 1:43 AM, Ole Holm Nielsen wrote: Hi Br

Re: [slurm-users] [EXTERNAL] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Greg Wickham
Hi Richard, While trying to respond I was looking into the manual pages and while it does appear that slurm can support some kind of high availability(*) it doesn’t seem simple. With multiple slurmctld only one can be active at any time as they share state information. It’s not clear how they

Re: [slurm-users] [EXTERNAL] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Richard Chang
Hello Greg, I have a two node set up. node1 is primary slurmctld + backup slurmdbd and node2 is primary slurmdbd + backup slurmctld and mysql database host.  My concern is if node 2 goes down, then the backup slurmdbd will take over, then what will happen ? I have read that slurmctld can ca

Re: [slurm-users] SlurmDBD losing connection to the backend MariaDB

2022-11-01 Thread Ole Holm Nielsen
Hi Brian, On 11/1/22 05:28, Brian Andrus wrote: It caches up to a point. As I understand it, that is about an hour (depending on size and how busy the cluster is, as well as available memory, etc). Have you found any documentation of slurmdbd caching? It's well-known that slurmctld caches i