Hi Diego.

In our setup, the database is critical.  We have some wrapper scripts that 
consult the database for information, and we also set environment variables on 
login, based on user/partition associations.  If the database is down, none of 
those things work.

I doubt there is appetite in the organization to change the way our setup 
works, but if we can improve database reliability, that would be a good 
solution.  Mostly I am interested in protecting from hardware failure, and 
that’s why I’m interested in a cluster solution such as XtraDB.

Thanks.

Daniel

> On Jan 23, 2024, at 03:23, Diego Zuccato <diego.zucc...@unibo.it> wrote:
> 
> IIUC the database is not "critical": if it goes down, you lose access to some 
> statistics. But job data gets cached anyway and the db will be updated when 
> it comes back online.
> 
> Diego
> 
> Il 22/01/2024 18:23, Daniel L'Hommedieu ha scritto:
>> Community:
>> What do you do to ensure database reliability in your SLURM environment?  We 
>> can have multiple controllers and multiple slurmdbds, but my understanding 
>> is that slurmdbd can be configured with a single MySQL server, so what do 
>> you do?  Do you have that “single MySQL server” be a cluster, such as 
>> Percona XtraDB?  Do you use MySQL replication, then manually switch to 
>> slurmdbd to a replication slave if the master goes down?  Do you do 
>> something else?
>> Thanks.
>> Daniel
> 
> -- 
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> 


Reply via email to