Hi Diego. In our setup, the database is critical. We have some wrapper scripts that consult the database for information, and we also set environment variables on login, based on user/partition associations. If the database is down, none of those things work.
I doubt there is appetite in the organization to change the way our setup works, but if we can improve database reliability, that would be a good solution. Mostly I am interested in protecting from hardware failure, and that’s why I’m interested in a cluster solution such as XtraDB. Thanks. Daniel > On Jan 23, 2024, at 03:23, Diego Zuccato <diego.zucc...@unibo.it> wrote: > > IIUC the database is not "critical": if it goes down, you lose access to some > statistics. But job data gets cached anyway and the db will be updated when > it comes back online. > > Diego > > Il 22/01/2024 18:23, Daniel L'Hommedieu ha scritto: >> Community: >> What do you do to ensure database reliability in your SLURM environment? We >> can have multiple controllers and multiple slurmdbds, but my understanding >> is that slurmdbd can be configured with a single MySQL server, so what do >> you do? Do you have that “single MySQL server” be a cluster, such as >> Percona XtraDB? Do you use MySQL replication, then manually switch to >> slurmdbd to a replication slave if the master goes down? Do you do >> something else? >> Thanks. >> Daniel > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Università di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 >