On 04-03-2019 16:30, Loris Bennett wrote:
On 3/4/19 2:26 PM, Loris Bennett wrote:
Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:
We're one of the many Slurm sites which run the slurmdbd database daemon on the
same server as the slurmctld daemon. This works without problems at our site
given our modest load, however, SchedMD recommends to run the daemons on
separate servers.
Contemplating how to upgrade our cluster from Slurm 17.11 to 18.08, I've come to
appreciate the advantage of running the daemons on separate servers: One can
upgrade slurmdbd to 18.08 while keeping slurmctld at 17.11 (for a while at
least). This enables us to upgrade to 18.08 in the recommended order without
any interruption to our running jobs and without any cluster downtime.
Can't one do this even with only one server? We have always run both
slurmctld and slurmdbd on one machine and have performed all the updates
without any downtime.
For minor upgrade 17.11.x to 17.11.y there is no issue because the MySQL
database layout is unchanged.
Major upgrades such as 17.11 to 18.08 is potentially more risky, see for example
this list thread "Extreme long db upgrade 16.05.6 -> 17.11.3":
https://lists.schedmd.com/pipermail/slurm-users/2018-February/000612.html
I recommend to study the instructions in
https://slurm.schedmd.com/quickstart_admin.html#upgrade.
That is indeed the protocol we follow.
See also the slides on "Upgrading" in
https://slurm.schedmd.com/SLUG18/field_notes2.pdf from the SLUG meeting 2018
(https://slurm.schedmd.com/publications.html).
Updating the database layout during a Slurm major upgrade can in special
situations lead to problems, so it's safer to do the upgrade separately for
slurmdbd and slurmctld. This is why I've decided to move my slurmdbd and
database to a separate server now. The slurmctld which governs the entire
cluster is thereby unaffected as I "play" with the database upgrade, and I can
upgrade Slurm without any cluster downtime.
I don't understand how the separation of the two services onto two
machines in the production environment makes such a difference. No
matter where the slurmdbd is running, the slurmcltd will attempt to
contact it and cache data if the slurmdbd is unreachable. Or is the
point more that, with a second machine you can do an offline conversion
of the database, i.e. it is good to have a test and a production
environment?
This is a nice discussion! My reasoning is:
If slurmdbd and slurmctld both run on the same machine, you MUST upgrade
the RPMs simultaneously, for example, 17.11.13 to 18.08.5. When
slurmdbd runs on a separate machine, you can upgrade that one without
affecting slurmctld.
Mind you, SchedMD's recommended incremental sequence of upgrading is
these enumerated steps:
1. slurmdbd
2. slurmctld
3. slurmd (on nodes)
4. Slurm commands (on login hosts)
There is a risk involved in lumping steps 1+2 together into one step,
especially if the database upgrade somehow has a problem or takes a very
long time. What if you're forced to roll back and downgrade slurmdbd to
the old version: In this case problems may arise by downgrading
slurmctld at the same time.
A crucial part of slurmctld is the StateSaveLocation
(/var/spool/slurmctld) directory which is being updated all the time due
to cluster activity. You don't want to compromise the operation of
slurmctld while upgrading slurmdbd.
I certainly recommend testing and timing the database and slurmdbd
upgrade on a non-production node before the real upgrade.
On the other hand, the Quick Start Addmin Guide
(https://slurm.schedmd.com/quickstart_admin.html) does mention "head
node, compute nodes, and slurmdbd node". I had always assumed a
separate slurmdbd node was mainly useful for performance reasons at
sites will a huge throughput of jobs, but maybe I am missing something.
For me safety of upgrading is most important. You're right that
high-throughput will want to separate the dbd and ctld services for
performance reasons.
/Ole