I use rpm's for our installs here. I usually pause all the jobs prior
to the upgrade, then I follow the guide here:
https://slurm.schedmd.com/quickstart_admin.html
I haven't done the upgrade to 18.08 though yet, and so I haven't had to
contend with the automatic restart that seems to be the case with the
new rpm spec script (we went to 17.11 prior to the rpm spec reorg).
Frankly I wish that they didn't do the automatic restart as I like to
manage that myself.
As Chris said though you definitely want to do the slurmdbd upgrade from
the commandline. I've had it where when just restarting the service it
times out and the database only gets partially update. In which case I
had to restore from the mysqldump I had made and tried again. Also
highly recommend doing mysqldumps prior to major version updates.
-Paul Edmon-
On 09/25/2018 09:54 AM, Baker D.J. wrote:
Thank you for your comments. I could potentially force the upgrade of
the slurm and slurm-slumdbd rpms using something like:
rpm -Uvh --noscripts --nodeps --force slurm-18.08.0-1.el7.x86_64.rpm
slurm-slurmdbd-18.08.0-1.el7.x86_64.rpm
That will certainly work, however the slurmctld (or in the case of my
test node, the slurmd) will be killed. The logic is that at v17.02 the
slurm rpm provides slurmctld and slurmd. So upgrading that rpm will
destroy/kill the existing slurmctld or slurmd processes. That is...
# rpm -q --whatprovides /usr/sbin/slurmctld
slurm-17.02.8-1.el7.x86_64
So if I force the upgrade of that rpm then I delete and kill
/usr/sbin/slurmctld. In the new rpm structure slurmctld is now
provided by its own rpm.
I would have thought that someone would have crossed this bridge, but
maybe most admins don't use rpms...
Best regards,
David
------------------------------------------------------------------------
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf
of Chris Samuel <ch...@csamuel.org>
*Sent:* 25 September 2018 13:00
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 -->
18.08
On Tuesday, 25 September 2018 9:41:10 PM AEST Baker D. J. wrote:
> I guess that the only solution is to upgrade all the slurm at once. That
> means that the slurmctld will be killed (unless it has been stopped
first).
We don't use RPMs from Slurm [1], but the rpm command does have a
--noscripts
option to (allegedly, I've never used it) suppress the execution of
pre/post
install scripts.
A big warning would be do not use systemctl to start the new slurmdbd
for the
first time when upgrading!
Stop the older one first (and then take a database dump) and then run
the new
slurmdbd process with the "-Dvvv" options (inside screen, just in
case) so
that you can watch its progress and systemd won't decide it's been
taking too
long to start and try and kill it part way through the database upgrades).
Once that's completed successfully then you can ^C it and start it up
via the
systemctl once more.
Hope that's useful!
All the best,
Chris
[1] - I've always installed into ${shared_local_area}/slurm/${version}
and had
a symlink called "latest" that points at the currently blessed version of
Slurm. Then I stop slurmdbd, upgrade that as above, then I can do
slurmctld
(with partitions marked down, just in case). Once those are done I can
restart slurmd's around the cluster.
--
Chris Samuel :
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C35d2a0583f124e84bf0d08d622deab4e%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=uTVuUTGI3fpPZqffe1p5RifQ1%2BG%2FbsrW0ixkCeu%2FxKw%3D&reserved=0
: Melbourne, VIC