If you follow the guide on the Slurm website you shouldn't have many
problems. We've made it standard practice here to set all partitions to
DOWN and suspend all the jobs when we do upgrades. This has led to far
greater stability. So we haven't lost any jobs in an upgrade. The only
weirdness we have seen is if jobs exit while the DB upgrade is going.
Sometimes it can leave residual jobs in the DB that were properly closed
out. This is why we pause all the jobs as it makes it such that we
don't end up with jobs exiting before the DB is back. In 16.05+ you
have the:
sacctmgr show runawayjobs
Feature which can clean up all those orphan jobs. So its not as much a
concern anymore.
Beyond that we follow the guide at the bottom of this page:
https://slurm.schedmd.com/quickstart_admin.html
I haven't tried going two major versions at once though. The docs
indicate that it should work fine. We generally try to keep pace with
current stable.
Given that you only have 100,000 jobs your upgrade should probably go
fairly quick. I could imagine around 10-15 minutes. Our DB has several
million jobs and it takes about 30 min to an hour depending on what
operations are bing done.
-Paul Edmon-
On 06/20/2017 09:37 AM, Nicholas McCollum wrote:
I'm about to update 15.08 to the latest SLURM in August and would
appreciate any notes you have on the process.
I'm especially interested in maintaining the DB as well as
associations. I'd also like to keep the pending job list if possible.
I've only got around 100,000 jobs in the DB so far, since January.
Thanks
Nick McCollum
Alabama Supercomputer Authority
On Jun 20, 2017 8:07 AM, Paul Edmon <ped...@cfa.harvard.edu> wrote:
Yeah, that sounds about right. Changes between major versions can
take
quite a bit of time. In the past I've seen upgrades take 2-3
hours for
the DB.
As for ways to speed it up. Putting the DB on newer hardware if you
haven't already helps quite a bit (depends on architecture as to how
much gain you will get, we went from AMD Abu Dhabi to Intel Broadwell
and saw a factor of 3-4 speed improvement). Upgrading to the latest
version of MariaDB if you are on an old version of MySQL can get you
about 30-40%.
Doing all of these whittled our DB upgrade times for major
upgrades to
about 30 min or so.
Beyond that I imagine some more specific DB optimization tricks
could be
done, but I'm not a DB admin so I won't venture to say.
-Paul Edmon-
On 06/20/2017 08:42 AM, Tim Fora wrote:
> Hi,
>
> Upgraded from 15.08 to 17.02. It took about one hour for slurmdbd to
> start. Logs show most of the time was spent on this step and
other table
> changes:
>
> adding column admin_comment after account in table
>
> Does this sound right? Any ideas to help things speed up.
>
> Thanks,
> Tim
>
>