If you follow the guide on the Slurm website you shouldn't have many problems. We've made it standard practice here to set all partitions to DOWN and suspend all the jobs when we do upgrades. This has led to far greater stability. So we haven't lost any jobs in an upgrade. The only weirdness we have seen is if jobs exit while the DB upgrade is going. Sometimes it can leave residual jobs in the DB that were properly closed out. This is why we pause all the jobs as it makes it such that we don't end up with jobs exiting before the DB is back. In 16.05+ you have the:

sacctmgr show runawayjobs

Feature which can clean up all those orphan jobs. So its not as much a concern anymore.

Beyond that we follow the guide at the bottom of this page:

https://slurm.schedmd.com/quickstart_admin.html

I haven't tried going two major versions at once though. The docs indicate that it should work fine. We generally try to keep pace with current stable.

Given that you only have 100,000 jobs your upgrade should probably go fairly quick. I could imagine around 10-15 minutes. Our DB has several million jobs and it takes about 30 min to an hour depending on what operations are bing done.

-Paul Edmon-


On 06/20/2017 09:37 AM, Nicholas McCollum wrote:
I'm about to update 15.08 to the latest SLURM in August and would appreciate any notes you have on the process.

I'm especially interested in maintaining the DB as well as associations. I'd also like to keep the pending job list if possible.

I've only got around 100,000 jobs in the DB so far, since January.

Thanks

Nick McCollum
Alabama Supercomputer Authority


On Jun 20, 2017 8:07 AM, Paul Edmon <ped...@cfa.harvard.edu> wrote:


    Yeah, that sounds about right.  Changes between major versions can
    take
    quite a bit of time.  In the past I've seen upgrades take 2-3
    hours for
    the DB.

    As for ways to speed it up.  Putting the DB on newer hardware if you
    haven't already helps quite a bit (depends on architecture as to how
    much gain you will get, we went from AMD Abu Dhabi to Intel Broadwell
    and saw a factor of 3-4 speed improvement). Upgrading to the latest
    version of MariaDB if you are on an old version of MySQL can get you
    about 30-40%.

    Doing all of these whittled our DB upgrade times for major
    upgrades to
    about 30 min or so.

    Beyond that I imagine some more specific DB optimization tricks
    could be
    done, but I'm not a DB admin so I won't venture to say.

    -Paul Edmon-


    On 06/20/2017 08:42 AM, Tim Fora wrote:
    > Hi,
    >
    > Upgraded from 15.08 to 17.02. It took about one hour for slurmdbd to
    > start. Logs show most of the time was spent on this step and
    other table
    > changes:
    >
    > adding column admin_comment after account in table
    >
    > Does this sound right? Any ideas to help things speed up.
    >
    > Thanks,
    > Tim
    >
    >



Reply via email to