Hi Lech,
I've tried to summarize your work on the Slurm database upgrade patch in
my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older
Could you kindly check if my notes are correct and complete? Hopefully
this Wiki will also h
Hi Ole,
your summary is correct as far as I can tell and will hopefully help some users.
One thing I’d add is the remark from the 18.08 Release Notes (
https://github.com/SchedMD/slurm/blob/slurm-18.08/RELEASE_NOTES ), which adds
mysql 5.5 to the list.
They’ve mentioned that mysql 5.5 is the def
Hi Lech,
Thanks! I added the 18.08 Release Notes reference to
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#database-upgrade-from-slurm-17-02-and-older
I've already upgraded from 17.11 to 18.08 without your patch, and this
went smoothly as expected. We upgraded from 17.02 to 17.11 l
Hi Paul, thanks for your advice. Actually I already tried what you
suggested. No matter what value do I put after PurgeJobAfter I always
end up with the same error:
sacctmgr archive dump Directory=/home/joule/archives/ PurgeJobAfter=1days
sacctmgr: error: slurmdbd: Getting response to message t
Same problem here: a Job submitted with gres-flags=disable-bindings is
assigned a node, but then the job step fails because all GPUs on that
node are already in use. Log messages:
[2019-04-05T15:29:05.216] error: gres/gpu: job 92453 node node5
overallocated resources by 1, (9 > 8)
[2019-04-05
Did it just time out, or did that failure happen immediately. If
immediate you may be in a situation where you are hitting a bug. It
"should" be safe to upgrade to a later version of 15.08.*. There may be
fixes in there related to that. I would look at the changelog though
just to see if ther
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming critical.
Le 05/04/2019 16:10, Paul Edmon a écrit :
Did it just time out, or did that failure happen immediately. I
Hi Julien,
Did you optimize the MySQL database, in particular InnoDB?
I have collected some documentation in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#mysql-configuration
and I also discuss database purging.
Please note that we run Slurm 17.11 (and recently 18.08) on Cent
On 4/5/19 4:28 PM, Julien Rey wrote:
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming critical.
...
Our slurm accounting database is growing bigger and bigger (more