> On Mar 22, 2019, at 4:22 AM, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> 
> wrote:
> 
> On 3/21/19 6:56 PM, Ryan Novosielski wrote:
>>> On Mar 21, 2019, at 12:21 PM, Loris Bennett <loris.benn...@fu-berlin.de> 
>>> wrote:
>>> 
>>>  Our last cluster only hit around 2.5 million jobs after
>>> around 6 years, so database conversion was never an issue.  For sites
>>> with a higher-throughput things may be different, but I would hope that
>>> at those places, the managers would know the importance of planned
>>> updates and testing.
>> I’d be curious about any database tuning you might have done, or anyone else 
>> here. SchedMD’s guidance is minimal.
>> I’ve ever been impressed with the performance on ours, and I’ve also seen 
>> other sites reporting >24 hour database conversion times.
> 
> Database tuning is actually documented by SchedMD, but you have to find the 
> appropriate pages first ;-)

Yeah, I’ve seen it, but there’s very little information provided (similar to 
what you’ve got listed). The major difference between theirs is the further 
mention of “you might want to increase innodb_buffer_pool_size quite a bit more 
than 1024MB.” In my conversations with SchedMD I more or less asked, “is that 
it? what if it’s still slow, does that mean look somewhere else or keep 
tweaking.” There is also other advice from SchedMD bugs (the one you mention on 
your site included), but many of them are for dramatically different versions 
of MySQL or SlurmDBD and it’s not always easy to tell what still applies. It 
does depend also on the type of access, the size of the DB, etc., but I don’t 
have any other size DB than the size I have; presumably the community knows how 
much is required for whatever kind, or how many years of X amount of job can be 
kept before you start to have problems with most tuning settings. I have taken 
some advice from mysqltuner.pl in some cases too, though I’m using basically 
the SchedMD recommendations right now (that thread_cache_size one was mine — 
can’t recall where I found it, but it seemed like a good idea for our workload):

[root@squid ~]# cat /etc/my.cnf.d/slurmdbd.cnf 
[mysqld]
innodb_buffer_pool_size=1G
thread_cache_size=4
innodb_log_file_size = 64M
innodb_lock_wait_timeout = 900

> I have collected Slurm database information in my Wiki page 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_database.  You may want to focus on 
> these sections:
> 
> * MySQL configuration (Innodb configuration)
> 
> * Setting database purge parameters (prune unwanted old database entries)
> 
> * Backup and restore of database (hopefully everyone does this already)
> 
> * Upgrade of MySQL/MariaDB (MySQL versions)
> 
> * Migrate the slurmdbd service to another server (I decided to do that 
> recently)
> 
> I hope this sheds some light on what needs to be considered.

Thanks, it’s helpful to have more information, particularly on purging and the 
migration process (which doesn’t seem complicated, but it’s nice to simply rip 
off the steps as opposed to having to write them :-D).

The tug-of-war on our system comes from SlurmDBD often needing quite a bit of 
memory itself for certain operations, and it sits on the MySQL server. I 
sometimes wonder whether it might not be better to colocate SlurmDBD with 
slurmctld, separating them both from the MySQL server.

PS: mainly for Prentice, Ole’s site has the thread from this list that 
mentioned the very large DB upgrade time:
https://lists.schedmd.com/pipermail/slurm-users/2018-February/000612.html — we 
tested the DB upgrade first independently because of that risk.

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

Reply via email to