Hi Ole,
Thank you for your advice.
As I said in my previous messages, this is how I set the my.cnf:
innodb_buffer_pool_size = 32G
innodb_log_file_size = 64M
innodb_lock_wait_timeout = 3600
I have read the thread "Extreme long db upgrade 16.05.6 -> 17.11.3".
However I have no way to purge the database with the slurmdbd tool. And
I still don't know if I can do it by hand with the mysql command line.
On top of that, the mysql daemon isn't starting anymore since this
afternoon, and I get this kind of errors:
190408 16:22:08 mysqld_safe Starting mysqld daemon with databases from
/var/lib/mysql
190408 16:22:08 [Warning] Using unique option prefix key_buffer instead
of key_buffer_size is deprecated and will be removed in a future
release. Please use the full name instead.
190408 16:22:08 [Warning] Using unique option prefix myisam-recover
instead of myisam-recover-options is deprecated and will be removed in a
future release. Please use the full name instead.
190408 16:22:08 [Note] Plugin 'FEDERATED' is disabled.
190408 16:22:08 InnoDB: The InnoDB memory heap is disabled
190408 16:22:08 InnoDB: Mutexes and rw_locks use GCC atomic builtins
190408 16:22:08 InnoDB: Compressed tables use zlib 1.2.8
190408 16:22:08 InnoDB: Using Linux native AIO
190408 16:22:08 InnoDB: Initializing buffer pool, size = 32.0G
190408 16:22:09 InnoDB: Completed initialization of buffer pool
190408 16:22:10 InnoDB: highest supported file format is Barracuda.
InnoDB: No valid checkpoint found.
InnoDB: If this error appears when you are creating an InnoDB database,
InnoDB: the problem may be that during an earlier attempt you managed
InnoDB: to create the InnoDB data files, but log file creation failed.
InnoDB: If that is the case, please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/error-creating-innodb.html
190408 16:22:10 [ERROR] Plugin 'InnoDB' init function returned error.
190408 16:22:10 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE
failed.
190408 16:22:10 [ERROR] Unknown/unsupported storage engine: InnoDB
190408 16:22:10 [ERROR] Aborting
190408 16:22:10 [Note] /usr/sbin/mysqld: Shutdown complete
190408 16:22:10 mysqld_safe mysqld from pid file
/var/run/mysqld/mysqld.pid ended
Should I just drop the database altogether and upgrade slurm ? What
would be the procedure for re-creating a database from scratch ?
J.
Le 05/04/2019 16:43, Ole Holm Nielsen a écrit :
Hi Julien,
Did you optimize the MySQL database, in particular InnoDB?
I have collected some documentation in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#mysql-configuration
and I also discuss database purging.
Please note that we run Slurm 17.11 (and recently 18.08) on CentOS 7.6
systems which come with a MariaDB 5.5 database. We have no problems
with the database or the daily purging operations (see
https://wiki.fysik.dtu.dk/niflheim/Slurm_database#setting-database-purge-parameters).
If you are upgrading your Slurm version (or planning to do it), I also
recommend you to read the thread [slurm-users] "Extreme long db
upgrade 16.05.6 -> 17.11.3" from the last few days.
Best regards,
Ole
On 4/5/19 4:28 PM, Julien Rey wrote:
The failure occurs after a few minutes (~10).
And we are running out of space on the slurm controller. The mysql
daemon is at 100% CPU usage all the time. This issue is becoming
critical.
Le 05/04/2019 16:10, Paul Edmon a écrit :
Did it just time out, or did that failure happen immediately. If
immediate you may be in a situation where you are hitting a bug. It
"should" be safe to upgrade to a later version of 15.08.*. There may
be fixes in there related to that. I would look at the changelog
though just to see if there is any database work that was done.
-Paul Edmon-
On 4/5/19 9:05 AM, Julien Rey wrote:
Hi Paul, thanks for your advice. Actually I already tried what you
suggested. No matter what value do I put after PurgeJobAfter I
always end up with the same error:
sacctmgr archive dump Directory=/home/joule/archives/
PurgeJobAfter=1days
sacctmgr: error: slurmdbd: Getting response to message type 1459
sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
Problem dumping archive: Unspecified error
sacctmgr archive dump Directory=/home/joule/archives/
PurgeJobAfter=48months
sacctmgr: error: slurmdbd: Getting response to message type 1459
sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
Problem dumping archive: Unspecified error
Has anyone tried to truncate tables by hand directly in the mysql
command line ?
Le 04/04/2019 16:13, Paul Edmon a écrit :
We ran into this problem in the past. I know that fixes were put
in to deal with large purges as a result of our problems but I
don't recall what version they ended up in, likely newer than
15.08.0.
A solution that can work is to walk up the time so that instead of
one large purge you do several smaller purges. That at least
worked for us in the past.
-Paul Edmon-
On 4/4/19 9:38 AM, Julien Rey wrote:
Hello,
Our slurm accounting database is growing bigger and bigger (more
than 100Gb) and is never being purged. We are running slurm
15.08.0-0pre1. I would like to upgrade to a more recent version
of the slurmdbd, but my fear is that it may break everything
during the update of the database.
Here is our slurmdbd.conf :
AuthType=auth/munge
AuthInfo=/var/run/munge/munge.socket.2
DbdHost=localhost
DebugLevel=6
StorageHost=localhost
StorageLoc=slurm_acct_db
StoragePass=shazaam
StorageType=accounting_storage/mysql
StorageUser=slurm
LogFile=/var/log/slurm-llnl/slurmdbd.log
PidFile=/var/run/slurm-llnl/slurmdbd.pid
SlurmUser=slurm
ArchiveDir=/home/joule/archives
PurgeEventAfter=18
PurgeJobAfter=18
PurgeResvAfter=1
PurgeStepAfter=1
PurgeSuspendAfter=1
I tried to purge it manually using this command but the slurmdbd
daemon ends up crashing and it doesn't remove anything:
sacctmgr archive dump Directory=/home/joule/archives/
PurgeJobAfter=365days
sacctmgr: error: slurmdbd: Getting response to message type 1459
sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
Problem dumping archive: Unspecified error
Sometimes I have to restart the mysql daemon (we are running
mysql 5.5.39-1). The /var/log/slurm-llnl/slurmdbd.log shows
nothings. The mysql logs are empty.
I tried to increase these values in my.cnf but so far no success :
innodb_buffer_pool_size = 32G
innodb_lock_wait_timeout = 3600
Is there any way to solve this issue ? Otherwise, what would be
the procedure for deleting the database records altogether and
starting on a fresh new one ?
--
Julien REY
Plate-forme RPBS
Modélisation Computationnelle des Interactions Protéines-Ligand (CMPLI)
Université Paris Diderot - Paris VII
tel : 01 57 27 83 95