Hi All seeing this after some hours of mysql downtime yesterday to correct something else but i didn't notice these errors until after I had performed the Slurm update to 18.08 which went through fine in spite of these errors
firstly when restarting the slurmdbd before I started the update [2019-02-06T11:28:44.398] slurmdbd version 17.02.7 started [2019-02-06T11:28:46.194] error: We have more time than is possible (4536000+6566400+0)(11102400) > 10886400 for cluster cluster(3024) from 2019-02-06T07:00:00 - 2019-02-06T08:00:00 tres 1 [2019-02-06T11:28:46.199] error: We have more time than is possible (4536000+6566400+0)(11102400) > 10886400 for cluster cluster(3024) from 2019-02-06T08:00:00 - 2019-02-06T09:00:00 tres 1 [2019-02-06T11:28:46.204] error: We have more time than is possible (4536000+6566400+0)(11102400) > 10886400 for cluster cluster(3024) from 2019-02-06T09:00:00 - 2019-02-06T10:00:00 tres 1 [2019-02-06T11:28:46.210] error: We have more time than is possible (4031100+7070700+0)(11101800) > 10886400 for cluster cluster(3024) from 2019-02-06T10:00:00 - 2019-02-06T11:00:00 tres 1 first I spotted it was here [2019-02-06T12:23:50.276] Conversion done: success! [2019-02-06T12:23:50.281] Accounting storage MYSQL plugin loaded [2019-02-06T12:23:50.734] slurmdbd version 18.08.4 started [2019-02-06T12:23:50.765] error: We have more time than is possible (3456000+11911388+0)(15367388) > 15336000 for cluster cluster(4624) from 2019-02-06T11:00:00 - 2019-02-06T12:00:00 tres 1 and now it repeats every hour [2019-02-06T13:00:00.186] error: We have more time than is possible (3456000+13219200+0)(16675200) > 16646400 for cluster cluster(4624) from 2019-02-06T12:00:00 - 2019-02-06T13:00:00 tres 1 [2019-02-06T14:00:00.283] error: We have more time than is possible (3456000+13212800+0)(16668800) > 16646400 for cluster cluster(4624) from 2019-02-06T13:00:00 - 2019-02-06T14:00:00 tres 1 [2019-02-06T15:00:00.369] error: We have more time than is possible (3456000+13219200+0)(16675200) > 16646400 for cluster cluster(4624) from 2019-02-06T14:00:00 - 2019-02-06T15:00:00 tres 1 15:59:45 [root ~]# sacctmgr list runawayjobs Runaway Jobs: No runaway jobs found on cluster cluster and just because of the convenient timing 16:04:31 [root ~]# tail /var/log/slurm/slurmdbd.log -n 1 [2019-02-06T16:00:00.917] error: We have more time than is possible (3456000+13219200+0)(16675200) > 16646400 for cluster cluster(4624) from 2019-02-06T15:00:00 - 2019-02-06T16:00:00 tres 1 There are 5 jobs that have been running throughout and are yet to complete. Is it possible this will stop when they have. What else could be causing this? Thanks Antony