Hi All

seeing this after some hours of mysql downtime yesterday to correct
something else but i didn't notice these  errors until after I had
performed the Slurm update to 18.08 which went through fine in spite of
these errors

firstly when restarting the slurmdbd before I started the update

[2019-02-06T11:28:44.398] slurmdbd version 17.02.7 started
[2019-02-06T11:28:46.194] error: We have more time than is possible
(4536000+6566400+0)(11102400) > 10886400 for cluster cluster(3024) from
2019-02-06T07:00:00 - 2019-02-06T08:00:00 tres 1
[2019-02-06T11:28:46.199] error: We have more time than is possible
(4536000+6566400+0)(11102400) > 10886400 for cluster cluster(3024) from
2019-02-06T08:00:00 - 2019-02-06T09:00:00 tres 1
[2019-02-06T11:28:46.204] error: We have more time than is possible
(4536000+6566400+0)(11102400) > 10886400 for cluster cluster(3024) from
2019-02-06T09:00:00 - 2019-02-06T10:00:00 tres 1
[2019-02-06T11:28:46.210] error: We have more time than is possible
(4031100+7070700+0)(11101800) > 10886400 for cluster cluster(3024) from
2019-02-06T10:00:00 - 2019-02-06T11:00:00 tres 1

first I spotted it was here
[2019-02-06T12:23:50.276] Conversion done: success!
[2019-02-06T12:23:50.281] Accounting storage MYSQL plugin loaded
[2019-02-06T12:23:50.734] slurmdbd version 18.08.4 started
[2019-02-06T12:23:50.765] error: We have more time than is possible
(3456000+11911388+0)(15367388) > 15336000 for cluster cluster(4624) from
2019-02-06T11:00:00 - 2019-02-06T12:00:00 tres 1

and now it repeats every hour
[2019-02-06T13:00:00.186] error: We have more time than is possible
(3456000+13219200+0)(16675200) > 16646400 for cluster cluster(4624) from
2019-02-06T12:00:00 - 2019-02-06T13:00:00 tres 1
[2019-02-06T14:00:00.283] error: We have more time than is possible
(3456000+13212800+0)(16668800) > 16646400 for cluster cluster(4624) from
2019-02-06T13:00:00 - 2019-02-06T14:00:00 tres 1
[2019-02-06T15:00:00.369] error: We have more time than is possible
(3456000+13219200+0)(16675200) > 16646400 for cluster cluster(4624) from
2019-02-06T14:00:00 - 2019-02-06T15:00:00 tres 1


15:59:45 [root ~]# sacctmgr list runawayjobs
Runaway Jobs: No runaway jobs found on cluster cluster

and just because of the convenient timing

16:04:31 [root ~]# tail /var/log/slurm/slurmdbd.log -n 1
[2019-02-06T16:00:00.917] error: We have more time than is possible
(3456000+13219200+0)(16675200) > 16646400 for cluster cluster(4624) from
2019-02-06T15:00:00 - 2019-02-06T16:00:00 tres 1

There are 5 jobs that have been running throughout and are yet to complete.
Is it possible this will stop  when they have. What else could be causing
this?

Thanks

Antony

Reply via email to