Hi Doug,
On Wed, 2020-01-08 at 06:38:32 -0800, Douglas Jacobsen wrote: > Try running `sacctmgr show runawayjobs`; it should give you the list of > running/pending jobs (from slurmdbd's perspective) that are unknown to > slurmctld. Thanks for this suggestion, it was the perfect solution. No more "error: We have more allocated time than is possible" messages. It will give you the option to "fix" it, however note that > fixing will set the end time of the job to the start time, Better than nothing, or erratic daily sums. so the > accounting will be defective, and it will re-roll (resummarize) accounting > statistics back to that point in time. If you fix a pending job, some > versions of slurm set that re-roll time to 0 -- so it would re roll all > accounting activity. This rerolling will take some time, I suppose? (I'll wait until Monday then before rerunning the summing job.) > In some cases we've chosen to manually edit the start/end times of these > runaway jobs in the jobs_table of the database directly instead in order to > maintain appropriate accounting, however that is fraught with risk as well > (and unless well timed, it can make it challenging to re-roll the > statistics well). This paragraph, I admit, was too frightening - I have already lost accounting data on another cluster (a HTCondor pool which did its history rotations too quickly) :( > These events often trace back to a crash of the slurmctld where some > messages did not get received by the slurmdbd. It seems that the slurmdbd indeed got restarted after those jobs had been submitted (and the log file got zeroed) - although there's no indication of a slurmctld crash corresponding to that day. In any case, the situation apparently has been resolved - I've got to wait for the daily rollup to fix the old accounting data though. Thanks a lot! - Steffen -- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany ~~~ Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de ~~~