Look up orphan jobs and lost.pl (quick script to find orphans) in https://groups.google.com/forum/#!forum/slurm-devel.
Battling this myself right now. Thank you, Doug On Fri, Oct 27, 2017 at 9:00 PM, Bill Broadley <b...@cse.ucdavis.edu> wrote: > > > I noticed crazy high numbers in my reports, things like sreport user top: > Top 10 Users 2017-10-20T00:00:00 - 2017-10-26T23:59:59 (604800 secs) > Use reported in Percentage of Total > ------------------------------------------------------------ > -------------------- > Cluster Login Proper Name Account Used Energy > --------- --------- --------------- --------------- ----------- > -------- > MyClust JoeUser Joe User jgrp 3710.15% 0.00% > > This was during a period when JoeUser hadn't submitted a single job. > > We have been through some slurm upgrades, figured one of the schema tweaks > had > confused things. I looked in the slurm accounting table and found the > job_table. I found 80,000 jobs with no end_time, that weren't actually > running. > So I set the end_time = begin time for those 80,000 jobs. It didn't help > the > reports. > > I then tried deleting all 80,000 jobs from the job_table and that didn't > help > either. > > Is there a way to rebuild the accounting data from the information in the > job_ > table? > > Or any other suggestion for getting some sane numbers out? >