Look up orphan jobs and lost.pl (quick script to find orphans) in
https://groups.google.com/forum/#!forum/slurm-devel.

Battling this myself right now.

Thank you,
Doug

On Fri, Oct 27, 2017 at 9:00 PM, Bill Broadley <b...@cse.ucdavis.edu> wrote:

>
>
> I noticed crazy high numbers in my reports, things like sreport user top:
> Top 10 Users 2017-10-20T00:00:00 - 2017-10-26T23:59:59 (604800 secs)
> Use reported in Percentage of Total
> ------------------------------------------------------------
> --------------------
>   Cluster     Login     Proper Name         Account        Used   Energy
> ---------     --------- --------------- --------------- -----------
> --------
>     MyClust   JoeUser   Joe User         jgrp           3710.15%    0.00%
>
> This was during a period when JoeUser hadn't submitted a single job.
>
> We have been through some slurm upgrades, figured one of the schema tweaks
> had
> confused things.  I looked in the slurm accounting table and found the
> job_table.  I found 80,000 jobs with no end_time, that weren't actually
> running.
>  So I set the end_time = begin time for those 80,000 jobs.  It didn't help
> the
> reports.
>
> I then tried deleting all 80,000 jobs from the job_table and that didn't
> help
> either.
>
> Is there a way to rebuild the accounting data from the information in the
> job_
> table?
>
> Or any other suggestion for getting some sane numbers out?
>

Reply via email to