Hi
While doing some statistics on efficient CPU usage, I realized that sacct is
reporting inexplicable (at least for me) high values for TotalCPU, UserCPU and
SystemCPU. Here is a simple example (each job step is a infinite while loop):
sacct -j 64338003
--format=jobid,elapsed,ncpus,cputime,totalcpu,usercpu,systemcpu,nodelist
JobID Elapsed NCPUS CPUTime TotalCPU UserCPU SystemCPU
NodeList
------------ ---------- ---------- ---------- ---------- ---------- ----------
---------------
64338003 00:02:29 4 00:09:56 13:19:41 13:19:36
00:05.054 anode033
64338003.ba+ 00:02:31 4 00:10:04 00:09.017 00:04.003
00:05.014 anode033
64338003.ex+ 00:02:30 4 00:10:00 00:00.001 00:00:00
00:00.001 anode033
64338003.0 00:02:32 1 00:02:32 03:19:52 03:19:52
00:00.013 anode033
64338003.1 00:02:32 1 00:02:32 03:19:54 03:19:54
00:00.008 anode033
64338003.2 00:02:32 1 00:02:32 03:19:53 03:19:53
00:00.010 anode033
64338003.3 00:02:32 1 00:02:32 03:19:52 03:19:52
00:00.007 anode033
I would expect CPUTime to be the upper limit for TotalCPU.
Looking at cpuacct.stat for job step3:
cat /cgroup/cpuacct/slurm/uid_6994/job_64338003/step_3/cpuacct.stat
user 14902 (~149 = 00:02:29)
system 0
This value corresponds to the expected CPU usage of a single job step.
We are running Slurm 18.08.4 with
JobAcctGatherType=jobacct_gather/cgroup
Does anyone have an explanation for those high values reported by sacct?
Best,
Nico
Universitaet Bern
Abt. Informatikdienste
Nico Färber
High Performance Computing
Gesellschaftsstrasse 6
CH-3012 Bern
Raum 104
Tel. +41 (0)31 631 51 89