Re: [slurm-users] How to check the percent cpu of a job?

Christopher Samuel Wed, 21 Nov 2018 15:13:06 -0800

On 22/11/18 5:41 am, Ryan Novosielski wrote:

You can see, both of the above are examples of jobs that have
allocated CPU numbers that are very different from the ultimate CPU
load (the first one using way more than allocated, though they’re in
a cgroup so theoretically isolated from the other users on the
machine), and the second one asking for all 28 CPUs but only “using”
~8 of them.


I've just had a quick play with pestat and it reveals that Slurm
18.08.3 seems to have some odd ideas about load on nodes, for instance
one of our KNL nodes that is offline is reported with a CPUload of
2.70, but I can see nothing running on it and the load average is
around 0.1 (which is mostly top).

Conversely a skylake node that's flat out with a load average of 32
(all from compute bound processes at 100% CPU) is reported with a
CPULoad of 2.5.

The CPULoad is just taken from the output of "sinfo", and I've confirmed
myself that the numbers are off in that output.

If you’re using cgroups, it would seem to me that there must also be
a way to see the output of “top” for just a group, or at least
something similar. systemd-cgtop does more or less that, but doesn’t
seem to show exactly what you’d want here:

[...]
> ...CPU only being shown as an aggregate at the top level

If you run:

systemd-cgtop -c

it will sort by CPU usage and be more useful! :-)

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

Re: [slurm-users] How to check the percent cpu of a job?

Reply via email to