Re: [slurm-users] How to check the percent cpu of a job?

Ole Holm Nielsen Thu, 22 Nov 2018 02:11:09 -0800

On 11/22/2018 12:10 AM, Christopher Samuel wrote:

I've just had a quick play with pestat and it reveals that Slurm
18.08.3 seems to have some odd ideas about load on nodes, for instance
one of our KNL nodes that is offline is reported with a CPUload of
2.70, but I can see nothing running on it and the load average is
around 0.1 (which is mostly top).


Conversely a skylake node that's flat out with a load average of 32
(all from compute bound processes at 100% CPU) is reported with a
CPULoad of 2.5.

The CPULoad is just taken from the output of "sinfo", and I've confirmed
myself that the numbers are off in that output.


FYI: Here's the sinfo flags which I use in pestat:

# sinfo output: NODELIST PARTITION CPU CPU_LOAD MEMORY FREE_MEM STATE GRES
sinfo -N -o "%N %P %C %O %m %e %t %Z %G"

The CPU_LOAD output should originate from the slurmd daemon running oneach compute node. Chris' observations might indicate that slurmdversion 18.08.3 doesn't show the correct CPU_LOAD numbers. Our clusterruns 17.11.12 and I don't see any such problems!


/Ole

Re: [slurm-users] How to check the percent cpu of a job?

Reply via email to