Hi Will,
I don't, currently, although it's on my list.
However, we had a presentation on a recent Oxford HPC-SIG meeting from a
colleague, who implemented a simple job profiler that saves a lot of job
data (including efficiency) & creates plots of the efficiency of the job
run (in a nutshell). We all thought it sounded interesting :)
Code is here: https://github.com/OxfordCBRG/sps
(it's a spank plugin I believe)
Tina
On 24/07/2023 15:37, Will Furnell - STFC UKRI wrote:
Hello,
I am aware of ‘seff’, which allows you to check the efficiency of a
single job, which is good for users, but as a cluster administrator I
would like to be able to track the efficiency of all jobs from all users
on the cluster, so I am able to ‘re-educate’ users that may be running
jobs that have terrible resource usage efficiency.
What do other cluster administrators use for this task? Is there
anything you use and recommend (or don’t recommend) or have heard of
that is able to do this? Even if it’s something like a Grafana dashboard
that hooks up to the SLURM database,
Thank you,
Will.