[slurm-users] Re: Print Slurm Stats on Login

Simon Andrews via slurm-users Tue, 27 Aug 2024 01:20:21 -0700

Those pieces of information are available from squeue / sacct as long as you’re 
happy to have a wrapper which does the aggregation part for you.  The commands 
I parse for our stat summaries are:


scontrol show nodes

squeue -r -O jobid,username,minmemory,numcpus,nodelist

sacct -a -S [one_month_ago] -o 
jobid,jobname,alloccpus,cputime%15,reqmem,account,submit,elapsed,state

The only thing which I can’t find an easy way to get is the total requested 
memory for a job.  You’d think this would be simple with squeue minmemory – 
except that for some jobs that value is the value for the whole job, and for 
others it’s a value per-cpu, so if you want to know the total you have to 
multiply by the number of requested CPUs.  The only place I’ve managed to find 
that setting is from

scontrol show jobid -d [jobid]

Where you can examine the “MinMemoryCPU” value – however this is really slow if 
you’re doing that for thousands of jobs.  If anyone knows how to get this to 
show up correctly in squeue/sacct that would be super helpful.

Simon.


From: Davide DelVento <[email protected]>
Sent: 21 August 2024 00:14
To: Kevin Broch <[email protected]>; Simon Andrews 
<[email protected]>
Cc: [email protected]
Subject: Re: [slurm-users] Re: Print Slurm Stats on Login



CAUTION: This email originated outside of the Organisation. Please help to keep 
us safe and do not click links or open attachments unless you recognise the 
sender and know the content is safe.


Thanks Kevin and Simon,

The full thing that you do is indeed overkill, however I was able to learn how 
to collect/parse some of the information I need.

What I am still unable to get is:

- utilization by queue (or list of node names), to track actual use of 
expensive resources such as GPUs, high memory nodes, etc
- statistics about wait-in-queue for jobs, due to unavailable resources

hopefully both in a sreport-like format by user and by overall system

I suspect this information is available in sacct, but needs some 
massaging/consolidation to become useful for what I am looking for. Perhaps 
either (or both) of your scripts already do that in some place that I did not 
find? That would be terrific, and I'd appreciate it if you can point me to its 
place.

Thanks again!

On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users 
<[email protected]<mailto:[email protected]>> wrote:
Heavyweight solution (although if you have grafana and prometheus going already 
a little less so): https://github.com/rivosinc/prometheus-slurm-exporter

On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users 
<[email protected]<mailto:[email protected]>> wrote:
Possibly a bit more elaborate than you want but I wrote a web based monitoring 
system for our cluster.  It mostly uses standard slurm commands for job 
monitoring, but I've also added storage monitoring which requires a separate 
cron job to run every night.  It was written for our cluster, but probably 
wouldn't take much work to adapt to another cluster with similar structure.

You can see the code and some screenshots at:

 https://github.com/s-andrews/capstone_monitor

..and there's a video walk through at:

https://vimeo.com/982985174<https://url6.mailanyone.net/scanner?m=1sgY3P-00000007sVN-37lD&d=4%7Cmail%2F90%2F1724195400%2F1sgY3P-00000007sVN-37lD%7Cin6e%7C57e1b682%7C10448314%7C12652688%7C66C5234BD2594AC531A50DA003030E55&o=%2Fphti%3A%2Fvts%2F.me8om9oc4715892&s=HkUa6gVu09L4VuD5nS6l_lPmQdY>

We've also got more friendly scripts for monitoring current and past jobs on 
the command line.  These are in a private repository as some of the other 
information there is more sensitive but I'm happy to share those scripts.  You 
can see the scripts being used in 
https://vimeo.com/982986202<https://url6.mailanyone.net/scanner?m=1sgY3P-00000007sVN-37lD&d=4%7Cmail%2F90%2F1724195400%2F1sgY3P-00000007sVN-37lD%7Cin6e%7C57e1b682%7C10448314%7C12652688%7C66C5234BD2594AC531A50DA003030E55&o=%2Fphti%3A%2Fvts%2F.me8om9oc2026892&s=VhbNypF9YxWyJlVfGG9twDcHcBI>

Simon.

-----Original Message-----
From: Paul Edmon via slurm-users 
<[email protected]<mailto:[email protected]>>
Sent: 09 August 2024 16:12
To: [email protected]<mailto:[email protected]>
Subject: [slurm-users] Print Slurm Stats on Login

We are working to make our users more aware of their usage. One of the ideas we 
came up with was to having some basic usage stats printed at login (usage over 
past day, fairshare, job efficiency, etc). Does anyone have any scripts or 
methods that they use to do this? Before baking my own I was curious what other 
sites do and if they would be willing to share their scripts and methodology.

-Paul Edmon-


--
slurm-users mailing list -- 
[email protected]<mailto:[email protected]> To 
unsubscribe send an email to 
[email protected]<mailto:[email protected]>

------------------------------------
This email has been scanned for spam & viruses. If you believe this email 
should have been stopped by our filters, click the following link to report it 
(https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjQxOTU2NjA7dXVpZD02NkM1MjM0QkQyNTk0QUM1MzFBNTBEQTAwMzAzMEU1NTt0b2tlbj0xMjc4YWU4NDExNDllZTZkNjQxZGQ1MWE5NTY4YmEyNzFlY2NmZDU0Ow%3D%3D<https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D>).

--
slurm-users mailing list -- 
[email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>

--
slurm-users mailing list -- 
[email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>

-- 
slurm-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[slurm-users] Re: Print Slurm Stats on Login

Reply via email to