[slurm-users] Re: Print Slurm Stats on Login

2024-08-20 Thread Ole Holm Nielsen via slurm-users
Hi Davide, Did you already check out what the slurmacct script can do for you? See https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacct What you're asking for seems like a pretty heavy task regarding system resources and Slurm database requests. You don't imagine th

[slurm-users] Re: Print Slurm Stats on Login

2024-08-20 Thread Davide DelVento via slurm-users
Thanks Kevin and Simon, The full thing that you do is indeed overkill, however I was able to learn how to collect/parse some of the information I need. What I am still unable to get is: - utilization by queue (or list of node names), to track actual use of expensive resources such as GPUs, high

[slurm-users] Slurm hanging behavior

2024-08-20 Thread Richard Yang via slurm-users
Hello Slurm community, We are using slurm as the system to deploy training jobs on a large gpu cluster, but encounter a strange behavior. As new comers, we wonder if this is a known behavior. Below is some more info: * We are running a relatively older version 22.0.5 * At relatively hig

[slurm-users] Re: Print Slurm Stats on Login

2024-08-20 Thread Kevin Broch via slurm-users
Heavyweight solution (although if you have grafana and prometheus going already a little less so): https://github.com/rivosinc/prometheus-slurm-exporter On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users < slurm-users@lists.schedmd.com> wrote: > Possibly a bit more elaborate than you

[slurm-users] Re: Print Slurm Stats on Login

2024-08-20 Thread Simon Andrews via slurm-users
Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but pr