Hi Davide,
Did you already check out what the slurmacct script can do for you? See
https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacct
What you're asking for seems like a pretty heavy task regarding system
resources and Slurm database requests. You don't imagine th
Thanks Kevin and Simon,
The full thing that you do is indeed overkill, however I was able to learn
how to collect/parse some of the information I need.
What I am still unable to get is:
- utilization by queue (or list of node names), to track actual use of
expensive resources such as GPUs, high
Hello Slurm community,
We are using slurm as the system to deploy training jobs on a large gpu
cluster, but encounter a strange behavior. As new comers, we wonder if this is
a known behavior. Below is some more info:
* We are running a relatively older version 22.0.5
* At relatively hig
Heavyweight solution (although if you have grafana and prometheus going
already a little less so):
https://github.com/rivosinc/prometheus-slurm-exporter
On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> Possibly a bit more elaborate than you
Possibly a bit more elaborate than you want but I wrote a web based monitoring
system for our cluster. It mostly uses standard slurm commands for job
monitoring, but I've also added storage monitoring which requires a separate
cron job to run every night. It was written for our cluster, but pr