theus going already
a little less so): https://github.com/rivosinc/prometheus-slurm-exporter
On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users
mailto:slurm-users@lists.schedmd.com>> wrote:
Possibly a bit more elaborate than you want but I wrote a web based monitoring
system for
Possibly a bit more elaborate than you want but I wrote a web based monitoring
system for our cluster. It mostly uses standard slurm commands for job
monitoring, but I've also added storage monitoring which requires a separate
cron job to run every night. It was written for our cluster, but pr
Our cluster has developed a strange intermittent behaviour where jobs are being
put into a pending state because they aren't passing the AssocGrpCpuLimit, even
though the user submitting has enough cpus for the job to run.
For example:
$ squeue -o "%.6i %.9P %.8j %.8u %.2t %.10M %.7m %.7c %.20R