[slurm-users] How do you guys track which GPU is used by which job ?

Sylvain MARET via slurm-users Wed, 16 Oct 2024 06:11:38 -0700

Hey guys !

I'm looking to improve GPU monitoring on our cluster. I want to installthis https://github.com/NVIDIA/dcgm-exporter and saw in the README thatit can support tracking of job id :https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter

However I haven't been able to see any examples on how to do it nor doesslurm seem to expose this information by default.Does anyone do this here ? And if so do you have any examples I couldtry to follow ? If you have advise on best practices to monitor GPU I'dbe happy to hear it out !


Regards,
Sylvain Maret


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] How do you guys track which GPU is used by which job ?

Reply via email to