Hi all,

Happy new year everyone!

I've been looking for a simple tool that reports how much resources are actually consumed by a job to help my colleagues and I adjust job requirements. I could not find such a tool, or the ones mentioned on this ML were not easy to install and use, so I have written a new one: https://github.com/CEA-LIST/sprofile

It's a simple python script which parses cgroup and nvml data from the nvidia driver. It reports duration, cpu load, peak RAM, GPU load and peak GPU memory like so:

|-- sprofile report (node03) -- Time: 0:00:03 / 1:00:00 CPU load: 2.0 / 4.0 RAM peak mem: 7G / 8G GPU load: 0.2 / 2.0 GPU peak mem: 7G / 40G|

The requirements are to use the slurm cgroup plugin and to enable accounting on the GPU (nvidia-smi --accounting-mode=1).

I hope you find this useful and let me know I you find bugs or want to contribute.

Regards,
Nicolas Granger

Reply via email to