Janne Blomqvist writes: > On 14/11/2019 20.41, Prentice Bisbal wrote: >> Is there any way to see how much a job used the GPU(s) on a cluster >> using sacct or any other slurm command? >> > > We have created > https://github.com/AaltoScienceIT/ansible-role-sacct_gpu/ as a quick > hack to put GPU utilization stats into the comment field at the end of > the job. > > The above is an ansible role, but if you're not using ansible you can > just pull the scripts from the "files" subdirectory.
I do something similar, but it's optional (on a per-job basis) and updates regularly. In the job submission script, a user may add ]] source /usr/share/gpu.sbatch which contains the following: ]] ( ]] while true ; do ]] util=$(nvidia-smi | grep Default | \ ]] cut -d'|' -f4 | grep -o -P '[0-9]+%' | \ ]] tr '\n' ' ') ]] scontrol update job=$SLURM_JOB_ID comment="GPU: $util" ]] sleep 15 ]] done ]] ) & and does shows each GPU's utilisation in the comment field as it is running. Quite handy. I haven't bothered figuring out how to make this for all users, and to be honest I think some users would rather not let everyone know due to embarrassment :-) Of course, it's not particularly efficient and assumes that the compute mode is set to Default, but it was a quick hack. Cheers, Aaron