Hi,

As their example was limited too "allgpus", I had posted my take on this on the nvidia developer blog.

Basically all the same, but lookups the groupid from the dcgmi group json using jp instead of a file.

https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-slurm/

prolog
group=$(sudo -u $SLURM_JOB_USER dcgmi group -c j$SLURM_JOB_ID)
if [ $? -eq 0 ]; then
  groupid=$(echo $group | awk '{print $10}')
  sudo -u $SLURM_JOB_USER dcgmi group --group $groupid --add $SLURM_JOB_GPUS
  sudo -u $SLURM_JOB_USER dcgmi stats --group $groupid --enable
  sudo -u $SLURM_JOB_USER dcgmi stats --group $groupid --jstart $SLURM_JOBID
fi


epilog
OUTPUTDIR=/tmp/
sudo -u $SLURM_JOB_USER dcgmi stats --jstop $SLURM_JOBID
sudo -u $SLURM_JOB_USER dcgmi stats --verbose --job $SLURM_JOBID | sudo -u 
$SLURM_JOB_USER tee $OUTPUTDIR/dcgm-gpu-stats-$HOSTNAME-$SLURM_JOBID.out

groupid=$(sudo -u $SLURM_JOB_USER dcgmi group -l --json | jp  
"body.Groups.children.[*][0][?children.\"Group Name\".value=='j$SLURM_JOBID'].children.\"Group 
ID\".value | [0] " | sed s/\"//g)

sudo -u $SLURM_JOB_USER dcgmi group --delete $groupid


MfG
--
Markus Kötter, +49 681 870832434
30159 Hannover, Lange Laube 6
Helmholtz Center for Information Security

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to