Hi,
As their example was limited too "allgpus", I had posted my take on this on the nvidia developer blog.
Basically all the same, but lookups the groupid from the dcgmi group json using jp instead of a file.
https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-manager-slurm/ prolog
group=$(sudo -u $SLURM_JOB_USER dcgmi group -c j$SLURM_JOB_ID) if [ $? -eq 0 ]; then groupid=$(echo $group | awk '{print $10}') sudo -u $SLURM_JOB_USER dcgmi group --group $groupid --add $SLURM_JOB_GPUS sudo -u $SLURM_JOB_USER dcgmi stats --group $groupid --enable sudo -u $SLURM_JOB_USER dcgmi stats --group $groupid --jstart $SLURM_JOBID fi
epilog
OUTPUTDIR=/tmp/ sudo -u $SLURM_JOB_USER dcgmi stats --jstop $SLURM_JOBID sudo -u $SLURM_JOB_USER dcgmi stats --verbose --job $SLURM_JOBID | sudo -u $SLURM_JOB_USER tee $OUTPUTDIR/dcgm-gpu-stats-$HOSTNAME-$SLURM_JOBID.out groupid=$(sudo -u $SLURM_JOB_USER dcgmi group -l --json | jp "body.Groups.children.[*][0][?children.\"Group Name\".value=='j$SLURM_JOBID'].children.\"Group ID\".value | [0] " | sed s/\"//g) sudo -u $SLURM_JOB_USER dcgmi group --delete $groupid
MfG -- Markus Kötter, +49 681 870832434 30159 Hannover, Lange Laube 6 Helmholtz Center for Information Security
smime.p7s
Description: S/MIME Cryptographic Signature
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com