In the message dated: Wed, 14 Aug 2019 10:21:12 -0400, The pithy ruminations from Dj Merrill on [[gridengine users] Multi-GPU setup] were: => To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had => single Nvidia GPU cards per compute node. We are contemplating the => purchase of a single compute node that has multiple GPU cards in it, and => want to ensure that running jobs only have access to the GPU resources => they ask for, and don't take over all of the GPU cards in the system.
That's an issue. => => We define gpu as a resource: => qconf -sc: => #name shortcut type relop requestable consumable => default urgency => gpu gpu INT <= YES YES 0 => 0 => => We define GPU persistence mode and exclusive process on each node: => nvidia-smi -pm 1 => nvidia-smi -c 3 Good. => => We set the number of GPUs in the host definition: => qconf -me (hostname) => => complex_values gpu=1 for our existing nodes, and this setup has been => working fine for us. Good. => => With the new system, we would set: => complex_values gpu=4 Yes. => => => If a job is submitted asking for one GPU, will it be limited to only => having access to a single GPU card on the system, or can it detect the => other cards and take up all four (and if so how do we prevent that)? There are two issues you'll need to deal with: 1. Preventing a job from using more than the requested number of GPUs I don't have a great answer for that. As you see, SGE is good at keeping track of the number of instances of a resource (the count), but not which physical GPU is assigned to a job. For a cgroups-like solution, see: http://gridengine.org/pipermail/users/2014-November/008128.html http://gridengine.org/pipermail/users/2017-October/009952.html http://gridengine.org/pipermail/users/2017-February/009581.html I don't have experience with the described method, but the trick (using a job prolog to chgrp the /dev/nvidia${GPUNUM} device) is on my list of things-to-do. 2. Ensuring that a job tries to use a free GPU, not just _any_ GPU Since SGE doesn't explicitely tell the job which GPU to use, we've found that a lot of software blindly tries to use GPU #0, apparently assuming that the software is running on a single-user/single-GPU system (python, I'm looking at you). Our solution has been to "suggest" that users run a command in their submit script to report to the number (GPU ID) of the next free GPU. This has eliminated most instances of this issue, but there are still some race conditions. ################################################################# #! /bin/bash # Script to return GPU Id of idle GPUs, if any # # Used in a submit script, in the form # # CUDA_VISIBLE_DEVICES=`get_CUDA_VISIBLE_DEVICES` || exit # export CUDA_VISIBLE_DEVICES # myGPUjob # Some software takes the specification of the GPU device on the command line. In that case, the command line might be something like: # # myGPUjob options -dev cuda${CUDA_VISIBLE_DEVICES} # # The command: # nvidia-smi pmon # returns output in the form: ################# # # gpu pid type sm mem enc dec command # # Idx # C/G % % % % name # 0 - - - - - - - ################# # Note the absence (-) of a PID to indicate an idle GPU which nvidia-smi 1> /dev/null 2>&1 if [ $? != 0 ] ; then # no nvidia-smi found! echo "-1" echo "No 'nvidia-smi' utility found on node `hostname -s` at `date`." 1>&2 if [ "X$JOB_ID" != "X" ] ; then # running as a batch job, this shouldn't happen ( printf "SGE job ${JOB_ID}: No 'nvidia-smi' utility found on node `hostname -s` at `date`.\n") 1>&2 | Mail -s "unexpected: no nvidia-smi utility on `hostname -s`" root fi exit 1 fi numGPUs=`nvidia-smi pmon -c 1 | wc -l` ; numGPUs=$((numGPUs -2)) # subtract the headers free=`nvidia-smi pmon -c 1 | awk '{if ( $2 == "-" ) {print $1 ; exit}}'` if [[ "X$free" != "X" && $numGPUs -gt 1 ]] ; then # we may have a race condition, where 2 (or more) GPU jobs are probing nvidia-smi at once, and each is reporting that there is a free GPU # are available. Sleep a random amount of time and check again....this is not guanteed to avoid the conflict, but it # will help... sleep $((RANDOM % 11)) free=`nvidia-smi pmon -c 1 | awk '{if ( $2 == "-" ) {print $1 ; exit}}'` fi if [ "X$free" = "X" ] ; then echo "-1" echo "SGE job ${JOB_ID} (${JOB_NAME}) failed: no free GPU on node `hostname -s` at `date`." 1>&2 ( printf "SGE job ${JOB_ID}, job name ${JOB_NAME} from $USER\nNo free GPU on node `hostname -s` at `date`.\n\nGPU status:\n==================================\n" ; nvidia=smi ; printf "============================\n\nSGE status on this node:\n=======================================\n" ; qstat -u \* -s rs -l hostname=`hostname` ) 2>&1 | Mail -s "unexpected: no free GPUs on `hostname -s`" root exit 1 fi echo $free exit 0 ################################################################# Mark => => Is there something like "cgroups" for gpus? => => Thanks, => => -Dj => => => _______________________________________________ => users mailing list => users@gridengine.org => https://gridengine.org/mailman/listinfo/users => -- Mark Bergman Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com I want a newsgroup with a infinite S/N ratio! Now taking CFV on: rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters 15+ So Far--Want to join? Check out: http://www.panix.com/~bergman _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users