Barry - Thanks again for the reply. The multiple srun method is very similar to my test #3 case from the original post.
I think my posts have made my question too confusing. I'll try to simplify the question. Given: % salloc -n 4 -c 2 -gres=gpu:1 % srun env | grep CUDA # a single srun # Currently always produces CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=0 Where we desire a single srun or mpirun command such that: CUDA_VISIBLE_DEVICES=0 CUDA_VISIBLE_DEVICES=1 CUDA_VISIBLE_DEVICES=2 CUDA_VISIBLE_DEVICES=3 *(given that 4 gpus are available on the node)* We want to use *mpi* such that each rank/task can use 1 gpu -- but the job can spread tasks/ranks among the 4 gpus. Currently we only limited to device 0 only. *In a mpi context,* we don't want to submit multiple sruns or use wrapper based method (via 'index % NUM_GPUS' you mentioned provided from the link <https://bugs.schedmd.com/show_bug.cgi?id=2626#c3>). Though maybe the wrapper hack is the only option? Can the slurm-devel forum confirm? Is there any a chance this gpu allocation behavior has changed in slurm versions ( from 14 to 16)? Thank you for your help so far!
