Given:
% salloc -n 4 -c 2 -gres=gpu:1
% srun env | grep CUDA # a single srun
# Currently always produces
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=0
man salloc:
--gres
... The specified resources will be allocated to the job on each
node. ...
Your requested a single GPU for the whole job; to allocate a separate
GPU for every task, you want --gres=gpu:4.
That being said, I don't know of a way to make srun (no experience with
mpirun) exclusively assign parts of the gres to each task.
You can do something like this (in a Bash wrapper) to exclusively assign
a GPU to a task (based on what is allocated to your job to be safe):
IFS=, cuda_devices=( $CUDA_VISIBLE_DEVICES )
[ $SLURM_LOCALID -lt ${#cuda_devices[*]} ] || exit 1
CUDA_VISIBLE_DEVICES=${cuda_devices[$SLURM_LOCALID]}
Alternatively, you may be able to use arrays?
sbatch -a 0-4 -c 2 --gres=gpu:1 <command>
Best,
Robbert