Given:

% salloc -n 4 -c 2 -gres=gpu:1

% srun env | grep CUDA   # a single srun

# Currently always produces

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=0

man salloc:
--gres
... The specified resources will be allocated to the job on each node. ...

Your requested a single GPU for the whole job; to allocate a separate GPU for every task, you want --gres=gpu:4.

That being said, I don't know of a way to make srun (no experience with mpirun) exclusively assign parts of the gres to each task.

You can do something like this (in a Bash wrapper) to exclusively assign a GPU to a task (based on what is allocated to your job to be safe):
IFS=, cuda_devices=( $CUDA_VISIBLE_DEVICES )
[ $SLURM_LOCALID -lt ${#cuda_devices[*]} ] || exit 1
CUDA_VISIBLE_DEVICES=${cuda_devices[$SLURM_LOCALID]}

Alternatively, you may be able to use arrays?
sbatch -a 0-4 -c 2 --gres=gpu:1 <command>

Best,

Robbert

Reply via email to