Barry - Thanks again for the reply.  The multiple srun method is very
similar to my test #3 case from the original post.

I think my posts have made my question too confusing.

I'll try to simplify the question.

Given:

% salloc -n 4 -c 2 -gres=gpu:1

% srun  env | grep CUDA   # a single srun

# Currently always produces

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=0



Where we desire a single srun or mpirun command such that:

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=1

CUDA_VISIBLE_DEVICES=2

CUDA_VISIBLE_DEVICES=3

*(given that 4 gpus are available on the node)*

We want to use *mpi* such that each rank/task can use 1 gpu --  but the
job can spread tasks/ranks among the 4 gpus.
Currently we only limited to device 0 only.

*In a mpi context,* we don't want to submit multiple sruns or use  wrapper
based method (via  'index % NUM_GPUS' you mentioned provided from the link
<https://bugs.schedmd.com/show_bug.cgi?id=2626#c3>).   Though maybe the
wrapper hack is the only option?

Can the slurm-devel forum confirm?

Is there any a chance this gpu allocation behavior has changed in slurm
versions ( from 14 to 16)?

Thank you for your help so far!

Reply via email to