On Fri, Oct 27, 2017 at 12:45 PM, Dave Sizer <dsi...@nvidia.com> wrote:
> Also, supposedly adding the "--accel-bind=g" option to srun will do this, 
> though we are observing that this is broken and causes jobs to hang.
>
> Can anyone confirm this?

Not really, it doesn't seem to be hanging for us:

-- 8< -----------------------------------------------------------------------
$ srun  --gres=gpu:1  --accel-bind=g --pty bash
srun: job 2682093 queued and waiting for resources
srun: job 2682093 has been allocated resources
[kilian@sh-113-01 ~]$
[kilian@sh-113-01 ~]$ nvidia-smi topo -m
       GPU0    mlx5_0  CPU Affinity
GPU0     X      PHB     10-10
mlx5_0  PHB      X
[kilian@sh-113-01 ~]$
-- 8< -----------------------------------------------------------------------

How do you submit your job? You can try with "srun -vvv" to display
some more information about the submission process.

Cheers,
-- 
Kilian

Reply via email to