On Fri, Oct 27, 2017 at 12:45 PM, Dave Sizer <dsi...@nvidia.com> wrote: > Also, supposedly adding the "--accel-bind=g" option to srun will do this, > though we are observing that this is broken and causes jobs to hang. > > Can anyone confirm this?
Not really, it doesn't seem to be hanging for us: -- 8< ----------------------------------------------------------------------- $ srun --gres=gpu:1 --accel-bind=g --pty bash srun: job 2682093 queued and waiting for resources srun: job 2682093 has been allocated resources [kilian@sh-113-01 ~]$ [kilian@sh-113-01 ~]$ nvidia-smi topo -m GPU0 mlx5_0 CPU Affinity GPU0 X PHB 10-10 mlx5_0 PHB X [kilian@sh-113-01 ~]$ -- 8< ----------------------------------------------------------------------- How do you submit your job? You can try with "srun -vvv" to display some more information about the submission process. Cheers, -- Kilian