On Fri, Oct 27, 2017 at 12:45 PM, Dave Sizer <[email protected]> wrote:
> Also, supposedly adding the "--accel-bind=g" option to srun will do this,
> though we are observing that this is broken and causes jobs to hang.
>
> Can anyone confirm this?
Not really, it doesn't seem to be hanging for us:
-- 8< -----------------------------------------------------------------------
$ srun --gres=gpu:1 --accel-bind=g --pty bash
srun: job 2682093 queued and waiting for resources
srun: job 2682093 has been allocated resources
[kilian@sh-113-01 ~]$
[kilian@sh-113-01 ~]$ nvidia-smi topo -m
GPU0 mlx5_0 CPU Affinity
GPU0 X PHB 10-10
mlx5_0 PHB X
[kilian@sh-113-01 ~]$
-- 8< -----------------------------------------------------------------------
How do you submit your job? You can try with "srun -vvv" to display
some more information about the submission process.
Cheers,
--
Kilian