Put “oob=tcp” in your default MCA param file

> On Oct 18, 2017, at 9:00 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote:
> 
> Hi,
> 
> We're intermittently seeing messages (below) about failing to register memory 
> with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the vanilla IB 
> stack as shipped by centos.
> 
> We're not using any mlx4_core module tweaks at the moment. On earlier 
> machines we used to set registered memory as per the FAQ, but neither 
> log_num_mtt nor num_mtt seem to exist these days (according to 
> /sys/module/mlx4_*/parameters/*), which makes it somewhat difficult to follow 
> the FAQ.
> 
> The output of 'ulimit -l' shows as unlimited for every rank.
> 
> Does anyone have any advice, please?
> 
> Thanks,
> 
> Mark
> 
> -------------------------------------------------------------------------
> Failed to register memory region (MR):
> 
> Hostname: dc1s0b1c
> Address:  ec5000
> Length:   20480
> Error:    Cannot allocate memory
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Open MPI has detected that there are UD-capable Verbs devices on your
> system, but none of them were able to be setup properly.  This may
> indicate a problem on this system.
> 
> You job will continue, but Open MPI will ignore the "ud" oob component
> in this run.
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to