Hi,

We're intermittently seeing messages (below) about failing to register memory with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the vanilla IB stack as shipped by centos.

We're not using any mlx4_core module tweaks at the moment. On earlier machines we used to set registered memory as per the FAQ, but neither log_num_mtt nor num_mtt seem to exist these days (according to /sys/module/mlx4_*/parameters/*), which makes it somewhat difficult to follow the FAQ.

The output of 'ulimit -l' shows as unlimited for every rank.

Does anyone have any advice, please?

Thanks,

Mark

-------------------------------------------------------------------------
Failed to register memory region (MR):

Hostname: dc1s0b1c
Address:  ec5000
Length:   20480
Error:    Cannot allocate memory
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly.  This may
indicate a problem on this system.

You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to