Hi,
We're intermittently seeing messages (below) about failing to register
memory with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the
vanilla IB stack as shipped by centos.
We're not using any mlx4_core module tweaks at the moment. On earlier
machines we used to set registered memory as per the FAQ, but neither
log_num_mtt nor num_mtt seem to exist these days (according to
/sys/module/mlx4_*/parameters/*), which makes it somewhat difficult to
follow the FAQ.
The output of 'ulimit -l' shows as unlimited for every rank.
Does anyone have any advice, please?
Thanks,
Mark
-------------------------------------------------------------------------
Failed to register memory region (MR):
Hostname: dc1s0b1c
Address: ec5000
Length: 20480
Error: Cannot allocate memory
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users