HI Greg,

It’s the aging of the openib btl.

You may be able to apply the attached patch.  Note the 3.1.x release stream is 
no longer supported.

You may want to try using the 4.1.1 release, in which case you’ll want to use 
UCX.

Howard


From: users <users-boun...@lists.open-mpi.org> on behalf of "Fischer, Greg A. 
via users" <users@lists.open-mpi.org>
Reply-To: Open MPI Users <users@lists.open-mpi.org>
Date: Wednesday, October 13, 2021 at 10:06 AM
To: "users@lists.open-mpi.org" <users@lists.open-mpi.org>
Cc: "Fischer, Greg A." <fisch...@westinghouse.com>
Subject: [EXTERNAL] [OMPI users] OpenMPI 3.1.6 openib failure: "mlx4_0 errno 
says Success"


Hello,



I have compiled OpenMPI 3.1.6 from source on SLES12-SP3, and I am seeing the 
following errors when I try to use the openib btl:



WARNING: There was an error initializing an OpenFabrics device.



  Local host:   bl1308

  Local device: mlx4_0

--------------------------------------------------------------------------

[bl1308][[44866,1],5][../../../../../openmpi-3.1.6/opal/mca/btl/openib/btl_openib_component.c:1671:init_one_device]
 error obtaining device attributes for mlx4_0 errno says Success



I have disabled UCX ("--without-ucx") because the UCX installation we have 
seems to be too out-of-date. ofed_info says "MLNX_OFED_LINUX-4.1-1.0.2.0". I've 
attached the detailed output of ofed_info and ompi_info.



This issue seems similar to Issue #7461 
(https://github.com/open-mpi/ompi/issues/7461), which I don't see a resolution 
for.



Does anyone know what the likely explanation is? Is the version of OFED on the 
system badly out-of-sync with contemporary OpenMPI?



Thanks,

Greg


________________________________

This e-mail may contain proprietary information of the sending organization. 
Any unauthorized or improper disclosure, copying, distribution, or use of the 
contents of this e-mail and attached document(s) is prohibited. The information 
contained in this e-mail and attached document(s) is intended only for the 
personal and private use of the recipient(s) named above. If you have received 
this communication in error, please notify the sender immediately by email and 
delete the original e-mail and attached document(s).

Attachment: 0001-patch-ibv_exp_dev_query-function-call.patch
Description: 0001-patch-ibv_exp_dev_query-function-call.patch

Reply via email to