HI Greg, It’s the aging of the openib btl.
You may be able to apply the attached patch. Note the 3.1.x release stream is no longer supported. You may want to try using the 4.1.1 release, in which case you’ll want to use UCX. Howard From: users <users-boun...@lists.open-mpi.org> on behalf of "Fischer, Greg A. via users" <users@lists.open-mpi.org> Reply-To: Open MPI Users <users@lists.open-mpi.org> Date: Wednesday, October 13, 2021 at 10:06 AM To: "users@lists.open-mpi.org" <users@lists.open-mpi.org> Cc: "Fischer, Greg A." <fisch...@westinghouse.com> Subject: [EXTERNAL] [OMPI users] OpenMPI 3.1.6 openib failure: "mlx4_0 errno says Success" Hello, I have compiled OpenMPI 3.1.6 from source on SLES12-SP3, and I am seeing the following errors when I try to use the openib btl: WARNING: There was an error initializing an OpenFabrics device. Local host: bl1308 Local device: mlx4_0 -------------------------------------------------------------------------- [bl1308][[44866,1],5][../../../../../openmpi-3.1.6/opal/mca/btl/openib/btl_openib_component.c:1671:init_one_device] error obtaining device attributes for mlx4_0 errno says Success I have disabled UCX ("--without-ucx") because the UCX installation we have seems to be too out-of-date. ofed_info says "MLNX_OFED_LINUX-4.1-1.0.2.0". I've attached the detailed output of ofed_info and ompi_info. This issue seems similar to Issue #7461 (https://github.com/open-mpi/ompi/issues/7461), which I don't see a resolution for. Does anyone know what the likely explanation is? Is the version of OFED on the system badly out-of-sync with contemporary OpenMPI? Thanks, Greg ________________________________ This e-mail may contain proprietary information of the sending organization. Any unauthorized or improper disclosure, copying, distribution, or use of the contents of this e-mail and attached document(s) is prohibited. The information contained in this e-mail and attached document(s) is intended only for the personal and private use of the recipient(s) named above. If you have received this communication in error, please notify the sender immediately by email and delete the original e-mail and attached document(s).
0001-patch-ibv_exp_dev_query-function-call.patch
Description: 0001-patch-ibv_exp_dev_query-function-call.patch