Just to follow up for the email web archives: this issue was followed up in https://github.com/open-mpi/ompi/issues/10841.
-- Jeff Squyres jsquy...@cisco.com ________________________________ From: users <users-boun...@lists.open-mpi.org> on behalf of Rob Kudyba via users <users@lists.open-mpi.org> Sent: Thursday, September 22, 2022 2:15 PM To: users@lists.open-mpi.org <users@lists.open-mpi.org> Cc: Rob Kudyba <rk3...@columbia.edu> Subject: [OMPI users] --mca parameter explainer; mpirun WARNING: There was an error initializing an OpenFabrics device We're using OpenMPI 4.1.1, CUDA aware on RHEL 8 cluster that we load as a module with Infiniband controller Mellanox Technologies MT28908 Family ConnectX-6, we see this warning runnig mpirun without any MCA options/parameters: WARNING: There was an error initializing an OpenFabrics device. Local host: xxxx Local device: mlx5_0 --------------------------------------------- I did add 0x02c9 to our mca-btl-openib-device-params.ini file for the Mellanox ConnectX6 stanza as we were getting the following warning that no longer appears: WARNING: No preset parameters were found for the device that Open MPI detected: Local host: xxxx Device name: mlx5_0 Device vendor ID: 0x02c9 Device vendor part ID: 4123 Which I found is referenced in these comments<https://accserv.classe.cornell.edu/svn/packages/openmpi/opal/mca/btl/openib/mca-btl-openib-device-params.ini>: # Note: Several vendors resell Mellanox hardware and put their own firmware # on the cards, therefore overriding the default Mellanox vendor ID. # # Mellanox 0x02c9 Running ompi_info --param btl all we have: MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.1.1) MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.1.1) MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.1.1) MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.1.1) MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.1.1) So I am trying to wrap my head around the various warnings, and what these various options/parameters available to use can improve performance and/or when to use them. I've gone through the OpenMPI run-time tuning documentation<https://www.open-mpi.org/faq/?category=tuning>, and I've used this STREAMS benchmark<https://anilmaurya.wordpress.com/2016/10/12/stream-benchmarks/>, https://anilmaurya.wordpress.com/2016/10/12/stream-benchmarks/ as well as these OSU Micro-Benchmarks at https://ulhpc-tutorials.readthedocs.io/en/latest/parallel/mpi/OSU_MicroBenchmarks/ With version 4.1.1, if I use --mca btl 'openib' I get seg faults which I believe is expected as it's deprecated<https://docs.open-mpi.org/en/v5.0.x/release-notes/networks.html>. I've tried --mca btl '^openib', --mca btl 'tcp' (or --mca btl 'tcp,self' using the OSU BMs) and the benchmark results are very similar even when I use multiple CPUs, threads and/or nodes. They also run without the warning messages. If I don't use a --mca option, I get the WARNING: message. Does anyone know of a tried and true way to run these benchmarks so know if these MCA parameters make a difference or am I just not understanding how to use these? Perhaps running these benchmarks on a very active cluster with shared CPUs/nodes will affect the results? I can share any desired results if that helps the discussion. Thanks!