If you disable it with -mtl ^openib the warning goes away.
And the performance of openib goes away right along with it.
Prentice
On 3/13/21 5:43 PM, Heinz, Michael William via users wrote:
I’ve begun getting this annoyingly generic warning, too. It appears to be
coming from the openib provider. If you disable it with -mtl ^openib the
warning goes away.
Sent from my iPad
On Mar 13, 2021, at 3:28 PM, Bob Beattie via users <users@lists.open-mpi.org>
wrote:
Hi everyone,
To be honest, as an MPI / IB noob, I don't know if this falls under OpenMPI or
Mellanox....
Am running a small cluster of HP DL380 G6/G7 machines.
Each runs Ubuntu server 20.04 and has a Mellanox ConnectX-3 card, connected by
an IS dumb switch.
When I begin my MPI program (snappyHexMesh for OpenFOAM) I get an error
reported.
The error doesn't stop my programs or appear to cause any problems, so this
request for help is more about delving into the why.
OMPI is compiled from source using v4.0.3; which is the default version for
Ubuntu 20.04
This compiles and works. I did this because I wanted to understand the
compilation process whilst using a known working OMPI version.
The Infiniband part is the Mellanox MLNXOFED installer v4.9-0.1.7.0 and I
install that with --dkms --without-fw-update --hpc --with-nfsrdma
The actual error reported is:
Warning: There was an error initialising an OpenFabrics device.
Local host: of1
Local device: mlx4_0
Then shortly after:
[of1:1015399] 19 more processes have sent help message help-mpi-btl-openib.txt
/ error in device init
[of1:1015399] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help
/ error messages
Adding this MCA parameter to the mpirun line simply gives me 20 or so copies of
the first warning.
Any ideas anyone ?
Cheers,
Bob.