Below are the results from the ibnetdiscover command This command was
run from node smd.
#
# Topology file: generated on Fri May 19 15:59:47 2017
#
# Initiated from node 0002c903000a0a32 port 0002c903000a0a34
vendid=0x8f1
devid=0x5a5a
sysimgguid=0x8f105001094d3
switchguid=0x8f105001094d2(8f1050
users-boun...@lists.open-mpi.org] On Behalf Of Gilles
Gouaillardet
Sent: Friday, May 19, 2017 12:16 AM
To: Open MPI Users
Subject: Re: [OMPI users] Many different errors with ompi version 2.1.1
Allan,
i just noted smd has a Mellanox card, while other nodes have QLogic cards.
mtl/psm works best for QL
Allan,
remember that Infiniband is not Ethernet. You dont NEED to set up IPOIB
interfaces.
Two diagnostics please for you to run:
ibnetdiscover
ibdiagnet
Let us please have the reuslts ofibnetdiscover
On 19 May 2017 at 09:25, John Hearns wrote:
> Giles, Allan,
>
> if the host 'smd'
Giles, Allan,
if the host 'smd' is acting as a cluster head node it is not a must for it
to have an Infiniband card.
So you should be able to run jobs across the other nodes, which have Qlogic
cards.
I may have something mixed up here, if so I am sorry.
If you want also to run jobs on the smd hos
Allan,
i just noted smd has a Mellanox card, while other nodes have QLogic cards.
mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best
for Mellanox,
but these are not interoperable. also, i do not think btl/openib can be
used with QLogic cards
(please someone correct me i
Allan,
- on which node is mpirun invoked ?
- are you running from a batch manager ?
- is there any firewall running on your nodes ?
- how many interfaces are part of bond0 ?
the error is likely occuring when wiring-up mpirun/orted
what if you
mpirun -np 2 --hostfile nodes --mca oob_tcp_if