Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Allan Overstreet
Below are the results from the ibnetdiscover command This command was run from node smd. # # Topology file: generated on Fri May 19 15:59:47 2017 # # Initiated from node 0002c903000a0a32 port 0002c903000a0a34 vendid=0x8f1 devid=0x5a5a sysimgguid=0x8f105001094d3 switchguid=0x8f105001094d2(8f1050

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Elken, Tom
users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Friday, May 19, 2017 12:16 AM To: Open MPI Users Subject: Re: [OMPI users] Many different errors with ompi version 2.1.1 Allan, i just noted smd has a Mellanox card, while other nodes have QLogic cards. mtl/psm works best for QL

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Allan, remember that Infiniband is not Ethernet. You dont NEED to set up IPOIB interfaces. Two diagnostics please for you to run: ibnetdiscover ibdiagnet Let us please have the reuslts ofibnetdiscover On 19 May 2017 at 09:25, John Hearns wrote: > Giles, Allan, > > if the host 'smd'

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Giles, Allan, if the host 'smd' is acting as a cluster head node it is not a must for it to have an Infiniband card. So you should be able to run jobs across the other nodes, which have Qlogic cards. I may have something mixed up here, if so I am sorry. If you want also to run jobs on the smd hos

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread Gilles Gouaillardet
Allan, i just noted smd has a Mellanox card, while other nodes have QLogic cards. mtl/psm works best for QLogic while btl/openib (or mtl/mxm) work best for Mellanox, but these are not interoperable. also, i do not think btl/openib can be used with QLogic cards (please someone correct me i

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-18 Thread Gilles Gouaillardet
Allan, - on which node is mpirun invoked ? - are you running from a batch manager ? - is there any firewall running on your nodes ? - how many interfaces are part of bond0 ? the error is likely occuring when wiring-up mpirun/orted what if you mpirun -np 2 --hostfile nodes --mca oob_tcp_if