We have a couple nodes with different IB adapters in them:

font1/var/log/lspci:03:00.0 InfiniBand [0c06]: Mellanox Technologies MT25204
[InfiniHost III Lx HCA] [15b3:6274] (rev 20)
font2/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220 InfiniBand
HCA [1077:7220] (rev 02)
font3/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220 InfiniBand
HCA [1077:7220] (rev 02)

With 1.10.3 we saw the following errors with mpirun:

[font2.cora.nwra.com:13982] [[23220,1],10] selected pml cm, but peer
[[23220,1],0] on font1 selected pml ob1

which crashed MPI_Init.

We worked around this by passing "--mca pml ob1".  I notice now with openmpi
2.0.2 without that option I no longer see errors, but the mpi program will
hang shortly after startup.  Re-adding the option makes it work, so I'm
assuming the underlying problem is still the same, but openmpi appears to have
stopped alerting me to the issue.

Thoughts?

-- 
Orion Poplawski
Technical Manager                          720-772-5637
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       or...@nwra.com
Boulder, CO 80301                   http://www.nwra.com
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to