Hi Orion

Does the problem occur if you only use font2 and 3?  Do you have MXM
installed on the font1 node?

The 2.x series is using PMIX and it could be that is impacting the PML
sanity check.

Howard


Orion Poplawski <or...@cora.nwra.com> schrieb am Mo. 27. Feb. 2017 um 14:50:

> We have a couple nodes with different IB adapters in them:
>
> font1/var/log/lspci:03:00.0 InfiniBand [0c06]: Mellanox Technologies
> MT25204
> [InfiniHost III Lx HCA] [15b3:6274] (rev 20)
> font2/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220
> InfiniBand
> HCA [1077:7220] (rev 02)
> font3/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220
> InfiniBand
> HCA [1077:7220] (rev 02)
>
> With 1.10.3 we saw the following errors with mpirun:
>
> [font2.cora.nwra.com:13982] [[23220,1],10] selected pml cm, but peer
> [[23220,1],0] on font1 selected pml ob1
>
> which crashed MPI_Init.
>
> We worked around this by passing "--mca pml ob1".  I notice now with
> openmpi
> 2.0.2 without that option I no longer see errors, but the mpi program will
> hang shortly after startup.  Re-adding the option makes it work, so I'm
> assuming the underlying problem is still the same, but openmpi appears to
> have
> stopped alerting me to the issue.
>
> Thoughts?
>
> --
> Orion Poplawski
> Technical Manager                          720-772-5637
> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
> 3380 Mitchell Lane                       or...@nwra.com
> Boulder, CO 80301                   http://www.nwra.com
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to