I tried this again and it resulted in the same error: nymph3.29935PSM can't open /dev/ipath for reading and writing (err=23) nymph3.29937PSM can't open /dev/ipath for reading and writing (err=23) nymph3.29936PSM can't open /dev/ipath for reading and writing (err=23) -------------------------------------------------------------------------- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning.
Error: Failure in initializing endpoint ————————————————————————————————————— The link is up according to ibstat: CA 'qib0' CA type: InfiniPath_QLE7340 Number of ports: 1 Firmware version: Hardware version: 2 Node GUID: 0x001175000076ec76 System image GUID: 0x001175000076ec76 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 6 LMC: 0 SM lid: 7 Capability mask: 0x0761086a Port GUID: 0x001175000076ec76 Link layer: InfiniBand Any other ideas? Dean > On 27 Jun 2020, at 16:58, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > > On Jun 26, 2020, at 7:30 AM, Peter Kjellström via users > <users@lists.open-mpi.org> wrote: >> >>> The cluster hardware is QLogic infiniband with Intel CPUs. My >>> understanding is that we should be using the old PSM for networking. >>> >>> Any thoughts what might be going wrong with the build? >> >> Yes only PSM will perform well on that hardware. Make sure that PSM >> works on the system. Then make sure you got a mca_mtl_psm built. > > > I think Peter is right: you want to use > > mpirun --mca pml cm --mca mtl psm ... > > I *think* Intel InfiniPath is PSM and Intel OmniPath is PSM2, so "psm" is > what you want (not "psm2"). > > Don't try to use pml/ob1 + btl/openib, and don't try to use UCX. PSM is > Intel's native support for its Infinipath network. > > -- > Jeff Squyres > jsquy...@cisco.com >