Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.
That being the case, I thought I’d ask here. After laying down the libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. (mpi_waitall() higher up the stack). GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0. Backtrace: /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d] /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754] /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f] /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d] Questions: Am I using the OFI MTL as intended? Should there be an “ofi” BTL? Does anyone use this? Thanks, Charlie Taylor UF Research Computing PS - If you could use some help updating the FAQs, I’d be willing to put in some time. I’d probably learn a lot. _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users