Because of the issues we are having with OpenMPI and the openib BTL (questions 
previously asked), I’ve been looking into what other transports are available.  
I was particularly interested in OFI/libfabric support but cannot find any 
information on it more recent than a reference to the usNIC BTL from 2015 (Jeff 
Squyres, Cisco).  Unfortunately, the openmpi-org website FAQ’s covering 
OpenFabrics support don’t mention anything beyond OpenMPI 1.8.  Given that 3.1 
is the current stable version, that seems odd.

That being the case, I thought I’d ask here. After laying down the 
libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up 
with an “ofi” MTL but nothing else.   I can run with OMPI_MCA_mtl=ofi and 
OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so.   
(mpi_waitall() higher up the stack).

GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.  
Backtrace:
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d]

Questions: Am I using the OFI MTL as intended?   Should there be an “ofi” BTL?  
 Does anyone use this?

Thanks,

Charlie Taylor
UF Research Computing

PS - If you could use some help updating the FAQs, I’d be willing to put in 
some time.  I’d probably learn a lot.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to