FYI… GIZMO: prov/verbs/src/ep_rdm/verbs_tagged_ep_rdm.c:443: fi_ibv_rdm_tagged_release_remote_sbuff: Assertion `0' failed.
GIZMO:10405 terminated with signal 6 at PC=2add5835c1f7 SP=7fff8071b008. Backtrace: /usr/lib64/libc.so.6(gsignal+0x37)[0x2add5835c1f7] /usr/lib64/libc.so.6(abort+0x148)[0x2add5835d8e8] /usr/lib64/libc.so.6(+0x2e266)[0x2add58355266] /usr/lib64/libc.so.6(+0x2e312)[0x2add58355312] /lib64/libfabric.so.1(+0x4df43)[0x2add5b87df43] /lib64/libfabric.so.1(+0x43af2)[0x2add5b873af2] /lib64/libfabric.so.1(+0x43ea9)[0x2add5b873ea9] > On Jun 14, 2018, at 7:48 AM, Howard Pritchard <hpprit...@gmail.com> wrote: > > Hello Charles > > You are heading in the right direction. > > First you might want to run the libfabric fi_info command to see what > capabilities you picked up from the libfabric RPMs. > > Next you may well not actually be using the OFI mtl. > > Could you run your app with > > export OMPI_MCA_mtl_base_verbose=100 > > and post the output? > > It would also help if you described the system you are using : OS > interconnect cpu type etc. > > Howard > > Charles A Taylor <chas...@ufl.edu <mailto:chas...@ufl.edu>> schrieb am Do. > 14. Juni 2018 um 06:36: > Because of the issues we are having with OpenMPI and the openib BTL > (questions previously asked), I’ve been looking into what other transports > are available. I was particularly interested in OFI/libfabric support but > cannot find any information on it more recent than a reference to the usNIC > BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org website > FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. > Given that 3.1 is the current stable version, that seems odd. > > That being the case, I thought I’d ask here. After laying down the > libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up > with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi and > OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. > (mpi_waitall() higher up the stack). > > GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0. > Backtrace: > /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d] > /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754] > /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f] > /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d] > > Questions: Am I using the OFI MTL as intended? Should there be an “ofi” > BTL? Does anyone use this? > > Thanks, > > Charlie Taylor > UF Research Computing > > PS - If you could use some help updating the FAQs, I’d be willing to put in > some time. I’d probably learn a lot. > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=pDOR2yTEZWtS3wHCqrASHkfd22e7kPU3D1XnttWrL7Y&s=UYlpo1EvM2cQqSZ5N-DoOLoE-G9_kWlffvJ2WfuESP4&e= > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwMFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=nOFQDWuhmU9qhe6be-0JeNMGn1q64kJj0nWQV-vZg7k&s=PoOVfxkE7rR9spMSFabAs8TokTpgbCIyJRGuWTf5jIk&e=>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=nOFQDWuhmU9qhe6be-0JeNMGn1q64kJj0nWQV-vZg7k&s=PoOVfxkE7rR9spMSFabAs8TokTpgbCIyJRGuWTf5jIk&e=
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users