Hi again. Just a followup: It is not just our own codes that crash during MPI initialization, the mpi4py module in SciPy-bundle/2022.05-foss-2022a also segfaults during initialization. We only see it on the OmniPath nodes.
Best regards Jakob [a022:229451] *** Process received signal *** [a022:229451] Signal: Segmentation fault (11) [a022:229451] Signal code: (-6) [a022:229451] Failing at address: 0xb230003804b [a022:229451] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b541020c630] [a022:229451] [ 1] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_finalize+0x48)[0x2b5417f5bf68] [a022:229451] [ 2] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_alloc_scalable+0x228)[0x2b5417f5c338] [a022:229451] [ 3] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(+0x3ee0)[0x2b5417f58ee0] [a022:229451] [ 4] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libopen-pal.so.40(mca_btl_base_select+0x102)[0x2b5417613a32] [a022:229451] [ 5] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x13)[0x2b5417f4ae33] [a022:229451] [ 6] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libmpi.so.40(mca_bml_base_init+0x82)[0x2b5417513c42] [a022:229451] [ 7] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libmpi.so.40(ompi_mpi_init+0x68f)[0x2b54175540cf] [a022:229451] [ 8] /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libmpi.so.40(PMPI_Init_thread+0x99)[0x2b54174f80e9] [a022:229451] [ 9] /home/modules/software/SciPy-bundle/2022.05-foss-2022a/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0xe783d)[0x2b541741783d] [a022:229451] [10] /home/modules/software/Python/3.10.4-GCCcore-11.3.0/lib/libpython3.10.so.1.0(PyModule_ExecDef+0x6f)[0x2b540f8b849f] [a022:229451] [11] /home/modules/software/Python/3.10.4-GCCcore-11.3.0/lib/libpython3.10.so.1.0(+0x1adff9)[0x2b540f8b8ff9] ... -- Jakob Schiøtz, professor, Ph.D. Department of Physics Technical University of Denmark DK-2800 Kongens Lyngby, Denmark > On 12 Dec 2022, at 14.59, Jakob Schiøtz <[email protected]> wrote: > > Hi EasyBuilders, > > We are having problems with using the foss/2022a toolchain on some of our > nodes with OmniPath. The crash happens in > /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_finalize+0x48)[0x2acdbfee9f68] > so I suspect it may have something to do with wrong MCA parameters for > OpenMPI. > > We do not manually set any such parameters, and I have no idea what > parameters to play with. We are having a default installation of foss/2022a > with OpenMPI/4.1.4-GCC-11.3.0 > > Do any of you experts have an idea about how to debug this, or what > environment variables to set to get the right MCA (assuming that the MCA is > indeed the culprit)? > > It is two different codes both giving a segmentation fault during MPI_Init() > or MPI_Init_thread(), and both work with foss/2020b but fail with foss/2022a. > > With my best regards > > Jakob > > -- > Jakob Schiøtz, professor, Ph.D. > Department of Physics > Technical University of Denmark > DK-2800 Kongens Lyngby, Denmark > > > >

