Hi again.

Just a followup: It is not just our own codes that crash during MPI 
initialization, the mpi4py module in SciPy-bundle/2022.05-foss-2022a also 
segfaults during initialization.  We only see it on the OmniPath nodes.

Best regards

Jakob

[a022:229451] *** Process received signal ***
[a022:229451] Signal: Segmentation fault (11)
[a022:229451] Signal code:  (-6)
[a022:229451] Failing at address: 0xb230003804b
[a022:229451] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b541020c630]
[a022:229451] [ 1] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_finalize+0x48)[0x2b5417f5bf68]
[a022:229451] [ 2] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_alloc_scalable+0x228)[0x2b5417f5c338]
[a022:229451] [ 3] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(+0x3ee0)[0x2b5417f58ee0]
[a022:229451] [ 4] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libopen-pal.so.40(mca_btl_base_select+0x102)[0x2b5417613a32]
[a022:229451] [ 5] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x13)[0x2b5417f4ae33]
[a022:229451] [ 6] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libmpi.so.40(mca_bml_base_init+0x82)[0x2b5417513c42]
[a022:229451] [ 7] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libmpi.so.40(ompi_mpi_init+0x68f)[0x2b54175540cf]
[a022:229451] [ 8] 
/home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/libmpi.so.40(PMPI_Init_thread+0x99)[0x2b54174f80e9]
[a022:229451] [ 9] 
/home/modules/software/SciPy-bundle/2022.05-foss-2022a/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0xe783d)[0x2b541741783d]
[a022:229451] [10] 
/home/modules/software/Python/3.10.4-GCCcore-11.3.0/lib/libpython3.10.so.1.0(PyModule_ExecDef+0x6f)[0x2b540f8b849f]
[a022:229451] [11] 
/home/modules/software/Python/3.10.4-GCCcore-11.3.0/lib/libpython3.10.so.1.0(+0x1adff9)[0x2b540f8b8ff9]
 ...



--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark




> On 12 Dec 2022, at 14.59, Jakob Schiøtz <[email protected]> wrote:
> 
> Hi EasyBuilders,
> 
> We are having problems with using the foss/2022a toolchain on some of our 
> nodes with OmniPath.  The crash happens in 
> /home/modules/software/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_btl_ofi.so(mca_btl_ofi_context_finalize+0x48)[0x2acdbfee9f68]
> so I suspect it may have something to do with wrong MCA parameters for 
> OpenMPI.  
> 
> We do not manually set any such parameters, and I have no idea what 
> parameters to play with.  We are having a default installation of foss/2022a 
> with OpenMPI/4.1.4-GCC-11.3.0
> 
> Do any of you experts have an idea about how to debug this, or what 
> environment variables to set to get the right MCA (assuming that the MCA is 
> indeed the culprit)?
> 
> It is two different codes both giving a segmentation fault during MPI_Init() 
> or MPI_Init_thread(), and both work with foss/2020b but fail with foss/2022a.
> 
> With my best regards
> 
> Jakob
> 
> --
> Jakob Schiøtz, professor, Ph.D.
> Department of Physics
> Technical University of Denmark
> DK-2800 Kongens Lyngby, Denmark
> 
> 
> 
> 

Reply via email to