Hi PETSc and OpenMPI teams, I'm running into a deadlock in PETSc 3.4.5 with OpenMPI 1.8.3:
1. PetscCommDestroy calls MPI_Attr_delete 2. MPI_Attr_delete acquires a lock 3. MPI_Attr_delete calls Petsc_DelComm_Outer (through a callback) 4. Petsc_DelComm_Outer calls MPI_Attr_get 5. MPI_Attr_get wants to also acquire the lock from step 2. Looking at the OpenMPI source code, it looks like you can't call an MPI_Attr_* function from inside the registered deletion callback. The OpenMPI source code notes that all of the functions acquire a global lock, which is where the problem is coming from - here are the comments and the lock definition, in ompi/attribute/attribute.c of OpenMPI 1.8.3: 404 /* 405 * We used to have multiple locks for semi-fine-grained locking. But 406 * the code got complex, and we had to spend time looking for subtle 407 * bugs. Craziness -- MPI attributes are *not* high performance, so 408 * just use a One Big Lock approach: there is *no* concurrent access. 409 * If you have the lock, you can do whatever you want and no data will 410 * change/disapear from underneath you. 411 */ 412 static opal_mutex_t attribute_lock; To get it to work, I had to modify the definition of this lock to use a recursive mutex: 412 static opal_mutex_t attribute_lock = { .m_lock_pthread = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP }; but this is non-portable. Is the behaviour expected from new versions OpenMPI? In which case a new approach might be needed in PETSc. Otherwise, maybe a per-attribute lock is needed in OpenMPI - but I'm not sure whether the get in the callback is on the same attribute as is being deleted. Thanks, Ben #0 0x00007fd7d5de4264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fd7d5ddf508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007fd7d5ddf3d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fd7d27d91bc in ompi_attr_get_c () from /apps/openmpi/1.8.3/lib/libmpi.so.1 #4 0x00007fd7d2803f03 in PMPI_Attr_get () from /apps/openmpi/1.8.3/lib/libmpi.so.1 #5 0x00007fd7d7716006 in Petsc_DelComm_Outer (comm=0x7fd7d2a83b30, keyval=128, attr_val=0x7fff00a20f00, extra_state=0xffffffffffffffff) at pinit.c:406 #6 0x00007fd7d27d8cad in ompi_attr_delete_impl () from /apps/openmpi/1.8.3/lib/libmpi.so.1 #7 0x00007fd7d27d8f2f in ompi_attr_delete () from /apps/openmpi/1.8.3/lib/libmpi.so.1 #8 0x00007fd7d2803dfc in PMPI_Attr_delete () from /apps/openmpi/1.8.3/lib/libmpi.so.1 #9 0x00007fd7d78bf5c5 in PetscCommDestroy (comm=0x7fd7d2a83b30) at tagm.c:256 #10 0x00007fd7d7506f58 in PetscHeaderDestroy_Private (h=0x7fd7d2a83b30) at inherit.c:114 #11 0x00007fd7d75038a0 in ISDestroy (is=0x7fd7d2a83b30) at index.c:225 #12 0x00007fd7d75029b7 in PCReset_ILU (pc=0x7fd7d2a83b30) at ilu.c:42 #13 0x00007fd7d77a9baa in PCReset (pc=0x7fd7d2a83b30) at precon.c:81 #14 0x00007fd7d77a99ae in PCDestroy (pc=0x7fd7d2a83b30) at precon.c:117 #15 0x00007fd7d7557c1a in KSPDestroy (ksp=0x7fd7d2a83b30) at itfunc.c:788 #16 0x00007fd7d91cdcca in linearSystemPETSc<double>::~linearSystemPETSc (this=0x7fd7d2a83b30) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Solver/li nearSystemPETSc.hpp:73 #17 0x00007fd7d8ddb63b in GFaceCompound::parametrize (this=0x7fd7d2a83b30, step=128, tom=10620672) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Geo/GFace Compound.cpp:1672 #18 0x00007fd7d8dda0fe in GFaceCompound::parametrize (this=0x7fd7d2a83b30) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Geo/GFace Compound.cpp:916 #19 0x00007fd7d8f98b0e in checkMeshCompound (gf=0x7fd7d2a83b30, edges=...) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/mesh GFace.cpp:2588 #20 0x00007fd7d8f95c7e in meshGenerator (gf=0xd13020, RECUR_ITER=0, repairSelfIntersecting1dMesh=true, onlyInitialMesh=false, debug=false, replacement_edges=0x0) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/mesh GFace.cpp:1075 #21 0x00007fd7d8f9a41e in meshGFace::operator() (this=0x7fd7d2a83b30, gf=0x80, print=false) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/mesh GFace.cpp:2562 #22 0x00007fd7d8f8c327 in Mesh2D (m=0x7fd7d2a83b30) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/Gene rator.cpp:407 #23 0x00007fd7d8f8ad0b in GenerateMesh (m=0x7fd7d2a83b30, ask=128) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/Gene rator.cpp:641 #24 0x00007fd7d8e43126 in GModel::mesh (this=0x7fd7d2a83b30, dimension=128) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Geo/GMode l.cpp:535 #25 0x00007fd7d8c1acd2 in GmshBatch () at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Common/Gm sh.cpp:240 #26 0x000000000040187a in main (argc=-760726736, argv=0x80) at /short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Common/Ma in.cpp:27