Hi PETSc and OpenMPI teams,

I'm running into a deadlock in PETSc 3.4.5 with OpenMPI 1.8.3:

 1. PetscCommDestroy calls MPI_Attr_delete
 2. MPI_Attr_delete acquires a lock
 3. MPI_Attr_delete calls Petsc_DelComm_Outer (through a callback)
 4. Petsc_DelComm_Outer calls MPI_Attr_get
 5. MPI_Attr_get wants to also acquire the lock from step 2.

Looking at the OpenMPI source code, it looks like you can't call an
MPI_Attr_* function from inside the registered deletion callback. The
OpenMPI source code notes that all of the functions acquire a global lock,
which is where the problem is coming from - here are the comments and the
lock definition, in ompi/attribute/attribute.c of OpenMPI 1.8.3:

    404 /*
    405  * We used to have multiple locks for semi-fine-grained locking.
But
    406  * the code got complex, and we had to spend time looking for subtle
    407  * bugs.  Craziness -- MPI attributes are *not* high performance, so
    408  * just use a One Big Lock approach: there is *no* concurrent
access.
    409  * If you have the lock, you can do whatever you want and no data
will
    410  * change/disapear from underneath you.
    411  */
    412 static opal_mutex_t attribute_lock;

To get it to work, I had to modify the definition of this lock to use a
recursive mutex:

    412 static opal_mutex_t attribute_lock = { .m_lock_pthread =
PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP };

but this is non-portable.

Is the behaviour expected from new versions OpenMPI? In which case a new
approach might be needed in PETSc. Otherwise, maybe a per-attribute lock is
needed in OpenMPI - but I'm not sure whether the get in the callback is on
the same attribute as is being deleted.

Thanks,
Ben

#0  0x00007fd7d5de4264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fd7d5ddf508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007fd7d5ddf3d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fd7d27d91bc in ompi_attr_get_c () from
/apps/openmpi/1.8.3/lib/libmpi.so.1
#4  0x00007fd7d2803f03 in PMPI_Attr_get () from
/apps/openmpi/1.8.3/lib/libmpi.so.1
#5  0x00007fd7d7716006 in Petsc_DelComm_Outer (comm=0x7fd7d2a83b30,
keyval=128, attr_val=0x7fff00a20f00, extra_state=0xffffffffffffffff) at
pinit.c:406
#6  0x00007fd7d27d8cad in ompi_attr_delete_impl () from
/apps/openmpi/1.8.3/lib/libmpi.so.1
#7  0x00007fd7d27d8f2f in ompi_attr_delete () from
/apps/openmpi/1.8.3/lib/libmpi.so.1
#8  0x00007fd7d2803dfc in PMPI_Attr_delete () from
/apps/openmpi/1.8.3/lib/libmpi.so.1
#9  0x00007fd7d78bf5c5 in PetscCommDestroy (comm=0x7fd7d2a83b30) at
tagm.c:256
#10 0x00007fd7d7506f58 in PetscHeaderDestroy_Private (h=0x7fd7d2a83b30) at
inherit.c:114
#11 0x00007fd7d75038a0 in ISDestroy (is=0x7fd7d2a83b30) at index.c:225
#12 0x00007fd7d75029b7 in PCReset_ILU (pc=0x7fd7d2a83b30) at ilu.c:42
#13 0x00007fd7d77a9baa in PCReset (pc=0x7fd7d2a83b30) at precon.c:81
#14 0x00007fd7d77a99ae in PCDestroy (pc=0x7fd7d2a83b30) at precon.c:117
#15 0x00007fd7d7557c1a in KSPDestroy (ksp=0x7fd7d2a83b30) at itfunc.c:788
#16 0x00007fd7d91cdcca in linearSystemPETSc<double>::~linearSystemPETSc
(this=0x7fd7d2a83b30) at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Solver/li
nearSystemPETSc.hpp:73
#17 0x00007fd7d8ddb63b in GFaceCompound::parametrize (this=0x7fd7d2a83b30,
step=128, tom=10620672) at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Geo/GFace
Compound.cpp:1672
#18 0x00007fd7d8dda0fe in GFaceCompound::parametrize (this=0x7fd7d2a83b30)
at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Geo/GFace
Compound.cpp:916
#19 0x00007fd7d8f98b0e in checkMeshCompound (gf=0x7fd7d2a83b30, edges=...)
at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/mesh
GFace.cpp:2588
#20 0x00007fd7d8f95c7e in meshGenerator (gf=0xd13020, RECUR_ITER=0,
repairSelfIntersecting1dMesh=true, onlyInitialMesh=false, debug=false,
replacement_edges=0x0)
    at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/mesh
GFace.cpp:1075
#21 0x00007fd7d8f9a41e in meshGFace::operator() (this=0x7fd7d2a83b30,
gf=0x80, print=false) at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/mesh
GFace.cpp:2562
#22 0x00007fd7d8f8c327 in Mesh2D (m=0x7fd7d2a83b30) at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/Gene
rator.cpp:407
#23 0x00007fd7d8f8ad0b in GenerateMesh (m=0x7fd7d2a83b30, ask=128) at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Mesh/Gene
rator.cpp:641
#24 0x00007fd7d8e43126 in GModel::mesh (this=0x7fd7d2a83b30, dimension=128)
at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Geo/GMode
l.cpp:535
#25 0x00007fd7d8c1acd2 in GmshBatch () at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Common/Gm
sh.cpp:240
#26 0x000000000040187a in main (argc=-760726736, argv=0x80) at
/short/z00/bjm900/build/fluidity/intel15-ompi183/gmsh-2.8.5-source/Common/Ma
in.cpp:27


Reply via email to