On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote:
> 
> opal_mutex_lock(): Resource deadlock avoided
> #0  0x0012e416 in __kernel_vsyscall ()
> #1  0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #2  0x01038e42 in abort () at abort.c:92
> #3  0x00d9da68 in ompi_attr_free_keyval (type=COMM_ATTR, key=0xbffda0e4, 
> predefined=0 '\000') at attribute/attribute.c:656
> #4  0x00dd8aa2 in PMPI_Keyval_free (keyval=0xbffda0e4) at pkeyval_free.c:52
> #5  0x01bf3e6a in ADIOI_End_call (comm=0xf1c0c0, keyval=10, 
> attribute_val=0x0, extra_state=0x0) at ad_end.c:82
> #6  0x00da01bb in ompi_attr_delete. (type=UNUSED_ATTR, object=0x6, 
> attr_hash=0x2c64, key=14285602, predefined=232 '\350', need_lock=128 '\200') 
> at attribute/attribute.c:726
> #7  0x00d9fb22 in ompi_attr_delete_all (type=COMM_ATTR, object=0xf1c0c0, 
> attr_hash=0x8d0fee8) at attribute/attribute.c:1043
> #8  0x00dbda65 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:133
> #9  0x00dd12c2 in PMPI_Finalize () at pfinalize.c:46
> #10 0x00d6b515 in mpi_finalize_f (ierr=0xbffda2b8) at pfinalize_f.c:62

I guess I need some OpenMPI eyeballs on this...

ROMIO hooks into the attribute keyval deletion mechanism to clean up
the internal data structures it has allocated.  I suppose since this
is MPI_Finalize, we could just leave those internal data structures
alone and let the OS deal with it. 

What I see happening here is the OpenMPI finalize routine is deleting
attributes.   one of those attributes is ROMIO's, which in turn tries
to free keyvals.  Is the deadlock that noting "under" ompi_attr_delete
can itself call ompi_* routines? (as ROMIO triggers a call to
ompi_attr_free_keyval) ?

Here's where ROMIO sets up the keyval and the delete handler:
https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/src/mpi/romio/mpi-io/mpir-mpioinit.c#L39

that routine gets called upon any "MPI-IO entry point" (open, delete,
register-datarep).  The keyvals help ensure that ROMIO's internal
structures get initialized exactly once, and the delete hooks help us
be good citizens and clean up on exit. 

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Reply via email to