On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote: > > opal_mutex_lock(): Resource deadlock avoided > #0 0x0012e416 in __kernel_vsyscall () > #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #2 0x01038e42 in abort () at abort.c:92 > #3 0x00d9da68 in ompi_attr_free_keyval (type=COMM_ATTR, key=0xbffda0e4, > predefined=0 '\000') at attribute/attribute.c:656 > #4 0x00dd8aa2 in PMPI_Keyval_free (keyval=0xbffda0e4) at pkeyval_free.c:52 > #5 0x01bf3e6a in ADIOI_End_call (comm=0xf1c0c0, keyval=10, > attribute_val=0x0, extra_state=0x0) at ad_end.c:82 > #6 0x00da01bb in ompi_attr_delete. (type=UNUSED_ATTR, object=0x6, > attr_hash=0x2c64, key=14285602, predefined=232 '\350', need_lock=128 '\200') > at attribute/attribute.c:726 > #7 0x00d9fb22 in ompi_attr_delete_all (type=COMM_ATTR, object=0xf1c0c0, > attr_hash=0x8d0fee8) at attribute/attribute.c:1043 > #8 0x00dbda65 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:133 > #9 0x00dd12c2 in PMPI_Finalize () at pfinalize.c:46 > #10 0x00d6b515 in mpi_finalize_f (ierr=0xbffda2b8) at pfinalize_f.c:62
I guess I need some OpenMPI eyeballs on this... ROMIO hooks into the attribute keyval deletion mechanism to clean up the internal data structures it has allocated. I suppose since this is MPI_Finalize, we could just leave those internal data structures alone and let the OS deal with it. What I see happening here is the OpenMPI finalize routine is deleting attributes. one of those attributes is ROMIO's, which in turn tries to free keyvals. Is the deadlock that noting "under" ompi_attr_delete can itself call ompi_* routines? (as ROMIO triggers a call to ompi_attr_free_keyval) ? Here's where ROMIO sets up the keyval and the delete handler: https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/src/mpi/romio/mpi-io/mpir-mpioinit.c#L39 that routine gets called upon any "MPI-IO entry point" (open, delete, register-datarep). The keyvals help ensure that ROMIO's internal structures get initialized exactly once, and the delete hooks help us be good citizens and clean up on exit. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA