Hi,

I just discovered a small issue with MPI_Finalize(). When sanity
checking a threaded tool on my NetBSD/amd64 workstation i turned on a
PTHREAD_DIAGASSERT environnement variable to report any issue that may
be triggered ...

And a simple MPI test program seemed to be affected :

njoly@issan [tmp/mpi]> mpicc --version
gcc (nb1 20160317) 5.3.0
njoly@issan [tmp/mpi]> cat sample.c 
#include <mpi.h>
int main(int argc, char **argv) {
  MPI_Init(&argc, &argv);
  MPI_Finalize();
  return 0; }
njoly@issan [tmp/mpi]> mpicc sample.c 
njoly@issan [tmp/mpi]> PTHREAD_DIAGASSERT=e ./a.out
a.out: Error detected by libpthread: Destroying locked mutex.
Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_mutex.c", line 
148, function "pthread_mutex_destroy".

Checking the MPI code show that MPI_Finalize() calls
ompi/mca/rte/orte/rte_orte_component.c:rte_orte_close() which is the
culprit :

static int rte_orte_close(void)
{
    opal_mutex_lock(&mca_rte_orte_component.lock);
    OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs);
    OBJ_DESTRUCT(&mca_rte_orte_component.lock);

    return OMPI_SUCCESS;
}

According to the pthread_mutex_destroy() specifications[1],
destroying a still locked mutex results in an "undefined behaviour".

[...]
It shall be safe to destroy an initialized mutex that is
unlocked. Attempting to destroy a locked mutex or a mutex that is
referenced (for example, while being used in a
pthread_cond_timedwait() or pthread_cond_wait()) by another thread
results in undefined behavior.
[...]

Any expected issue in adding a opal_mutex_unlock() call before
destroying the opal_mutex_t object ?

Thanks.

[1] 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html

-- 
Nicolas Joly

Cluster & Computing Group
Biology IT Center
Institut Pasteur, Paris.

Reply via email to