Hi, I just discovered a small issue with MPI_Finalize(). When sanity checking a threaded tool on my NetBSD/amd64 workstation i turned on a PTHREAD_DIAGASSERT environnement variable to report any issue that may be triggered ...
And a simple MPI test program seemed to be affected : njoly@issan [tmp/mpi]> mpicc --version gcc (nb1 20160317) 5.3.0 njoly@issan [tmp/mpi]> cat sample.c #include <mpi.h> int main(int argc, char **argv) { MPI_Init(&argc, &argv); MPI_Finalize(); return 0; } njoly@issan [tmp/mpi]> mpicc sample.c njoly@issan [tmp/mpi]> PTHREAD_DIAGASSERT=e ./a.out a.out: Error detected by libpthread: Destroying locked mutex. Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_mutex.c", line 148, function "pthread_mutex_destroy". Checking the MPI code show that MPI_Finalize() calls ompi/mca/rte/orte/rte_orte_component.c:rte_orte_close() which is the culprit : static int rte_orte_close(void) { opal_mutex_lock(&mca_rte_orte_component.lock); OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs); OBJ_DESTRUCT(&mca_rte_orte_component.lock); return OMPI_SUCCESS; } According to the pthread_mutex_destroy() specifications[1], destroying a still locked mutex results in an "undefined behaviour". [...] It shall be safe to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex or a mutex that is referenced (for example, while being used in a pthread_cond_timedwait() or pthread_cond_wait()) by another thread results in undefined behavior. [...] Any expected issue in adding a opal_mutex_unlock() call before destroying the opal_mutex_t object ? Thanks. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html -- Nicolas Joly Cluster & Computing Group Biology IT Center Institut Pasteur, Paris.