On Thu, May 19, 2016 at 09:13:15AM -0700, Ralph Castain wrote: > No issue at all - I?ll check the latest versions and ensure the > problem is present in them. Out of curiosity - what version of OMPI > are you describing?
njoly@lanfeust [tmp/mpi]> mpirun --version mpirun (Open MPI) 1.10.1 I discovered it with 1.10.1, and was able to reproduce with older versions 1.6.5 and 1.8.8 i had handy. Thanks. > > On May 19, 2016, at 9:06 AM, Nicolas Joly <nj...@pasteur.fr> wrote: > > > > > > Hi, > > > > I just discovered a small issue with MPI_Finalize(). When sanity > > checking a threaded tool on my NetBSD/amd64 workstation i turned on a > > PTHREAD_DIAGASSERT environnement variable to report any issue that may > > be triggered ... > > > > And a simple MPI test program seemed to be affected : > > > > njoly@issan [tmp/mpi]> mpicc --version > > gcc (nb1 20160317) 5.3.0 > > njoly@issan [tmp/mpi]> cat sample.c > > #include <mpi.h> > > int main(int argc, char **argv) { > > MPI_Init(&argc, &argv); > > MPI_Finalize(); > > return 0; } > > njoly@issan [tmp/mpi]> mpicc sample.c > > njoly@issan [tmp/mpi]> PTHREAD_DIAGASSERT=e ./a.out > > a.out: Error detected by libpthread: Destroying locked mutex. > > Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_mutex.c", > > line 148, function "pthread_mutex_destroy". > > > > Checking the MPI code show that MPI_Finalize() calls > > ompi/mca/rte/orte/rte_orte_component.c:rte_orte_close() which is the > > culprit : > > > > static int rte_orte_close(void) > > { > > opal_mutex_lock(&mca_rte_orte_component.lock); > > OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs); > > OBJ_DESTRUCT(&mca_rte_orte_component.lock); > > > > return OMPI_SUCCESS; > > } > > > > According to the pthread_mutex_destroy() specifications[1], > > destroying a still locked mutex results in an "undefined behaviour". > > > > [...] > > It shall be safe to destroy an initialized mutex that is > > unlocked. Attempting to destroy a locked mutex or a mutex that is > > referenced (for example, while being used in a > > pthread_cond_timedwait() or pthread_cond_wait()) by another thread > > results in undefined behavior. > > [...] > > > > Any expected issue in adding a opal_mutex_unlock() call before > > destroying the opal_mutex_t object ? > > > > Thanks. > > > > [1] > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html > > > > -- > > Nicolas Joly > > > > Cluster & Computing Group > > Biology IT Center > > Institut Pasteur, Paris. > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29239.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29240.php -- Nicolas Joly Cluster & Computing Group Biology IT Center Institut Pasteur, Paris.