Here’s the 1.10 version of the PR: https://github.com/open-mpi/ompi-release/pull/1172 <https://github.com/open-mpi/ompi-release/pull/1172>
> On May 19, 2016, at 9:18 AM, Nicolas Joly <nj...@pasteur.fr> wrote: > > On Thu, May 19, 2016 at 09:13:15AM -0700, Ralph Castain wrote: >> No issue at all - I?ll check the latest versions and ensure the >> problem is present in them. Out of curiosity - what version of OMPI >> are you describing? > > njoly@lanfeust [tmp/mpi]> mpirun --version > mpirun (Open MPI) 1.10.1 > > I discovered it with 1.10.1, and was able to reproduce with older > versions 1.6.5 and 1.8.8 i had handy. > > Thanks. > >>> On May 19, 2016, at 9:06 AM, Nicolas Joly <nj...@pasteur.fr> wrote: >>> >>> >>> Hi, >>> >>> I just discovered a small issue with MPI_Finalize(). When sanity >>> checking a threaded tool on my NetBSD/amd64 workstation i turned on a >>> PTHREAD_DIAGASSERT environnement variable to report any issue that may >>> be triggered ... >>> >>> And a simple MPI test program seemed to be affected : >>> >>> njoly@issan [tmp/mpi]> mpicc --version >>> gcc (nb1 20160317) 5.3.0 >>> njoly@issan [tmp/mpi]> cat sample.c >>> #include <mpi.h> >>> int main(int argc, char **argv) { >>> MPI_Init(&argc, &argv); >>> MPI_Finalize(); >>> return 0; } >>> njoly@issan [tmp/mpi]> mpicc sample.c >>> njoly@issan [tmp/mpi]> PTHREAD_DIAGASSERT=e ./a.out >>> a.out: Error detected by libpthread: Destroying locked mutex. >>> Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_mutex.c", >>> line 148, function "pthread_mutex_destroy". >>> >>> Checking the MPI code show that MPI_Finalize() calls >>> ompi/mca/rte/orte/rte_orte_component.c:rte_orte_close() which is the >>> culprit : >>> >>> static int rte_orte_close(void) >>> { >>> opal_mutex_lock(&mca_rte_orte_component.lock); >>> OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs); >>> OBJ_DESTRUCT(&mca_rte_orte_component.lock); >>> >>> return OMPI_SUCCESS; >>> } >>> >>> According to the pthread_mutex_destroy() specifications[1], >>> destroying a still locked mutex results in an "undefined behaviour". >>> >>> [...] >>> It shall be safe to destroy an initialized mutex that is >>> unlocked. Attempting to destroy a locked mutex or a mutex that is >>> referenced (for example, while being used in a >>> pthread_cond_timedwait() or pthread_cond_wait()) by another thread >>> results in undefined behavior. >>> [...] >>> >>> Any expected issue in adding a opal_mutex_unlock() call before >>> destroying the opal_mutex_t object ? >>> >>> Thanks. >>> >>> [1] >>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html >>> >>> -- >>> Nicolas Joly >>> >>> Cluster & Computing Group >>> Biology IT Center >>> Institut Pasteur, Paris. >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/05/29239.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> <https://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29240.php >> <http://www.open-mpi.org/community/lists/users/2016/05/29240.php> > -- > Nicolas Joly > > Cluster & Computing Group > Biology IT Center > Institut Pasteur, Paris. > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > <https://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29241.php > <http://www.open-mpi.org/community/lists/users/2016/05/29241.php>