On Thu, May 19, 2016 at 09:13:15AM -0700, Ralph Castain wrote:
> No issue at all - I?ll check the latest versions and ensure the
> problem is present in them. Out of curiosity - what version of OMPI
> are you describing?

njoly@lanfeust [tmp/mpi]> mpirun --version
mpirun (Open MPI) 1.10.1

I discovered it with 1.10.1, and was able to reproduce with older
versions 1.6.5 and 1.8.8 i had handy.

Thanks.

> > On May 19, 2016, at 9:06 AM, Nicolas Joly <nj...@pasteur.fr> wrote:
> > 
> > 
> > Hi,
> > 
> > I just discovered a small issue with MPI_Finalize(). When sanity
> > checking a threaded tool on my NetBSD/amd64 workstation i turned on a
> > PTHREAD_DIAGASSERT environnement variable to report any issue that may
> > be triggered ...
> > 
> > And a simple MPI test program seemed to be affected :
> > 
> > njoly@issan [tmp/mpi]> mpicc --version
> > gcc (nb1 20160317) 5.3.0
> > njoly@issan [tmp/mpi]> cat sample.c 
> > #include <mpi.h>
> > int main(int argc, char **argv) {
> >  MPI_Init(&argc, &argv);
> >  MPI_Finalize();
> >  return 0; }
> > njoly@issan [tmp/mpi]> mpicc sample.c 
> > njoly@issan [tmp/mpi]> PTHREAD_DIAGASSERT=e ./a.out
> > a.out: Error detected by libpthread: Destroying locked mutex.
> > Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_mutex.c", 
> > line 148, function "pthread_mutex_destroy".
> > 
> > Checking the MPI code show that MPI_Finalize() calls
> > ompi/mca/rte/orte/rte_orte_component.c:rte_orte_close() which is the
> > culprit :
> > 
> > static int rte_orte_close(void)
> > {
> >    opal_mutex_lock(&mca_rte_orte_component.lock);
> >    OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs);
> >    OBJ_DESTRUCT(&mca_rte_orte_component.lock);
> > 
> >    return OMPI_SUCCESS;
> > }
> > 
> > According to the pthread_mutex_destroy() specifications[1],
> > destroying a still locked mutex results in an "undefined behaviour".
> > 
> > [...]
> > It shall be safe to destroy an initialized mutex that is
> > unlocked. Attempting to destroy a locked mutex or a mutex that is
> > referenced (for example, while being used in a
> > pthread_cond_timedwait() or pthread_cond_wait()) by another thread
> > results in undefined behavior.
> > [...]
> > 
> > Any expected issue in adding a opal_mutex_unlock() call before
> > destroying the opal_mutex_t object ?
> > 
> > Thanks.
> > 
> > [1] 
> > http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html
> > 
> > -- 
> > Nicolas Joly
> > 
> > Cluster & Computing Group
> > Biology IT Center
> > Institut Pasteur, Paris.
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2016/05/29239.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29240.php
-- 
Nicolas Joly

Cluster & Computing Group
Biology IT Center
Institut Pasteur, Paris.

Reply via email to