Here’s the 1.10 version of the PR:

https://github.com/open-mpi/ompi-release/pull/1172 
<https://github.com/open-mpi/ompi-release/pull/1172>


> On May 19, 2016, at 9:18 AM, Nicolas Joly <nj...@pasteur.fr> wrote:
> 
> On Thu, May 19, 2016 at 09:13:15AM -0700, Ralph Castain wrote:
>> No issue at all - I?ll check the latest versions and ensure the
>> problem is present in them. Out of curiosity - what version of OMPI
>> are you describing?
> 
> njoly@lanfeust [tmp/mpi]> mpirun --version
> mpirun (Open MPI) 1.10.1
> 
> I discovered it with 1.10.1, and was able to reproduce with older
> versions 1.6.5 and 1.8.8 i had handy.
> 
> Thanks.
> 
>>> On May 19, 2016, at 9:06 AM, Nicolas Joly <nj...@pasteur.fr> wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> I just discovered a small issue with MPI_Finalize(). When sanity
>>> checking a threaded tool on my NetBSD/amd64 workstation i turned on a
>>> PTHREAD_DIAGASSERT environnement variable to report any issue that may
>>> be triggered ...
>>> 
>>> And a simple MPI test program seemed to be affected :
>>> 
>>> njoly@issan [tmp/mpi]> mpicc --version
>>> gcc (nb1 20160317) 5.3.0
>>> njoly@issan [tmp/mpi]> cat sample.c 
>>> #include <mpi.h>
>>> int main(int argc, char **argv) {
>>> MPI_Init(&argc, &argv);
>>> MPI_Finalize();
>>> return 0; }
>>> njoly@issan [tmp/mpi]> mpicc sample.c 
>>> njoly@issan [tmp/mpi]> PTHREAD_DIAGASSERT=e ./a.out
>>> a.out: Error detected by libpthread: Destroying locked mutex.
>>> Detected by file "/local/src/NetBSD/src/lib/libpthread/pthread_mutex.c", 
>>> line 148, function "pthread_mutex_destroy".
>>> 
>>> Checking the MPI code show that MPI_Finalize() calls
>>> ompi/mca/rte/orte/rte_orte_component.c:rte_orte_close() which is the
>>> culprit :
>>> 
>>> static int rte_orte_close(void)
>>> {
>>>   opal_mutex_lock(&mca_rte_orte_component.lock);
>>>   OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs);
>>>   OBJ_DESTRUCT(&mca_rte_orte_component.lock);
>>> 
>>>   return OMPI_SUCCESS;
>>> }
>>> 
>>> According to the pthread_mutex_destroy() specifications[1],
>>> destroying a still locked mutex results in an "undefined behaviour".
>>> 
>>> [...]
>>> It shall be safe to destroy an initialized mutex that is
>>> unlocked. Attempting to destroy a locked mutex or a mutex that is
>>> referenced (for example, while being used in a
>>> pthread_cond_timedwait() or pthread_cond_wait()) by another thread
>>> results in undefined behavior.
>>> [...]
>>> 
>>> Any expected issue in adding a opal_mutex_unlock() call before
>>> destroying the opal_mutex_t object ?
>>> 
>>> Thanks.
>>> 
>>> [1] 
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html
>>> 
>>> -- 
>>> Nicolas Joly
>>> 
>>> Cluster & Computing Group
>>> Biology IT Center
>>> Institut Pasteur, Paris.
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/05/29239.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/05/29240.php 
>> <http://www.open-mpi.org/community/lists/users/2016/05/29240.php>
> -- 
> Nicolas Joly
> 
> Cluster & Computing Group
> Biology IT Center
> Institut Pasteur, Paris.
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29241.php 
> <http://www.open-mpi.org/community/lists/users/2016/05/29241.php>

Reply via email to