Hello all,

I sometimes run into deadlocks in OpenMPI (1.3.3a1r21206), when
running my MPI+threaded PT-Scotch software. Luckily, the case
is very small, with 4 procs only, so I have been able to investigate
it a bit. It seems that matches between commnications are not done
properly on cloned communicators. In the end, I run into a case where
a MPI_Waitall completes a MPI_Barrier on another proc. The bug is
erratic but quite easy to reproduce, luckily too.

To be sure, I ran my code into valgrind using helgrind, its
race condition detection tool. It produced much output, most
of which seems to be innocuous, yet I have some concerns about
such messages as the following ones. The ==12**== were generated
when running on 4 procs, while the ==83**== were generated
when running on 2 procs :

==8329== Possible data race during write of size 4 at 0x8882200
==8329==    at 0x508B315: sm_fifo_write (btl_sm.h:254)
==8329==    by 0x508B401: mca_btl_sm_send (btl_sm.c:811)
==8329==    by 0x5070A0C: mca_bml_base_send_status (bml.h:288)
==8329==    by 0x50708E6: mca_pml_ob1_send_request_start_copy
(pml_ob1_sendreq.c:567)
==8329==    by 0x5064C30: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:363)
==8329==    by 0x5064A19: mca_pml_ob1_send_request_start (pml_ob1_sendreq.h:429)
==8329==    by 0x5064856: mca_pml_ob1_isend (pml_ob1_isend.c:87)
==8329==    by 0x5142C46: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:51)
==8329==    by 0x514F379: ompi_coll_tuned_barrier_intra_two_procs
(coll_tuned_barrier.c:258)
==8329==    by 0x5143252: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:192)
==8329==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==8329==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==8329==   Old state: shared-readonly by threads #1, #7
==8329==   New state: shared-modified by threads #1, #7
==8329==   Reason:    this thread, #1, holds no consistent locks
==8329==   Location 0x8882200 has never been protected by any lock

==1220== Possible data race during write of size 4 at 0x88CEF88
==1220==    at 0x508CD84: sm_fifo_read (btl_sm.h:272)
==1220==    by 0x508C864: mca_btl_sm_component_progress (btl_sm_component.c:391)
==1220==    by 0x41F72DF: opal_progress (opal_progress.c:207)
==1220==    by 0x40BD67D: opal_condition_wait (condition.h:85)
==1220==    by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==1220==    by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==1220==    by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1220==    by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1220==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1220==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==1220==    by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==1220==    by 0x805EA43: kdgraphMapRbPart2 (kdgraph_map_rb_part.c:331)
==1220==   Old state: shared-readonly by threads #1, #7
==1220==   New state: shared-modified by threads #1, #7
==1220==   Reason:    this thread, #1, holds no consistent locks
==1220==   Location 0x88CEF88 has never been protected by any lock

==1219== Possible data race during write of size 4 at 0x891BC8C
==1219==    at 0x508CD99: sm_fifo_read (btl_sm.h:273)
==1219==    by 0x508C864: mca_btl_sm_component_progress (btl_sm_component.c:391)
==1219==    by 0x41F72DF: opal_progress (opal_progress.c:207)
==1219==    by 0x40BD67D: opal_condition_wait (condition.h:85)
==1219==    by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==1219==    by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==1219==    by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1219==    by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1219==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1219==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==1219==    by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==1219==    by 0x805EA43: kdgraphMapRbPart2 (kdgraph_map_rb_part.c:331)
==1219==   Old state: shared-readonly by threads #1, #7
==1219==   New state: shared-modified by threads #1, #7
==1219==   Reason:    this thread, #1, holds no consistent locks
==1219==   Location 0x891BC8C has never been protected by any lock

==1220== Possible data race during write of size 4 at 0x4243A68
==1220==    at 0x41F72A7: opal_progress (opal_progress.c:186)
==1220==    by 0x40BD67D: opal_condition_wait (condition.h:85)
==1220==    by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==1220==    by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==1220==    by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1220==    by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1220==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1220==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==1220==    by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==1220==    by 0x805EA43: kdgraphMapRbPart2 (kdgraph_map_rb_part.c:331)
==1220==    by 0x805EB86: _SCOTCHkdgraphMapRbPart (kdgraph_map_rb_part.c:421)
==1220==    by 0x8057713: _SCOTCHkdgraphMapSt (kdgraph_map_st.c:182)
==1220==   Old state: shared-readonly by threads #1, #7
==1220==   New state: shared-modified by threads #1, #7
==1220==   Reason:    this thread, #1, holds no consistent locks
==1220==   Location 0x4243A68 has never been protected by any lock

==8328== Possible data race during write of size 4 at 0x4532318
==8328==    at 0x508A9B8: opal_atomic_lifo_pop (opal_atomic_lifo.h:111)
==8328==    by 0x508A69F: mca_btl_sm_alloc (btl_sm.c:612)
==8328==    by 0x5070571: mca_bml_base_alloc (bml.h:241)
==8328==    by 0x5070778: mca_pml_ob1_send_request_start_copy
(pml_ob1_sendreq.c:506)
==8328==    by 0x5064C30: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:363)
==8328==    by 0x5064A19: mca_pml_ob1_send_request_start (pml_ob1_sendreq.h:429)
==8328==    by 0x5064856: mca_pml_ob1_isend (pml_ob1_isend.c:87)
==8328==    by 0x5142C46: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:51)
==8328==    by 0x514F379: ompi_coll_tuned_barrier_intra_two_procs
(coll_tuned_barrier.c:258)
==8328==    by 0x5143252: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:192)
==8328==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==8328==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==8328==   Old state: shared-readonly by threads #1, #8
==8328==   New state: shared-modified by threads #1, #8
==8328==   Reason:    this thread, #1, holds no consistent locks
==8328==   Location 0x4532318 has never been protected by any lock

==8329== Possible data race during write of size 4 at 0x452F238
==8329==    at 0x5067FD3: recv_req_matched (pml_ob1_recvreq.h:219)
==8329==    by 0x5067D95: mca_pml_ob1_recv_frag_callback_match
(pml_ob1_recvfrag.c:191)
==8329==    by 0x508C9BB: mca_btl_sm_component_progress (btl_sm_component.c:426)
==8329==    by 0x41F72DF: opal_progress (opal_progress.c:207)
==8329==    by 0x40BD67D: opal_condition_wait (condition.h:85)
==8329==    by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==8329==    by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==8329==    by 0x514F379: ompi_coll_tuned_barrier_intra_two_procs
(coll_tuned_barrier.c:258)
==8329==    by 0x5143252: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:192)
==8329==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==8329==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==8329==    by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==8329==   Old state: owned exclusively by thread #7
==8329==   New state: shared-modified by threads #1, #7
==8329==   Reason:    this thread, #1, holds no locks at all

==8329== Possible data race during write of size 4 at 0x452F2DC
==8329==    at 0x40D5946: ompi_convertor_unpack (convertor.c:280)
==8329==    by 0x5067E78: mca_pml_ob1_recv_frag_callback_match
(pml_ob1_recvfrag.c:215)
==8329==    by 0x508C9BB: mca_btl_sm_component_progress (btl_sm_component.c:426)
==8329==    by 0x41F72DF: opal_progress (opal_progress.c:207)
==8329==    by 0x40BD67D: opal_condition_wait (condition.h:85)
==8329==    by 0x40BDA96: ompi_request_default_wait_all (req_wait.c:262)
==8329==    by 0x5142C78: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55)
==8329==    by 0x514F379: ompi_coll_tuned_barrier_intra_two_procs
(coll_tuned_barrier.c:258)
==8329==    by 0x5143252: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:192)
==8329==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==8329==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==8329==    by 0x805E2B2: kdgraphMapRbPartFold2 (kdgraph_map_rb_part.c:199)
==8329==   Old state: owned exclusively by thread #7
==8329==   New state: shared-modified by threads #1, #7
==8329==   Reason:    this thread, #1, holds no locks at all

I guess the following are ok, but I provide them as a
reference :

==1220== Possible data race during write of size 4 at 0x8968780
==1220==    at 0x508A619: opal_atomic_unlock (atomic_impl.h:367)
==1220==    by 0x508B468: mca_btl_sm_send (btl_sm.c:811)
==1220==    by 0x5070A0C: mca_bml_base_send_status (bml.h:288)
==1220==    by 0x50708E6: mca_pml_ob1_send_request_start_copy
(pml_ob1_sendreq.c:567)
==1220==    by 0x5064C30: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:363)
==1220==    by 0x5064A19: mca_pml_ob1_send_request_start (pml_ob1_sendreq.h:429)
==1220==    by 0x5064856: mca_pml_ob1_isend (pml_ob1_isend.c:87)
==1220==    by 0x5142C46: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:51)
==1220==    by 0x514F07A: ompi_coll_tuned_barrier_intra_recursivedoubling
(coll_tuned_barrier.c:174)
==1220==    by 0x51432A3: ompi_coll_tuned_barrier_intra_dec_fixed
(coll_tuned_decision_fixed.c:208)
==1220==    by 0x40E410C: PMPI_Barrier (pbarrier.c:59)
==1220==    by 0x806C5FB: _SCOTCHdgraphInducePart (dgraph_induce.c:334)
==1220==   Old state: shared-modified by threads #1, #7
==1220==   New state: shared-modified by threads #1, #7
==1220==   Reason:    this thread, #1, holds no consistent locks
==1220==   Location 0x8968780 has never been protected by any lock

ompi_info says :
                 Package: Open MPI pelegrin@brol Distribution
                Open MPI: 1.3.3a1r21206
   Open MPI SVN revision: r21206
   Open MPI release date: Unreleased developer copy
                Open RTE: 1.3.3a1r21206
   Open RTE SVN revision: r21206
   Open RTE release date: Unreleased developer copy
                    OPAL: 1.3.3a1r21206
       OPAL SVN revision: r21206
       OPAL release date: Unreleased developer copy
            Ident string: 1.3.3a1r21206
                  Prefix: /usr/local
 Configured architecture: i686-pc-linux-gnu
          Configure host: brol
           Configured by: pelegrin
           Configured on: Tue May 12 15:50:08 CEST 2009
          Configure host: brol
                Built by: pelegrin
                Built on: Tue May 12 16:17:34 CEST 2009
              Built host: brol
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: yes, progress: no)
           Sparse Groups: no
  Internal debug support: yes
     MPI parameter check: always
Memory profiling support: no
Memory debugging support: yes
         libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no  (checkpoint thread: no)
           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.3)
          MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.3.3)
              MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3)
           MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.3)
               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.3)
               MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.3)
               MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.3)
         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3)
         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3)
              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3)
           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3)
           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3)
                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3)
                  MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3)
               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3)
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3)
               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3)
              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3)
                MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3)
                MCA odls: default (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.3)
               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.3)
               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3)
              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.3)
              MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3)
              MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3)
               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3)
              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3)
             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3)
             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3)

Thanks in advance for any help / explanation,

                                        f.p.

Reply via email to