Nathan,

I do, but the hang comes later on. It looks like it's a situation where the
root is way, way faster than the children and he's inducing an an overrun
in the unexpected message queue. I think the queue is set to just keep
growing and it eventually blows up the memory??

$/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
--display-map -mca btl vader,self ./a.out
 Data for JOB [14187,1] offset 0

 ========================   JOB MAP   ========================

 Data for node: mngx-apl-01     Num slots: 16   Max slots: 0    Num procs: 3
        Process OMPI jobid: [14187,1] App: 0 Process rank: 0
        Process OMPI jobid: [14187,1] App: 0 Process rank: 1
        Process OMPI jobid: [14187,1] App: 0 Process rank: 2

 =============================================================
rank 2, m = 0
rank 0, m = 0
rank 1, m = 0
rank 0, m = 1000
rank 0, m = 2000
rank 0, m = 3000
rank 2, m = 1000
rank 1, m = 1000
rank 0, m = 4000
rank 0, m = 5000
rank 0, m = 6000
rank 0, m = 7000
rank 1, m = 2000
rank 2, m = 2000
rank 0, m = 8000
rank 0, m = 9000
rank 0, m = 10000
rank 0, m = 11000
rank 2, m = 3000
rank 1, m = 3000
rank 0, m = 12000
rank 0, m = 13000
rank 0, m = 14000
rank 1, m = 4000
rank 2, m = 4000
rank 0, m = 15000
rank 0, m = 16000
rank 0, m = 17000
rank 0, m = 18000
rank 1, m = 5000
rank 2, m = 5000
rank 0, m = 19000
rank 0, m = 20000
rank 0, m = 21000
rank 0, m = 22000
rank 2, m = 6000     <--- Finally hangs when Ranks 2 and 1 are at 6000 but
rank 0, the root, is at 22,000
rank 1, m = 6000

It fails with the ompi_coll_tuned_bcast_intra_split_bintree algorithm in
Tuned - looks like a scatter/allgather type of operation. It's in the
allgather phase during the bidirectional send/recv that things go bad.
There are no issues running this under "Basic" colls.

Josh




On Mon, Feb 23, 2015 at 4:13 PM, Nathan Hjelm <hje...@lanl.gov> wrote:

>
> Josh, do you see a hang when using vader? It is preferred over the old
> sm btl.
>
> -Nathan
>
> On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
> >    Sachin,
> >
> >    I am able to reproduce something funny. Looks like your issue. When I
> run
> >    on a single host with two ranks, the test works fine. However, when I
> try
> >    three or more, it looks like only the root, rank 0, is making any
> progress
> >    after the first iteration.
> >
> >    $/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
> -np 3
> >    -mca btl self,sm ./bcast_loop
> >    rank 0, m = 0
> >    rank 1, m = 0
> >    rank 2, m = 0
> >    rank 0, m = 1000
> >    rank 0, m = 2000
> >    rank 0, m = 3000
> >    rank 0, m = 4000
> >    rank 0, m = 5000
> >    rank 0, m = 6000
> >    rank 0, m = 7000
> >    rank 0, m = 8000
> >    rank 0, m = 9000
> >    rank 0, m = 10000
> >    rank 0, m = 11000
> >    rank 0, m = 12000
> >    rank 0, m = 13000
> >    rank 0, m = 14000
> >    rank 0, m = 15000
> >    rank 0, m = 16000   <----- Hanging
> >
> >    After hanging for a while, I get an OOM kernel panic message:
> >
> >    joshual@mngx-apl-01 ~
> >    $
> >    Message from syslogd@localhost at Feb 23 22:42:17 ...
> >     kernel:Kernel panic - not syncing: Out of memory: system-wide
> >    panic_on_oom is enabled
> >
> >    Message from syslogd@localhost at Feb 23 22:42:17 ...
> >     kernel:
> >
> >    With TCP BTL the result is sensible, i.e. I see three ranks reporting
> for
> >    each multiple of 1000:
> >    $/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
> -np 3
> >    -mca btl self,tcp ./a.out
> >    rank 1, m = 0
> >    rank 2, m = 0
> >    rank 0, m = 0
> >    rank 0, m = 1000
> >    rank 2, m = 1000
> >    rank 1, m = 1000
> >    rank 1, m = 2000
> >    rank 0, m = 2000
> >    rank 2, m = 2000
> >    rank 0, m = 3000
> >    rank 2, m = 3000
> >    rank 1, m = 3000
> >    rank 0, m = 4000
> >    rank 1, m = 4000
> >    rank 2, m = 4000
> >    rank 0, m = 5000
> >    rank 2, m = 5000
> >    rank 1, m = 5000
> >    rank 0, m = 6000
> >    rank 1, m = 6000
> >    rank 2, m = 6000
> >    rank 2, m = 7000
> >    rank 1, m = 7000
> >    rank 0, m = 7000
> >    rank 0, m = 8000
> >    rank 2, m = 8000
> >    rank 1, m = 8000
> >    rank 0, m = 9000
> >    rank 2, m = 9000
> >    rank 1, m = 9000
> >    rank 2, m = 10000
> >    rank 0, m = 10000
> >    rank 1, m = 10000
> >    rank 1, m = 11000
> >    rank 0, m = 11000
> >    rank 2, m = 11000
> >    rank 2, m = 12000
> >    rank 1, m = 12000
> >    rank 0, m = 12000
> >    rank 1, m = 13000
> >    rank 0, m = 13000
> >    rank 2, m = 13000
> >    rank 1, m = 14000
> >    rank 2, m = 14000
> >    rank 0, m = 14000
> >    rank 1, m = 15000
> >    rank 0, m = 15000
> >    rank 2, m = 15000
> >    etc...
> >
> >    It looks like a bug in the SM BTL. I can poke some more at this
> tomorrow.
> >
> >    Josh
> >    On Sun, Feb 22, 2015 at 11:18 PM, Sachin Krishnan <sachk...@gmail.com
> >
> >    wrote:
> >
> >      George,
> >      I was able to run the code without any errors in an older version of
> >      OpenMPI in another machine. It looks like some problem with my
> machine
> >      like Josh pointed out.
> >      Adding --mca coll tuned or basic  to the mpirun command resulted in
> an
> >      MPI_Init failed error with the following additional information for
> the
> >      Open MPI developer:
> >       mca_coll_base_comm_select(MPI_COMM_WORLD) failed
> >        --> Returned "Not found" (-13) instead of "Success" (0)
> >      Thanks for the help.
> >      Sachin
> >      On Mon, Feb 23, 2015 at 4:17 AM, George Bosilca <
> bosi...@icl.utk.edu>
> >      wrote:
> >
> >        Sachin,
> >        I cant replicate your issue neither with the latest 1.8 nor with
> the
> >        trunk. I tried using a single host, while forcing SM and then TP
> to no
> >        avail.
> >        Can you try restricting the collective modules in use (adding
> --mca
> >        coll tuned,basic) to your mpirun command?
> >          George.
> >        On Fri, Feb 20, 2015 at 9:31 PM, Sachin Krishnan <
> sachk...@gmail.com>
> >        wrote:
> >
> >          Josh,
> >          Thanks for the help.
> >          I'm running on a single host. How do I confirm that it is an
> issue
> >          with the shared memory?
> >          Sachin
> >          On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd <
> jladd.m...@gmail.com>
> >          wrote:
> >
> >            Sachin,
> >
> >            Are you running this on a single host or across multiple hosts
> >            (i.e. are you communicating between processes via
> networking.) If
> >            it's on a single host, then it might be an issue with shared
> >            memory.
> >
> >            Josh
> >            On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan
> >            <sachk...@gmail.com> wrote:
> >
> >              Hello Josh,
> >
> >              The command i use to compile the code is:
> >
> >              mpicc bcast_loop.c
> >
> >              To run the code I use:
> >
> >              mpirun -np 2 ./a.out
> >
> >              Output is unpredictable. It gets stuck at different places.
> >
> >              Im attaching lstopo and ompi_info outputs. Do you need any
> other
> >              info?
> >
> >              lstopo-no-graphics output:
> >
> >              Machine (3433MB)
> >
> >                Socket L#0 + L3 L#0 (8192KB)
> >
> >                  L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core
> L#0
> >
> >                    PU L#0 (P#0)
> >
> >                    PU L#1 (P#4)
> >
> >                  L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core
> L#1
> >
> >                    PU L#2 (P#1)
> >
> >                    PU L#3 (P#5)
> >
> >                  L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core
> L#2
> >
> >                    PU L#4 (P#2)
> >
> >                    PU L#5 (P#6)
> >
> >                  L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core
> L#3
> >
> >                    PU L#6 (P#3)
> >
> >                    PU L#7 (P#7)
> >
> >                HostBridge L#0
> >
> >                  PCI 8086:0162
> >
> >                    GPU L#0 "card0"
> >
> >                    GPU L#1 "renderD128"
> >
> >                    GPU L#2 "controlD64"
> >
> >                  PCI 8086:1502
> >
> >                    Net L#3 "eth0"
> >
> >                  PCI 8086:1e02
> >
> >                    Block L#4 "sda"
> >
> >                    Block L#5 "sr0"
> >
> >              ompi_info output:
> >
> >                               Package: Open MPI builduser@anatol
> Distribution
> >
> >                              Open MPI: 1.8.4
> >
> >                Open MPI repo revision: v1.8.3-330-g0344f04
> >
> >                 Open MPI release date: Dec 19, 2014
> >
> >                              Open RTE: 1.8.4
> >
> >                Open RTE repo revision: v1.8.3-330-g0344f04
> >
> >                 Open RTE release date: Dec 19, 2014
> >
> >                                  OPAL: 1.8.4
> >
> >                    OPAL repo revision: v1.8.3-330-g0344f04
> >
> >                     OPAL release date: Dec 19, 2014
> >
> >                               MPI API: 3.0
> >
> >                          Ident string: 1.8.4
> >
> >                                Prefix: /usr
> >
> >               Configured architecture: i686-pc-linux-gnu
> >
> >                        Configure host: anatol
> >
> >                         Configured by: builduser
> >
> >                         Configured on: Sat Dec 20 17:00:34 PST 2014
> >
> >                        Configure host: anatol
> >
> >                              Built by: builduser
> >
> >                              Built on: Sat Dec 20 17:12:16 PST 2014
> >
> >                            Built host: anatol
> >
> >                            C bindings: yes
> >
> >                          C++ bindings: yes
> >
> >                           Fort mpif.h: yes (all)
> >
> >                          Fort use mpi: yes (full: ignore TKR)
> >
> >                     Fort use mpi size: deprecated-ompi-info-value
> >
> >                      Fort use mpi_f08: yes
> >
> >               Fort mpi_f08 compliance: The mpi_f08 module is available,
> but
> >              due to
> >
> >                                        limitations in the
> /usr/bin/gfortran
> >              compiler, does
> >
> >                                        not support the following: array
> >              subsections,
> >
> >                                        direct passthru (where possible)
> to
> >              underlying Open
> >
> >                                        MPI's C functionality
> >
> >                Fort mpi_f08 subarrays: no
> >
> >                         Java bindings: no
> >
> >                Wrapper compiler rpath: runpath
> >
> >                            C compiler: gcc
> >
> >                   C compiler absolute: /usr/bin/gcc
> >
> >                C compiler family name: GNU
> >
> >                    C compiler version: 4.9.2
> >
> >                          C++ compiler: g++
> >
> >                 C++ compiler absolute: /usr/bin/g++
> >
> >                         Fort compiler: /usr/bin/gfortran
> >
> >                     Fort compiler abs:
> >
> >                       Fort ignore TKR: yes (!GCC$ ATTRIBUTES
> NO_ARG_CHECK ::)
> >
> >                 Fort 08 assumed shape: yes
> >
> >                    Fort optional args: yes
> >
> >                        Fort INTERFACE: yes
> >
> >                  Fort ISO_FORTRAN_ENV: yes
> >
> >                     Fort STORAGE_SIZE: yes
> >
> >                    Fort BIND(C) (all): yes
> >
> >                    Fort ISO_C_BINDING: yes
> >
> >               Fort SUBROUTINE BIND(C): yes
> >
> >                     Fort TYPE,BIND(C): yes
> >
> >               Fort T,BIND(C,name="a"): yes
> >
> >                          Fort PRIVATE: yes
> >
> >                        Fort PROTECTED: yes
> >
> >                         Fort ABSTRACT: yes
> >
> >                     Fort ASYNCHRONOUS: yes
> >
> >                        Fort PROCEDURE: yes
> >
> >                         Fort C_FUNLOC: yes
> >
> >               Fort f08 using wrappers: yes
> >
> >                       Fort MPI_SIZEOF: yes
> >
> >                           C profiling: yes
> >
> >                         C++ profiling: yes
> >
> >                 Fort mpif.h profiling: yes
> >
> >                Fort use mpi profiling: yes
> >
> >                 Fort use mpi_f08 prof: yes
> >
> >                        C++ exceptions: no
> >
> >                        Thread support: posix (MPI_THREAD_MULTIPLE: no,
> OPAL
> >              support: yes,
> >
> >                                        OMPI progress: no, ORTE progress:
> yes,
> >              Event lib:
> >
> >                                        yes)
> >
> >                         Sparse Groups: no
> >
> >                Internal debug support: no
> >
> >                MPI interface warnings: yes
> >
> >                   MPI parameter check: runtime
> >
> >              Memory profiling support: no
> >
> >              Memory debugging support: no
> >
> >                       libltdl support: yes
> >
> >                 Heterogeneous support: no
> >
> >               mpirun default --prefix: no
> >
> >                       MPI I/O support: yes
> >
> >                     MPI_WTIME support: gettimeofday
> >
> >                   Symbol vis. support: yes
> >
> >                 Host topology support: yes
> >
> >                        MPI extensions:
> >
> >                 FT Checkpoint support: no (checkpoint thread: no)
> >
> >                 C/R Enabled Debugging: no
> >
> >                   VampirTrace support: yes
> >
> >                MPI_MAX_PROCESSOR_NAME: 256
> >
> >                  MPI_MAX_ERROR_STRING: 256
> >
> >                   MPI_MAX_OBJECT_NAME: 64
> >
> >                      MPI_MAX_INFO_KEY: 36
> >
> >                      MPI_MAX_INFO_VAL: 256
> >
> >                     MPI_MAX_PORT_NAME: 1024
> >
> >                MPI_MAX_DATAREP_STRING: 128
> >
> >                         MCA backtrace: execinfo (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                          MCA compress: bzip (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                          MCA compress: gzip (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA crs: none (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                                MCA db: hash (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                                MCA db: print (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                             MCA event: libevent2021 (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA hwloc: external (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                                MCA if: posix_ipv4 (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                                MCA if: linux_ipv6 (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                       MCA installdirs: env (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                       MCA installdirs: config (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                        MCA memchecker: valgrind (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                            MCA memory: linux (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA pstat: linux (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA sec: basic (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                             MCA shmem: mmap (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA shmem: posix (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA shmem: sysv (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA timer: linux (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA dfs: app (MCA v2.0, API v1.0, Component
> >              v1.8.4)
> >
> >                               MCA dfs: orted (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                               MCA dfs: test (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                            MCA errmgr: default_app (MCA v2.0, API v3.0,
> >              Component v1.8.4)
> >
> >                            MCA errmgr: default_hnp (MCA v2.0, API v3.0,
> >              Component v1.8.4)
> >
> >                            MCA errmgr: default_orted (MCA v2.0, API v3.0,
> >              Component
> >
> >                                        v1.8.4)
> >
> >                            MCA errmgr: default_tool (MCA v2.0, API v3.0,
> >              Component v1.8.4)
> >
> >                               MCA ess: env (MCA v2.0, API v3.0, Component
> >              v1.8.4)
> >
> >                               MCA ess: hnp (MCA v2.0, API v3.0, Component
> >              v1.8.4)
> >
> >                               MCA ess: singleton (MCA v2.0, API v3.0,
> >              Component v1.8.4)
> >
> >                               MCA ess: tool (MCA v2.0, API v3.0,
> Component
> >              v1.8.4)
> >
> >                             MCA filem: raw (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                           MCA grpcomm: bad (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA iof: hnp (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA iof: mr_hnp (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA iof: mr_orted (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                               MCA iof: orted (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA iof: tool (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA odls: default (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA oob: tcp (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA plm: isolated (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                               MCA plm: rsh (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA ras: loadleveler (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                               MCA ras: simulator (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA rmaps: lama (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA rmaps: mindist (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA rmaps: ppr (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                             MCA rmaps: rank_file (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA rmaps: resilient (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA rmaps: round_robin (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA rmaps: seq (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                             MCA rmaps: staged (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA rml: oob (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                            MCA routed: binomial (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                            MCA routed: debruijn (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                            MCA routed: direct (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                            MCA routed: radix (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA state: app (MCA v2.0, API v1.0, Component
> >              v1.8.4)
> >
> >                             MCA state: hnp (MCA v2.0, API v1.0, Component
> >              v1.8.4)
> >
> >                             MCA state: novm (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                             MCA state: orted (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                             MCA state: staged_hnp (MCA v2.0, API v1.0,
> >              Component v1.8.4)
> >
> >                             MCA state: staged_orted (MCA v2.0, API v1.0,
> >              Component v1.8.4)
> >
> >                             MCA state: tool (MCA v2.0, API v1.0,
> Component
> >              v1.8.4)
> >
> >                         MCA allocator: basic (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                         MCA allocator: bucket (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA bcol: basesmuma (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                              MCA bcol: ptpcoll (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA bml: r2 (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA btl: self (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA btl: sm (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA btl: tcp (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA btl: vader (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA coll: basic (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA coll: hierarch (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                              MCA coll: inter (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA coll: libnbc (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA coll: ml (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                              MCA coll: self (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA coll: sm (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                              MCA coll: tuned (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                               MCA dpm: orte (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA fbtl: posix (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA fcoll: dynamic (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA fcoll: individual (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA fcoll: static (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA fcoll: two_phase (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                             MCA fcoll: ylib (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                                MCA fs: ufs (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                                MCA io: ompio (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                                MCA io: romio (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA mpool: grdma (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                             MCA mpool: sm (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA osc: rdma (MCA v2.0, API v3.0,
> Component
> >              v1.8.4)
> >
> >                               MCA osc: sm (MCA v2.0, API v3.0, Component
> >              v1.8.4)
> >
> >                               MCA pml: v (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA pml: bfo (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA pml: cm (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA pml: ob1 (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                            MCA pubsub: orte (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                            MCA rcache: vma (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                               MCA rte: orte (MCA v2.0, API v2.0,
> Component
> >              v1.8.4)
> >
> >                              MCA sbgp: basesmsocket (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                              MCA sbgp: basesmuma (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                              MCA sbgp: p2p (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                          MCA sharedfp: individual (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                          MCA sharedfp: lockedfile (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >                          MCA sharedfp: sm (MCA v2.0, API v2.0, Component
> >              v1.8.4)
> >
> >                              MCA topo: basic (MCA v2.0, API v2.1,
> Component
> >              v1.8.4)
> >
> >                         MCA vprotocol: pessimist (MCA v2.0, API v2.0,
> >              Component v1.8.4)
> >
> >              Sachin
> >
> >              >Sachin,
> >
> >              >Can you, please, provide a command line? Additional
> information
> >              about your
> >              >system could be helpful also.
> >
> >              >Josh
> >
> >              >>On Wed, Feb 18, 2015 at 3:43 AM, Sachin Krishnan
> >              <sachkris_at_[hidden]> wrote:
> >
> >              >> Hello,
> >              >>
> >              >> I am new to MPI and also this list.
> >              >> I wrote an MPI code with several MPI_Bcast calls in a
> loop.
> >              My code was
> >              >> getting stuck at random points, ie it was not systematic.
> >              After a few hours
> >              >> of debugging and googling, I found that the issue may be
> with
> >              the several
> >              >> MPI_Bcast calls in a loop.
> >              >>
> >              >> I stumbled on this test code which can reproduce the
> issue:
> >              >>
> https://github.com/fintler/ompi/blob/master/orte/test/mpi/bcast_loop.c
> >              >>
> >              >> Im using OpenMPI v1.8.4 installed from official Arch
> Linux
> >              repo.
> >              >>
> >              >> Is it a known issue with OpenMPI?
> >              >> Is it some problem with the way openmpi is configured in
> my
> >              system?
> >              >>
> >              >> Thanks in advance.
> >              >>
> >              >> Sachin
> >              >>
> >              >>
> >              >>
> >              >> _______________________________________________
> >              >> users mailing list
> >              >> users_at_[hidden]
> >              >>
> >              Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >              >> Link to this post:
> >              >>
> http://www.open-mpi.org/community/lists/users/2015/02/26338.php
> >              >>
> >
> >              _______________________________________________
> >              users mailing list
> >              us...@open-mpi.org
> >              Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >              Link to this post:
> >
> http://www.open-mpi.org/community/lists/users/2015/02/26363.php
> >
> >            _______________________________________________
> >            users mailing list
> >            us...@open-mpi.org
> >            Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >            Link to this post:
> >
> http://www.open-mpi.org/community/lists/users/2015/02/26366.php
> >
> >          _______________________________________________
> >          users mailing list
> >          us...@open-mpi.org
> >          Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >          Link to this post:
> >          http://www.open-mpi.org/community/lists/users/2015/02/26367.php
> >
> >        _______________________________________________
> >        users mailing list
> >        us...@open-mpi.org
> >        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >        Link to this post:
> >        http://www.open-mpi.org/community/lists/users/2015/02/26368.php
> >
> >      _______________________________________________
> >      users mailing list
> >      us...@open-mpi.org
> >      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >      Link to this post:
> >      http://www.open-mpi.org/community/lists/users/2015/02/26369.php
>
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26375.php
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26376.php
>

Reply via email to