Re: [OMPI users] Program hangs in mpi_bcast

Ralph Castain Mon, 14 Nov 2011 18:17:53 -0500

Yes, this is well documented - may be on the FAQ, but certainly has been in the 
user list multiple times.


The problem is that one process falls behind, which causes it to begin 
accumulating "unexpected messages" in its queue. This causes the matching logic 
to run a little slower, thus making the process fall further and further 
behind. Eventually, things hang because everyone is sitting in bcast waiting 
for the slow proc to catch up, but it's queue is saturated and it can't.

The solution is to do exactly what you describe - add some barriers to force 
the slow process to catch up. This happened enough that we even added support 
for it in OMPI itself so you don't have to modify your code. Look at the 
following from "ompi_info --param coll sync"

                MCA coll: parameter "coll_base_verbose" (current value: <0>, 
data source: default value)
                          Verbosity level for the coll framework (0 = no 
verbosity)
                MCA coll: parameter "coll_sync_priority" (current value: <50>, 
data source: default value)
                          Priority of the sync coll component; only relevant if 
barrier_before or barrier_after is > 0
               MCA coll: parameter "coll_sync_barrier_before" (current value: 
<1000>, data source: default value)
                          Do a synchronization before each Nth collective
                MCA coll: parameter "coll_sync_barrier_after" (current value: 
<0>, data source: default value)
                          Do a synchronization after each Nth collective

Take your pick - inserting a barrier before or after doesn't seem to make a lot 
of difference, but most people use "before". Try different values until you get 
something that works for you.


On Nov 14, 2011, at 3:10 PM, Tom Rosmond wrote:

> Hello:
> 
> A colleague and I have been running a large F90 application that does an
> enormous number of mpi_bcast calls during execution.  I deny any
> responsibility for the design of the code and why it needs these calls,
> but it is what we have inherited and have to work with.
> 
> Recently we ported the code to an 8 node, 6 processor/node NUMA system
> (lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3,
> and began having trouble with mysterious 'hangs' in the program inside
> the mpi_bcast calls.  The hangs were always in the same calls, but not
> necessarily at the same time during integration.  We originally didn't
> have NUMA support, so reinstalled with libnuma support added, but the
> problem persisted.  Finally, just as a wild guess, we inserted
> 'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program
> now runs without problems.
> 
> I believe conventional wisdom is that properly formulated MPI programs
> should run correctly without barriers, so do you have any thoughts on
> why we found it necessary to add them?  The code has run correctly on
> other architectures, i.g. Crayxe6, so I don't think there is a bug
> anywhere.  My only explanation is that some internal resource gets
> exhausted because of the large number of 'mpi_bcast' calls in rapid
> succession, and the barrier calls force synchronization which allows the
> resource to be restored.  Does this make sense?  I'd appreciate any
> comments and advice you can provide.
> 
> 
> I have attached compressed copies of config.log and ompi_info for the
> system.  The program is built with ifort 12.0 and typically runs with 
> 
>  mpirun -np 36 -bycore -bind-to-core program.exe
> 
> We have run both interactively and with PBS, but that doesn't seem to
> make any difference in program behavior.
> 
> T. Rosmond
> 
> 
> <lstopo_out.txt><config.log.bz2><ompi_info.bz2>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Program hangs in mpi_bcast

Reply via email to