Fair enough. Thanks anyway! On Nov 30, 2011, at 3:39 PM, Tom Rosmond wrote:
> Jeff, > > I'm afraid trying to produce a reproducer of this problem wouldn't be > worth the effort. It is a legacy code that I wasn't involved in > developing and will soon be discarded, so I can't justify spending time > trying to understand its behavior better. The bottom line is that it > works correctly with the small 'sync' value, and because it isn't very > expensive to run, that is enough for us. > > T. Rosmond > > > On Wed, 2011-11-30 at 15:29 -0500, Jeff Squyres wrote: >> Yes, but I'd like to see a reproducer that requires setting the >> sync_barrier_before=5. Your reproducers allowed much higher values, IIRC. >> >> I'm curious to know what makes that code require such a low value (i.e., >> 5)... >> >> >> On Nov 30, 2011, at 1:50 PM, Ralph Castain wrote: >> >>> FWIW: we already have a reproducer from prior work I did chasing this down >>> a couple of years ago. See orte/test/mpi/bcast_loop.c >>> >>> >>> On Nov 29, 2011, at 9:35 AM, Jeff Squyres wrote: >>> >>>> That's quite weird/surprising that you would need to set it down to *5* -- >>>> that's really low. >>>> >>>> Can you share a simple reproducer code, perchance? >>>> >>>> >>>> On Nov 15, 2011, at 11:49 AM, Tom Rosmond wrote: >>>> >>>>> Ralph, >>>>> >>>>> Thanks for the advice. I have to set 'coll_sync_barrier_before=5' to do >>>>> the job. This is a big change from the default value (1000), so our >>>>> application seems to be a pretty extreme case. >>>>> >>>>> T. Rosmond >>>>> >>>>> >>>>> On Mon, 2011-11-14 at 16:17 -0700, Ralph Castain wrote: >>>>>> Yes, this is well documented - may be on the FAQ, but certainly has been >>>>>> in the user list multiple times. >>>>>> >>>>>> The problem is that one process falls behind, which causes it to begin >>>>>> accumulating "unexpected messages" in its queue. This causes the >>>>>> matching logic to run a little slower, thus making the process fall >>>>>> further and further behind. Eventually, things hang because everyone is >>>>>> sitting in bcast waiting for the slow proc to catch up, but it's queue >>>>>> is saturated and it can't. >>>>>> >>>>>> The solution is to do exactly what you describe - add some barriers to >>>>>> force the slow process to catch up. This happened enough that we even >>>>>> added support for it in OMPI itself so you don't have to modify your >>>>>> code. Look at the following from "ompi_info --param coll sync" >>>>>> >>>>>> MCA coll: parameter "coll_base_verbose" (current value: <0>, >>>>>> data source: default value) >>>>>> Verbosity level for the coll framework (0 = no >>>>>> verbosity) >>>>>> MCA coll: parameter "coll_sync_priority" (current value: >>>>>> <50>, data source: default value) >>>>>> Priority of the sync coll component; only relevant >>>>>> if barrier_before or barrier_after is > 0 >>>>>> MCA coll: parameter "coll_sync_barrier_before" (current >>>>>> value: <1000>, data source: default value) >>>>>> Do a synchronization before each Nth collective >>>>>> MCA coll: parameter "coll_sync_barrier_after" (current >>>>>> value: <0>, data source: default value) >>>>>> Do a synchronization after each Nth collective >>>>>> >>>>>> Take your pick - inserting a barrier before or after doesn't seem to >>>>>> make a lot of difference, but most people use "before". Try different >>>>>> values until you get something that works for you. >>>>>> >>>>>> >>>>>> On Nov 14, 2011, at 3:10 PM, Tom Rosmond wrote: >>>>>> >>>>>>> Hello: >>>>>>> >>>>>>> A colleague and I have been running a large F90 application that does an >>>>>>> enormous number of mpi_bcast calls during execution. I deny any >>>>>>> responsibility for the design of the code and why it needs these calls, >>>>>>> but it is what we have inherited and have to work with. >>>>>>> >>>>>>> Recently we ported the code to an 8 node, 6 processor/node NUMA system >>>>>>> (lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3, >>>>>>> and began having trouble with mysterious 'hangs' in the program inside >>>>>>> the mpi_bcast calls. The hangs were always in the same calls, but not >>>>>>> necessarily at the same time during integration. We originally didn't >>>>>>> have NUMA support, so reinstalled with libnuma support added, but the >>>>>>> problem persisted. Finally, just as a wild guess, we inserted >>>>>>> 'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program >>>>>>> now runs without problems. >>>>>>> >>>>>>> I believe conventional wisdom is that properly formulated MPI programs >>>>>>> should run correctly without barriers, so do you have any thoughts on >>>>>>> why we found it necessary to add them? The code has run correctly on >>>>>>> other architectures, i.g. Crayxe6, so I don't think there is a bug >>>>>>> anywhere. My only explanation is that some internal resource gets >>>>>>> exhausted because of the large number of 'mpi_bcast' calls in rapid >>>>>>> succession, and the barrier calls force synchronization which allows the >>>>>>> resource to be restored. Does this make sense? I'd appreciate any >>>>>>> comments and advice you can provide. >>>>>>> >>>>>>> >>>>>>> I have attached compressed copies of config.log and ompi_info for the >>>>>>> system. The program is built with ifort 12.0 and typically runs with >>>>>>> >>>>>>> mpirun -np 36 -bycore -bind-to-core program.exe >>>>>>> >>>>>>> We have run both interactively and with PBS, but that doesn't seem to >>>>>>> make any difference in program behavior. >>>>>>> >>>>>>> T. Rosmond >>>>>>> >>>>>>> >>>>>>> <lstopo_out.txt><config.log.bz2><ompi_info.bz2>_______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/