Hi @Ashley: What is the exact semantics of an asynchronous barrier, and is it part of the MPI specs?
Thanks Jody On Thu, Sep 9, 2010 at 9:34 PM, Ashley Pittman <ash...@pittman.co.uk> wrote: > > On 9 Sep 2010, at 17:00, Gus Correa wrote: > >> Hello All >> >> Gabrielle's question, Ashley's recipe, and Dick Treutmann's cautionary >> words, may be part of a larger context of load balance, or not? >> >> Would Ashley's recipe of sporadic barriers be a silver bullet to >> improve load imbalance problems, regardless of which collectives or >> even point-to-point calls are in use? > > No, it only holds where there is no data dependency between some of the > ranks, in particular if there are any non-rooted collectives in an iteration > of your code then it cannot make any difference at all, likewise if you have > a reduce followed by a barrier using the same root for example then you > already have global synchronisation each iteration and it won't help. My > feeling is that it applies to a significant minority of problems, certainly > the phrase "adding barriers can make codes faster" should be textbook stuff > if it isn't already. > >> Would sporadic barriers in the flux coupler "shake up" these delays? > > I don't fully understand your description but it sounds like it might set the > program back to a clean slate which would give you per-iteraion delays only > rather than cumulative or worse delays. > >> Ashley: How did you get to the magic number of 25 iterations for the >> sporadic barriers? > > Experience and finger in the air. The major factors in picking this number > is the likelihood of a positives feedback cycle of delays happening, the > delays these delays add and the cost of a barrier itself. Having too low a > value will slightly reduce performance, having too high a value can > drastically reduce performance. > > As a further item (because I like them) the asynchronous barrier is even > better again if used properly, in the good case it doesn't cause any process > to block ever so the cost is only that of the CPU cycles the code takes > itself, in the bad case where it has to delay a rank then this tends to have > a positive impact on performance. > >> Would it be application/communicator pattern dependent? > > Absolutely. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >