Re: [OMPI users] MPI_Reduce performance

jody Thu, 9 Sep 2010 16:10:31 -0400

Hi
@Ashley:
What is the exact semantics of an asynchronous barrier,
and is it part of the MPI specs?


Thanks
  Jody

On Thu, Sep 9, 2010 at 9:34 PM, Ashley Pittman <ash...@pittman.co.uk> wrote:
>
> On 9 Sep 2010, at 17:00, Gus Correa wrote:
>
>> Hello All
>>
>> Gabrielle's question, Ashley's recipe, and Dick Treutmann's cautionary 
>> words, may be part of a larger context of load balance, or not?
>>
>> Would Ashley's recipe of sporadic barriers be a silver bullet to
>> improve load imbalance problems, regardless of which collectives or
>> even point-to-point calls are in use?
>
> No, it only holds where there is no data dependency between some of the 
> ranks, in particular if there are any non-rooted collectives in an iteration 
> of your code then it cannot make any difference at all, likewise if you have 
> a reduce followed by a barrier using the same root for example then you 
> already have global synchronisation each iteration and it won't help.  My 
> feeling is that it applies to a significant minority of problems, certainly 
> the phrase "adding barriers can make codes faster" should be textbook stuff 
> if it isn't already.
>
>> Would sporadic barriers in the flux coupler "shake up" these delays?
>
> I don't fully understand your description but it sounds like it might set the 
> program back to a clean slate which would give you per-iteraion delays only 
> rather than cumulative or worse delays.
>
>> Ashley:  How did you get to the magic number of 25 iterations for the
>> sporadic barriers?
>
> Experience and finger in the air.  The major factors in picking this number 
> is the likelihood of a positives feedback cycle of delays happening, the 
> delays these delays add and the cost of a barrier itself.  Having too low a 
> value will slightly reduce performance, having too high a value can 
> drastically reduce performance.
>
> As a further item (because I like them) the asynchronous barrier is even 
> better again if used properly, in the good case it doesn't cause any process 
> to block ever so the cost is only that of the CPU cycles the code takes 
> itself, in the bad case where it has to delay a rank then this tends to have 
> a positive impact on performance.
>
>> Would it be application/communicator pattern dependent?
>
> Absolutely.
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] MPI_Reduce performance

Reply via email to