On 5 Mar 2009, at 15:25, Jeff Squyres wrote:
I don't remember who originally said it, but I've repeated the statement: any MPI program that relies on a barrier for correctness is an incorrect MPI application.

I'm not 100% sure this holds although it's a good rule of thumb, I've certainly written programs which need barriers but that's using one-sided comms so is slightly different.

There's anecdotal evidence that throwing in a barrier every once in a while can help reduce unexpected messages (and other things), and therefore improve performance a bit. But that's very application dependent, and usually not frequent.

I've seen this a number off times, a number of algorithms work fairly well as long as things are vaguely in sync but slow down drastically if they are not, without barriers there is no way to recover from this slowdown. Basically if one rank is slow for whatever reason other ranks try to communicate with it and the unexpected messages cause it to slow down further and you get a positive feedback loop.

I sometimes feel that Barriers have a bad reputation and maybe it is because they can be used to hide sloppy coding and allow incorrect MPI applications to run, I don't see that as a reason not to use them however, just be sure you need one.

On 5 Mar 2009, at 15:52, Shanyuan Gao wrote:
My current research is trying to rewrite some collective MPI operations to work with our system. Barrier is my first step, maybe I will have bcast and reduce in the future. I understand that some applications used too many unnecessary barriers. But here what I want is just an application to measure the performance improvement versus normal MPI_Barrier. And the improvement can only be measured if the barriers are executed many times. I have done some synthetic tests, all I need now are real applications.

I've done a lot of work on Barrier and on collectives in general, my advice would be to implement a non-blocking barrier, barriers can be slow and *always* delay the application for the duration of the barrier, if you can write a non-blocking barrier and pipeline it with your application steps then assuming the application is working well the CPU cost of the barrier is almost zero (I got it down to .15uS) and if the application isn't working well then the barrier will still bring it back in step.

Another interesting challenge is to benchmark MPI_Barrier, it's not as easy as you might think...

Ashley Pittman.

Reply via email to