Jeff Squyres wrote:
As you have notes, MPI_Barrier is the *only* collective operation that MPI guarantees to have any synchronization properties (and it's a fairly weak guarantee at that; no process will exit the barrier until every process has entered the barrier -- but there's no guarantee that all processes leave the barrier at the same time).
Actually, many collectives have that property due to data-causality conditions. E.g., MPI_Allreduce cannot exit from any process until every process has finished.
As Jeff mentions, however, exit times can be "ragged" (and unfortunately often are).