This problem as nothing to do with stdout...

Example with 3 processes:

P0 hits barrier at t=12
P1 hits barrier at t=27
P2 hits barrier at t=41

In this situation:
P0 waits 41-12 = 29
P1 waits 41-27 = 14
P2 waits 41-41 = 00

So I should see something  like (no ordering is expected):
barrier_time = 14
barrier_time = 00
barrier_time = 29

But what I see is much more like
barrier_time = 22
barrier_time = 29
barrier_time = 25

See? No process has a barrier_time equal to zero !!!



Le 8 sept. 2011 à 14:55, Jeff Squyres a écrit :

> The order in which you see stdout printed from mpirun is not necessarily 
> reflective of what order things were actually printers.  Remember that the 
> stdout from each MPI process needs to flow through at least 3 processes and 
> potentially across the network before it is actually displayed on mpirun's 
> stdout.
> 
> MPI process -> local Open MPI daemon -> mpirun -> printed to mpirun's stdout
> 
> Hence, the ordering of stdout can get transposed.
> 
> 
> On Sep 8, 2011, at 8:49 AM, Ghislain Lartigue wrote:
> 
>> Thank you for this explanation but indeed this confirms that the LAST 
>> process that hits the barrier should go through nearly instantaneously 
>> (except for the broadcast time for the acknowledgment signal).
>> And this is not what happens in my code : EVERY process waits for a very 
>> long time before going through the barrier (thousands of times more than a 
>> broadcast)...
>> 
>> 
>> Le 8 sept. 2011 à 14:26, Jeff Squyres a écrit :
>> 
>>> Order in which processes hit the barrier is only one factor in the time it 
>>> takes for that process to finish the barrier.
>>> 
>>> An easy way to think of a barrier implementation is a "fan in/fan out" 
>>> model.  When each nonzero rank process calls MPI_BARRIER, it sends a 
>>> message saying "I have hit the barrier!" (it usually sends it to its parent 
>>> in a tree of all MPI processes in the communicator, but you can simplify 
>>> this model and consider that it sends it to rank 0).  Rank 0 collects all 
>>> of these messages.  When it has messages from all processes in the 
>>> communicator, it sends out "ok, you can leave the barrier now" messages 
>>> (again, it's usually via a tree distribution, but you can pretend that it 
>>> directly, linearly sends a message to each peer process in the 
>>> communicator).
>>> 
>>> Hence, the time that any individual process spends in the communicator is 
>>> relative to when every other process enters the communicator.  But it's 
>>> also dependent upon communication speed, congestion in the network, etc.
>>> 
>>> 
>>> On Sep 8, 2011, at 6:20 AM, Ghislain Lartigue wrote:
>>> 
>>>> Hello,
>>>> 
>>>> at a given point in my (Fortran90) program, I write:
>>>> 
>>>> ===================
>>>> start_time = MPI_Wtime()
>>>> call MPI_BARRIER(...)
>>>> new_time = MPI_Wtime() - start_time
>>>> write(*,*) "barrier time =",new_time
>>>> ==================
>>>> 
>>>> and then I run my code...
>>>> 
>>>> I expected that the values of "new_time" would range from 0 to Tmax (1700 
>>>> in my case)
>>>> As I understand it, the first process that hits the barrier should print 
>>>> Tmax and the last process that hits the barrier should print 0 (or a very 
>>>> low value).
>>>> 
>>>> But this is not the case: all processes print values in the range 
>>>> 1400-1700!
>>>> 
>>>> Any explanation?
>>>> 
>>>> Thanks,
>>>> Ghislain.
>>>> 
>>>> PS:
>>>> This small code behaves perfectly in other parts of my code...
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Reply via email to