and to fix things, the units I use are not the direct result of MPI_Wtime():
new_time = (MPI_Wtime()-start_time)*1e9/(36^3) This means that you should multiply these times by ~20'000 to have ticks.. Le 8 sept. 2011 à 16:42, Ghislain Lartigue a écrit : > I will check that, but as I said in first email, this strange behaviour > happens only in one place in my code. > I have the same time/barrier/time procedure in other places (in the same > code) and it works perfectly. > > At one place I have the following output (sorted) > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 1.0 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 1.0 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 14.2 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 16.3 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 25.1 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 28.4 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 32.6 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 35.3 > . > . > . > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 90.1 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 96.3 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 99.5 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 101.2 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 119.3 > <00>(0) CAST GHOST DATA1 LOOP 1 barrier 169.3 > > but in the place that concerns me I have (sorted) > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1386.9 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1401.5 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1412.9 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1414.1 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1419.6 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1428.1 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1430.4 > . > . > . > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1632.7 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1635.7 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1660.6 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1685.1 > <00>(0) CAST GHOST DATA2 LOOP 2 barrier 1699.2 > > > These are the same units... > You see that in the first place, the "time" to "hit/wait/leave" can be very > small compared to the last output... > > > Le 8 sept. 2011 à 16:35, Teng Ma a écrit : > >> You'd better check process-core binding in your case. It looks to me P0 >> and P1 on the same node and P2 on another node, which makes ack to P0/P1 >> go through share memory and ack to P2 through networking. >> 1000x is very possible. sm latency can be about 0.03microsec. ethernet >> latency is about 20-30 microsec. >> >> Just my guess...... >> >> Teng >>> Thanks, >>> >>> I understand this but the delays that I measure are huge compared to a >>> classical ack procedure... (1000x more) >>> And this is repeatable: as far as I understand it, this shows that the >>> network is not involved. >>> >>> Ghislain. >>> >>> >>> Le 8 sept. 2011 à 16:16, Teng Ma a écrit : >>> >>>> I guess you forget to count the "leaving time"(fan-out). When everyone >>>> hits the barrier, it still needs "ack" to leave. And remember in most >>>> cases, leader process will send out "acks" in a sequence way. It's very >>>> possible: >>>> >>>> P0 barrier time = 29 + send/recv ack 0 >>>> P1 barrier time = 14 + send ack 0 + send/recv ack 1 >>>> P2 barrier time = 0 + send ack 0 + send ack 1 + send/recv ack 2 >>>> >>>> That's your measure time. >>>> >>>> Teng >>>>> This problem as nothing to do with stdout... >>>>> >>>>> Example with 3 processes: >>>>> >>>>> P0 hits barrier at t=12 >>>>> P1 hits barrier at t=27 >>>>> P2 hits barrier at t=41 >>>>> >>>>> In this situation: >>>>> P0 waits 41-12 = 29 >>>>> P1 waits 41-27 = 14 >>>>> P2 waits 41-41 = 00 >>>> >>>> >>>> >>>>> So I should see something like (no ordering is expected): >>>>> barrier_time = 14 >>>>> barrier_time = 00 >>>>> barrier_time = 29 >>>>> >>>>> But what I see is much more like >>>>> barrier_time = 22 >>>>> barrier_time = 29 >>>>> barrier_time = 25 >>>>> >>>>> See? No process has a barrier_time equal to zero !!! >>>>> >>>>> >>>>> >>>>> Le 8 sept. 2011 à 14:55, Jeff Squyres a écrit : >>>>> >>>>>> The order in which you see stdout printed from mpirun is not >>>>>> necessarily >>>>>> reflective of what order things were actually printers. Remember that >>>>>> the stdout from each MPI process needs to flow through at least 3 >>>>>> processes and potentially across the network before it is actually >>>>>> displayed on mpirun's stdout. >>>>>> >>>>>> MPI process -> local Open MPI daemon -> mpirun -> printed to mpirun's >>>>>> stdout >>>>>> >>>>>> Hence, the ordering of stdout can get transposed. >>>>>> >>>>>> >>>>>> On Sep 8, 2011, at 8:49 AM, Ghislain Lartigue wrote: >>>>>> >>>>>>> Thank you for this explanation but indeed this confirms that the LAST >>>>>>> process that hits the barrier should go through nearly >>>>>>> instantaneously >>>>>>> (except for the broadcast time for the acknowledgment signal). >>>>>>> And this is not what happens in my code : EVERY process waits for a >>>>>>> very long time before going through the barrier (thousands of times >>>>>>> more than a broadcast)... >>>>>>> >>>>>>> >>>>>>> Le 8 sept. 2011 à 14:26, Jeff Squyres a écrit : >>>>>>> >>>>>>>> Order in which processes hit the barrier is only one factor in the >>>>>>>> time it takes for that process to finish the barrier. >>>>>>>> >>>>>>>> An easy way to think of a barrier implementation is a "fan in/fan >>>>>>>> out" >>>>>>>> model. When each nonzero rank process calls MPI_BARRIER, it sends a >>>>>>>> message saying "I have hit the barrier!" (it usually sends it to its >>>>>>>> parent in a tree of all MPI processes in the communicator, but you >>>>>>>> can >>>>>>>> simplify this model and consider that it sends it to rank 0). Rank >>>>>>>> 0 >>>>>>>> collects all of these messages. When it has messages from all >>>>>>>> processes in the communicator, it sends out "ok, you can leave the >>>>>>>> barrier now" messages (again, it's usually via a tree distribution, >>>>>>>> but you can pretend that it directly, linearly sends a message to >>>>>>>> each >>>>>>>> peer process in the communicator). >>>>>>>> >>>>>>>> Hence, the time that any individual process spends in the >>>>>>>> communicator >>>>>>>> is relative to when every other process enters the communicator. >>>>>>>> But >>>>>>>> it's also dependent upon communication speed, congestion in the >>>>>>>> network, etc. >>>>>>>> >>>>>>>> >>>>>>>> On Sep 8, 2011, at 6:20 AM, Ghislain Lartigue wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> at a given point in my (Fortran90) program, I write: >>>>>>>>> >>>>>>>>> =================== >>>>>>>>> start_time = MPI_Wtime() >>>>>>>>> call MPI_BARRIER(...) >>>>>>>>> new_time = MPI_Wtime() - start_time >>>>>>>>> write(*,*) "barrier time =",new_time >>>>>>>>> ================== >>>>>>>>> >>>>>>>>> and then I run my code... >>>>>>>>> >>>>>>>>> I expected that the values of "new_time" would range from 0 to Tmax >>>>>>>>> (1700 in my case) >>>>>>>>> As I understand it, the first process that hits the barrier should >>>>>>>>> print Tmax and the last process that hits the barrier should print >>>>>>>>> 0 >>>>>>>>> (or a very low value). >>>>>>>>> >>>>>>>>> But this is not the case: all processes print values in the range >>>>>>>>> 1400-1700! >>>>>>>>> >>>>>>>>> Any explanation? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Ghislain. >>>>>>>>> >>>>>>>>> PS: >>>>>>>>> This small code behaves perfectly in other parts of my code... >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jeff Squyres >>>>>>>> jsquy...@cisco.com >>>>>>>> For corporate legal information go to: >>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> >>>> | Teng Ma Univ. of Tennessee | >>>> | t...@cs.utk.edu Knoxville, TN | >>>> | http://web.eecs.utk.edu/~tma/ | >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> | Teng Ma Univ. of Tennessee | >> | t...@cs.utk.edu Knoxville, TN | >> | http://web.eecs.utk.edu/~tma/ | >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >