I guess you're perfectly right! I will try to test it tomorrow by putting a call system("wait(X)) befor the barrier!
Thanks, Ghislain. PS: if anyone has more information about the implementation of the MPI_IRECV() procedure, I would be glad to learn more about it! Le 8 sept. 2011 à 17:35, Eugene Loh a écrit : > I should know OMPI better than I do, but generally, when you make an MPI > call, you could be diving into all kinds of other stuff. E.g., with > non-blocking point-to-point operations, a message might make progress during > another MPI call. E.g., > > MPI_Irecv(recv_req) > MPI_Isend(send_req) > MPI_Wait(send_req) > MPI_Wait(recv_req) > > A receive is started in one call and completed in another, but it's quite > possible that most of the data transfer (and waiting time) occurs while the > program is in the calls associated with the send. The accounting gets tricky. > > So, I'm guessing during the second barrier, MPI is busy making progress on > the pending non-blocking point-to-point operations, where progress is > possible. It isn't purely a barrier operation. > > On 9/8/2011 8:04 AM, Ghislain Lartigue wrote: >> This behavior happens at every call (first and following) >> >> >> Here is my code (simplified): >> >> ================================================================ >> start_time = MPI_Wtime() >> call mpi_ext_barrier() >> new_time = MPI_Wtime()-start_time >> write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP) >> call print_message("CAST GHOST DATA2 LOOP 1 barrier "//trim(local_time),0) >> >> do conn_index_id=1, Nconn(conn_type_id) >> >> ! loop over data >> this_data => block%data >> do while (associated(this_data)) >> >> MPI_IRECV(...) >> MPI_ISEND(...) >> >> this_data => this_data%next >> enddo >> >> endif >> >> enddo >> >> enddo >> >> start_time = MPI_Wtime() >> call mpi_ext_barrier() >> new_time = MPI_Wtime()-start_time >> write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP) >> call print_message("CAST GHOST DATA2 LOOP 2 barrier "//trim(local_time),0) >> >> done=.false. >> counter = 0 >> do while (.not.done) >> do ireq=1,nreq >> if (recv_req(ireq)/=MPI_REQUEST_NULL) then >> call MPI_TEST(recv_req(ireq),found,mystatus,icommerr) >> if (found) then >> call .... >> counter=counter+1 >> endif >> endif >> enddo >> if (counter==nreq) then >> done=.true. >> endif >> enddo >> ================================================================ >> >> The first call to the barrier works perfectly fine, but the second one gives >> the strange behavior... >> >> Ghislain. >> >> Le 8 sept. 2011 à 16:53, Eugene Loh a écrit : >> >>> On 9/8/2011 7:42 AM, Ghislain Lartigue wrote: >>>> I will check that, but as I said in first email, this strange behaviour >>>> happens only in one place in my code. >>> Is the strange behavior on the first time, or much later on? (You seem to >>> imply later on, but I thought I'd ask.) >>> >>> I agree the behavior is noteworthy, but it's plausible and there's not >>> enough information to explain it based solely on what you've said. >>> >>> Here is one scenario. I don't know if it applies to you since I know very >>> little about what you're doing. I think with VampirTrace, you can collect >>> performance data into large buffers. Occasionally, the buffers need to be >>> flushed to disk. VampirTrace will wait for a good opportunity to do so -- >>> e.g., a global barrier. So, you execute lots of barriers, but suddenly you >>> hit one where VT wants to flush to disk. This takes a long time and >>> everyone in the barrier spends a long time in the barrier. Then, execution >>> resumes and barrier performance looks again like what it used to look like. >>> >>> Again, there are various scenarios to explain what you see. More >>> information would be needed to decide which applies to you. >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >