I guess you're perfectly right!
I will try to test it tomorrow by putting a call system("wait(X)) befor the
barrier!
Thanks,
Ghislain.
PS:
if anyone has more information about the implementation of the MPI_IRECV()
procedure, I would be glad to learn more about it!
Le 8 sept. 2011 à 17:35, Eugene Loh a écrit :
> I should know OMPI better than I do, but generally, when you make an MPI
> call, you could be diving into all kinds of other stuff. E.g., with
> non-blocking point-to-point operations, a message might make progress during
> another MPI call. E.g.,
>
> MPI_Irecv(recv_req)
> MPI_Isend(send_req)
> MPI_Wait(send_req)
> MPI_Wait(recv_req)
>
> A receive is started in one call and completed in another, but it's quite
> possible that most of the data transfer (and waiting time) occurs while the
> program is in the calls associated with the send. The accounting gets tricky.
>
> So, I'm guessing during the second barrier, MPI is busy making progress on
> the pending non-blocking point-to-point operations, where progress is
> possible. It isn't purely a barrier operation.
>
> On 9/8/2011 8:04 AM, Ghislain Lartigue wrote:
>> This behavior happens at every call (first and following)
>>
>>
>> Here is my code (simplified):
>>
>> ================================================================
>> start_time = MPI_Wtime()
>> call mpi_ext_barrier()
>> new_time = MPI_Wtime()-start_time
>> write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
>> call print_message("CAST GHOST DATA2 LOOP 1 barrier "//trim(local_time),0)
>>
>> do conn_index_id=1, Nconn(conn_type_id)
>>
>> ! loop over data
>> this_data => block%data
>> do while (associated(this_data))
>>
>> MPI_IRECV(...)
>> MPI_ISEND(...)
>>
>> this_data => this_data%next
>> enddo
>>
>> endif
>>
>> enddo
>>
>> enddo
>>
>> start_time = MPI_Wtime()
>> call mpi_ext_barrier()
>> new_time = MPI_Wtime()-start_time
>> write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
>> call print_message("CAST GHOST DATA2 LOOP 2 barrier "//trim(local_time),0)
>>
>> done=.false.
>> counter = 0
>> do while (.not.done)
>> do ireq=1,nreq
>> if (recv_req(ireq)/=MPI_REQUEST_NULL) then
>> call MPI_TEST(recv_req(ireq),found,mystatus,icommerr)
>> if (found) then
>> call ....
>> counter=counter+1
>> endif
>> endif
>> enddo
>> if (counter==nreq) then
>> done=.true.
>> endif
>> enddo
>> ================================================================
>>
>> The first call to the barrier works perfectly fine, but the second one gives
>> the strange behavior...
>>
>> Ghislain.
>>
>> Le 8 sept. 2011 à 16:53, Eugene Loh a écrit :
>>
>>> On 9/8/2011 7:42 AM, Ghislain Lartigue wrote:
>>>> I will check that, but as I said in first email, this strange behaviour
>>>> happens only in one place in my code.
>>> Is the strange behavior on the first time, or much later on? (You seem to
>>> imply later on, but I thought I'd ask.)
>>>
>>> I agree the behavior is noteworthy, but it's plausible and there's not
>>> enough information to explain it based solely on what you've said.
>>>
>>> Here is one scenario. I don't know if it applies to you since I know very
>>> little about what you're doing. I think with VampirTrace, you can collect
>>> performance data into large buffers. Occasionally, the buffers need to be
>>> flushed to disk. VampirTrace will wait for a good opportunity to do so --
>>> e.g., a global barrier. So, you execute lots of barriers, but suddenly you
>>> hit one where VT wants to flush to disk. This takes a long time and
>>> everyone in the barrier spends a long time in the barrier. Then, execution
>>> resumes and barrier performance looks again like what it used to look like.
>>>
>>> Again, there are various scenarios to explain what you see. More
>>> information would be needed to decide which applies to you.
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>