Re: [OMPI users] Problem with MPI_BARRIER

Ghislain Lartigue Thu, 8 Sep 2011 11:04:19 -0400

This behavior happens at every call (first and following)


Here is my code (simplified):

================================================================
start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
call print_message("CAST GHOST DATA2 LOOP 1 barrier "//trim(local_time),0)

            do conn_index_id=1, Nconn(conn_type_id)

                  ! loop over data
                  this_data => block%data
                  do while (associated(this_data))

                        MPI_IRECV(...)
                        MPI_ISEND(...)

                  this_data => this_data%next
                  enddo

               endif

            enddo

         enddo

start_time = MPI_Wtime()
call mpi_ext_barrier()
new_time = MPI_Wtime()-start_time
write(local_time,'(F9.1)') new_time*1.0e9_WP/(36.0_WP*36.0_WP*36.0_WP)
call print_message("CAST GHOST DATA2 LOOP 2 barrier "//trim(local_time),0)

         done=.false.
         counter = 0
         do while (.not.done)
            do ireq=1,nreq
               if (recv_req(ireq)/=MPI_REQUEST_NULL) then
                  call MPI_TEST(recv_req(ireq),found,mystatus,icommerr)
                  if (found) then
                     call ....
                     counter=counter+1
                  endif
               endif
            enddo
            if (counter==nreq) then
               done=.true.
            endif
         enddo
================================================================

The first call to the barrier works perfectly fine, but the second one gives 
the strange behavior...

Ghislain.

Le 8 sept. 2011 à 16:53, Eugene Loh a écrit :

> On 9/8/2011 7:42 AM, Ghislain Lartigue wrote:
>> I will check that, but as I said in first email, this strange behaviour 
>> happens only in one place in my code.
> Is the strange behavior on the first time, or much later on?  (You seem to 
> imply later on, but I thought I'd ask.)
> 
> I agree the behavior is noteworthy, but it's plausible and there's not enough 
> information to explain it based solely on what you've said.
> 
> Here is one scenario.  I don't know if it applies to you since I know very 
> little about what you're doing.  I think with VampirTrace, you can collect 
> performance data into large buffers.  Occasionally, the buffers need to be 
> flushed to disk.  VampirTrace will wait for a good opportunity to do so -- 
> e.g., a global barrier.  So, you execute lots of barriers, but suddenly you 
> hit one where VT wants to flush to disk.  This takes a long time and everyone 
> in the barrier spends a long time in the barrier.  Then, execution resumes 
> and barrier performance looks again like what it used to look like.
> 
> Again, there are various scenarios to explain what you see.  More information 
> would be needed to decide which applies to you.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Problem with MPI_BARRIER

Reply via email to