amjad ali wrote:
Dear E. Loh.

Another is whether you can overlap communications and computation.  This does not require persistent channels, but only nonblocking communications (MPI_Isend/MPI_Irecv).  Again, there are no MPI guarantees here, so you may have to break your computation up and insert MPI_Test calls.

You may want to get the basic functionality working first and then run performance experiments to decide whether these really are areas that warrant such optimizations.

         CALL MPI_STARTALL
         -------perform work that could be done with local data ---------------- (A)
         CALL MPI_WAITALL( )
         -------perform work using the received data  --------------- (B)


In the above I have broken up the computation. In (A) I perform the work that could be done with local data. When the recevied data is must for remaining computations I put WAITALL  to ensure that data data from the neighbouring processes has received. I am fine with MPI_IRECV and ISEND, i.e.,

         CALL MPI_IRECV()
         CALL MPI_ISEND()
         -------perform work that could be done with local data ---------------- (A)
         CALL MPI_WAITALL( )
         -------perform work using the received data  --------------- (B)


But I am doubtful whether I am getting computation-communication overlap to save time.or I am getting the the same performance as could be obtained by,

         CALL MPI_IRECV()
         CALL MPI_ISEND()
         CALL MPI_WAITALL( )
         -------perform work that could be done with local data ---------------- (A)
         -------perform work using the received data  --------------- (B)


In this case (equivalent to blocking communication), I observed that only around 5% more time it takes.
Right.  Again, MPI makes no guarantees that communications are actually progressing between when you have posted nonblocking operations (like Isend or Irecv) and when you force them to complete with MPI_Wait calls.  Sometimes (depending on the MPI implementation and what interconnect is being used to effect a particular message), you have to decompose the computation more finely.  E.g., your situation might be:

    CALL MPI_ISEND()
    call my_work()      ! no MPI progress is being made here
    CALL MPI_WAIT()

and it's conceivable that you might have better performance with

    CALL MPI_ISEND()
    DO I = 1, N
        call do_a_little_of_my_work()  ! no MPI progress is being made here
        CALL MPI_TEST()            ! enough MPI progress is being made here that the receiver has something to do
    END DO
    CALL MPI_WAIT()

Whether performance improves or not is not guaranteed by the MPI standard.
And the SECOND desire is to use Persistent communication for even better speedup.
Right.  That's a separate issue.

Reply via email to