Hi   Jeff S.
Thank you very much for your reply.
I am still feeling some confusion. Please guide.

 The idea is to do this:
>
>    MPI_Recv_init()
>    MPI_Send_init()
>    for (i = 0; i < 1000; ++i) {
>        MPI_Startall()
>        /* do whatever */
>        MPI_Waitall()
>    }
>    for (i = 0; i < 1000; ++i) {
>        MPI_Request_free()
>    }
>
> So in your inner loop, you just call MPI_Startall() and a corresponding
> MPI_Test* / MPI_Wait* call to complete those requests.
>
> The idea is that the MPI_*_init() functions do some one-time setup on the
> requests and then you just start and complete those same requests over and
> over and over.  When you're done, you free them.
>
> Actually in my code what I was doing is:

*CALL a subroutine-(1) 10000 times in the main program.

**Subroutine-(1) starts===================================*
*
   Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes)
   Call subroutine-(2)

   Subroutine-(2) starts----------------------------
         Pick local data from array U in separate arrays for each
neighboring processor
         CALL MPI_IRECV for each neighboring process
         CALL MPI_ISEND for each neighboring process

         -------perform work that could be done with local data
         CALL MPI_WAITALL( )
         -------perform work using the received data
   Subroutine**-(2)** ends**----------------------------*

*         -------perform work to update array U*
*   Loop A ends here >>>>>>>>>>>>>>>>>>>>*

*Subroutine-(1) ends====================================*

I assume that the above setup will overlap computation with communication
(hiding communication behind computations), as well.
Now intention is to use persistent communication to get more efficiency. I
am facing confusion how to use your proposed model for my work. Please
suggest.

best regards,
AA.

Reply via email to