Hi Jeff S. Thank you very much for your reply. I am still feeling some confusion. Please guide.
The idea is to do this: > > MPI_Recv_init() > MPI_Send_init() > for (i = 0; i < 1000; ++i) { > MPI_Startall() > /* do whatever */ > MPI_Waitall() > } > for (i = 0; i < 1000; ++i) { > MPI_Request_free() > } > > So in your inner loop, you just call MPI_Startall() and a corresponding > MPI_Test* / MPI_Wait* call to complete those requests. > > The idea is that the MPI_*_init() functions do some one-time setup on the > requests and then you just start and complete those same requests over and > over and over. When you're done, you free them. > > Actually in my code what I was doing is: *CALL a subroutine-(1) 10000 times in the main program. **Subroutine-(1) starts===================================* * Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes) Call subroutine-(2) Subroutine-(2) starts---------------------------- Pick local data from array U in separate arrays for each neighboring processor CALL MPI_IRECV for each neighboring process CALL MPI_ISEND for each neighboring process -------perform work that could be done with local data CALL MPI_WAITALL( ) -------perform work using the received data Subroutine**-(2)** ends**----------------------------* * -------perform work to update array U* * Loop A ends here >>>>>>>>>>>>>>>>>>>>* *Subroutine-(1) ends====================================* I assume that the above setup will overlap computation with communication (hiding communication behind computations), as well. Now intention is to use persistent communication to get more efficiency. I am facing confusion how to use your proposed model for my work. Please suggest. best regards, AA.