>
> You would break the MPI_Irecv and MPI_Isend calls up into two parts:
> MPI_Send_init and MPI_Recv_init in the first part and MPI_Start[all] in the
> second part.  The first part needs to be moved out of the subroutine... at
> least outside of the loop in sub1() and maybe even outside the
> 10000-iteration loop in the main program.  (There would also be
> MPI_Request_free calls that would similarly have to be moved out.)  If the
> overheads are small compared to the other work you're doing per message, the
> savings would be small.  (And, I'm guessing this is the case for you.)
> Further, the code refactoring might not be simple.  So, persistent
> communications *might* not be a fruitful optimization strategy for you.
> Just a warning.
>


Well! If I follow this strategy then the picture should be as follows.
Correct??
Obviously the sub1 and sub2 exists outside separately. Following is just for
understanding.

*
**Main program starts------@@@@@@@@@@@@@@@@@@@@@@@.*
*
**CALL MPI_RECV_INIT for each neighboring process
CALL MPI_SEND_INIT for each neighboring process*
*Loop Calling the subroutine1--------------------(10000 times in the main
program).

** Call subroutine1*
*
**Subroutine1 starts===================================*
*   Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes)
   Call subroutine2

   Subroutine2 starts----------------------------
         Pick local data from array U in separate arrays for each
neighboring processor
         CALL MPI_STARTALL
         -------perform work that could be done with local data
         CALL MPI_WAITALL( )
         -------perform work using the received data
   Subroutine**2** ends**----------------------------*

*         -------perform work to update array U*
*   Loop A ends here >>>>>>>>>>>>>>>>>>>>*
*Subroutine1 ends====================================*

*Loop Calling the subroutine1 ends------------(10000 times in the main
program).*

*CALL MPI_Request_free( )*

*Main program ends------@@@@@@@@@@@@@@@@@@@@@@@.*


But I think in the above case sending and receiving buffers would need to be
create in GLOBAL Module , or need to be passed in the subroutine headers. In
above there is one confusion. The sending buffer will be present in the
argument list of the MPI_SEND_INIT() but it will get the values to be sent
in the sub2? Is it possible/correct?

The question is that, will above actually be communication efficient and
over-lapping communication-computation.

best regards,
AA

Reply via email to