> > You would break the MPI_Irecv and MPI_Isend calls up into two parts: > MPI_Send_init and MPI_Recv_init in the first part and MPI_Start[all] in the > second part. The first part needs to be moved out of the subroutine... at > least outside of the loop in sub1() and maybe even outside the > 10000-iteration loop in the main program. (There would also be > MPI_Request_free calls that would similarly have to be moved out.) If the > overheads are small compared to the other work you're doing per message, the > savings would be small. (And, I'm guessing this is the case for you.) > Further, the code refactoring might not be simple. So, persistent > communications *might* not be a fruitful optimization strategy for you. > Just a warning. >
Well! If I follow this strategy then the picture should be as follows. Correct?? Obviously the sub1 and sub2 exists outside separately. Following is just for understanding. * **Main program starts------@@@@@@@@@@@@@@@@@@@@@@@.* * **CALL MPI_RECV_INIT for each neighboring process CALL MPI_SEND_INIT for each neighboring process* *Loop Calling the subroutine1--------------------(10000 times in the main program). ** Call subroutine1* * **Subroutine1 starts===================================* * Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes) Call subroutine2 Subroutine2 starts---------------------------- Pick local data from array U in separate arrays for each neighboring processor CALL MPI_STARTALL -------perform work that could be done with local data CALL MPI_WAITALL( ) -------perform work using the received data Subroutine**2** ends**----------------------------* * -------perform work to update array U* * Loop A ends here >>>>>>>>>>>>>>>>>>>>* *Subroutine1 ends====================================* *Loop Calling the subroutine1 ends------------(10000 times in the main program).* *CALL MPI_Request_free( )* *Main program ends------@@@@@@@@@@@@@@@@@@@@@@@.* But I think in the above case sending and receiving buffers would need to be create in GLOBAL Module , or need to be passed in the subroutine headers. In above there is one confusion. The sending buffer will be present in the argument list of the MPI_SEND_INIT() but it will get the values to be sent in the sub2? Is it possible/correct? The question is that, will above actually be communication efficient and over-lapping communication-computation. best regards, AA