Hi Gus, Thanks for your suggestion!
The problem of this two-phased data exchange is as follows. Each rank can have data blocks that will be exchanged to potentially all other ranks. So if a rank needs to tell all the other ranks about which blocks to receive, it would require an all-to-all collective communication during phase one (e.g., MPI_Gatherallv). Because such collective communication is blocking in current stable OpenMPI (MPI-2), it would have a negative impact on scalability of the application, especially when we have a large number of MPI ranks. This negative impact would not be compensated by the bandwidth saved :-) What I really need is something like this: Isend sets count to 0 if a block is not dirty. On the receiving side, MPI_Waitall deallocates the corresponding Irecv request immediately and sets the Irecv request handle to MPI_REQUEST_NULL as if it were a normal Irecv. I am wondering if someone could confirm this behavior with me? I could do an experiment on this too... Regards, Jacky On Wed, May 1, 2013 at 3:46 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Maybe start the data exchange by sending a (presumably short) > list/array/index-function of the dirty/not-dirty blocks status > (say, 0=not-dirty,1=dirty), > then putting if conditionals before the Isend/Irecv so that only > dirty blocks are exchanged? > > I hope this helps, > Gus Correa > > > > > On 05/01/2013 01:28 PM, Thomas Watson wrote: > >> Hi, >> >> I have a program where each MPI rank hosts a set of data blocks. After >> doing computation over *some of* its local data blocks, each MPI rank >> needs to exchange data with other ranks. Note that the computation may >> involve only a subset of the data blocks on a MPI rank. The data >> exchange is achieved at each MPI rank through Isend and Irecv and then >> Waitall to complete the requests. Each pair of Isend and Irecv exchanges >> a corresponding pair of data blocks at different ranks. Right now, we do >> Isend/Irecv for EVERY block! >> >> The idea is that because the computation at a rank may only involves a >> subset of blocks, we could mark those blocks as dirty during the >> computation. And to reduce data exchange bandwidth, we could only >> exchanges those *dirty* pairs across ranks. >> >> The problem is: if a rank does not compute on a block 'm', and if it >> does not call Isend for 'm', then the receiving rank must somehow know >> this and either a) does not call Irecv for 'm' as well, or b) let Irecv >> for 'm' fail gracefully. >> >> My questions are: >> 1. how Irecv will behave (actually how MPI_Waitall will behave) if the >> corresponding Isend is missing? >> >> 2. If we still post Isend for 'm', but because we really do not need to >> send any data for 'm', can I just set a "flag" in Isend so that >> MPI_Waitall on the receiving side will "cancel" the corresponding Irecv >> immediately? For example, I can set the count in Isend to 0, and on the >> receiving side, when MPI_Waitall see a message with empty payload, it >> reclaims the corresponding Irecv? In my code, the correspondence between >> a pair of Isend and Irecv is established by a matching TAG. >> >> Thanks! >> >> Jacky >> >> >> ______________________________**_________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > > ______________________________**_________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >