Hi Gus, Thanks for the suggestion. I've been thinking along those lines, but it seems to have drawbacks. Consider the following MPI conversation:
Time NODE 1 NODE 2 0 local work local work 1 post n-b recv local work 2 local work post n-b send 3 complete recv in 1 local work In an ideal implementation, NODE 1 would be able to go back to local work immediately after posting a non blocking receive at t=1. If using blocking message passing for the initial header, NODE 1 would have to block at least until t=2, when NODE 2 sends the corresponding message header. Node 1 can then go on doing local work while the main message data is being transferred, but it still wastes 1 time unit waiting for a message header to arrive. Is there some clever way around this? Am I missing something? /Lars On Thu, Jun 4, 2009 at 2:34 PM, Lars Andersson <lars...@gmail.com> wrote: > Hi Lars > > I wonder if you could always use blocking message passing on the > preliminary send/receive pair that transmits the message size/header, > then use non-blocking mode for the actual message. > If the "message size/header" part transmits a small buffer, > the preliminary send/recv pair will use the "eager" communication mode, > return quickly, and may not reduce performance, I would guess. > > For a group of several messages the preliminary > send/recv pair could transmit a small (to ensure "eager mode") > array of message sizes, > maybe along with the message tags and sender ranks, > instead of only one size. > > Just a thought. > > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > Lars Andersson wrote: >> Hi, >> >> I'm trying to solve a problem of passing serializable, arbitrarily >> sized objects around using MPI and non-blocking communication. The >> problem I'm facing is what to do at the receiving end when expecting >> an object of unknown size, but at the same time not block on waiting >> for it. >> >> When using blocking message passing, I have simply solved the problem >> by first sending a small, fixed size header containing the size of >> rest of the data, sent in the following mpi message. When using >> non-blocking message passing, this doesn't seem to be such a good >> idea, since we cant post the main data transfer until we have received >> the message header... It seems to take away most of the advantages on >> non-blocking io in the first place. >> >> >> I've been thinking about solving this using MPI_Probe / MPI_IProbe, >> but I'm worried about performance. >> >> >> Question 1: >> >> Will MPI_Probe or the underlying MPI implementation actually receive >> the full message data (assuming reasonably sized message, like less >> than 10MB) before MPI_Probe returns? Or will there be a significant >> data transfer delay (for large messages) when calling MPI_Recv after a >> successful MPI_Probe? >> >> >> >> What I want is something like this: >> >> 1) post one or several non-blocking, variable sized message receives >> >> 2) do other, non-MPI work, while any incoming messages will be fully >> received into >> buffers on the local machine. >> >> 3) perform completion of the receives posted in 1). I don't want to >> unnecessarily >> wait here for data transfers that could have taken place during 2). >> >> >> Problems: >> >> I can't post non-blocking MPI_Irecv() calls in 1, because I don't know >> the sizes of incoming messages. >> >> If I simply do nothing in 1, and call MPI_Probe in 3, I'm worried that >> I won't get nice compute/transfer overlap because the messages wont >> actually be received locally until I post a Probe or Recv in 3. >> >> >> Question 2: >> >> How can I achieve the communication sequence described in 1,2,3 above, >> with overlapping data transfer and local computation during 2? >> >> >> Question 3: >> >> A temporary kludge solution to the problem above might be to allocate >> a temporary receive buffer of some arbitrary, constant maximum size >> BUFSIZE in 1 for each non-blocking receive operation, make sure >> messages sent are not larger than BUFSIZE, and post MPI_Irecv(buffer, >> BUFSIZE,...) calls in 1. I haven't been able to figure out if it's >> actually correct and portable to receive less data than specified in >> the count argument to MPI_Irecv. >> >> What if the message sent on the other end is 10 bytes, and >> BUFSIZE=count=20. Would that be OK? >> >> >> If anyone can shed any light on this, I'd be grateful. FYI, we're >> using a cluster of 2-8 core x86-64 machines running Linux and >> connected using ordinary 1Gbit ethernet. >> >> >> Best regards, >> >> Lars Andersson >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users >