Corey, The communication pattern looks legit, it is very difficult to see what is going wrong with a code to look at. Can you provide a simple case (maybe the skeleton of your application) we can work from?
George. On Dec 20, 2012, at 22:07 , Corey Allen <corey.al...@cajkhenderson.com> wrote: > Hello, > > I am trying to confirm that I am using OpenMPI in a correct way. I > seem to be losing messages but I don't like to assume there's a bug > when I'm still new to MPI in general. > > I have multiple processes in a master / slaves type setup, and I am > trying to have multiple persistent non-blocking message requests > between them to prevent starvation. (Tech detail: 4-core Intel running > Ubuntu 64-bit and OpenMPI 1.4. Everything is local. Total processes is > 5. One master, four slaves. The problem only surfaces on the slowest > slave - the one with the most work.) > > The setup is like this: > > Master: > > Create 3 persistent send requests, with three different buffers (in a 2D > array) > Load data into each buffer > Start each send request > In a loop: > TestSome on the 3 sends > for each send that's completed > load new data into the buffer > restart that send > loop > > Slave: > > Create 3 persistent receive requests, with three different buffers (in > a 2D array) > Start each receive request > In a loop: > WaitAny on the 3 receives > Consume data from the one receive buffer from WaitAny > Start that receive again > loop > > Basically what I'm seeing is that the master gets a "completed" send > request from TestSome and loads new data, restarts, etc. but the slave > never sees that particular message. I was under the impression that > WaitAny should return only one message but also should eventually > return every message sent in this situation. > > I am operating under the assumption that even if the send request is > completed and the buffer overwritten in the master, the receive for > that message eventually occurs with the correct data in the slave. I > did not think I had to advise the master that the slave was finished > reading data out of the receive buffer before the master could reuse > the send buffer. > > What it LOOKS like to me is that WaitAny is marking more than one send > completed, so the master sends the next message, but I can't see it in > the slave. > > I hope this is making sense. Any input on whether I'm doing this wrong > or a way to see if the message is really being lost would be helpful. > If there's a good example code of multiple simultaneous asynchronous > messages to avoid starvation that is set up better than my approach, > I'd like to see it. > > Thanks! > > Corey > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users