On Mon, 21 Nov 2011, Mudassar Majeed wrote:

Thank you for your answer. Actually, I used the term UDP to show the 
non-connection oriented messaging. TCP creates connection between two parties 
(who
communicate) but in UDP a message can be sent to any IP/port where a 
process/thread is listening to, and if the process is busy in doing something, 
all the
received messages are queued for it and when ever it calls the recv function 
one message is taken from the queue.

That is how MPI message matching works; messages sit in a queue until you call MPI_Irecv (or MPI_Recv or MPI_Probe, etc.) to get them. Unlike UDP, MPI messages do not need to complete on the sender until they are received, so you will probably need to use MPI_Isend to avoid deadlocks.

I am implementing a distributed algorithm that will provide communication 
sensitive load balancing for computational loads. For example, if we have 10 
nodes each
containing 10 cores (100 cores in total). So when MPI application will start 
(let say with 1000) processes (more than 1 process per core) then I will run my
distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as it is not a 
part of MPI, but I am trying to make it the part of MPI ;) ). So that algorithm
will take those processes that communicate more in the same node (keeping the 
computational load on 10 cores on that node balanced).

So that was the little bit explanation. So for that my distributed algorithm 
requires that some processes communicate with each other to collaborate on 
something.
So I need a kind of messaging that I explained above. It is kind of UDP 
messaging (no connection before sending a message, and message is always queued 
on the
receiver's side and sender is not blocked, it just sends the message and the 
receiver takes it when it gets free from other task).

The one difficulty in doing this is to manage the MPI requests from the sends and poll them with MPI_Test periodically. You can just keep the requests in an array (std::vector in C++) which can be expanded when needed; to send a message, call MPI_Isend and put the request into the array, and periodically call MPI_Testany or MPI_Testsome on the array to find completed requests. Note that you will need to keep the data being sent intact in its buffer until the request completes. Here's a naive version that does extra copies and doesn't clean out its arrays of requests or buffers:

class message_send_engine {
  vector<MPI_Request> requests;
  vector<vector<char> > buffers;

  public:
  void send(void* buf, int byte_len, int dest, int tag) {
    MPI_Request req;
    size_t buf_num = buffers.size();
    buffers.resize(buf_num + 1);
    buffers[buf_num].assign((char*)buf, (char*)buf + byte_len);
    requests.resize(buf_num + 1);
    MPI_Isend(&buffers[buf_num][0], byte_len, MPI_BYTE, dest, tag, MPI_COMM_WORLD, 
&requests[buf_num]);
  }

  void poll() { // Call this periodically
    while (true) {
      int index, flag;
      MPI_Testany((int)requests.size(), &requests[0], &index, &flag, 
MPI_STATUS_IGNORE);
      if (flag && index != MPI_UNDEFINED) {
        buffers[index].clear(); // Free memory
      } else {
        break;
      }
    }
  }
};

bool test_for_message(void* buf, int max_len, MPI_Status& st) {
  int flag;
  MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &st);
  return (flag != 0);
}

If test_for_message returns true, you can then use MPI_Recv to get the message.

I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe, MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that I am looking for. I think MPI should also provide that way. May be it is not in my knowledge. That's why I am asking the experts. I am still looking for it :(

-- Jeremiah Willcock

Reply via email to