On Mon, 21 Nov 2011, Mudassar Majeed wrote:
Thank you for your answer. Actually, I used the term UDP to show the
non-connection oriented messaging. TCP creates connection between two parties
(who
communicate) but in UDP a message can be sent to any IP/port where a
process/thread is listening to, and if the process is busy in doing something,
all the
received messages are queued for it and when ever it calls the recv function
one message is taken from the queue.
That is how MPI message matching works; messages sit in a queue until you
call MPI_Irecv (or MPI_Recv or MPI_Probe, etc.) to get them. Unlike UDP,
MPI messages do not need to complete on the sender until they are
received, so you will probably need to use MPI_Isend to avoid deadlocks.
I am implementing a distributed algorithm that will provide communication
sensitive load balancing for computational loads. For example, if we have 10
nodes each
containing 10 cores (100 cores in total). So when MPI application will start
(let say with 1000) processes (more than 1 process per core) then I will run my
distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as it is not a
part of MPI, but I am trying to make it the part of MPI ;) ). So that algorithm
will take those processes that communicate more in the same node (keeping the
computational load on 10 cores on that node balanced).
So that was the little bit explanation. So for that my distributed algorithm
requires that some processes communicate with each other to collaborate on
something.
So I need a kind of messaging that I explained above. It is kind of UDP
messaging (no connection before sending a message, and message is always queued
on the
receiver's side and sender is not blocked, it just sends the message and the
receiver takes it when it gets free from other task).
The one difficulty in doing this is to manage the MPI requests from the
sends and poll them with MPI_Test periodically. You can just keep the
requests in an array (std::vector in C++) which can be expanded when
needed; to send a message, call MPI_Isend and put the request into the
array, and periodically call MPI_Testany or MPI_Testsome on the array to
find completed requests. Note that you will need to keep the data being
sent intact in its buffer until the request completes. Here's a naive
version that does extra copies and doesn't clean out its arrays of
requests or buffers:
class message_send_engine {
vector<MPI_Request> requests;
vector<vector<char> > buffers;
public:
void send(void* buf, int byte_len, int dest, int tag) {
MPI_Request req;
size_t buf_num = buffers.size();
buffers.resize(buf_num + 1);
buffers[buf_num].assign((char*)buf, (char*)buf + byte_len);
requests.resize(buf_num + 1);
MPI_Isend(&buffers[buf_num][0], byte_len, MPI_BYTE, dest, tag, MPI_COMM_WORLD,
&requests[buf_num]);
}
void poll() { // Call this periodically
while (true) {
int index, flag;
MPI_Testany((int)requests.size(), &requests[0], &index, &flag,
MPI_STATUS_IGNORE);
if (flag && index != MPI_UNDEFINED) {
buffers[index].clear(); // Free memory
} else {
break;
}
}
}
};
bool test_for_message(void* buf, int max_len, MPI_Status& st) {
int flag;
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &st);
return (flag != 0);
}
If test_for_message returns true, you can then use MPI_Recv to get the
message.
I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe,
MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that
I am looking for. I think MPI should also provide that way. May be it is
not in my knowledge. That's why I am asking the experts. I am still
looking for it :(
-- Jeremiah Willcock