Hi,
I am using Open MPI 1.8.6. I guess my question is related to the flow
control algorithm for small messages. The question is how to avoid the
sender being blocked by the receiver when using *openib* module for small
messages and using *blocking send*. I have looked through this FAQ(
https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot) but
didn't find the answer. My understanding of "eager sending protocol" is
that if a message is "small", it will be transported to the receiver
immediately, even if the receiver is not ready. As a result, the sender
won't be blocked until the receiver posts the receive operation.
I am trying to observe such behavior with a simple program of two MPI ranks
(attached). My confusion is that while I can see the behavior with "vader"
module (shared memory) when running the two ranks on the same node,
[output]
[0] size = 16, loop = 78, *time = 0.00007*
[1] size = 16, loop = 78, *time = 3.42426*
[/output]
but I cannot see it when running them on two nodes using the "openib"
module.
[output]
[0] size = 16, loop = 78, *time = 3.42627*
[1] size = 16, loop = 78, *time = 3.42426*
[/output]
So anyone knows the reason? My runtime configuration is also attached.
Thanks!
Sincerely,
Michael
--
Xiaolong Cui (Michael)
Department of Computer Science
Dietrich School of Arts & Science
University of Pittsburgh
Pittsburgh, PA 15260
btl = openib,vader,self
#btl_base_verbose = 100
btl_openib_use_eager_rdma = 1
btl_openib_eager_limit = 160000
btl_openib_rndv_eager_limit = 160000
btl_openib_max_send_size = 160000
btl_openib_receive_queues =
P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512
#include "mpi.h"
#include <mpi-ext.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int size, rank, psize;
int loops = 78;
int length = 4;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int *code = (int *)malloc(length * sizeof(int));
MPI_Status status;
long long i = 0;
double time_s = MPI_Wtime();
if(rank % 2 == 1)
{
int i ;
int j ;
double a = 0.3, b = 0.5;
for(i = 0; i < 30000; i++)
for(j = 0; j < 30000; j++){
a = a * 2;
b = b + a;
}
}
for(i = 0; i < loops; i++){
if(rank % 2 == 0){
MPI_Send(code, length, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);
}
else if(rank % 2 == 1){
MPI_Recv(code, length, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
}
double time_e = MPI_Wtime();
printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, length * sizeof(int), loops, time_e - time_s);
MPI_Finalize();
return 0;
}