Hi,

I am using Open MPI 1.8.6. I guess my question is related to the flow
control algorithm for small messages. The question is how to avoid the
sender being blocked by the receiver when using *openib* module for small
messages and using *blocking send*. I have looked through this FAQ(
https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot) but
didn't find the answer. My understanding of "eager sending protocol" is
that if a message is "small", it will be transported to the receiver
immediately, even if the receiver is not ready. As a result, the sender
won't be blocked until the receiver posts the receive operation.

I am trying to observe such behavior with a simple program of two MPI ranks
(attached). My confusion is that while I can see the behavior with "vader"
module (shared memory) when running the two ranks on the same node,
[output]

[0] size = 16, loop = 78, *time = 0.00007*

[1] size = 16, loop = 78, *time = 3.42426*
[/output]
but I cannot see it when running them on two nodes using the "openib"
module.
[output]

[0] size = 16, loop = 78, *time = 3.42627*

[1] size = 16, loop = 78, *time = 3.42426*
[/output]

So anyone knows the reason? My runtime configuration is also attached.
Thanks!

Sincerely,
Michael

-- 
Xiaolong Cui (Michael)
Department of Computer Science
Dietrich School of Arts & Science
University of Pittsburgh
Pittsburgh, PA 15260
btl = openib,vader,self
#btl_base_verbose = 100
btl_openib_use_eager_rdma = 1
btl_openib_eager_limit = 160000
btl_openib_rndv_eager_limit = 160000
btl_openib_max_send_size = 160000
btl_openib_receive_queues = 
P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512
#include "mpi.h" 
#include <mpi-ext.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) 
{ 
   int size, rank, psize; 
   int loops = 78;
   int length = 4;
   MPI_Init(&argc, &argv); 
   MPI_Comm_size(MPI_COMM_WORLD, &size);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   int *code = (int *)malloc(length * sizeof(int));
   MPI_Status status;
   long long i = 0;
   double time_s = MPI_Wtime();

   if(rank % 2 == 1)
   {
       int i ;
       int j ;
       double a = 0.3, b = 0.5;
       for(i = 0; i < 30000; i++)
           for(j = 0; j < 30000; j++){
               a = a * 2;
               b = b + a;
           }
   }

   for(i = 0; i < loops; i++){
       if(rank % 2 == 0){
           MPI_Send(code, length, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);
       }
       else if(rank % 2 == 1){
           MPI_Recv(code, length, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
       }
   }
   double time_e = MPI_Wtime();
   printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, length * sizeof(int), loops, time_e - time_s);

   MPI_Finalize(); 
   return 0; 
} 

Reply via email to