When using eager_rdma the sender will block once it runs out of
"credits". If the receiver enters MPI for any reason the incoming
messages will be placed in the ob1 unexpected queue and the credits will
be returned to the sender. If you turn off eager_rdma you will probably
get different results. That said, the unexpected message path is
non-optimal and it would be best to ensure a matching receive is posted
before the send.

Additionally, if you are using infiniband I recommend against adding a
per-peer queue pair to btl_openib_receive_queues. We have not seen any
performance benefit to using per-peer queue pairs and they do not
scale.

-Nathan Hjelm
HPC-ENV, LANL

On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote:
>    Hi,
>    I am using Open MPI 1.8.6. I guess my question is related to the flow
>    control algorithm for small messages. The question is how to avoid the
>    sender being blocked by the receiver when using openib module for small
>    messages and using blocking send. I have looked through this
>    FAQ(https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot)
>    but didn't find the answer. My understanding of "eager sending protocol"
>    is that if a message is "small", it will be transported to the receiver
>    immediately, even if the receiver is not ready. As a result, the sender
>    won't be blocked until the receiver posts the receive operation. 
>    I am trying to observe such behavior with a simple program of two MPI
>    ranks (attached). My confusion is that while I can see the behavior with
>    "vader" module (shared memory) when running the two ranks on the same
>    node, 
>    [output]
> 
>    [0] size = 16, loop = 78, time = 0.00007
> 
>    [1] size = 16, loop = 78, time = 3.42426
> 
>    [/output]
>    but I cannot see it when running them on two nodes using the "openib"
>    module. 
>    [output]
> 
>    [0] size = 16, loop = 78, time = 3.42627
> 
>    [1] size = 16, loop = 78, time = 3.42426
> 
>    [/output]
>    So anyone knows the reason? My runtime configuration is also attached.
>    Thanks!
>    Sincerely,
>    Michael
>    --
>    Xiaolong Cui (Michael)
>    Department of Computer Science
>    Dietrich School of Arts & Science
>    University of Pittsburgh
>    Pittsburgh, PA 15260

> btl = openib,vader,self
> #btl_base_verbose = 100
> btl_openib_use_eager_rdma = 1
> btl_openib_eager_limit = 160000
> btl_openib_rndv_eager_limit = 160000
> btl_openib_max_send_size = 160000
> btl_openib_receive_queues = 
> P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512

> #include "mpi.h" 
> #include <mpi-ext.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> int main(int argc, char *argv[]) 
> { 
>    int size, rank, psize; 
>    int loops = 78;
>    int length = 4;
>    MPI_Init(&argc, &argv); 
>    MPI_Comm_size(MPI_COMM_WORLD, &size);
>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>    int *code = (int *)malloc(length * sizeof(int));
>    MPI_Status status;
>    long long i = 0;
>    double time_s = MPI_Wtime();
> 
>    if(rank % 2 == 1)
>    {
>        int i ;
>        int j ;
>        double a = 0.3, b = 0.5;
>        for(i = 0; i < 30000; i++)
>            for(j = 0; j < 30000; j++){
>                a = a * 2;
>                b = b + a;
>            }
>    }
> 
>    for(i = 0; i < loops; i++){
>        if(rank % 2 == 0){
>            MPI_Send(code, length, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);
>        }
>        else if(rank % 2 == 1){
>            MPI_Recv(code, length, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, 
> MPI_STATUS_IGNORE);
>        }
>    }
>    double time_e = MPI_Wtime();
>    printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, length * 
> sizeof(int), loops, time_e - time_s);
> 
>    MPI_Finalize(); 
>    return 0; 
> } 
> 

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29224.php

Attachment: pgpHVi0Pyq9Qk.pgp
Description: PGP signature

Reply via email to