Re: [OMPI users] Eager sending on InfiniBand

Nathan Hjelm Tue, 17 May 2016 11:00:56 -0400 (EDT)

If it is blocking on the first message then it might be blocked by the
connection manager. Removing the per-peer queue pair might help in that
case.


-Nathan

On Mon, May 16, 2016 at 10:11:29PM -0400, Xiaolong Cui wrote:
>    Hi Nathan,
>    Thanks for your answer. 
>    The "credits" make sense for the purpose of flow control. However, the
>    sender in my case will be blocked even for the first message. This doesn't
>    seem to be the symptom of running out of credits. Is there any reason for
>    this? Also, is there a mac parameter for the number of credits? 
>    Best,
>    Michael
>    On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
>      When using eager_rdma the sender will block once it runs out of
>      "credits". If the receiver enters MPI for any reason the incoming
>      messages will be placed in the ob1 unexpected queue and the credits will
>      be returned to the sender. If you turn off eager_rdma you will probably
>      get different results. That said, the unexpected message path is
>      non-optimal and it would be best to ensure a matching receive is posted
>      before the send.
> 
>      Additionally, if you are using infiniband I recommend against adding a
>      per-peer queue pair to btl_openib_receive_queues. We have not seen any
>      performance benefit to using per-peer queue pairs and they do not
>      scale.
> 
>      -Nathan Hjelm
>      HPC-ENV, LANL
>      On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote:
>      >    Hi,
>      >    I am using Open MPI 1.8.6. I guess my question is related to the
>      flow
>      >    control algorithm for small messages. The question is how to avoid
>      the
>      >    sender being blocked by the receiver when using openib module for
>      small
>      >    messages and using blocking send. I have looked through this
>      >   
>      FAQ(https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot)
>      >    but didn't find the answer. My understanding of "eager sending
>      protocol"
>      >    is that if a message is "small", it will be transported to the
>      receiver
>      >    immediately, even if the receiver is not ready. As a result, the
>      sender
>      >    won't be blocked until the receiver posts the receive operation.
>      >    I am trying to observe such behavior with a simple program of two
>      MPI
>      >    ranks (attached). My confusion is that while I can see the behavior
>      with
>      >    "vader" module (shared memory) when running the two ranks on the
>      same
>      >    node,
>      >    [output]
>      >
>      >    [0] size = 16, loop = 78, time = 0.00007
>      >
>      >    [1] size = 16, loop = 78, time = 3.42426
>      >
>      >    [/output]
>      >    but I cannot see it when running them on two nodes using the
>      "openib"
>      >    module.
>      >    [output]
>      >
>      >    [0] size = 16, loop = 78, time = 3.42627
>      >
>      >    [1] size = 16, loop = 78, time = 3.42426
>      >
>      >    [/output]
>      >    So anyone knows the reason? My runtime configuration is also
>      attached.
>      >    Thanks!
>      >    Sincerely,
>      >    Michael
>      >    --
>      >    Xiaolong Cui (Michael)
>      >    Department of Computer Science
>      >    Dietrich School of Arts & Science
>      >    University of Pittsburgh
>      >    Pittsburgh, PA 15260
> 
>      > btl = openib,vader,self
>      > #btl_base_verbose = 100
>      > btl_openib_use_eager_rdma = 1
>      > btl_openib_eager_limit = 160000
>      > btl_openib_rndv_eager_limit = 160000
>      > btl_openib_max_send_size = 160000
>      > btl_openib_receive_queues =
>      
> P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512
> 
>      > #include "mpi.h"
>      > #include <mpi-ext.h>
>      > #include <stdio.h>
>      > #include <stdlib.h>
>      >
>      > int main(int argc, char *argv[])
>      > {
>      >    int size, rank, psize;
>      >    int loops = 78;
>      >    int length = 4;
>      >    MPI_Init(&argc, &argv);
>      >    MPI_Comm_size(MPI_COMM_WORLD, &size);
>      >    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>      >    int *code = (int *)malloc(length * sizeof(int));
>      >    MPI_Status status;
>      >    long long i = 0;
>      >    double time_s = MPI_Wtime();
>      >
>      >    if(rank % 2 == 1)
>      >    {
>      >        int i ;
>      >        int j ;
>      >        double a = 0.3, b = 0.5;
>      >        for(i = 0; i < 30000; i++)
>      >            for(j = 0; j < 30000; j++){
>      >                a = a * 2;
>      >                b = b + a;
>      >            }
>      >    }
>      >
>      >    for(i = 0; i < loops; i++){
>      >        if(rank % 2 == 0){
>      >            MPI_Send(code, length, MPI_INT, rank + 1, 0,
>      MPI_COMM_WORLD);
>      >        }
>      >        else if(rank % 2 == 1){
>      >            MPI_Recv(code, length, MPI_INT, rank - 1, 0,
>      MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>      >        }
>      >    }
>      >    double time_e = MPI_Wtime();
>      >    printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, length *
>      sizeof(int), loops, time_e - time_s);
>      >
>      >    MPI_Finalize();
>      >    return 0;
>      > }
>      >
> 
>      > _______________________________________________
>      > users mailing list
>      > us...@open-mpi.org
>      > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>      > Link to this post:
>      http://www.open-mpi.org/community/lists/users/2016/05/29224.php
> 
>      _______________________________________________
>      users mailing list
>      us...@open-mpi.org
>      Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>      Link to this post:
>      http://www.open-mpi.org/community/lists/users/2016/05/29227.php
> 
>    --
>    Xiaolong Cui (Michael)
>    Department of Computer Science
>    Dietrich School of Arts & Science
>    University of Pittsburgh
>    Pittsburgh, PA 15260

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29228.php

pgp3wlJox23hC.pgp
Description: PGP signature

Re: [OMPI users] Eager sending on InfiniBand

Reply via email to