I think it is the connection manager that blocks the first message. If I add a pair of send/recv at the very beginning, the problem is gone. But removing the per-peer queue pair does not help.
Do you know any document that discusses the open mpi internals, especially related to this problem? On Tue, May 17, 2016 at 11:00 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > > If it is blocking on the first message then it might be blocked by the > connection manager. Removing the per-peer queue pair might help in that > case. > > -Nathan > > On Mon, May 16, 2016 at 10:11:29PM -0400, Xiaolong Cui wrote: > > Hi Nathan, > > Thanks for your answer. > > The "credits" make sense for the purpose of flow control. However, the > > sender in my case will be blocked even for the first message. This > doesn't > > seem to be the symptom of running out of credits. Is there any reason > for > > this? Also, is there a mac parameter for the number of credits? > > Best, > > Michael > > On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm <hje...@lanl.gov> > wrote: > > > > When using eager_rdma the sender will block once it runs out of > > "credits". If the receiver enters MPI for any reason the incoming > > messages will be placed in the ob1 unexpected queue and the credits > will > > be returned to the sender. If you turn off eager_rdma you will > probably > > get different results. That said, the unexpected message path is > > non-optimal and it would be best to ensure a matching receive is > posted > > before the send. > > > > Additionally, if you are using infiniband I recommend against > adding a > > per-peer queue pair to btl_openib_receive_queues. We have not seen > any > > performance benefit to using per-peer queue pairs and they do not > > scale. > > > > -Nathan Hjelm > > HPC-ENV, LANL > > On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote: > > > Hi, > > > I am using Open MPI 1.8.6. I guess my question is related to > the > > flow > > > control algorithm for small messages. The question is how to > avoid > > the > > > sender being blocked by the receiver when using openib module > for > > small > > > messages and using blocking send. I have looked through this > > > > > FAQ( > https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot) > > > but didn't find the answer. My understanding of "eager sending > > protocol" > > > is that if a message is "small", it will be transported to the > > receiver > > > immediately, even if the receiver is not ready. As a result, > the > > sender > > > won't be blocked until the receiver posts the receive > operation. > > > I am trying to observe such behavior with a simple program of > two > > MPI > > > ranks (attached). My confusion is that while I can see the > behavior > > with > > > "vader" module (shared memory) when running the two ranks on > the > > same > > > node, > > > [output] > > > > > > [0] size = 16, loop = 78, time = 0.00007 > > > > > > [1] size = 16, loop = 78, time = 3.42426 > > > > > > [/output] > > > but I cannot see it when running them on two nodes using the > > "openib" > > > module. > > > [output] > > > > > > [0] size = 16, loop = 78, time = 3.42627 > > > > > > [1] size = 16, loop = 78, time = 3.42426 > > > > > > [/output] > > > So anyone knows the reason? My runtime configuration is also > > attached. > > > Thanks! > > > Sincerely, > > > Michael > > > -- > > > Xiaolong Cui (Michael) > > > Department of Computer Science > > > Dietrich School of Arts & Science > > > University of Pittsburgh > > > Pittsburgh, PA 15260 > > > > > btl = openib,vader,self > > > #btl_base_verbose = 100 > > > btl_openib_use_eager_rdma = 1 > > > btl_openib_eager_limit = 160000 > > > btl_openib_rndv_eager_limit = 160000 > > > btl_openib_max_send_size = 160000 > > > btl_openib_receive_queues = > > > P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512 > > > > > #include "mpi.h" > > > #include <mpi-ext.h> > > > #include <stdio.h> > > > #include <stdlib.h> > > > > > > int main(int argc, char *argv[]) > > > { > > > int size, rank, psize; > > > int loops = 78; > > > int length = 4; > > > MPI_Init(&argc, &argv); > > > MPI_Comm_size(MPI_COMM_WORLD, &size); > > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > int *code = (int *)malloc(length * sizeof(int)); > > > MPI_Status status; > > > long long i = 0; > > > double time_s = MPI_Wtime(); > > > > > > if(rank % 2 == 1) > > > { > > > int i ; > > > int j ; > > > double a = 0.3, b = 0.5; > > > for(i = 0; i < 30000; i++) > > > for(j = 0; j < 30000; j++){ > > > a = a * 2; > > > b = b + a; > > > } > > > } > > > > > > for(i = 0; i < loops; i++){ > > > if(rank % 2 == 0){ > > > MPI_Send(code, length, MPI_INT, rank + 1, 0, > > MPI_COMM_WORLD); > > > } > > > else if(rank % 2 == 1){ > > > MPI_Recv(code, length, MPI_INT, rank - 1, 0, > > MPI_COMM_WORLD, MPI_STATUS_IGNORE); > > > } > > > } > > > double time_e = MPI_Wtime(); > > > printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, > length * > > sizeof(int), loops, time_e - time_s); > > > > > > MPI_Finalize(); > > > return 0; > > > } > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29224.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29227.php > > > > -- > > Xiaolong Cui (Michael) > > Department of Computer Science > > Dietrich School of Arts & Science > > University of Pittsburgh > > Pittsburgh, PA 15260 > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29228.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29229.php > -- Xiaolong Cui (Michael) Department of Computer Science Dietrich School of Arts & Science University of Pittsburgh Pittsburgh, PA 15260