I don't know of any documentation on the connection manager other than what is in the code and in my head. I rewrote a lot of the code in 2.x so you might want to try out the latest 2.x tarball from https://www.open-mpi.org/software/ompi/v2.x/
I know the per-peer queue pair will prevent totally asynchronous connections even in 2.x but SRQ/XRC only should work. -Nathan On Tue, May 17, 2016 at 11:31:01AM -0400, Xiaolong Cui wrote: > I think it is the connection manager that blocks the first message. If I > add a pair of send/recv at the very beginning, the problem is gone. But > removing the per-peer queue pair does not help. > Do you know any document that discusses the open mpi internals, especially > related to this problem? > On Tue, May 17, 2016 at 11:00 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > > If it is blocking on the first message then it might be blocked by the > connection manager. Removing the per-peer queue pair might help in that > case. > > -Nathan > On Mon, May 16, 2016 at 10:11:29PM -0400, Xiaolong Cui wrote: > > Hi Nathan, > > Thanks for your answer. > > The "credits" make sense for the purpose of flow control. However, > the > > sender in my case will be blocked even for the first message. This > doesn't > > seem to be the symptom of running out of credits. Is there any > reason for > > this? Also, is there a mac parameter for the number of credits? > > Best, > > Michael > > On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm <hje...@lanl.gov> > wrote: > > > > When using eager_rdma the sender will block once it runs out of > > "credits". If the receiver enters MPI for any reason the incoming > > messages will be placed in the ob1 unexpected queue and the > credits will > > be returned to the sender. If you turn off eager_rdma you will > probably > > get different results. That said, the unexpected message path is > > non-optimal and it would be best to ensure a matching receive is > posted > > before the send. > > > > Additionally, if you are using infiniband I recommend against > adding a > > per-peer queue pair to btl_openib_receive_queues. We have not > seen any > > performance benefit to using per-peer queue pairs and they do not > > scale. > > > > -Nathan Hjelm > > HPC-ENV, LANL > > On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote: > > > Hi, > > > I am using Open MPI 1.8.6. I guess my question is related to > the > > flow > > > control algorithm for small messages. The question is how to > avoid > > the > > > sender being blocked by the receiver when using openib > module for > > small > > > messages and using blocking send. I have looked through this > > > > > > FAQ(https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot) > > > but didn't find the answer. My understanding of "eager > sending > > protocol" > > > is that if a message is "small", it will be transported to > the > > receiver > > > immediately, even if the receiver is not ready. As a result, > the > > sender > > > won't be blocked until the receiver posts the receive > operation. > > > I am trying to observe such behavior with a simple program > of two > > MPI > > > ranks (attached). My confusion is that while I can see the > behavior > > with > > > "vader" module (shared memory) when running the two ranks on > the > > same > > > node, > > > [output] > > > > > > [0] size = 16, loop = 78, time = 0.00007 > > > > > > [1] size = 16, loop = 78, time = 3.42426 > > > > > > [/output] > > > but I cannot see it when running them on two nodes using the > > "openib" > > > module. > > > [output] > > > > > > [0] size = 16, loop = 78, time = 3.42627 > > > > > > [1] size = 16, loop = 78, time = 3.42426 > > > > > > [/output] > > > So anyone knows the reason? My runtime configuration is also > > attached. > > > Thanks! > > > Sincerely, > > > Michael > > > -- > > > Xiaolong Cui (Michael) > > > Department of Computer Science > > > Dietrich School of Arts & Science > > > University of Pittsburgh > > > Pittsburgh, PA 15260 > > > > > btl = openib,vader,self > > > #btl_base_verbose = 100 > > > btl_openib_use_eager_rdma = 1 > > > btl_openib_eager_limit = 160000 > > > btl_openib_rndv_eager_limit = 160000 > > > btl_openib_max_send_size = 160000 > > > btl_openib_receive_queues = > > > > P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512 > > > > > #include "mpi.h" > > > #include <mpi-ext.h> > > > #include <stdio.h> > > > #include <stdlib.h> > > > > > > int main(int argc, char *argv[]) > > > { > > > int size, rank, psize; > > > int loops = 78; > > > int length = 4; > > > MPI_Init(&argc, &argv); > > > MPI_Comm_size(MPI_COMM_WORLD, &size); > > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > int *code = (int *)malloc(length * sizeof(int)); > > > MPI_Status status; > > > long long i = 0; > > > double time_s = MPI_Wtime(); > > > > > > if(rank % 2 == 1) > > > { > > > int i ; > > > int j ; > > > double a = 0.3, b = 0.5; > > > for(i = 0; i < 30000; i++) > > > for(j = 0; j < 30000; j++){ > > > a = a * 2; > > > b = b + a; > > > } > > > } > > > > > > for(i = 0; i < loops; i++){ > > > if(rank % 2 == 0){ > > > MPI_Send(code, length, MPI_INT, rank + 1, 0, > > MPI_COMM_WORLD); > > > } > > > else if(rank % 2 == 1){ > > > MPI_Recv(code, length, MPI_INT, rank - 1, 0, > > MPI_COMM_WORLD, MPI_STATUS_IGNORE); > > > } > > > } > > > double time_e = MPI_Wtime(); > > > printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, > length * > > sizeof(int), loops, time_e - time_s); > > > > > > MPI_Finalize(); > > > return 0; > > > } > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: > https://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29224.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29227.php > > > > -- > > Xiaolong Cui (Michael) > > Department of Computer Science > > Dietrich School of Arts & Science > > University of Pittsburgh > > Pittsburgh, PA 15260 > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29228.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29229.php > > -- > Xiaolong Cui (Michael) > Department of Computer Science > Dietrich School of Arts & Science > University of Pittsburgh > Pittsburgh, PA 15260 > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29230.php
pgp4z0HLEXhR3.pgp
Description: PGP signature