Hi Nathan, I got one more question. I am measuring the number of messages that can be eagerly sent with a given SRQ. Again, as illustrated below, my program has two ranks, rank 0 sends a variable number (*n*) of messages to rank 1 who is not ready to receive.
[image: Inline image 1] I measured the time for rank 0 to send out all the messages, and surprisingly, the result looks like below. Do you know why the time drops at n=127? The SRQ is simply btl_openib_receive_queues = S,2048,512,494,80 [image: Inline image 2] On Tue, May 17, 2016 at 11:49 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > > I don't know of any documentation on the connection manager other than > what is in the code and in my head. I rewrote a lot of the code in 2.x > so you might want to try out the latest 2.x tarball from > https://www.open-mpi.org/software/ompi/v2.x/ > > I know the per-peer queue pair will prevent totally asynchronous > connections even in 2.x but SRQ/XRC only should work. > > -Nathan > > On Tue, May 17, 2016 at 11:31:01AM -0400, Xiaolong Cui wrote: > > I think it is the connection manager that blocks the first message. > If I > > add a pair of send/recv at the very beginning, the problem is gone. > But > > removing the per-peer queue pair does not help. > > Do you know any document that discusses the open mpi internals, > especially > > related to this problem? > > On Tue, May 17, 2016 at 11:00 AM, Nathan Hjelm <hje...@lanl.gov> > wrote: > > > > If it is blocking on the first message then it might be blocked by > the > > connection manager. Removing the per-peer queue pair might help in > that > > case. > > > > -Nathan > > On Mon, May 16, 2016 at 10:11:29PM -0400, Xiaolong Cui wrote: > > > Hi Nathan, > > > Thanks for your answer. > > > The "credits" make sense for the purpose of flow control. > However, > > the > > > sender in my case will be blocked even for the first message. > This > > doesn't > > > seem to be the symptom of running out of credits. Is there any > > reason for > > > this? Also, is there a mac parameter for the number of credits? > > > Best, > > > Michael > > > On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm <hje...@lanl.gov > > > > wrote: > > > > > > When using eager_rdma the sender will block once it runs out > of > > > "credits". If the receiver enters MPI for any reason the > incoming > > > messages will be placed in the ob1 unexpected queue and the > > credits will > > > be returned to the sender. If you turn off eager_rdma you > will > > probably > > > get different results. That said, the unexpected message > path is > > > non-optimal and it would be best to ensure a matching > receive is > > posted > > > before the send. > > > > > > Additionally, if you are using infiniband I recommend against > > adding a > > > per-peer queue pair to btl_openib_receive_queues. We have not > > seen any > > > performance benefit to using per-peer queue pairs and they > do not > > > scale. > > > > > > -Nathan Hjelm > > > HPC-ENV, LANL > > > On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui wrote: > > > > Hi, > > > > I am using Open MPI 1.8.6. I guess my question is > related to > > the > > > flow > > > > control algorithm for small messages. The question is > how to > > avoid > > > the > > > > sender being blocked by the receiver when using openib > > module for > > > small > > > > messages and using blocking send. I have looked through > this > > > > > > > > > FAQ( > https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot) > > > > but didn't find the answer. My understanding of "eager > > sending > > > protocol" > > > > is that if a message is "small", it will be transported > to > > the > > > receiver > > > > immediately, even if the receiver is not ready. As a > result, > > the > > > sender > > > > won't be blocked until the receiver posts the receive > > operation. > > > > I am trying to observe such behavior with a simple > program > > of two > > > MPI > > > > ranks (attached). My confusion is that while I can see > the > > behavior > > > with > > > > "vader" module (shared memory) when running the two > ranks on > > the > > > same > > > > node, > > > > [output] > > > > > > > > [0] size = 16, loop = 78, time = 0.00007 > > > > > > > > [1] size = 16, loop = 78, time = 3.42426 > > > > > > > > [/output] > > > > but I cannot see it when running them on two nodes > using the > > > "openib" > > > > module. > > > > [output] > > > > > > > > [0] size = 16, loop = 78, time = 3.42627 > > > > > > > > [1] size = 16, loop = 78, time = 3.42426 > > > > > > > > [/output] > > > > So anyone knows the reason? My runtime configuration is > also > > > attached. > > > > Thanks! > > > > Sincerely, > > > > Michael > > > > -- > > > > Xiaolong Cui (Michael) > > > > Department of Computer Science > > > > Dietrich School of Arts & Science > > > > University of Pittsburgh > > > > Pittsburgh, PA 15260 > > > > > > > btl = openib,vader,self > > > > #btl_base_verbose = 100 > > > > btl_openib_use_eager_rdma = 1 > > > > btl_openib_eager_limit = 160000 > > > > btl_openib_rndv_eager_limit = 160000 > > > > btl_openib_max_send_size = 160000 > > > > btl_openib_receive_queues = > > > > > > P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512 > > > > > > > #include "mpi.h" > > > > #include <mpi-ext.h> > > > > #include <stdio.h> > > > > #include <stdlib.h> > > > > > > > > int main(int argc, char *argv[]) > > > > { > > > > int size, rank, psize; > > > > int loops = 78; > > > > int length = 4; > > > > MPI_Init(&argc, &argv); > > > > MPI_Comm_size(MPI_COMM_WORLD, &size); > > > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > int *code = (int *)malloc(length * sizeof(int)); > > > > MPI_Status status; > > > > long long i = 0; > > > > double time_s = MPI_Wtime(); > > > > > > > > if(rank % 2 == 1) > > > > { > > > > int i ; > > > > int j ; > > > > double a = 0.3, b = 0.5; > > > > for(i = 0; i < 30000; i++) > > > > for(j = 0; j < 30000; j++){ > > > > a = a * 2; > > > > b = b + a; > > > > } > > > > } > > > > > > > > for(i = 0; i < loops; i++){ > > > > if(rank % 2 == 0){ > > > > MPI_Send(code, length, MPI_INT, rank + 1, 0, > > > MPI_COMM_WORLD); > > > > } > > > > else if(rank % 2 == 1){ > > > > MPI_Recv(code, length, MPI_INT, rank - 1, 0, > > > MPI_COMM_WORLD, MPI_STATUS_IGNORE); > > > > } > > > > } > > > > double time_e = MPI_Wtime(); > > > > printf("[%d] size = %d, loop = %d, time = %.5f\n", rank, > > length * > > > sizeof(int), loops, time_e - time_s); > > > > > > > > MPI_Finalize(); > > > > return 0; > > > > } > > > > > > > > > > > _______________________________________________ > > > > users mailing list > > > > us...@open-mpi.org > > > > Subscription: > > https://www.open-mpi.org/mailman/listinfo.cgi/users > > > > Link to this post: > > > > http://www.open-mpi.org/community/lists/users/2016/05/29224.php > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: > https://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > > > > http://www.open-mpi.org/community/lists/users/2016/05/29227.php > > > > > > -- > > > Xiaolong Cui (Michael) > > > Department of Computer Science > > > Dietrich School of Arts & Science > > > University of Pittsburgh > > > Pittsburgh, PA 15260 > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29228.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/05/29229.php > > > > -- > > Xiaolong Cui (Michael) > > Department of Computer Science > > Dietrich School of Arts & Science > > University of Pittsburgh > > Pittsburgh, PA 15260 > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29230.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29231.php > -- Xiaolong Cui (Michael) Department of Computer Science Dietrich School of Arts & Science University of Pittsburgh Pittsburgh, PA 15260