Sorry, the figures do not display. They are attached to this message. On Wed, May 18, 2016 at 3:24 PM, Xiaolong Cui <sunshine...@gmail.com> wrote:
> Hi Nathan, > > I got one more question. I am measuring the number of messages that can be > eagerly sent with a given SRQ. Again, as illustrated below, my program has > two ranks, rank 0 sends a variable number (*n*) of messages to rank 1 who > is not ready to receive. > > [image: Inline image 1] > > I measured the time for rank 0 to send out all the messages, and > surprisingly, the result looks like below. Do you know why the time drops > at n=127? The SRQ is simply btl_openib_receive_queues = S,2048,512,494,80 > [image: Inline image 2] > > > On Tue, May 17, 2016 at 11:49 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > >> >> I don't know of any documentation on the connection manager other than >> what is in the code and in my head. I rewrote a lot of the code in 2.x >> so you might want to try out the latest 2.x tarball from >> https://www.open-mpi.org/software/ompi/v2.x/ >> >> I know the per-peer queue pair will prevent totally asynchronous >> connections even in 2.x but SRQ/XRC only should work. >> >> -Nathan >> >> On Tue, May 17, 2016 at 11:31:01AM -0400, Xiaolong Cui wrote: >> > I think it is the connection manager that blocks the first message. >> If I >> > add a pair of send/recv at the very beginning, the problem is gone. >> But >> > removing the per-peer queue pair does not help. >> > Do you know any document that discusses the open mpi internals, >> especially >> > related to this problem? >> > On Tue, May 17, 2016 at 11:00 AM, Nathan Hjelm <hje...@lanl.gov> >> wrote: >> > >> > If it is blocking on the first message then it might be blocked by >> the >> > connection manager. Removing the per-peer queue pair might help in >> that >> > case. >> > >> > -Nathan >> > On Mon, May 16, 2016 at 10:11:29PM -0400, Xiaolong Cui wrote: >> > > Hi Nathan, >> > > Thanks for your answer. >> > > The "credits" make sense for the purpose of flow control. >> However, >> > the >> > > sender in my case will be blocked even for the first message. >> This >> > doesn't >> > > seem to be the symptom of running out of credits. Is there any >> > reason for >> > > this? Also, is there a mac parameter for the number of >> credits? >> > > Best, >> > > Michael >> > > On Mon, May 16, 2016 at 6:35 PM, Nathan Hjelm < >> hje...@lanl.gov> >> > wrote: >> > > >> > > When using eager_rdma the sender will block once it runs >> out of >> > > "credits". If the receiver enters MPI for any reason the >> incoming >> > > messages will be placed in the ob1 unexpected queue and the >> > credits will >> > > be returned to the sender. If you turn off eager_rdma you >> will >> > probably >> > > get different results. That said, the unexpected message >> path is >> > > non-optimal and it would be best to ensure a matching >> receive is >> > posted >> > > before the send. >> > > >> > > Additionally, if you are using infiniband I recommend >> against >> > adding a >> > > per-peer queue pair to btl_openib_receive_queues. We have >> not >> > seen any >> > > performance benefit to using per-peer queue pairs and they >> do not >> > > scale. >> > > >> > > -Nathan Hjelm >> > > HPC-ENV, LANL >> > > On Mon, May 16, 2016 at 12:21:41PM -0400, Xiaolong Cui >> wrote: >> > > > Hi, >> > > > I am using Open MPI 1.8.6. I guess my question is >> related to >> > the >> > > flow >> > > > control algorithm for small messages. The question is >> how to >> > avoid >> > > the >> > > > sender being blocked by the receiver when using openib >> > module for >> > > small >> > > > messages and using blocking send. I have looked >> through this >> > > > >> > > >> > FAQ( >> https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot) >> > > > but didn't find the answer. My understanding of "eager >> > sending >> > > protocol" >> > > > is that if a message is "small", it will be >> transported to >> > the >> > > receiver >> > > > immediately, even if the receiver is not ready. As a >> result, >> > the >> > > sender >> > > > won't be blocked until the receiver posts the receive >> > operation. >> > > > I am trying to observe such behavior with a simple >> program >> > of two >> > > MPI >> > > > ranks (attached). My confusion is that while I can see >> the >> > behavior >> > > with >> > > > "vader" module (shared memory) when running the two >> ranks on >> > the >> > > same >> > > > node, >> > > > [output] >> > > > >> > > > [0] size = 16, loop = 78, time = 0.00007 >> > > > >> > > > [1] size = 16, loop = 78, time = 3.42426 >> > > > >> > > > [/output] >> > > > but I cannot see it when running them on two nodes >> using the >> > > "openib" >> > > > module. >> > > > [output] >> > > > >> > > > [0] size = 16, loop = 78, time = 3.42627 >> > > > >> > > > [1] size = 16, loop = 78, time = 3.42426 >> > > > >> > > > [/output] >> > > > So anyone knows the reason? My runtime configuration >> is also >> > > attached. >> > > > Thanks! >> > > > Sincerely, >> > > > Michael >> > > > -- >> > > > Xiaolong Cui (Michael) >> > > > Department of Computer Science >> > > > Dietrich School of Arts & Science >> > > > University of Pittsburgh >> > > > Pittsburgh, PA 15260 >> > > >> > > > btl = openib,vader,self >> > > > #btl_base_verbose = 100 >> > > > btl_openib_use_eager_rdma = 1 >> > > > btl_openib_eager_limit = 160000 >> > > > btl_openib_rndv_eager_limit = 160000 >> > > > btl_openib_max_send_size = 160000 >> > > > btl_openib_receive_queues = >> > > >> > >> P,128,256,192,64:S,2048,1024,1008,80:S,12288,1024,1008,80:S,160000,1024,512,512 >> > > >> > > > #include "mpi.h" >> > > > #include <mpi-ext.h> >> > > > #include <stdio.h> >> > > > #include <stdlib.h> >> > > > >> > > > int main(int argc, char *argv[]) >> > > > { >> > > > int size, rank, psize; >> > > > int loops = 78; >> > > > int length = 4; >> > > > MPI_Init(&argc, &argv); >> > > > MPI_Comm_size(MPI_COMM_WORLD, &size); >> > > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> > > > int *code = (int *)malloc(length * sizeof(int)); >> > > > MPI_Status status; >> > > > long long i = 0; >> > > > double time_s = MPI_Wtime(); >> > > > >> > > > if(rank % 2 == 1) >> > > > { >> > > > int i ; >> > > > int j ; >> > > > double a = 0.3, b = 0.5; >> > > > for(i = 0; i < 30000; i++) >> > > > for(j = 0; j < 30000; j++){ >> > > > a = a * 2; >> > > > b = b + a; >> > > > } >> > > > } >> > > > >> > > > for(i = 0; i < loops; i++){ >> > > > if(rank % 2 == 0){ >> > > > MPI_Send(code, length, MPI_INT, rank + 1, 0, >> > > MPI_COMM_WORLD); >> > > > } >> > > > else if(rank % 2 == 1){ >> > > > MPI_Recv(code, length, MPI_INT, rank - 1, 0, >> > > MPI_COMM_WORLD, MPI_STATUS_IGNORE); >> > > > } >> > > > } >> > > > double time_e = MPI_Wtime(); >> > > > printf("[%d] size = %d, loop = %d, time = %.5f\n", >> rank, >> > length * >> > > sizeof(int), loops, time_e - time_s); >> > > > >> > > > MPI_Finalize(); >> > > > return 0; >> > > > } >> > > > >> > > >> > > > _______________________________________________ >> > > > users mailing list >> > > > us...@open-mpi.org >> > > > Subscription: >> > https://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > Link to this post: >> > > >> http://www.open-mpi.org/community/lists/users/2016/05/29224.php >> > > >> > > _______________________________________________ >> > > users mailing list >> > > us...@open-mpi.org >> > > Subscription: >> https://www.open-mpi.org/mailman/listinfo.cgi/users >> > > Link to this post: >> > > >> http://www.open-mpi.org/community/lists/users/2016/05/29227.php >> > > >> > > -- >> > > Xiaolong Cui (Michael) >> > > Department of Computer Science >> > > Dietrich School of Arts & Science >> > > University of Pittsburgh >> > > Pittsburgh, PA 15260 >> > >> > > _______________________________________________ >> > > users mailing list >> > > us...@open-mpi.org >> > > Subscription: >> https://www.open-mpi.org/mailman/listinfo.cgi/users >> > > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/05/29228.php >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/05/29229.php >> > >> > -- >> > Xiaolong Cui (Michael) >> > Department of Computer Science >> > Dietrich School of Arts & Science >> > University of Pittsburgh >> > Pittsburgh, PA 15260 >> >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29230.php >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29231.php >> > > > > -- > Xiaolong Cui (Michael) > Department of Computer Science > Dietrich School of Arts & Science > University of Pittsburgh > Pittsburgh, PA 15260 > -- Xiaolong Cui (Michael) Department of Computer Science Dietrich School of Arts & Science University of Pittsburgh Pittsburgh, PA 15260