Jeff, The hardware limitation doesn't allow me to use anything other than TCP...
I think I have a good understanding of what's going on, and may have a solution. I'll test it out. Thanks to you all. Best regards, Zhen On Fri, May 6, 2016 at 7:13 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On May 5, 2016, at 10:09 PM, Zhen Wang <tod...@gmail.com> wrote: > > > > It's taking so long because you are sleeping for .1 second between > calling MPI_Test(). > > > > The TCP transport is only sending a few fragments of your message during > each iteration through MPI_Test (because, by definition, it has to return > "immediately"). Other transports do better handing off large messages like > this to hardware for asynchronous progress. > > This agrees with what I observed. Larger messages needs more calls of > MPI_Test. What do you mean by other transports? > > The POSIX sockets API, commonly used with TCP over Ethernet, is great for > most network-based applications, but it has some inherent constraints that > limit its performance in HPC types of applications. > > That being said, many people just take a bunch of servers and run MPI over > over TCP/Ethernet, and it works well enough for them. Because of this > "good enough" performance, and the fact that every server in the world > supports some type of Ethernet capability, all MPI implementations support > TCP. > > But there are more demanding HPC applications that require higher > performance from the network in order to get good overall performance. As > such, other networking APIs -- most commonly provided by vendors for > HPC-class networks (Ethernet or otherwise) -- do not have the same > performance constraints as the POSIX sockets API, and are usually preferred > by HPC applications. > > There's usually two kinds of performance improvements that such networking > APIs offer (in conjunction with the underlying NIC for the HPC-class > network): > > 1. Improving software API efficiency (e.g., avoid extra memory copies, > bypassing the OS and exposing NIC hardware directly into userspace, etc.) > > 2. Exploiting NIC hardware capabilities, usually designed for MPI and/or > general high performance (e.g., polling for progress instead of waiting for > interrupts, hardware demultiplex of incoming messages directly to target > processes, direct data placement at the target, etc.) > > Hence, when I say "other transports", I'm referring to these HPC-class > networks (and associated APIs). > > > Additionally, in the upcoming v2.0.0 release is a non-default option to > enable an asynchronous progress thread for the TCP transport. We're up to > v2.0.0rc2; you can give that async TCP support a whirl, if you want. Pass > "--mca btl_tcp_progress_thread 1" on the mpirun command line to enable the > TCP progress thread to try it. > > Does this mean there's an additional thread to transfer data in > background? > > Yes. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29112.php >