On May 5, 2016, at 10:09 PM, Zhen Wang <tod...@gmail.com> wrote: > > It's taking so long because you are sleeping for .1 second between calling > MPI_Test(). > > The TCP transport is only sending a few fragments of your message during each > iteration through MPI_Test (because, by definition, it has to return > "immediately"). Other transports do better handing off large messages like > this to hardware for asynchronous progress. > This agrees with what I observed. Larger messages needs more calls of > MPI_Test. What do you mean by other transports?
The POSIX sockets API, commonly used with TCP over Ethernet, is great for most network-based applications, but it has some inherent constraints that limit its performance in HPC types of applications. That being said, many people just take a bunch of servers and run MPI over over TCP/Ethernet, and it works well enough for them. Because of this "good enough" performance, and the fact that every server in the world supports some type of Ethernet capability, all MPI implementations support TCP. But there are more demanding HPC applications that require higher performance from the network in order to get good overall performance. As such, other networking APIs -- most commonly provided by vendors for HPC-class networks (Ethernet or otherwise) -- do not have the same performance constraints as the POSIX sockets API, and are usually preferred by HPC applications. There's usually two kinds of performance improvements that such networking APIs offer (in conjunction with the underlying NIC for the HPC-class network): 1. Improving software API efficiency (e.g., avoid extra memory copies, bypassing the OS and exposing NIC hardware directly into userspace, etc.) 2. Exploiting NIC hardware capabilities, usually designed for MPI and/or general high performance (e.g., polling for progress instead of waiting for interrupts, hardware demultiplex of incoming messages directly to target processes, direct data placement at the target, etc.) Hence, when I say "other transports", I'm referring to these HPC-class networks (and associated APIs). > Additionally, in the upcoming v2.0.0 release is a non-default option to > enable an asynchronous progress thread for the TCP transport. We're up to > v2.0.0rc2; you can give that async TCP support a whirl, if you want. Pass > "--mca btl_tcp_progress_thread 1" on the mpirun command line to enable the > TCP progress thread to try it. > Does this mean there's an additional thread to transfer data in background? Yes. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/