On Aug 27, 2014, at 9:21 AM, Zhang,Lei(Ecom) <zhangle...@baidu.com> wrote:
> The problem is that I profiled the receiving node and found that its network > bandwidth is used only less than 50%. How did you profile that? > That's why I want to find ways to increase the receiving throughput. Any > ideas ? A lot of this depends on your networking setup. Is your fabric oversubscribed, perchance? Are there other performance bottlenecks? (e.g., using less-than-optimal networking hardware, etc.) What happens when you run MPI NetPipe between the two servers in question -- can you get full bandwidth? What happens when you run MPI NetPipe between 2 pairs of servers that share some of the networking hardware (e.g., on the same switch) -- do you still get full bandwidth? Repeat the experiment until you're running (num_switch_ports/2) instances MPI NetPipe -- between the first half of the ports and the 2nd half of the ports. Do you still get full bandwidth? Now start this experiment across multiple switches -- first with just a pair to ensure that you have a good network path from A to B (where A and B are on different switches). Now start adding more simultaneous pairs of servers running NetPipe to simulate congestion in the network. This type of experiment will help identify your network architecture and see if the fabric itself is leading to bandwidth constraints. (BTW, you may be able to infer much of this information without running all of these tests if you look at the physical and logical connectivity between all your switches -- e.g., do you have 16 leaf servers off a switch, but only 8 uplinks? And so on) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/