On Aug 27, 2014, at 9:21 AM, Zhang,Lei(Ecom) <zhangle...@baidu.com> wrote:

> The problem is that I profiled the receiving node and found that its network 
> bandwidth is used only less than 50%.

How did you profile that?

> That's why I want to find ways to increase the receiving throughput. Any 
> ideas ?

A lot of this depends on your networking setup.  Is your fabric oversubscribed, 
perchance?  Are there other performance bottlenecks?  (e.g., using 
less-than-optimal networking hardware, etc.)

What happens when you run MPI NetPipe between the two servers in question -- 
can you get full bandwidth?  

What happens when you run MPI NetPipe between 2 pairs of servers that share 
some of the networking hardware (e.g., on the same switch) -- do you still get 
full bandwidth?

Repeat the experiment until you're running (num_switch_ports/2) instances MPI 
NetPipe -- between the first half of the ports and the 2nd half of the ports.  
Do you still get full bandwidth?

Now start this experiment across multiple switches -- first with just a pair to 
ensure that you have a good network path from A to B (where A and B are on 
different switches).  Now start adding more simultaneous pairs of servers 
running NetPipe to simulate congestion in the network.

This type of experiment will help identify your network architecture and see if 
the fabric itself is leading to bandwidth constraints.

(BTW, you may be able to infer much of this information without running all of 
these tests if you look at the physical and logical connectivity between all 
your switches -- e.g., do you have 16 leaf servers off a switch, but only 8 
uplinks?  And so on)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to