(still trolling through the history in my INBOX...)

On Jul 9, 2010, at 8:56 AM, Andreas Schäfer wrote:

> On 14:39 Fri 09 Jul     , Peter Kjellstrom wrote:
> > 8x pci-express gen2 5GT/s should show figures like mine. If it's pci-express
> > gen1 or gen2 2.5GT/s or 4x or if the IB only came up with two lanes then 
> > 1500
> > is expected.
> 
> lspci and ibv_devinfo tell me it's PCIe 2.0 x8 and InfiniBand 4x QDR
> (active_width 4X, active_speed 10.0 Gbps), so I /should/ be able to
> get about twice the throughput of what I'm currently seeing.

You'll get different shared memory performance if you bind both the local procs 
to a single socket or two different sockets.  I don't know much about AMDs, so 
I can't say exactly what it'll do offhand.

As for the IB performance, you want to make sure that your MPI process is bound 
to a core that is "near" the HCA for minimum latency and max bandwidth.  Then 
also check that your IB fabric is clean, etc.  I believe that OFED comes with a 
bunch of verbs-level latency and bandwidth unit tests that can measure what 
you're getting across your fabric (i.e., raw network performance without MPI).  
It's been a while since I've worked deeply with OFED stuff; I don't remember 
the command names offhand -- perhaps ibv_rc_pingpong, or somesuch?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to