(still trolling through the history in my INBOX...) On Jul 9, 2010, at 8:56 AM, Andreas Schäfer wrote:
> On 14:39 Fri 09 Jul , Peter Kjellstrom wrote: > > 8x pci-express gen2 5GT/s should show figures like mine. If it's pci-express > > gen1 or gen2 2.5GT/s or 4x or if the IB only came up with two lanes then > > 1500 > > is expected. > > lspci and ibv_devinfo tell me it's PCIe 2.0 x8 and InfiniBand 4x QDR > (active_width 4X, active_speed 10.0 Gbps), so I /should/ be able to > get about twice the throughput of what I'm currently seeing. You'll get different shared memory performance if you bind both the local procs to a single socket or two different sockets. I don't know much about AMDs, so I can't say exactly what it'll do offhand. As for the IB performance, you want to make sure that your MPI process is bound to a core that is "near" the HCA for minimum latency and max bandwidth. Then also check that your IB fabric is clean, etc. I believe that OFED comes with a bunch of verbs-level latency and bandwidth unit tests that can measure what you're getting across your fabric (i.e., raw network performance without MPI). It's been a while since I've worked deeply with OFED stuff; I don't remember the command names offhand -- perhaps ibv_rc_pingpong, or somesuch? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/