Hi,

I'm benchmarking our (well tested) parallel code on and AMD based system, 
featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for a total of 
32 cores. The system is running Scientific Linux 6.1 and OpenMPI 1.4.5.

When I run a single core job the performance is as expected. However, when I 
run with 32 processes the performance drops to about 60%

Be aware that on AMD CPUs based on Bulldozer/Interlagos technology 2 cores share the FPU units of one module. There is also a problem with Cross-Cache-Invalidations [1] in earlier kernel versions - be sure to use an up-to-date kernel (2.6.32-220.7.1)

Cheers,
Nico

[1] http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf

Reply via email to