Hi,
I'm benchmarking our (well tested) parallel code on and AMD based system,
featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for a total of
32 cores. The system is running Scientific Linux 6.1 and OpenMPI 1.4.5.
When I run a single core job the performance is as expected. How
FWIW: 1.5.5 still doesn't support binding to NUMA regions, for example - and
the script doesn't really do anything more than bind to cores. I believe only
the trunk provides a more comprehensive set of binding options.
Given the described NUMA layout, I suspect bind-to-NUMA is going to make the
You can try running using this script:
#!/bin/bash
s=$(($OMPI_COMM_WORLD_NODE_RANK))
numactl --physcpubind=$((s)) --localalloc ./YOUR_PROG
instead of 'mpirun ... ./YOUR_PROG' run 'mpirun ... ./SCRIPT
I tried this with openmpi-1.5.4 and it helped.
Best regards, Pavel Mezentsev
P.S openmpi-1.5.
I think you'd have much better luck using the developer's trunk as the binding
there is much better - e.g., you can bind to NUMA instead of just cores. The
1.4 binding is pretty limited.
http://www.open-mpi.org/nightly/trunk/
On Mar 30, 2012, at 5:02 AM, Ricardo Fonseca wrote:
> Hi guys
>
> I
Hi guys
I'm benchmarking our (well tested) parallel code on and AMD based system,
featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for a total of
32 cores. The system is running Scientific Linux 6.1 and OpenMPI 1.4.5.
When I run a single core job the performance is as expected. H