Re: [OMPI users] Help with multicore AMD machine performance

2012-04-02 Thread Nico Mittenzwey
Hi, I'm benchmarking our (well tested) parallel code on and AMD based system, featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for a total of 32 cores. The system is running Scientific Linux 6.1 and OpenMPI 1.4.5. When I run a single core job the performance is as expected. How

Re: [OMPI users] Help with multicore AMD machine performance

2012-03-30 Thread Ralph Castain
FWIW: 1.5.5 still doesn't support binding to NUMA regions, for example - and the script doesn't really do anything more than bind to cores. I believe only the trunk provides a more comprehensive set of binding options. Given the described NUMA layout, I suspect bind-to-NUMA is going to make the

Re: [OMPI users] Help with multicore AMD machine performance

2012-03-30 Thread Pavel Mezentsev
You can try running using this script: #!/bin/bash s=$(($OMPI_COMM_WORLD_NODE_RANK)) numactl --physcpubind=$((s)) --localalloc ./YOUR_PROG instead of 'mpirun ... ./YOUR_PROG' run 'mpirun ... ./SCRIPT I tried this with openmpi-1.5.4 and it helped. Best regards, Pavel Mezentsev P.S openmpi-1.5.

Re: [OMPI users] Help with multicore AMD machine performance

2012-03-30 Thread Ralph Castain
I think you'd have much better luck using the developer's trunk as the binding there is much better - e.g., you can bind to NUMA instead of just cores. The 1.4 binding is pretty limited. http://www.open-mpi.org/nightly/trunk/ On Mar 30, 2012, at 5:02 AM, Ricardo Fonseca wrote: > Hi guys > > I

[OMPI users] Help with multicore AMD machine performance

2012-03-30 Thread Ricardo Fonseca
Hi guys I'm benchmarking our (well tested) parallel code on and AMD based system, featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for a total of 32 cores. The system is running Scientific Linux 6.1 and OpenMPI 1.4.5. When I run a single core job the performance is as expected. H