Hi,
I am running the NAS parallel benchmarks and I have a performance problem
depending on the hostfile configuration. I use Open MPI version 1.7.2.

I run the FT benchmark in 16 processes, but I want to overload each core
with 4 processes (yes, I want to do it), so I execute:

time mpirun --hostfile ./hostfile -np 16 --oversubscribe -bind-to 
core:overload-allowed --ppr 4:core --report-bindings ./ft.C.16

and the hostfile is (each node has 2 octo-core Intel Xeon processors):
compute-0-15 slots=4

I check the core mapping whit the "top" command and the 16 processes run 
over 4 physical cores. The time execution in this configuration is 80 seconds.

The problem is that if I change the hostfile to:
compute-0-15 slots=16

and I run the same mpirun instruction (overloading each core with 4 
processes) the execution time increase to 240 seconds (!). 
I check the core mapping again and the 16 processes were running over 
the same 4 cores. 

Any idea to explain the performance drop?

Thanks,
Iván Cores.

P.S.:
In both cases the binging is:
[compute-0-15.local:14691] MCW rank 15 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 2 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 3 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 4 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 5 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 6 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 7 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 8 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 9 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 10 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 11 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 12 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 13 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]
[compute-0-15.local:14691] MCW rank 14 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../../../..][../../../../../../../..]


Reply via email to