[OMPI users] Running on two nodes slower than running on one node

Victor Tue, 28 Jan 2014 21:00:56 -0500 (EST)

I am running a CFD simulation benchmark cavity3d available within
http://www.palabos.org/images/palabos_releases/palabos-v1.4r1.tgz


It is a parallel friendly Lattice Botlzmann solver library.

Palabos provides benchmark results for the cavity3d on several different
platforms and variables here:
http://wiki.palabos.org/plb_wiki:benchmark:cavity_n400

The problem that I have is that the benchmark performance on my cluster
does not scale even close to a linear scale.

My cluster configuration:

Node1: Dual Xeon 5560 48 Gb RAM
Node2: i5-2400 24 Gb RAM

Gigabit ethernet connection on eth0

OpenMPI 1.6.5 on Ubuntu 12.04.3


Hostfile:

Node1 -slots=4 -max-slots=4
Node2 -slots=4 -max-slots=4

MPI command: mpirun --mca btl_tcp_if_include eth0 --hostfile
/home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400

Problem:

cavity3d 400

When I run mpirun -np 4 on Node1 I get 35.7615 Mega site updates per second
When I run mpirun -np 4 on Node2 I get 30.7972 Mega site updates per second
When I run mpirun --mca btl_tcp_if_include eth0 --hostfile
/home/mpiuser/.mpi_hostfile -np 8 ./cavity3d 400 I get  47.3538 Mega site
updates per second

I understand that there are latencies with GbE and that there is MPI
overhead, but this performance scaling still seems very poor. Are my
expectations of scaling naive, or is there actually something wrong and
fixable that will improve the scaling? Optimistically I would like each
node to add to the cluster performance, not slow it down.

Things get even worse if I run asymmetric number of mpi jobs in each node.
For instance running -np 12 on Node1 is significantly faster than running
-np 16 across Node1 and Node2, thus adding Node2 actually slows down the
performance.

Thanks,

Victor

[OMPI users] Running on two nodes slower than running on one node

Reply via email to