Hello,
I'm running experiments with BT NAS benchmark on OpenMPI. I've
identified a very weird performance degradation of OpenMPI v1.10.2 (and
later versions) when the system is oversubscribed. In particular, note
the performance difference between 1.10.2 and 1.10.1 when running 36 MPI
processes over 28 CPUs.
> $HOME/openmpi-bin-1.10.1/bin/mpirun -np 36 taskset -c 0-27
$HOME/NPB/NPB3.3-MPI/bin/bt.C.36 -> Time in seconds = 82.79
> $HOME/openmpi-bin-1.10.2/bin/mpirun -np 36 taskset -c 0-27
$HOME/NPB/NPB3.3-MPI/bin/bt.C.36 -> Time in seconds = 111.71
The performance when the system is undersubscribed (i.e. 16 MPI
processes over 28 CPUs) seems pretty similar in both versions:
> $HOME/openmpi-bin-1.10.1/bin/mpirun -np 16 taskset -c 0-27
$HOME/NPB/NPB3.3-MPI/bin/bt.C.16 -> Time in seconds = 96.78
> $HOME/openmpi-bin-1.10.2/bin/mpirun -np 16 taskset -c 0-27
$HOME/NPB/NPB3.3-MPI/bin/bt.C.16 -> Time in seconds = 99.35
Any idea of what is happening?
Thanks
PS. As the system has 28 cores with hyperthreaded enabled, I use taskset
to ensure that only one thread per core is used.
PS2. I have tested also versions 1.10.6, 2.0.1 and 2.0.2, and the
degradation also occurs.
http://bsc.es/disclaimer
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users