Hello,

I'm running experiments with BT NAS benchmark on OpenMPI. I've identified a very weird performance degradation of OpenMPI v1.10.2 (and later versions) when the system is oversubscribed. In particular, note the performance difference between 1.10.2 and 1.10.1 when running 36 MPI processes over 28 CPUs.

> $HOME/openmpi-bin-1.10.1/bin/mpirun -np 36 taskset -c 0-27 $HOME/NPB/NPB3.3-MPI/bin/bt.C.36 -> Time in seconds = 82.79 > $HOME/openmpi-bin-1.10.2/bin/mpirun -np 36 taskset -c 0-27 $HOME/NPB/NPB3.3-MPI/bin/bt.C.36 -> Time in seconds = 111.71

The performance when the system is undersubscribed (i.e. 16 MPI processes over 28 CPUs) seems pretty similar in both versions:

> $HOME/openmpi-bin-1.10.1/bin/mpirun -np 16 taskset -c 0-27 $HOME/NPB/NPB3.3-MPI/bin/bt.C.16 -> Time in seconds = 96.78 > $HOME/openmpi-bin-1.10.2/bin/mpirun -np 16 taskset -c 0-27 $HOME/NPB/NPB3.3-MPI/bin/bt.C.16 -> Time in seconds = 99.35

Any idea of what is happening?

Thanks

PS. As the system has 28 cores with hyperthreaded enabled, I use taskset to ensure that only one thread per core is used. PS2. I have tested also versions 1.10.6, 2.0.1 and 2.0.2, and the degradation also occurs.

http://bsc.es/disclaimer
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to