Hello We have been busy this week comparing five different MPI-implementations on a small test cluster. Several notable differences have been observed but I will limit myself to one perticular test in this e-mail (64-rank Intel MPI Benchmark alltoall on 8 dual quad nodes).
Lets start with the hardware and software conditions: Hardware: 16 nodes (8 used for this test) each with two Clovertown cpus (X5355/2.66GHz, quad-core) and 16G RAM. Interconnected with IB 4x SDR on PCI-express (MT25208). Software: Centos-4.3 x86_64 2.6.9-34.0.2smp with OFED-1.1 and intel compilers 9.1.04x MPIs tested: OpenMPI-1.1.3b4, OpenMPI-1.2b3, MVAPICH-0.9.8, MVAPICH2-0.9.8 and ScaMPI-3.10.4 (ScaMPI is a commercial mpi from Scali). Main question to the OpenMPI developers: why does OpenMPI behave so badly between approx. 10 and 1000 bytes? Plot: http://www.nsc.liu.se/~cap/all2all_64pe_clover.png Notes: * The OpenMPI run tagged 'basic' was done with "-mca coll self,sm,basic" all other runs were done with whatever setting is the default. * Both x- and y-axis is log scaled. The y-axis labels are a bit hard to read but the first "5.0000" is 50us, the 2nd 500us and so on. ompi_info: http://www.nsc.liu.se/~cap/openmpi-1.1.3b4-intel91.info http://www.nsc.liu.se/~cap/openmpi-1.2b3-intel91.info Best Regards, Peter K -- ------------------------------------------------------------ Peter Kjellström National Supercomputer Centre, Linköping Sweden
pgpFFP0IesVyC.pgp
Description: PGP signature