Hello,

I observe very poor scaling with openmpi on HP blade system consisting
of 8 blades (each having 2 quad-core AMD Barcelona 2.2 GHz CPU) and
interconnected with Infiniband fabric. When running the standard cpmd
32 waters test, I observe the following scaling (the numbers are
elapsed time)

openmpi-1.2.6:

using full blades (8 cores per blade)
np8            7 MINUTES 26.40 SECONDS
np16        4 MINUTES 19.91 SECONDS
np32        2 MINUTES 55.51 SECONDS
np48            2 MINUTES 38.18 SECONDS
np64            3 MINUTES 19.78 SECONDS

I tried also openmpi-1.2.8 and openmpi-1.3 and it is about the same,
openmpi-1.3 is somewhat better for 32 cores but in all cases there is
practically no scaling beyond 4 blades (32 cores) and running on 64
cores is a disaster. With Intel MPI, however, I get the following
numbers

Intel MPI-3.2.1.009

using full blades (8 cores per blade)
np8    7 MINUTES 23.19 SECONDS
np16    4 MINUTES 22.17 SECONDS
np32    2 MINUTES 50.07 SECONDS
np48    1 MINUTES 42.87 SECONDS
np64    1 MINUTES 23.76 SECONDS

so there is reasonably good scaling up to 64 cores. I am running with
the option
--mca mpi_paffinity_alone 1, I have tried also -mca btl_openib_use_srq
1 but it had only marginal effect. With mvapich I get similar scaling
as with Intel MPI. The system is running the Rocksclusters
distribution 5.1 with the mellanox ofed-1.4 roll. I would be grateful
if somebody could suggest me what could be the origin of the problem
and how to tune openmpi to get better scaling.

Many thanks in advance.

Best regards

Roman

Reply via email to