Re: [OMPI users] scaling problem with openmpi

Roman Martonak Tue, 19 May 2009 08:21:01 -0400

I am using CPMD 3.11.1, not cp2k. Below are the timings for 20 steps
of MD for 32 water molecules (one of standard CPMD benchmarks) with
openmpi, mvapich and Intel MPI, running on 64 cores (8 blades, each
has 2 quad-core 2.2 GHz AMD Barcelona CPUs).


openmpi-1.3.2                           time per one MD step is 3.66 s
summary:
       CPU TIME :    0 HOURS  1 MINUTES 23.85 SECONDS
   ELAPSED TIME :    0 HOURS  1 MINUTES 25.90 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS   70020/ 319128 kBYTES ***

 PROGRAM CPMD ENDED AT:   Tue May 19 11:12:06 2009

 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE                8585. BYTES              48447.  =
 = BROADCAST                  19063. BYTES                396.  =
 = GLOBAL SUMMATION           32010. BYTES                329.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           102033. BYTES               4221.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              209.014  MB/S           1.990 SEC  =
 = BROADCAST                  10.485  MB/S           0.720 SEC  =
 = GLOBAL SUMMATION          154.115  MB/S           0.410 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM             7.802  MB/S          55.200 SEC  =
 = SYNCHRONISATION                                   2.440 SEC  =
 ================================================================

mvapich-1.1.0                            time per one MD step is 2.55 s
summary:
       CPU TIME :    0 HOURS  0 MINUTES 59.79 SECONDS
   ELAPSED TIME :    0 HOURS  1 MINUTES  0.65 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS   59072/ 182960 kBYTES ***

 PROGRAM CPMD ENDED AT:   Tue May 19 10:34:56 2009

 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE                8585. BYTES              48447.  =
 = BROADCAST                  19063. BYTES                396.  =
 = GLOBAL SUMMATION           32010. BYTES                329.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           102033. BYTES               4221.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              170.466  MB/S           2.440 SEC  =
 = BROADCAST                   6.863  MB/S           1.100 SEC  =
 = GLOBAL SUMMATION           61.948  MB/S           1.020 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM            14.815  MB/S          29.070 SEC  =
 = SYNCHRONISATION                                   0.400 SEC  =
 ================================================================

Intel MPI 3.2.1.009                 time per one MD step is 1.58 s

summary:
       CPU TIME :    0 HOURS  0 MINUTES 36.11 SECONDS
   ELAPSED TIME :    0 HOURS  0 MINUTES 38.16 SECONDS
 ***      CPMD| SIZE OF THE PROGRAM IS   65196/ 178736 kBYTES ***

 PROGRAM CPMD ENDED AT:   Tue May 19 10:17:17 2009

 ================================================================
 = COMMUNICATION TASK  AVERAGE MESSAGE LENGTH  NUMBER OF CALLS  =
 = SEND/RECEIVE                8585. BYTES              48447.  =
 = BROADCAST                  19063. BYTES                396.  =
 = GLOBAL SUMMATION           32010. BYTES                329.  =
 = GLOBAL MULTIPLICATION          0. BYTES                  1.  =
 = ALL TO ALL COMM           102033. BYTES               4221.  =
 =                             PERFORMANCE          TOTAL TIME  =
 = SEND/RECEIVE              815.562  MB/S           0.510 SEC  =
 = BROADCAST                 754.914  MB/S           0.010 SEC  =
 = GLOBAL SUMMATION          180.535  MB/S           0.350 SEC  =
 = GLOBAL MULTIPLICATION       0.000  MB/S           0.001 SEC  =
 = ALL TO ALL COMM            38.696  MB/S          11.130 SEC  =
 = SYNCHRONISATION                                   0.550 SEC  =
 ================================================================

Clearly the whole difference is basically in the ALL TO ALL COMM time.
Running on 1 blade (8 cores) all three MPI implementations have very
similar same time per step of about 8.6 s. Openmpi was ran with the
--mca mpi_paffinity_alone 1 option, in mvapich and IntelMPI no
particular option was used. I was told by HP that there could be an
increased latency when all 8 cores in one blade communicate via a
single port HCA to Infiniband fabric but even if that is the case I am
still wondering how there can be such difference between the
implementations. For CPMD I found that using the keyword TASKGROUP
which introduces a different way of parallelization it is possible to
improve on the openmpi time substantially and lower the time from 3.66
s to 1.67 s, almost to the value found with Intel MPI. Is there
perhaps any openmpi parameter that could be tuned to help the scaling,
without the use of TASKGROUP (maybe some tuning of collective
operations) ?

Thanks, best regards

Roman


On Mon, May 18, 2009 at 6:58 PM, Noam Bernstein
<noam.bernst...@nrl.navy.mil> wrote:
>
> On May 18, 2009, at 12:50 PM, Pavel Shamis (Pasha) wrote:
>
>> Roman,
>> Can you please share with us Mvapich numbers that you get . Also what is
>> mvapich version that you use.
>> Default mvapich and openmpi IB tuning is very similar, so it is strange to
>> see so big difference. Do you know what kind of collectives operation is
>> used in this specific application.
>
> This code does a bunch of parallel things in various different places
> (mostly dense matrix math, and some FFT stuff that may or may not
> be parallelized).  In the standard output there's a summary of the time
> taken by various MPI routines.  Perhaps Roman can send them?  The
> code also uses ScaLAPACK, but I'm not sure how CP2K labels the
> timing for those routines in the output.
>
>                                                                        Noam
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] scaling problem with openmpi

Reply via email to