On Tuesday 19 May 2009, Roman Martonak wrote:
...
> openmpi-1.3.2                           time per one MD step is 3.66 s
>    ELAPSED TIME :    0 HOURS  1 MINUTES 25.90 SECONDS
>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>  = ALL TO ALL COMM             7.802  MB/S          55.200 SEC  =
...
> mvapich-1.1.0                            time per one MD step is 2.55 s
>    ELAPSED TIME :    0 HOURS  1 MINUTES  0.65 SECONDS
>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>  = ALL TO ALL COMM            14.815  MB/S          29.070 SEC  =
...
> Intel MPI 3.2.1.009                 time per one MD step is 1.58 s
>    ELAPSED TIME :    0 HOURS  0 MINUTES 38.16 SECONDS
>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>  = ALL TO ALL COMM            38.696  MB/S          11.130 SEC  =
...
> Clearly the whole difference is basically in the ALL TO ALL COMM time.
> Running on 1 blade (8 cores) all three MPI implementations have very
> similar same time per step of about 8.6 s.

My guess is that what you see is the difference in MPI_Alltoall performance 
for the different MPI-implementations (running in your env. on your hw.).

You could write a trivial loop like this and try on the three MPIs:

 MPI_init
 for i in 1 to 4221
   MPI_Alltoall(size=102033, ...)
 MPI_finialize

And time it to comfirm this.

> For CPMD I found that using the keyword TASKGROUP
> which introduces a different way of parallelization it is possible to
> improve on the openmpi time substantially and lower the time from 3.66
> s to 1.67 s, almost to the value found with Intel MPI.

I guess this changed what kind of communication is done and you no longer have 
to do 4221x 100Kbytes alltoall that seems to hurt OpenMPI so much.

/Peter

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to