Re: [gmx-users] Parallel Use of gromacs - parallel run is 20 times SLOWER than single node run

David van der Spoel Wed, 18 Jul 2007 00:37:04 -0700

Jim Kress wrote:

I ran a parallel (mpi) compiled version of gromacs using the followingcommand line:


$ mpirun -np 5 mdrun_mpi -s topol.tpr -np 5 -v

At the end of the file md0.log I found:

        M E G A - F L O P S   A C C O U N T I N G

        Parallel run - timing based on wallclock.
   RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
   T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
   NF=No Forces

 Computing:                        M-Number         M-Flops  % of Flops
-----------------------------------------------------------------------
 Coulomb + LJ [W4-W4]            876.631638   234060.647346    88.0
 Outer nonbonded loop            692.459088     6924.590880     2.6
 NS-Pairs                        457.344228     9604.228788     3.6
 Reset In Box                     13.782888      124.045992     0.0
 Shift-X                         137.773776      826.642656     0.3
 CG-CoM                            3.445722       99.925938     0.0
 Sum Forces                      206.660664      206.660664     0.1
 Virial                           70.237023     1264.266414     0.5
 Update                           68.886888     2135.493528     0.8
 Stop-CM                          68.880000      688.800000     0.3
 P-Coupling                       68.886888      413.321328     0.2
 Calc-Ekin                        68.893776     1860.131952     0.7
 Constraint-V                     68.886888      413.321328     0.2
 Constraint-Vir                   51.675498     1240.211952     0.5
 Settle                           17.225166     5563.728618     2.1
 Virtual Site 3                   17.221722      637.203714     0.2
-----------------------------------------------------------------------
 Total                                        266063.221098   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:   3344.000   3344.000    100.0
                       55:44
               (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:      0.262     79.564      0.517     46.444

Detailed load balancing info in percentage of average
Type        NODE:  0   1   2   3   4 Scaling
-------------------------------------------
Coulomb + LJ [W4-W4]:118  94 101 104  80     84%
Outer nonbonded loop: 97  98  98 103 102     96%
       NS-Pairs:116  94 101 104  82     85%
   Reset In Box: 99 100  99 100  99     99%
        Shift-X: 99 100  99 100  99     99%
         CG-CoM: 99 100  99 100  99     99%
     Sum Forces: 99 100  99  99  99     99%
         Virial: 99 100  99 100  99     99%
         Update: 99 100  99 100  99     99%
        Stop-CM: 99 100  99 100  99     99%
     P-Coupling: 99 100  99 100  99     99%
      Calc-Ekin: 99 100  99 100  99     99%
   Constraint-V: 99 100  99 100  99     99%
 Constraint-Vir: 99 100  99 100  99     99%
         Settle: 99 100  99 100  99     99%
 Virtual Site 3: 99 100  99 100  99     99%

    Total Force:118  94 101 104  81     84%


    Total Shake: 99 100  99 100  99     99%


Total Scaling: 85% of max performance

Finished mdrun on node 0 Sat Jul 14 23:32:32 2007

Now, I tried the same calculation on one node and found the following atthe end of the file md.log:


        M E G A - F L O P S   A C C O U N T I N G

   RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
   T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
   NF=No Forces

 Computing:                        M-Number         M-Flops  % of Flops
-----------------------------------------------------------------------
 Coulomb + LJ [W4-W4]            875.182588   233673.750996    88.0
 Outer nonbonded loop            688.853376     6888.533760     2.6
 NS-Pairs                        456.997574     9596.949054     3.6
 Reset In Box                     13.782888      124.045992     0.0
 Shift-X                         137.773776      826.642656     0.3
 CG-CoM                            3.445722       99.925938     0.0
 Virial                           69.156915     1244.824470     0.5
 Update                           68.886888     2135.493528     0.8
 Stop-CM                          68.880000      688.800000     0.3
 P-Coupling                       68.886888      413.321328     0.2
 Calc-Ekin                        68.893776     1860.131952     0.7
 Constraint-V                     68.886888      413.321328     0.2
 Constraint-Vir                   51.675498     1240.211952     0.5
 Settle                           17.225166     5563.728618     2.1
 Virtual Site 3                   17.221722      637.203714     0.2
-----------------------------------------------------------------------
 Total                                        265406.885286   100.0
-----------------------------------------------------------------------

               NODE (s)   Real (s)      (%)
       Time:    165.870    167.000     99.3
                       2:45
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:      5.276      1.600     10.418      2.304
Finished mdrun on node 0 Thu Jul 12 15:17:49 2007

While I didn't expect to find pure linear scaling with gromacs.However, I didn't expect to find a massive INCREASE in computationaleffort across my 5 node, gigabit ethernet cluster.


Anybody understand why this happened?

it is just the communication overhead that kills you. with infinibandyou might be able to scale it a bit. it will be better in the nextrelease, but you should realize that communication times are counted inmilliseconds using TCP/IP, and that means 1 million cycles on a GHzchip. In that time gromacs computes 5276 interactions for you (see above).

It looks like you're doing a small TIP4P box, smaller things scaleworse. With the development code we have been able to scale largeprotein/water systems to 30-40 processors on Gbit as well.


--
David van der Spoel, Ph.D.
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205. Fax: +4618511755.
[EMAIL PROTECTED]       [EMAIL PROTECTED]   http://folding.bmc.uu.se
_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!

Please don't post (un)subscribe requests to the list. Use thewww interface or send it to [EMAIL PROTECTED]

Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: [gmx-users] Parallel Use of gromacs - parallel run is 20 times SLOWER than single node run

Reply via email to