Jim Kress wrote:
I ran a parallel (mpi) compiled version of gromacs using the following
command line:
$ mpirun -np 5 mdrun_mpi -s topol.tpr -np 5 -v
At the end of the file md0.log I found:
M E G A - F L O P S A C C O U N T I N G
Parallel run - timing based on wallclock.
RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy
T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs)
NF=No Forces
Computing: M-Number M-Flops % of Flops
-----------------------------------------------------------------------
Coulomb + LJ [W4-W4] 876.631638 234060.647346 88.0
Outer nonbonded loop 692.459088 6924.590880 2.6
NS-Pairs 457.344228 9604.228788 3.6
Reset In Box 13.782888 124.045992 0.0
Shift-X 137.773776 826.642656 0.3
CG-CoM 3.445722 99.925938 0.0
Sum Forces 206.660664 206.660664 0.1
Virial 70.237023 1264.266414 0.5
Update 68.886888 2135.493528 0.8
Stop-CM 68.880000 688.800000 0.3
P-Coupling 68.886888 413.321328 0.2
Calc-Ekin 68.893776 1860.131952 0.7
Constraint-V 68.886888 413.321328 0.2
Constraint-Vir 51.675498 1240.211952 0.5
Settle 17.225166 5563.728618 2.1
Virtual Site 3 17.221722 637.203714 0.2
-----------------------------------------------------------------------
Total 266063.221098 100.0
-----------------------------------------------------------------------
NODE (s) Real (s) (%)
Time: 3344.000 3344.000 100.0
55:44
(Mnbf/s) (MFlops) (ns/day) (hour/ns)
Performance: 0.262 79.564 0.517 46.444
Detailed load balancing info in percentage of average
Type NODE: 0 1 2 3 4 Scaling
-------------------------------------------
Coulomb + LJ [W4-W4]:118 94 101 104 80 84%
Outer nonbonded loop: 97 98 98 103 102 96%
NS-Pairs:116 94 101 104 82 85%
Reset In Box: 99 100 99 100 99 99%
Shift-X: 99 100 99 100 99 99%
CG-CoM: 99 100 99 100 99 99%
Sum Forces: 99 100 99 99 99 99%
Virial: 99 100 99 100 99 99%
Update: 99 100 99 100 99 99%
Stop-CM: 99 100 99 100 99 99%
P-Coupling: 99 100 99 100 99 99%
Calc-Ekin: 99 100 99 100 99 99%
Constraint-V: 99 100 99 100 99 99%
Constraint-Vir: 99 100 99 100 99 99%
Settle: 99 100 99 100 99 99%
Virtual Site 3: 99 100 99 100 99 99%
Total Force:118 94 101 104 81 84%
Total Shake: 99 100 99 100 99 99%
Total Scaling: 85% of max performance
Finished mdrun on node 0 Sat Jul 14 23:32:32 2007
Now, I tried the same calculation on one node and found the following at
the end of the file md.log:
M E G A - F L O P S A C C O U N T I N G
RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy
T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs)
NF=No Forces
Computing: M-Number M-Flops % of Flops
-----------------------------------------------------------------------
Coulomb + LJ [W4-W4] 875.182588 233673.750996 88.0
Outer nonbonded loop 688.853376 6888.533760 2.6
NS-Pairs 456.997574 9596.949054 3.6
Reset In Box 13.782888 124.045992 0.0
Shift-X 137.773776 826.642656 0.3
CG-CoM 3.445722 99.925938 0.0
Virial 69.156915 1244.824470 0.5
Update 68.886888 2135.493528 0.8
Stop-CM 68.880000 688.800000 0.3
P-Coupling 68.886888 413.321328 0.2
Calc-Ekin 68.893776 1860.131952 0.7
Constraint-V 68.886888 413.321328 0.2
Constraint-Vir 51.675498 1240.211952 0.5
Settle 17.225166 5563.728618 2.1
Virtual Site 3 17.221722 637.203714 0.2
-----------------------------------------------------------------------
Total 265406.885286 100.0
-----------------------------------------------------------------------
NODE (s) Real (s) (%)
Time: 165.870 167.000 99.3
2:45
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 5.276 1.600 10.418 2.304
Finished mdrun on node 0 Thu Jul 12 15:17:49 2007
While I didn't expect to find pure linear scaling with gromacs.
However, I didn't expect to find a massive INCREASE in computational
effort across my 5 node, gigabit ethernet cluster.
Anybody understand why this happened?
it is just the communication overhead that kills you. with infiniband
you might be able to scale it a bit. it will be better in the next
release, but you should realize that communication times are counted in
milliseconds using TCP/IP, and that means 1 million cycles on a GHz
chip. In that time gromacs computes 5276 interactions for you (see above).
It looks like you're doing a small TIP4P box, smaller things scale
worse. With the development code we have been able to scale large
protein/water systems to 30-40 processors on Gbit as well.
--
David van der Spoel, Ph.D.
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205. Fax: +4618511755.
[EMAIL PROTECTED] [EMAIL PROTECTED] http://folding.bmc.uu.se
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php