Hi, Just to note for the users who might read this: the report is valid, some non-thread-parallel code is the reason and we hope to have a fix for 4.6.0.
For updates, follow the issue #1211. Cheers, -- Szilárd On Wed, Jan 16, 2013 at 4:45 PM, Berk Hess <g...@hotmail.com> wrote: > > The issue I'm referring to is about a factor of 2 in update and > constraints, but here it's much more. > I just found out that the SD update is not OpenMP threaded (and I even > noted in the code why this is). > I reopened the issue and will find a solution. > > Cheers. > > Berk > > ---------------------------------------- > > Date: Wed, 16 Jan 2013 16:20:32 +0100 > > Subject: Re: [gmx-users] >60% slowdown with GPU / verlet and sd > integrator > > From: mark.j.abra...@gmail.com > > To: gmx-users@gromacs.org > > > > We should probably note this effect on the wiki somewhere? > > > > Mark > > > > On Wed, Jan 16, 2013 at 3:44 PM, Berk Hess <g...@hotmail.com> wrote: > > > > > > > > Hi, > > > > > > Unfortunately this is not a bug, but a feature! > > > We made the non-bondeds so fast on the GPU that integration and > > > constraints take more time. > > > The sd1 integrator is almost as fast as the md integrator, but slightly > > > less accurate. > > > In most cases that's a good solution. > > > > > > I closed the redmine issue: > > > http://redmine.gromacs.org/issues/1121 > > > > > > Cheers, > > > > > > Berk > > > > > > ---------------------------------------- > > > > Date: Wed, 16 Jan 2013 17:26:18 +0300 > > > > Subject: Re: [gmx-users] >60% slowdown with GPU / verlet and sd > > > integrator > > > > From: jmsstarli...@gmail.com > > > > To: gmx-users@gromacs.org > > > > > > > > Hi all! > > > > > > > > I've also done some calculations with the SD integraator used as the > > > > thermostat ( without t_coupl ) with the system of 65k atoms I > obtained > > > > 10ns\day performance on gtc 670 and 4th core i5. > > > > I haventrun any simulations with MD integrator yet so It should test > it. > > > > > > > > James > > > > > > > > 2013/1/15 Szilárd Páll <szilard.p...@cbr.su.se>: > > > > > Hi Floris, > > > > > > > > > > Great feedback, this needs to be looked into. Could you please > file a > > > bug > > > > > report, preferably with a tpr (and/or all inputs) as well as log > files. > > > > > > > > > > Thanks, > > > > > > > > > > -- > > > > > Szilárd > > > > > > > > > > > > > > > On Tue, Jan 15, 2013 at 3:50 AM, Floris Buelens < > > > floris_buel...@yahoo.com>wrote: > > > > > > > > > >> Hi, > > > > >> > > > > >> > > > > >> I'm seeing MD simulation running a lot slower with the sd > integrator > > > than > > > > >> with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found > no > > > > >> documented indication that this should be the case. > > > > >> Timings and logs pasted in below - wall time seems to be > accumulating > > > up > > > > >> in Update and Rest, adding up to >60% of total. The effect is > still > > > there > > > > >> without GPU, ca. 40% slowdown when switching from group to Verlet > > > with the > > > > >> SD integrator > > > > >> System: Xeon E5-1620, 1x GTX 680, gromacs > > > > >> 4.6-beta3-dev-20130107-e66851a-unknown, GCC 4.4.6 and 4.7.0 > > > > >> > > > > >> I didn't file a bug report yet as I don't have much variety of > testing > > > > >> conditions available right now, I hope someone else has a moment > to > > > try to > > > > >> reproduce? > > > > >> > > > > >> Timings: > > > > >> > > > > >> cpu (ns/day) > > > > >> sd / verlet: 6 > > > > >> sd / group: 10 > > > > >> md / verlet: 9.2 > > > > >> md / group: 11.4 > > > > >> > > > > >> gpu (ns/day) > > > > >> sd / verlet: 11 > > > > >> md / verlet: 29.8 > > > > >> > > > > >> > > > > >> > > > > >> **************MD integrator, GPU / verlet > > > > >> > > > > >> M E G A - F L O P S A C C O U N T I N G > > > > >> > > > > >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet > kernels > > > > >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > > > > >> W3=SPC/TIP3p W4=TIP4p (single or pairs) > > > > >> V&F=Potential and force V=Potential only F=Force only > > > > >> > > > > >> Computing: M-Number M-Flops % Flops > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Pair Search distance check 1244.988096 11204.893 0.1 > > > > >> NxN QSTab Elec. + VdW [F] 194846.615488 7988711.235 91.9 > > > > >> NxN QSTab Elec. + VdW [V&F] 2009.923008 118585.457 1.4 > > > > >> 1,4 nonbonded interactions 31.616322 2845.469 0.0 > > > > >> Calc Weights 703.010574 25308.381 0.3 > > > > >> Spread Q Bspline 14997.558912 29995.118 0.3 > > > > >> Gather F Bspline 14997.558912 89985.353 1.0 > > > > >> 3D-FFT 47658.567884 381268.543 4.4 > > > > >> Solve PME 20.580896 1317.177 0.0 > > > > >> Shift-X 9.418458 56.511 0.0 > > > > >> Angles 21.879375 3675.735 0.0 > > > > >> Propers 48.599718 11129.335 0.1 > > > > >> Virial 23.498403 422.971 0.0 > > > > >> Stop-CM 2.436616 24.366 0.0 > > > > >> Calc-Ekin 93.809716 2532.862 0.0 > > > > >> Lincs 12.147284 728.837 0.0 > > > > >> Lincs-Mat 131.328750 525.315 0.0 > > > > >> Constraint-V 246.633614 1973.069 0.0 > > > > >> Constraint-Vir 23.486379 563.673 0.0 > > > > >> Settle 74.129451 23943.813 0.3 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Total 8694798.114 100.0 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> > > > > >> R E A L C Y C L E A N D T I M E A C C O U N T I N G > > > > >> > > > > >> Computing: Nodes Th. Count Wall t (s) G-Cycles % > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Neighbor search 1 8 201 0.944 27.206 3.3 > > > > >> Launch GPU ops. 1 8 5001 0.371 10.690 1.3 > > > > >> Force 1 8 5001 2.185 62.987 7.7 > > > > >> PME mesh 1 8 5001 15.033 433.441 52.9 > > > > >> Wait GPU local 1 8 5001 1.551 44.719 5.5 > > > > >> NB X/F buffer ops. 1 8 9801 0.538 15.499 1.9 > > > > >> Write traj. 1 8 2 0.725 20.912 2.6 > > > > >> Update 1 8 5001 2.318 66.826 8.2 > > > > >> Constraints 1 8 5001 2.898 83.551 10.2 > > > > >> Rest 1 1.832 52.828 6.5 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Total 1 28.394 818.659 100.0 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> PME spread/gather 1 8 10002 8.745 252.144 30.8 > > > > >> PME 3D-FFT 1 8 10002 5.392 155.458 19.0 > > > > >> PME solve 1 8 5001 0.869 25.069 3.1 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> GPU timings > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Computing: Count Wall t (s) ms/step % > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Pair list H2D 201 0.080 0.397 0.4 > > > > >> X / q H2D 5001 0.698 0.140 3.7 > > > > >> Nonbonded F kernel 4400 14.856 3.376 79.1 > > > > >> Nonbonded F+ene k. 400 1.667 4.167 8.9 > > > > >> Nonbonded F+prune k. 100 0.441 4.407 2.3 > > > > >> Nonbonded F+ene+prune k. 101 0.535 5.300 2.9 > > > > >> F D2H 5001 0.501 0.100 2.7 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Total 18.778 3.755 100.0 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> Force evaluation time GPU/CPU: 3.755 ms/3.443 ms = 1.091 > > > > >> For optimal performance this ratio should be close to 1! > > > > >> > > > > >> > > > > >> Core t (s) Wall t (s) (%) > > > > >> Time: 221.730 28.394 780.9 > > > > >> (ns/day) (hour/ns) > > > > >> Performance: 30.435 0.789 > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> *****************SD integrator, GPU / verlet > > > > >> > > > > >> M E G A - F L O P S A C C O U N T I N G > > > > >> > > > > >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet > kernels > > > > >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > > > > >> W3=SPC/TIP3p W4=TIP4p (single or pairs) > > > > >> V&F=Potential and force V=Potential only F=Force only > > > > >> > > > > >> Computing: M-Number M-Flops % Flops > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Pair Search distance check 1254.604928 11291.444 0.1 > > > > >> NxN QSTab Elec. + VdW [F] 197273.059584 8088195.443 91.6 > > > > >> NxN QSTab Elec. + VdW [V&F] 2010.150784 118598.896 1.3 > > > > >> 1,4 nonbonded interactions 31.616322 2845.469 0.0 > > > > >> Calc Weights 703.010574 25308.381 0.3 > > > > >> Spread Q Bspline 14997.558912 29995.118 0.3 > > > > >> Gather F Bspline 14997.558912 89985.353 1.0 > > > > >> 3D-FFT 47473.892284 379791.138 4.3 > > > > >> Solve PME 20.488896 1311.289 0.0 > > > > >> Shift-X 9.418458 56.511 0.0 > > > > >> Angles 21.879375 3675.735 0.0 > > > > >> Propers 48.599718 11129.335 0.1 > > > > >> Virial 23.498403 422.971 0.0 > > > > >> Update 234.336858 7264.443 0.1 > > > > >> Stop-CM 2.436616 24.366 0.0 > > > > >> Calc-Ekin 93.809716 2532.862 0.0 > > > > >> Lincs 24.289712 1457.383 0.0 > > > > >> Lincs-Mat 262.605000 1050.420 0.0 > > > > >> Constraint-V 246.633614 1973.069 0.0 > > > > >> Constraint-Vir 23.486379 563.673 0.0 > > > > >> Settle 148.229268 47878.054 0.5 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Total 8825351.354 100.0 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> > > > > >> R E A L C Y C L E A N D T I M E A C C O U N T I N G > > > > >> > > > > >> Computing: Nodes Th. Count Wall t (s) G-Cycles % > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Neighbor search 1 8 201 0.945 27.212 1.2 > > > > >> Launch GPU ops. 1 8 5001 0.384 11.069 0.5 > > > > >> Force 1 8 5001 2.180 62.791 2.7 > > > > >> PME mesh 1 8 5001 15.029 432.967 18.5 > > > > >> Wait GPU local 1 8 5001 3.327 95.844 4.1 > > > > >> NB X/F buffer ops. 1 8 9801 0.542 15.628 0.7 > > > > >> Write traj. 1 8 2 0.749 21.582 0.9 > > > > >> Update 1 8 5001 28.044 807.908 34.5 > > > > >> Constraints 1 8 10002 5.562 160.243 6.8 > > > > >> Rest 1 24.488 705.458 30.1 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Total 1 81.250 2340.701 100.0 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> PME spread/gather 1 8 10002 8.769 252.615 10.8 > > > > >> PME 3D-FFT 1 8 10002 5.367 154.630 6.6 > > > > >> PME solve 1 8 5001 0.865 24.910 1.1 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> GPU timings > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Computing: Count Wall t (s) ms/step % > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Pair list H2D 201 0.080 0.398 0.4 > > > > >> X / q H2D 5001 0.699 0.140 3.4 > > > > >> Nonbonded F kernel 4400 16.271 3.698 79.6 > > > > >> Nonbonded F+ene k. 400 1.827 4.568 8.9 > > > > >> Nonbonded F+prune k. 100 0.482 4.816 2.4 > > > > >> Nonbonded F+ene+prune k. 101 0.584 5.787 2.9 > > > > >> F D2H 5001 0.505 0.101 2.5 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> Total 20.448 4.089 100.0 > > > > >> > > > > >> > > > > ----------------------------------------------------------------------------- > > > > >> > > > > >> Force evaluation time GPU/CPU: 4.089 ms/3.441 ms = 1.188 > > > > >> For optimal performance this ratio should be close to 1! > > > > >> > > > > >> Core t (s) Wall t (s) (%) > > > > >> Time: 643.440 81.250 791.9 > > > > >> (ns/day) (hour/ns) > > > > >> Performance: 10.636 2.256 > > > > >> -- > > > > >> gmx-users mailing list gmx-users@gromacs.org > > > > >> http://lists.gromacs.org/mailman/listinfo/gmx-users > > > > >> * Please search the archive at > > > > >> http://www.gromacs.org/Support/Mailing_Lists/Search before > posting! > > > > >> * Please don't post (un)subscribe requests to the list. Use the > > > > >> www interface or send it to gmx-users-requ...@gromacs.org. > > > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > >> > > > > > -- > > > > > gmx-users mailing list gmx-users@gromacs.org > > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > > > > * Please don't post (un)subscribe requests to the list. Use the > > > > > www interface or send it to gmx-users-requ...@gromacs.org. > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > -- > > > > gmx-users mailing list gmx-users@gromacs.org > > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > > > * Please don't post (un)subscribe requests to the list. Use the > > > > www interface or send it to gmx-users-requ...@gromacs.org. > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > -- > > > gmx-users mailing list gmx-users@gromacs.org > > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > > * Please don't post (un)subscribe requests to the list. Use the > > > www interface or send it to gmx-users-requ...@gromacs.org. > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > -- > > gmx-users mailing list gmx-users@gromacs.org > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > * Please don't post (un)subscribe requests to the list. Use the > > www interface or send it to gmx-users-requ...@gromacs.org. > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists