Re: [gmx-users] How to let Gromacs run 60% faster..

Mark Abraham Sun, 18 Oct 2009 06:43:03 -0700

Vaclav Horacek wrote:

Hello,


I just bumped into the site www.yasara.org, who claims that their just released 
new MD algorithms are 60% faster then Gromacs.
Actually they dont say 'Gromacs', but 'closest competitor', which I assume is 
Gromacs looking at the benchmark numbers.

One should always be skeptical about people who mention that they arebetter, but don't display their comparisons. It's very difficult tofairly compare different MD packages, because of fundamental algorithmdifferences and optimization levels. See discussion in the GROMACS 4paper, for example. Even if you can design a fair test, you still needto be sure you've done the best by all codes with the compiler at hand.Further, the metric they quote (time for a single integration step) isnot very useful. Anyone doing serious MD is going to run calculationsfor at least days, if not months - comparisons need to be over *those*timeframes. They claim to be doing PME with a 0.786nm real-spacecut-off, which ought to require much smaller than 0.1nm Fourier gridspacing for the reciprocal-space part, for decent accuracy. Speed isonly one part of the issue. There might be other reasons they aren'treferring to peer-reviewed literature to support these claims :-)

From the numbers, I also saw that they seem to do particulary well on newer 
CPUs like Core 2 Duo and Xeon L5420, using code for SSSE3 and SSE 4.1.

They don't show performance numbers without such extensions being used,so it looks like marketing hype. I don't see SSE3 or higher being veryuseful at all.

I am not expert for this kind of low level stuff, but typing SSE4 into 
Wikipedia shows lots of commands that look useful for MD. For example the 
'dpps' instruction does an entire dot product at once.

IIRC, there's only one dot-product-like operation per interaction in aPME non-bonded inner loop, which is the operation for r^2= (x1-x2)^2 +(y1-y2)^2 + (z1-z2)^2, which is probably already spread out SIMD-styleover several interactions with SSE or SSE2. At best you might gain 2flops per interaction which is a percent or two. Whether that might comeat a cost to the existing SSE/SSE2 SIMD is a harder question.


A single-cycle "floating-point distance to nearest integer":

y <- x - floor(x)

would be noteworthy :-)

I looked at the gmxlib/nonbonded directory and saw that SSE2 seems to be the 
most supported by Gromacs. So maybe adding support for SSE3 and SSE4 can still 
help a lot! Are there any plans for that?


Mark
_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!

Please don't post (un)subscribe requests to the list. Use thewww interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: [gmx-users] How to let Gromacs run 60% faster..

Reply via email to