Vaclav Horacek wrote:
Hello,

I just bumped into the site www.yasara.org, who claims that their just released 
new MD algorithms are 60% faster then Gromacs.
Actually they dont say 'Gromacs', but 'closest competitor', which I assume is 
Gromacs looking at the benchmark numbers.

One should always be skeptical about people who mention that they are better, but don't display their comparisons. It's very difficult to fairly compare different MD packages, because of fundamental algorithm differences and optimization levels. See discussion in the GROMACS 4 paper, for example. Even if you can design a fair test, you still need to be sure you've done the best by all codes with the compiler at hand. Further, the metric they quote (time for a single integration step) is not very useful. Anyone doing serious MD is going to run calculations for at least days, if not months - comparisons need to be over *those* timeframes. They claim to be doing PME with a 0.786nm real-space cut-off, which ought to require much smaller than 0.1nm Fourier grid spacing for the reciprocal-space part, for decent accuracy. Speed is only one part of the issue. There might be other reasons they aren't referring to peer-reviewed literature to support these claims :-)

From the numbers, I also saw that they seem to do particulary well on newer 
CPUs like Core 2 Duo and Xeon L5420, using code for SSSE3 and SSE 4.1.

They don't show performance numbers without such extensions being used, so it looks like marketing hype. I don't see SSE3 or higher being very useful at all.

I am not expert for this kind of low level stuff, but typing SSE4 into 
Wikipedia shows lots of commands that look useful for MD. For example the 
'dpps' instruction does an entire dot product at once.

IIRC, there's only one dot-product-like operation per interaction in a PME non-bonded inner loop, which is the operation for r^2= (x1-x2)^2 + (y1-y2)^2 + (z1-z2)^2, which is probably already spread out SIMD-style over several interactions with SSE or SSE2. At best you might gain 2 flops per interaction which is a percent or two. Whether that might come at a cost to the existing SSE/SSE2 SIMD is a harder question.

A single-cycle "floating-point distance to nearest integer":

y <- x - floor(x)

would be noteworthy :-)

I looked at the gmxlib/nonbonded directory and saw that SSE2 seems to be the 
most supported by Gromacs. So maybe adding support for SSE3 and SSE4 can still 
help a lot! Are there any plans for that?

Mark
_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to