Mark Abraham a écrit :
Nicolas wrote:
Hello,
I'm trying to do a benchmark with Gromacs 4 on our cluster, but I
don't completely understand the results I obtain. The system I used
is a 128 DOPC bilayer hydrated by ~18800 SPC for a total of ~70200
atoms. The size of the system is 9.6x9.6x10.1 nm^3. I'm using the
following parameters:
* nstlist = 10
* rlist = 1
* Coulombtype = PME
* rcoulomb = 1
* fourier spacing = 0.12
* vdwtype = Cutoff
* rvdw = 1
The cluster itself has got 2 procs/node connected by Ethernet 100 MB/s.
Ethernet and Gigabit ethernet are not fast enough to get reasonable
scaling. There've been quite a few posts on this topic in the last six
months.
Hmm I see you've corrected your post to refer to Infiniband with four
cores/node. That should be reasonable, I understand (but search the
archive).
You should also check that your benchmark calculation is long enough
that you are measuring a simulation time that isn't being dominated by
setup costs. Some years ago a non-MD sysadmin complained of poor
scaling when he was testing over 10 or so MD steps!
My computation are lasting at least 10 min (20000 steps). I think it's
enough. By the way, could the message passing interface can
significantly influence the performance? I'm using MPICH-1.2. Should I
consider using LAM or MPICH2?
Nicolas
I'm using mpiexec to run Gromacs. When I use -npme 2 -ddorder
interleave, I get:
ncore Perf (ns/day) PME (%)
1 0,00 0
2 0,00 0
3 0,00 0
4 1,35 28
5 1,84 31
6 2,08 27
8 2,09 21
10 2,25 17
12 2,02 15
14 2,20 13
16 2,04 11
18 2,18 10
20 2,29 9
So, above 6-8 cores, the PP nodes are spending too much time waiting
for the PME nodes and the perf forms a plateau.
That's not surprising - the heuristic is that about a third to a
quarter of the cores want to be PME-only nodes. Of course, that
depends on the relative size of the real- and reciprocal-space parts
of the calculation.
When I use -npme 0, I get:
ncore Perf (ns/day) PME (%)
1 0,43 33
2 0,92 34
3 1,34 35
4 1,69 36
5 2,17 33
6 2,56 32
8 3,24 33
10 3,84 34
12 4,34 35
14 5,05 32
16 5,47 34
18 5,54 37
20 6,13 36
I obtain much better performances when there is no PME nodes, while I
was expecting the opposite. Does someone have an explanation for
that? Does that means domain decomposition is useless below a certain
real space cutoff? I'm quite confused.
The relevant observations are for 4,5,6 and 8, for which shared-duty
is out-performing -npme 2. I think your observations support the
conclusion that your network hardware is more limiting for PME-only
nodes than shared-duty nodes. They don't support the conclusion that
DD is useless, since you haven't compared with PD.
You can play with the PME parameters to shift more load into the
real-space part - IIRC Carsten suggested a heuristic a few months back.
Mark
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before
posting!
Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
begin:vcard
fn:Nicolas Sapay
n:Sapay;Nicolas
org:University of Calgary;Biological department
adr:;;2500 University drive NW;Calgary;AB;T2N 1N4;Canada
email;internet:nsa...@ucalgary.ca
title:Post-doctoral fellow
tel;work:403-220-6869
x-mozilla-html:TRUE
url:http://moose.bio.ucalgary.ca/
version:2.1
end:vcard
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php