Hi, On Nov 13, 2012, at 2:22 PM, Thomas Schlesier <[email protected]> wrote:
> Sorry for reposting, but forgot one comment and added it now below: > > Am 13.11.2012 06:16, schrieb gmx-users-request at gromacs.org: > >> Dear all, > >> >i did some scaling tests for a cluster and i'm a little bit clueless > >> >about the results. > >> >So first the setup: > >> > > >> >Cluster: > >> >Saxonid 6100, Opteron 6272 16C 2.100GHz, Infiniband QDR > >> >GROMACS version: 4.0.7 and 4.5.5 > >> >Compiler: GCC 4.7.0 > >> >MPI: Intel MPI 4.0.3.008 > >> >FFT-library: ACML 5.1.0 fma4 > >> > > >> >System: > >> >895 spce water molecules > > this is a somewhat small system I would say. > > > >> >Simulation time: 750 ps (0.002 fs timestep) > >> >Cut-off: 1.0 nm > >> >but with long-range correction ( DispCorr = EnerPres ; PME (standard > >> >settings) - but in each case no extra CPU solely for PME) > >> >V-rescale thermostat and Parrinello-Rahman barostat > >> > > >> >I get the following timings (seconds), whereas is calculated as the time > >> >which would be needed for 1 CPU (so if a job on 2 CPUs took X s the time > >> >would be 2 * X s). > >> >These timings were taken from the *.log file, at the end of the > >> >'real cycle and time accounting' - section. > >> > > >> >Timings: > >> >gmx-version 1cpu 2cpu 4cpu > >> >4.0.7 4223 3384 3540 > >> >4.5.5 3780 3255 2878 > > Do you mean CPUs or CPU cores? Are you using the IB network or are you > > running single-node? > > Meant number of cores and all cores are on the same node. > > > > >> > > >> >I'm a little bit clueless about the results. I always thought, that if i > >> >have a non-interacting system and double the amount of CPUs, i > > You do use PME, which means a global interaction of all charges. > > > >> >would get a simulation which takes only half the time (so the times as > >> >defined above would be equal). If the system does have interactions, i > >> >would lose some performance due to communication. Due to node imbalance > >> >there could be a further loss of performance. > >> > > >> >Keeping this in mind, i can only explain the timings for version 4.0.7 > >> >2cpu -> 4cpu (2cpu a little bit faster, since going to 4cpu leads to more > >> >communication -> loss of performance). > >> > > >> >All the other timings, especially that 1cpu takes in each case longer > >> >than the other cases, i do not understand. > >> >Probalby the system is too small and / or the simulation time is too > >> >short for a scaling test. But i would assume that the amount of time to > >> >setup the simulation would be equal for all three cases of one > >> >GROMACS-version. For somewhat cleaner benchmark numbers excluding any setup and load balancing equilibration time, you can set the "-resethway" switch to mdrun. This way, it will only report timings for the last half of the time steps. > >> >Only other explaination, which comes to my mind, would be that something > >> >went wrong during the installation of the programs? I think it is the small size of your system. Try a benchmark with e.g. 10k particles, only if that looks as bad I would assume something is wrong with the installation. Carsten > > You might want to take a closer look at the timings in the md.log output > > files, this will > > give you a clue where the bottleneck is, and also tell you about the > > communication-computation > > ratio. > > > > Best, > > Carsten > > > > > >> > > >> >Please, can somebody enlighten me? > >> > > > Here are the timings from the log-file (for GMX 4.5.5): > > #cores: 1 2 4 (all cores are on the same node) > Computing: > -------------------- > Domain decomp. 41.7 47.8 up > DD comm. load 0.0 0.0 - > Comm. coord. 17.8 30.5 up > Neighbor search 614.1 355.4 323.7 down > Force 2401.6 1968.7 1676.0 down > Wait + Comm. F 15.1 31.4 up > PME mesh 596.3 710.4 639.1 - > Write traj. 1.2 0.8 0.6 down > Update 49.7 44.0 37.6 down > Constraints 79.3 70.4 60.0 down > Comm. energies 3.2 5.3 up > Rest 38.3 27.1 25.4 down > -------------------- > Total 3780.5 3254.6 2877.5 down > -------------------- > -------------------- > PME redist. X/F 133.0 120.5 down > PME spread/gather 511.3 465.7 396.8 down > PME 3D-FFT 59.4 88.9 102.2 up > PME solve 25.2 22.2 18.9 down > -------------------- > > The two calculations-parts for which the most time is saved for going > parallel are: > 1) Forces > 2) Neighbor search (ok, going from 2cores to 4cores does not make a big > differences, but from 1core to 2 or 4 saves much time) > > For GMX 4.0.7 ist looks similar, whereas the difference between 2 and 4 cores > is not so high as for GMX 4.5.5 > > Is there any good explains for this time saving? > I would have thought that the system has a set number of interaction and > one has to calculate all these interactions. If i divide the set in 2 or > 4 smaller sets, the number of interactions shouldn't change and so the > calculation time shouldn't change? > > Or is something fancy in the algorithm, which reducces the time spent > for calling up the arrays if the calculation is for a smaller set of > interactions? > -- > gmx-users mailing list [email protected] > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the www interface > or send it to [email protected]. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- Dr. Carsten Kutzner Max Planck Institute for Biophysical Chemistry Theoretical and Computational Biophysics Am Fassberg 11, 37077 Goettingen, Germany Tel. +49-551-2012313, Fax: +49-551-2012302 http://www.mpibpc.mpg.de/grubmueller/kutzner -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

