On Fri, May 17, 2013 at 2:48 PM, Djurre de Jong-Bruinink <djurredej...@yahoo.com> wrote: > > >>The answer is in the log files, in particular the performance summary >>should indicate where is the performance difference. If you post your >>log files somewhere we can probably give further tips on optimizing >>your run configurations. > > > I put the log files for 72 CPUs, using GMX455, GMX461+group and GMX461+verlet > here: > http://md.chem.rug.nl/~djurre/logs/N6_gmx455.log > http://md.chem.rug.nl/~djurre/logs/N6_gmx461_group.log > http://md.chem.rug.nl/~djurre/logs/N6_gmx461_verlet.log
That tells much more. > It would be great if you could point out some possible optimizations. Here you go: - You seem to be using 2 fs time-steps so you don't need to constrain all bonds, constraining only h-bonds is enough. This will reduce the cell size requirement posed by LINCS and will allow further decomposition. Additionally, you can also tweak the LINCS order and iteration. - With the Verlet scheme you can use OpenMP parallelization to reduce the pressure on domain-decomposition, e.g. by using 2 OpenMP threads (at least for PP) you'd need only 24 domains instead of 48. OpenMP parallelization is not very efficient on the old-ish AMD processors you are using, but 2 threads/MPI ranks should still help at very high parallelization (<200 atoms/core). - On the AMD Istambul (K10) processors that you are using gcc generates rather poor non-bonded kernel code. icc will make the non-bondeds run 10-20% faster. - With Verlet scheme you can safely increase nstlist to higher values so already at 72 cores and especially at higher core count 12,15, or even 20 might give better performance. - Your 4.6 group scheme run shows large PP-PME imbalance, try increasing the number of PME ranks! > > >>Note that with such a small system the scaling with the group scheme >>surely becomes limited by imbalance and probably it won't scale much >>further than 72 cores. At the same time, simulations with the verlet >>scheme have shown scaling to below 100 atoms/core. > > I tried running on 84 cpus (56PP cores=400 atoms/PP core), but I did get an > domain decomposition error. Maybe I could optimize -rcon and -dds further, > however although the scaling to more CPUs is better with the verlet scheme, I > think you will never win: with 72 CPUs Verlet is almost as fast as with > group at 60 CPUs, however compared to 24 cpus the scaling per CPU is already > down to 60%. > > But as Mark Abraham mentioned, it might be that my system is just to small to > get the advantage of scaling that will be there in larger systems. As I mentioned before, you *should* be able to use 2x more cores (or perhaps even more), but of course the parallel efficiency will decrease. Cheers, -- Szilard > > > Groetnis, > Djurre > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists