Hi, The Intel compilers are only recommended for pre-Bulldozer AMD processors (K10: Magny-Cours, Intanbul, Barcelona, etc.). On these, PME non-bonded kernels (not the RF or plain cut-off!) are 10-30% slower with gcc than with icc. The icc-gcc difference is the smallest with gcc 4.7, typically around 10-15% with Verlet scheme, AFAIR a bit larger with the group scheme.
This is a performance issue in gcc specific to our non-bonded kernels on AMD K10. On all other Intel and AMD architectures we tried, gcc 4.7/4.8 always matched or slightly outperformed icc 12 (13 is typically slightly slower). Note that other parts of the code (where gcc will do some AMD-specific optimizations (while Intel won't) can/will be faster with gcc, e.g. PME is typically faster. Therefore, the icc-gcc difference on AMD K10 will depend on factors like cut-off (PP/PME ratio) and cut-off scheme, but typically icc will result in overall slightly (1-10%) faster binaries. Your processors are K10/Istanbul, so icc should be faster in your case. To see the details of where the performance difference is, I suggest you compare the performance stats table at the end of the log file; tip: for easier comparison run a fixed number of steps and compare the cycles columns (e.g. in a diff tool). Cheers, -- Szilárd On Wed, Jun 26, 2013 at 9:30 AM, Djurre de Jong-Bruinink <djurredej...@yahoo.com> wrote: >>You're using a real-MPI process per core, and you have six cores per > > > I was using the current setup, which is indeed not fully optimized, just to > see how much the speed-up is between intel and gcc compiled. > > >>processor. The recommended procedure is to map cores to OpenMP >>threads, and choose the number of MPI processes per processor (and >>thus the number of OpenMP threads per MPI process) to maximize >>performance. See >>http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Multi-level_parallelization.3a_MPI.2fthread-MPI_.2b_OpenMP > > I have optimized this before. In my experience one only gets a speedup from > using openMP at high parrellization (+/-200 particles per PP core) and if I > use #mpi = total number of cores AND 2 openMP threads per mpi process. The > total number of processes is then double the number of cores, so you are > effectively overloading/hyperthreading the cores (and thus the number of > particles per PP process is +/- 100). I have a similar experience on a newer, > intel based system, although there the advantage already starts at lower > parrallelization. I was wondering if openMP is always used in combination > with hyperthreading? No, not necessarily/not only. While multi-threading should *in theory* nearly always help, there are two caveats: - There are parts of the code, mostly data/cache intensive like integration or domain-decomposition which (unlike e.g. the PP force calculation which is flop-intensive) don't scale that very well with threads. Parallelization inefficiencies get amplified with many threads both on AMD (due to its weaker cache performance wrt Intel) as well as on Intel with HT (2x threads banding the same cache). - OpenMP has an additional overhead which should be negligible in most cases, but not always (e.g not . At the same time, multi-threading has numerous advantages among them: Therefore, when running without DD (a single process), using OpenMP only is typically fastest on Intel with up to 12-24 threads (even across sockets and with HT), and with 4-6 threads on AMD. However, with DD using HT > > On the machine from my previous email, using openMP gives the warning: > > "Can not set thread affinities on the current platform. On NUMA systems this > can cause performance degradation. If you think your platform should support > setting affinities, contact the GROMACS developers." > > > With the gcc compiled version the, using 72 cores\700 particles/PP core this > indeed leads a slightly lower performance. However using the intel compiled > version the simulations get orders of magnitude slower. > > > Groetnis, > Djurre > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists