Re: [gmx-users] Intel vs gcc compilers

Szilárd Páll Thu, 01 Aug 2013 10:26:10 -0700

Hi,

The Intel compilers are only recommended for pre-Bulldozer AMD
processors (K10: Magny-Cours, Intanbul, Barcelona, etc.). On these,
PME non-bonded kernels (not the RF or plain cut-off!) are 10-30%
slower with gcc than with icc. The icc-gcc difference is the smallest
with gcc 4.7, typically around 10-15% with Verlet scheme, AFAIR a bit
larger with the group scheme.

This is a performance issue in gcc specific to our non-bonded kernels
on AMD K10. On all other Intel and AMD architectures we tried, gcc
4.7/4.8 always matched or slightly outperformed icc 12 (13 is
typically slightly slower).
Note that other parts of the code (where gcc will do some AMD-specific
optimizations (while Intel won't) can/will be faster with gcc, e.g.
PME is typically faster. Therefore, the icc-gcc difference on AMD K10
will depend on factors like cut-off (PP/PME ratio) and cut-off scheme,
but typically icc will result in overall slightly (1-10%) faster
binaries.

Your processors are K10/Istanbul, so icc should be faster in your
case. To see the details of where the performance difference is, I
suggest you compare the performance stats table at the end of the log
file; tip: for easier comparison run a fixed number of steps and
compare the cycles columns (e.g. in a diff tool).

Cheers,
--
Szilárd


On Wed, Jun 26, 2013 at 9:30 AM, Djurre de Jong-Bruinink
<djurredej...@yahoo.com> wrote:
>>You're using a real-MPI process per core, and you have six cores per
>
>
> I was using the current setup, which is indeed not fully optimized, just to 
> see how much the speed-up is between intel and gcc compiled.
>
>
>>processor. The recommended procedure is to map cores to OpenMP
>>threads, and choose the number of MPI processes per processor (and
>>thus the number of OpenMP threads per MPI process) to maximize
>>performance. See
>>http://www.gromacs.org/Documentation/Acceleration_and_parallelization#Multi-level_parallelization.3a_MPI.2fthread-MPI_.2b_OpenMP
>
> I have optimized this before. In my experience one only gets a speedup from 
> using openMP at high parrellization (+/-200 particles per PP core) and if I 
> use #mpi = total number of cores AND 2 openMP threads per mpi process. The 
> total number of processes is then double the number of cores, so you are 
> effectively overloading/hyperthreading the cores (and thus the number of 
> particles per PP process is +/- 100). I have a similar experience on a newer, 
> intel based system, although there the advantage already starts at lower 
> parrallelization. I was wondering if openMP is always used in combination 
> with hyperthreading?

No, not necessarily/not only. While multi-threading should *in theory*
nearly always help, there are two caveats:
- There are parts of the code, mostly data/cache intensive like
integration or domain-decomposition which (unlike e.g. the PP force
calculation which is flop-intensive) don't scale that very well with
threads. Parallelization inefficiencies get amplified with many
threads both on AMD (due to its weaker cache performance wrt Intel) as
well as on Intel with HT (2x threads banding the same cache).
- OpenMP has an additional overhead which should be negligible in most
cases, but not always (e.g not .

At the same time, multi-threading has numerous advantages among them:

Therefore, when running without DD (a single process), using OpenMP
only is typically fastest on Intel with up to 12-24 threads (even
across sockets and with HT), and with 4-6 threads on AMD. However,
with DD using HT

>
> On the machine from my previous email, using openMP gives the warning:
>
> "Can not set thread affinities on the current platform. On NUMA systems this
> can cause performance degradation. If you think your platform should support
> setting affinities, contact the GROMACS developers."
>
>
> With the gcc compiled version the, using 72 cores\700 particles/PP core this 
> indeed leads a slightly lower performance. However using the intel compiled 
> version the simulations get orders of magnitude slower.
>
>
> Groetnis,
> Djurre
>
> --
> gmx-users mailing list    gmx-users@gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-requ...@gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Re: [gmx-users] Intel vs gcc compilers

Reply via email to