On Wed, Jun 6, 2018 at 11:10 PM Zan Lynx <zl...@acm.org> wrote: > > On 06/06/2018 10:22 AM, Dmitry Mikushin wrote: > > The opinion you've mentioned is common in scientific community. However, in > > more detail it often surfaces that the used set of GCC compiler options > > simply does not correspond to that "fast" version of Intel. For instance, > > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > > introduces significant performance gap. > > > > Please note that if your compute cluster uses different models of CPU, > be extremely careful with -march=native. > > I've been bitten by it in VMs, several times. Unless you always run on > the same system that did the build, you are running a risk of illegal > instructions.
Yes. Note this is where ICC has an advantage because it supports automagically doing runtime versioning based on the CPU instruction set for vectorized loops. We only support that in an awkward explicit way (the manual talks about this in the 'Function Multiversioning' section). But in the end it's just a "detail" that can be worked around with a little inconvenience ;) (I've yet to see a heterogenous cluster where the instruction set differences make a performance difference over choosing the lowest common one) Richard. > -- > Knowledge is Power -- Power Corrupts > Study Hard -- Be Evil