One case where ICC can generate much faster code sometimes is by using the nontemporal pragma [https://software.intel.com/en-us/node/524559] with loops.
AFAIK, there's no such equivalent pragma in gcc [https://gcc.gnu.org/ml/gcc/2012-01/msg00028.html]. When I tried this simple example https://github.com/rnburn/square_timing/blob/master/bench.cpp that measures times for this loop: void compute(const double* x, index_t N, double* y) { #pragma vector nontemporal for(index_t i=0; i<N; ++i) y[i] = x[i]*x[i]; } with and without nontemporal I got these times (N = 1,000,000) Temporal 1,042,080 Non-Temporal 538,842 So running with the non-temporal pragma was nearly twice as fast. An equivalent non-temporal pragma for GCC would, IMO, certainly be a very good feature to add. On Wed, Jun 6, 2018 at 12:22 PM, Dmitry Mikushin <dmi...@kernelgen.org> wrote: > Dear Paul, > > The opinion you've mentioned is common in scientific community. However, in > more detail it often surfaces that the used set of GCC compiler options > simply does not correspond to that "fast" version of Intel. For instance, > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > introduces significant performance gap. > > Kind regards, > - Dmitry Mikushin | Applied Parallel Computing LLC | > https://parallel-computing.pro > > > 2018-06-06 18:51 GMT+03:00 Paul Menzel <pmenzel+gcc.gnu....@molgen.mpg.de>: > >> Dear GCC folks, >> >> >> Some scientists in our organization still want to use the Intel compiler, >> as they say, it produces faster code, which is then executed on clusters. >> Some resources on the Web [1][2] confirm this. (I am aware, that it’s >> heavily dependent on the actual program.) >> >> My question is, is it realistic, that GCC could catch up and that the >> scientists will start to use it over Intel’s compiler? Or will Intel >> developers always have the lead, because they have secret documentation and >> direct contact with the processor designers? >> >> If it is realistic, how can we get there? Would first the program be >> written, and then the compiler be optimized for that? Or are just more GCC >> developers needed? >> >> >> Kind regards, >> >> Paul >> >> >> [1]: https://colfaxresearch.com/compiler-comparison/ >> [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679 >> .1280&rep=rep1&type=pdf >> >>