On Wed, Jun 6, 2018 at 8:31 PM Ryan Burn <rnickb...@gmail.com> wrote: > > One case where ICC can generate much faster code sometimes is by using > the nontemporal pragma [https://software.intel.com/en-us/node/524559] > with loops. > > AFAIK, there's no such equivalent pragma in gcc > [https://gcc.gnu.org/ml/gcc/2012-01/msg00028.html]. > > When I tried this simple example > https://github.com/rnburn/square_timing/blob/master/bench.cpp that > measures times for this loop: > > void compute(const double* x, index_t N, double* y) { > #pragma vector nontemporal > for(index_t i=0; i<N; ++i) y[i] = x[i]*x[i]; > } > > with and without nontemporal I got these times (N = 1,000,000) > > Temporal 1,042,080 > Non-Temporal 538,842 > > So running with the non-temporal pragma was nearly twice as fast. > > An equivalent non-temporal pragma for GCC would, IMO, certainly be a > very good feature to add.
GCC has robust infrastructure for loop pragmas now just the set of pragmas available isn't very big. It would be interesting to know which ICC ones people use regularly so we can support those in GCC as well. Note using #pragmas is very much hand-optimizing the code for the compiler you use - sth that is possible for GCC as well. Richard. > On Wed, Jun 6, 2018 at 12:22 PM, Dmitry Mikushin <dmi...@kernelgen.org> wrote: > > Dear Paul, > > > > The opinion you've mentioned is common in scientific community. However, in > > more detail it often surfaces that the used set of GCC compiler options > > simply does not correspond to that "fast" version of Intel. For instance, > > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > > introduces significant performance gap. > > > > Kind regards, > > - Dmitry Mikushin | Applied Parallel Computing LLC | > > https://parallel-computing.pro > > > > > > 2018-06-06 18:51 GMT+03:00 Paul Menzel <pmenzel+gcc.gnu....@molgen.mpg.de>: > > > >> Dear GCC folks, > >> > >> > >> Some scientists in our organization still want to use the Intel compiler, > >> as they say, it produces faster code, which is then executed on clusters. > >> Some resources on the Web [1][2] confirm this. (I am aware, that it’s > >> heavily dependent on the actual program.) > >> > >> My question is, is it realistic, that GCC could catch up and that the > >> scientists will start to use it over Intel’s compiler? Or will Intel > >> developers always have the lead, because they have secret documentation and > >> direct contact with the processor designers? > >> > >> If it is realistic, how can we get there? Would first the program be > >> written, and then the compiler be optimized for that? Or are just more GCC > >> developers needed? > >> > >> > >> Kind regards, > >> > >> Paul > >> > >> > >> [1]: https://colfaxresearch.com/compiler-comparison/ > >> [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679 > >> .1280&rep=rep1&type=pdf > >> > >>