https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89371
Bug ID: 89371 Summary: missed vectorisation with "#pragma omp simd collapse(2)" Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: arnaud02 at users dot sourceforge.net Target Milestone: --- void ff(double* res, double const* a, double const* b, int ncell, int neq) { #pragma omp simd collapse(2) for(int icell=0; icell < ncell; ++icell) { for(int ieq=0; ieq<neq; ++ieq) { res[icell*neq+ieq] = a[icell*neq+ieq]-b[icell*neq+ieq]; } } } built by gcc 8.2 on x86_64 with "-std=c++14 -O3 -mavx -fopenmp-simd" results in simd instruction emitted. Run time tests with ncell=100'000 and neq=3 for instance confirm that the code is slower with "#pragma omp simd collapse(2)". Am I missing something? Ideally, I would like to be able to flatten the loop: void ff(double* res, double const* a, double const* b, int ncell, int neq) { for(int j=0; j < ncell*neq; ++j) res[j] = a[j]-b[j]; }