https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89371

--- Comment #3 from Arnaud Desitter <arnaud02 at users dot sourceforge.net> ---
Considering:
#include <vector>
#include <iostream>
#include <numeric>

void ff(double* res, double const* a, double const* b, int n1, int n2)
{
#pragma omp simd collapse(2)
  for(int i1=0; i1 < n1; ++i1)
  {
    for(int i2=0; i2 < n2; ++i2)
    {
      res[i1*n2+i2] = a[i1*n2+i2]-b[i1*n2+i2];
    }
  }
}

int main()
{
  const auto repeat = 100*100;

  const std::size_t n1 = 100*1000;
  const std::size_t n2 = 3;

  std::vector<double> res(n1*n2), a(n1*n2), b(n1*n2);
  std::iota(a.begin(), a.end(), 1.0);
  std::iota(b.begin(), b.end(), -200.0);

  for(int r=repeat; r>0; --r)
    ff(res.data(), a.data(), b.data(), n1, n2);

  std::cout << res[0] << '\n';
}

Using clang 8.0:
>clang++ -O3 main2.cpp
>/usr/bin/time ./a.out > /dev/null
2.93user 0.00system 0:02.94elapsed 99%CPU (0avgtext+0avgdata 8424maxresident)k
>clang++ -fopenmp-simd -O3 main2.cpp > /dev/null
>/usr/bin/time ./a.out > /dev/null
2.83user 0.00system 0:02.83elapsed 99%CPU (0avgtext+0avgdata 8492maxresident)k
0inputs+0outputs (0major+2215minor)pagefaults 0swaps

Using gcc 9.1.0:
>g++ -O3 main2.cpp
>/usr/bin/time ./a.out > /dev/null
3.49user 0.00system 0:03.50elapsed 99%CPU (0avgtext+0avgdata 8488maxresident)k
0inputs+0outputs (0major+2215minor)pagefaults 0swaps
>g++ -fopenmp-simd -O3 main2.cpp
>/usr/bin/time ./a.out > /dev/null
5.83user 0.00system 0:05.84elapsed 99%CPU (0avgtext+0avgdata 8492maxresident)k
0inputs+0outputs (0major+2215minor)pagefaults 0swaps

clang 8.0 is able to produce vectorised code using "#pragma omp simd
collapse(2)" whereas gcc 9.1.0 cannot.

For record, clang 7.0 produces terrible code for this example.

Reply via email to