On Fri, Oct 12, 2018 at 07:35:09PM +0200, Thomas Schwinge wrote:
>     int a[NJ][NI];
> 
>     #pragma acc loop collapse(2)
>     for (int j = 0; j < N_J; ++j)
>       for (int i = 0; i < N_I; ++i)
>         a[j][i] = 0;

For e.g.
int a[128][128];

void
foo (int m, int n)
{
  #pragma omp for simd collapse(2)
  for (int i = 0; i < m; i++)
    for (int j = 0; j < n; j++)
      a[i][j]++;
}
we emit in the inner loop:
  <bb 8> :
  i = i.0;
  j = j.1;
  _1 = a[i][j];
  _2 = _1 + 1;
  a[i][j] = _2;
  .iter.4 = .iter.4 + 1;
  j.1 = j.1 + 1;
  D.2912 = j.1 < n.7 ? 0 : 1;
  i.0 = D.2912 + i.0;
  j.1 = j.1 < n.7 ? j.1 : 0;
  
  <bb 9> :
  if (.iter.4 < D.2902)
    goto <bb 8>; [87.50%]
  else
    goto <bb 10>; [12.50%]
to make it more vectorization friendly (though, in this particular case it
isn't vectorized either) and not do the expensive % and / operations inside
of the inner loop.  Without -fopenmp it does vectorize only the inner loop,
there is no collapse.

        Jakub

Reply via email to