On Fri, Oct 12, 2018 at 07:35:09PM +0200, Thomas Schwinge wrote:
> int a[NJ][NI];
>
> #pragma acc loop collapse(2)
> for (int j = 0; j < N_J; ++j)
> for (int i = 0; i < N_I; ++i)
> a[j][i] = 0;
For e.g.
int a[128][128];
void
foo (int m, int n)
{
#pragma omp for simd collapse(2)
for (int i = 0; i < m; i++)
for (int j = 0; j < n; j++)
a[i][j]++;
}
we emit in the inner loop:
<bb 8> :
i = i.0;
j = j.1;
_1 = a[i][j];
_2 = _1 + 1;
a[i][j] = _2;
.iter.4 = .iter.4 + 1;
j.1 = j.1 + 1;
D.2912 = j.1 < n.7 ? 0 : 1;
i.0 = D.2912 + i.0;
j.1 = j.1 < n.7 ? j.1 : 0;
<bb 9> :
if (.iter.4 < D.2902)
goto <bb 8>; [87.50%]
else
goto <bb 10>; [12.50%]
to make it more vectorization friendly (though, in this particular case it
isn't vectorized either) and not do the expensive % and / operations inside
of the inner loop. Without -fopenmp it does vectorize only the inner loop,
there is no collapse.
Jakub