On Fri, Oct 12, 2018 at 9:52 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Fri, Oct 12, 2018 at 07:35:09PM +0200, Thomas Schwinge wrote:
> >     int a[NJ][NI];
> >
> >     #pragma acc loop collapse(2)
> >     for (int j = 0; j < N_J; ++j)
> >       for (int i = 0; i < N_I; ++i)
> >         a[j][i] = 0;
>
> For e.g.
> int a[128][128];
>
> void
> foo (int m, int n)
> {
>   #pragma omp for simd collapse(2)
>   for (int i = 0; i < m; i++)
>     for (int j = 0; j < n; j++)
>       a[i][j]++;
> }
> we emit in the inner loop:
>   <bb 8> :
>   i = i.0;
>   j = j.1;
>   _1 = a[i][j];
>   _2 = _1 + 1;
>   a[i][j] = _2;
>   .iter.4 = .iter.4 + 1;
>   j.1 = j.1 + 1;
>   D.2912 = j.1 < n.7 ? 0 : 1;
>   i.0 = D.2912 + i.0;
>   j.1 = j.1 < n.7 ? j.1 : 0;
>
>   <bb 9> :
>   if (.iter.4 < D.2902)
>     goto <bb 8>; [87.50%]
>   else
>     goto <bb 10>; [12.50%]
> to make it more vectorization friendly (though, in this particular case it
> isn't vectorized either) and not do the expensive % and / operations inside
> of the inner loop.  Without -fopenmp it does vectorize only the inner loop,
> there is no collapse.

Yeah.  Note this still makes the IVs not analyzable since i now effectively
becomes wrapping in the inner loop.  For some special values we might
get away with a wrapping CHREC in a bit-precision type but we cannot
represent wrapping at some (possibly non-constant) value.

So - collapsing loops is a bad idea.  Why's that done anyways?

Richard.

>
>         Jakub

Reply via email to