On Fri, Oct 12, 2018 at 9:52 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Fri, Oct 12, 2018 at 07:35:09PM +0200, Thomas Schwinge wrote: > > int a[NJ][NI]; > > > > #pragma acc loop collapse(2) > > for (int j = 0; j < N_J; ++j) > > for (int i = 0; i < N_I; ++i) > > a[j][i] = 0; > > For e.g. > int a[128][128]; > > void > foo (int m, int n) > { > #pragma omp for simd collapse(2) > for (int i = 0; i < m; i++) > for (int j = 0; j < n; j++) > a[i][j]++; > } > we emit in the inner loop: > <bb 8> : > i = i.0; > j = j.1; > _1 = a[i][j]; > _2 = _1 + 1; > a[i][j] = _2; > .iter.4 = .iter.4 + 1; > j.1 = j.1 + 1; > D.2912 = j.1 < n.7 ? 0 : 1; > i.0 = D.2912 + i.0; > j.1 = j.1 < n.7 ? j.1 : 0; > > <bb 9> : > if (.iter.4 < D.2902) > goto <bb 8>; [87.50%] > else > goto <bb 10>; [12.50%] > to make it more vectorization friendly (though, in this particular case it > isn't vectorized either) and not do the expensive % and / operations inside > of the inner loop. Without -fopenmp it does vectorize only the inner loop, > there is no collapse.
Yeah. Note this still makes the IVs not analyzable since i now effectively becomes wrapping in the inner loop. For some special values we might get away with a wrapping CHREC in a bit-precision type but we cannot represent wrapping at some (possibly non-constant) value. So - collapsing loops is a bad idea. Why's that done anyways? Richard. > > Jakub