On Fri, Oct 12, 2018 at 07:35:09PM +0200, Thomas Schwinge wrote: > int a[NJ][NI]; > > #pragma acc loop collapse(2) > for (int j = 0; j < N_J; ++j) > for (int i = 0; i < N_I; ++i) > a[j][i] = 0;
For e.g. int a[128][128]; void foo (int m, int n) { #pragma omp for simd collapse(2) for (int i = 0; i < m; i++) for (int j = 0; j < n; j++) a[i][j]++; } we emit in the inner loop: <bb 8> : i = i.0; j = j.1; _1 = a[i][j]; _2 = _1 + 1; a[i][j] = _2; .iter.4 = .iter.4 + 1; j.1 = j.1 + 1; D.2912 = j.1 < n.7 ? 0 : 1; i.0 = D.2912 + i.0; j.1 = j.1 < n.7 ? j.1 : 0; <bb 9> : if (.iter.4 < D.2902) goto <bb 8>; [87.50%] else goto <bb 10>; [12.50%] to make it more vectorization friendly (though, in this particular case it isn't vectorized either) and not do the expensive % and / operations inside of the inner loop. Without -fopenmp it does vectorize only the inner loop, there is no collapse. Jakub