On Mon, Oct 15, 2018 at 10:55:26AM +0200, Richard Biener wrote: > Yeah. Note this still makes the IVs not analyzable since i now effectively > becomes wrapping in the inner loop. For some special values we might > get away with a wrapping CHREC in a bit-precision type but we cannot > represent wrapping at some (possibly non-constant) value. > > So - collapsing loops is a bad idea. Why's that done anyways?
Because the standards (both OpenMP and OpenACC) mandate that if one uses collapse(2) or more. The semantics is that that many nested loops form a larger iteration space then and that is then handled according to the rules of the particular construct. Sometimes it can be very much beneficial, sometimes less so, but e.g. with OpenMP user has the option to say what they want. They can e.g. do: #pragma omp distribute for (int i = 0; i < M; i++) #pragma omp parallel for for (int j = 0; j < N; j++) #pragma omp simd for (int k = 0; k < O; k++) do_something (i, j, k); and that way distribute the outermost loop, parallelize the middle one and vectorize the innermost one, or they can do: #pragma omp distribute parallel for simd collapse (3) for (int i = 0; i < M; i++) for (int j = 0; j < N; j++) for (int k = 0; k < O; k++) do_something (i, j, k); and let the implementation split the M x N x O iteration space itself (or use clauses to say how exactly it is done). Say if O is very large and N small and there are many cores, it might be more beneficial to parallelize it more, etc. If we come up with some way to help the vectorizer with the collapsed loop, whether in a form of some loop flags, or internal fns, whatever, I'm all for it. Jakub