On Mon, Oct 15, 2018 at 10:55:26AM +0200, Richard Biener wrote:
> Yeah.  Note this still makes the IVs not analyzable since i now effectively
> becomes wrapping in the inner loop.  For some special values we might
> get away with a wrapping CHREC in a bit-precision type but we cannot
> represent wrapping at some (possibly non-constant) value.
> 
> So - collapsing loops is a bad idea.  Why's that done anyways?

Because the standards (both OpenMP and OpenACC) mandate that if one uses
collapse(2) or more.  The semantics is that that many nested loops form a
larger iteration space then and that is then handled according to the rules
of the particular construct.  Sometimes it can be very much beneficial,
sometimes less so, but e.g. with OpenMP user has the option to say what they
want.  They can e.g. do:
  #pragma omp distribute
  for (int i = 0; i < M; i++)
    #pragma omp parallel for
    for (int j = 0; j < N; j++)
      #pragma omp simd
      for (int k = 0; k < O; k++)
        do_something (i, j, k);
and that way distribute the outermost loop, parallelize the middle one and
vectorize the innermost one, or they can do:
  #pragma omp distribute parallel for simd collapse (3)
  for (int i = 0; i < M; i++)
    for (int j = 0; j < N; j++)
      for (int k = 0; k < O; k++)
        do_something (i, j, k);
and let the implementation split the M x N x O iteration space itself (or
use clauses to say how exactly it is done).  Say if O is very large and N
small and there are many cores, it might be more beneficial to parallelize
it more, etc.
If we come up with some way to help the vectorizer with the collapsed loop,
whether in a form of some loop flags, or internal fns, whatever, I'm all for
it.

        Jakub

Reply via email to