or linearization?

Jakub Jelinek Mon, 15 Oct 2018 02:46:57 -0700

On Mon, Oct 15, 2018 at 11:30:56AM +0200, Richard Biener wrote:
> But isn't _actual_ collapsing an implementation detail?


No, it is required by the standard and in many cases it is very much
observable.
#pragma omp parallel for schedule(nonmonotonic: static, 23) collapse (2)
for (int i = 0; i < 64; i++)
  for (int j = 0; j < 16; j++)
    a[i][j] = omp_get_thread_num ();
The standard says that from the logical iteration space 64 x 16,
first 23 iterations go to the first thread (i.e. i=0, j=0..15 and i=1,
j=0..14), then 23 iterations go to the second thread, etc.
In other constructs, e.g. the new loop construct, it is a request to
distribute, parallelize and vectorize as much as possible with optional
guarantee of no cross-iteration dependencies at all, but even in that case
using the source loops might not be always a win, e.g. the loopnest could be
5 loops and the iteration space might be diagonal or other not exactly
rectangular.

> That is, can we delay the actual collapsing until after vectorization
> for example?

No.  We can come up with some way to propagate some of the original info to
the vectorizer if it helps (or teach vectorizer to recognize whatever we
produce), but the mandatory transformation needs to be done
immediately before optimizations make those impossible.

        Jakub

Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

Reply via email to