On Mon, Oct 15, 2018 at 11:45 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Mon, Oct 15, 2018 at 11:30:56AM +0200, Richard Biener wrote:
> > But isn't _actual_ collapsing an implementation detail?
>
> No, it is required by the standard and in many cases it is very much
> observable.
> #pragma omp parallel for schedule(nonmonotonic: static, 23) collapse (2)
> for (int i = 0; i < 64; i++)
>   for (int j = 0; j < 16; j++)
>     a[i][j] = omp_get_thread_num ();
> The standard says that from the logical iteration space 64 x 16,
> first 23 iterations go to the first thread (i.e. i=0, j=0..15 and i=1,
> j=0..14), then 23 iterations go to the second thread, etc.
> In other constructs, e.g. the new loop construct, it is a request to
> distribute, parallelize and vectorize as much as possible with optional
> guarantee of no cross-iteration dependencies at all, but even in that case
> using the source loops might not be always a win, e.g. the loopnest could be
> 5 loops and the iteration space might be diagonal or other not exactly
> rectangular.

But then you could do

  for (int i = si1; i < n1; i++)
    for (int j = sj1; j < m1; j++)
      {
        a[i][j] = omp_get_thread_num ();
      }
   if (m_tail1)
     for (int j = 0; j < m_tail1; j++)
    ...

with appropriate start/end for the i/j loop and the "epilogue" loop?

> > That is, can we delay the actual collapsing until after vectorization
> > for example?
>
> No.  We can come up with some way to propagate some of the original info to
> the vectorizer if it helps (or teach vectorizer to recognize whatever we
> produce), but the mandatory transformation needs to be done
> immediately before optimizations make those impossible.

The issue is that with refs like

  a[i % m] = a[(i + 1) % m];

you do not know whether you have a backwards or forward dependence
so I do not see how you could perform loop vectorization.  That implies
that one option might be to have the OMP lowering unroll & interleave
loops when asked for SIMD so that the SLP vectorizer could pick up
things?

But then how is safelen() defined in the context of collapse()?

Richard.

>         Jakub

Reply via email to