On Mon, Oct 15, 2018 at 11:45 AM Jakub Jelinek <ja...@redhat.com> wrote: > > On Mon, Oct 15, 2018 at 11:30:56AM +0200, Richard Biener wrote: > > But isn't _actual_ collapsing an implementation detail? > > No, it is required by the standard and in many cases it is very much > observable. > #pragma omp parallel for schedule(nonmonotonic: static, 23) collapse (2) > for (int i = 0; i < 64; i++) > for (int j = 0; j < 16; j++) > a[i][j] = omp_get_thread_num (); > The standard says that from the logical iteration space 64 x 16, > first 23 iterations go to the first thread (i.e. i=0, j=0..15 and i=1, > j=0..14), then 23 iterations go to the second thread, etc. > In other constructs, e.g. the new loop construct, it is a request to > distribute, parallelize and vectorize as much as possible with optional > guarantee of no cross-iteration dependencies at all, but even in that case > using the source loops might not be always a win, e.g. the loopnest could be > 5 loops and the iteration space might be diagonal or other not exactly > rectangular.
But then you could do for (int i = si1; i < n1; i++) for (int j = sj1; j < m1; j++) { a[i][j] = omp_get_thread_num (); } if (m_tail1) for (int j = 0; j < m_tail1; j++) ... with appropriate start/end for the i/j loop and the "epilogue" loop? > > That is, can we delay the actual collapsing until after vectorization > > for example? > > No. We can come up with some way to propagate some of the original info to > the vectorizer if it helps (or teach vectorizer to recognize whatever we > produce), but the mandatory transformation needs to be done > immediately before optimizations make those impossible. The issue is that with refs like a[i % m] = a[(i + 1) % m]; you do not know whether you have a backwards or forward dependence so I do not see how you could perform loop vectorization. That implies that one option might be to have the OMP lowering unroll & interleave loops when asked for SIMD so that the SLP vectorizer could pick up things? But then how is safelen() defined in the context of collapse()? Richard. > Jakub