On Mon, Apr 27, 2015 at 7:06 PM, Jeff Law <l...@redhat.com> wrote: > On 04/27/2015 10:12 AM, Alan Lawrence wrote: >> >> >> After copyrename3, immediately prior to dom1, the loop body looks like: >> >> <bb 2>: >> >> <bb 3>: >> # i_11 = PHI <i_9(7), 0(2)> >> _5 = a[i_11]; >> _6 = i_11 & _5; >> if (_6 != 0) >> goto <bb 4>; >> else >> goto <bb 5>; >> >> <bb 4>: >> >> <bb 5>: >> # m_2 = PHI <5(4), 4(3)> >> _7 = m_2 * _5; >> b[i_11] = _7; >> i_9 = i_11 + 1; >> if (i_9 != 32) >> goto <bb 7>; >> else >> goto <bb 6>; >> >> <bb 7>: >> goto <bb 3>; >> >> <bb 6>: >> return; >> >> dom1 then peeled part of the first loop iteration, producing: > > Yup. The jump threading code realized that if we traverse the edge 2->3, > then we'll always traverse 3->5. The net effect is like peeling the first > iteration because we copy bb3. The original will be reached via 7->3 (it, > loop iterating), the copy via 2->3' and 3' will have its conditional removed > and will unconditionally transfer control to bb5. > > > > This is a known problem, but we really don't have any good heuristics for > when to do this vs when not to do it. > > >> >> In contrast, a slightly-different testcase: >> >> #define N 32 >> >> int a[N]; >> int b[N]; >> >> int foo () >> { >> for (int i = 0; i < N ; i++) >> { >> int cond = (a[i] & i) ? -1 : 0; // extra variable here >> int m = (cond) ? 5 : 4; >> b[i] = a[i] * m; >> } >> } >> >> after copyrename3, just before dom1, is only slightly different: >> >> <bb 2>: >> >> <bb 3>: >> # i_15 = PHI <i_10(7), 0(2)> >> _6 = a[i_15]; >> _7 = i_15 & _6; >> if (_7 != 0) >> goto <bb 4>; >> else >> goto <bb 6>; >> >> <bb 4>: >> # m_3 = PHI <4(6), 5(3)> >> _8 = m_3 * _6; >> b[i_15] = _8; >> i_10 = i_15 + 1; >> if (i_10 != 32) >> goto <bb 7>; >> else >> goto <bb 5>; >> >> <bb 7>: >> goto <bb 3>; >> >> <bb 5>: >> return; >> >> <bb 6>: >> goto <bb 4>; >> >> with bb6 being out-of-line at the end of the function, rather than bb4 >> falling through from just above bb5. However, this prevents dom1 from >> doing the partial peeling, and dom1 only changes the "goto bb7" into a >> "goto bb3": > > I would have still expected it to thread 2->3, 3->6->4 > > >> >> (1) dom1 should really, in the second case, perform the same partial >> peeling that it does in the first testcase, if that is what it thinks is >> desirable. (Of course, we might want to fix that only later, as ATM >> that'd take us backwards). > > Please a file a BZ. It could be something simple, or we might be hitting > one of Zdenek's heuristics around keeping overall loop structure. > >> >> Alternatively, maybe we don't want dom1 doing that sort of thing (?), >> but I'm inclined to think that if it's doing such optimizations, it's >> for a good reason ;) I guess there'll be other times where we *cannot* >> do partially peeling of later iterations... > > It's an open question -- we never reached any kind of conclusion when it was > last discussed with Zdenek. I think the fundamental issue is we can't > really predict when threading the loop is going to interfere with later > optimizations or not. The heuristics we have are marginal at best. > > The one thought we've never explored was re-rolling that first iteration > back into the loop in the vectorizer.
Well. In this case we hit /* If one of the loop header's edge is an exit edge then do not apply if-conversion. */ FOR_EACH_EDGE (e, ei, loop->header->succs) if (loop_exit_edge_p (loop, e)) return false; which is simply because even after if-conversion we'll at least end up with a non-empty latch block which is what the vectorizer doesn't support. DOM rotated the loop into this non-canonical form. Running loop header copying again would probably undo this. Richard. > > Jeff