Re: dom1 prevents vectorization via partial loop peeling?

Jeff Law Mon, 27 Apr 2015 10:07:12 -0700

On 04/27/2015 10:12 AM, Alan Lawrence wrote:


After copyrename3, immediately prior to dom1, the loop body looks like:

   <bb 2>:

   <bb 3>:
   # i_11 = PHI <i_9(7), 0(2)>
   _5 = a[i_11];
   _6 = i_11 & _5;
   if (_6 != 0)
     goto <bb 4>;
   else
     goto <bb 5>;

   <bb 4>:

   <bb 5>:
   # m_2 = PHI <5(4), 4(3)>
   _7 = m_2 * _5;
   b[i_11] = _7;
   i_9 = i_11 + 1;
   if (i_9 != 32)
     goto <bb 7>;
   else
     goto <bb 6>;

   <bb 7>:
   goto <bb 3>;

   <bb 6>:
   return;

dom1 then peeled part of the first loop iteration, producing:

Yup. The jump threading code realized that if we traverse the edge2->3, then we'll always traverse 3->5. The net effect is like peelingthe first iteration because we copy bb3. The original will be reachedvia 7->3 (it, loop iterating), the copy via 2->3' and 3' will have itsconditional removed and will unconditionally transfer control to bb5.

This is a known problem, but we really don't have any good heuristicsfor when to do this vs when not to do it.


In contrast, a slightly-different testcase:

#define N 32

int a[N];
int b[N];

int foo ()
{
   for (int i = 0; i < N ; i++)
     {
       int cond = (a[i] & i) ? -1 : 0; // extra variable here
       int m = (cond) ? 5 : 4;
       b[i] = a[i] * m;
     }
}

after copyrename3, just before dom1, is only slightly different:

   <bb 2>:

   <bb 3>:
   # i_15 = PHI <i_10(7), 0(2)>
   _6 = a[i_15];
   _7 = i_15 & _6;
   if (_7 != 0)
     goto <bb 4>;
   else
     goto <bb 6>;

   <bb 4>:
   # m_3 = PHI <4(6), 5(3)>
   _8 = m_3 * _6;
   b[i_15] = _8;
   i_10 = i_15 + 1;
   if (i_10 != 32)
     goto <bb 7>;
   else
     goto <bb 5>;

   <bb 7>:
   goto <bb 3>;

   <bb 5>:
   return;

   <bb 6>:
   goto <bb 4>;

with bb6 being out-of-line at the end of the function, rather than bb4
falling through from just above bb5. However, this prevents dom1 from
doing the partial peeling, and dom1 only changes the "goto bb7" into a
"goto bb3":

I would have still expected it to thread 2->3, 3->6->4


(1) dom1 should really, in the second case, perform the same partial
peeling that it does in the first testcase, if that is what it thinks is
desirable. (Of course, we might want to fix that only later, as ATM
that'd take us backwards).

Please a file a BZ. It could be something simple, or we might behitting one of Zdenek's heuristics around keeping overall loop structure.


Alternatively, maybe we don't want dom1 doing that sort of thing (?),
but I'm inclined to think that if it's doing such optimizations, it's
for a good reason ;) I guess there'll be other times where we *cannot*
do partially peeling of later iterations...

It's an open question -- we never reached any kind of conclusion when itwas last discussed with Zdenek. I think the fundamental issue is wecan't really predict when threading the loop is going to interfere withlater optimizations or not. The heuristics we have are marginal at best.

The one thought we've never explored was re-rolling that first iterationback into the loop in the vectorizer.


Jeff

Re: dom1 prevents vectorization via partial loop peeling?

Reply via email to