https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119351

--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Sorry for the delay, had a few days off.

So looking at this again, it's happening When next_ci gets inlined into
nbnxn_make_pairlist_part, the while loop

while (next_ci(iGrid, nth, ci_block, &ci_x, &ci_y, &ci_b, &ci))

becomes vectorizable with early break.

The code is quite a deep nesting of C++ templated function so it's hard to map
it back to source code.

the ifcvt loop that's getting vectorized is:

  <bb 384> [local count: 9554422]:
  _1643 = ci_y_1924 + 1;
  if (_1643 == _1649)
    goto <bb 385>; [5.50%]
  else
    goto <bb 458>; [94.50%]

  <bb 458> [local count: 9028929]:
  goto <bb 387>; [100.00%]

  <bb 385> [local count: 525493]:
  _1646 = ci_x_1920 + 1;

  <bb 386> [local count: 1081571]:
  # ci_x_1920 = PHI <ci_x_1919(382), _1646(385)>
  # ci_y_1923 = PHI <ci_y_1922(382), 0(385)>
  _1650 = _1649 * ci_x_1920;

  <bb 387> [local count: 10110500]:
  # ci_y_1924 = PHI <_1643(458), ci_y_1923(386)>
  _1651 = _1650 + ci_y_1924;
  _1652 = _1651 + 1;
  _1653 = (long unsigned int) _1652;
  _1655 = _1653 * 4;
  _1656 = _1654 + _1655;
  _1657 = *_1656;
  if (_1657 <= ci_1918)
    goto <bb 384>; [94.50%]
  else
    goto <bb 388>; [5.50%]

Which doesn't look like it should have been vectorized. BB 384 has a
non-exiting control flow.
BB 385 and 386 should only conditionally be executed if _1643 != 1649

But they're not an exit, The generated code:

     504:       05a03984        mov     z4.s, w12
     508:       25ac0fef        whilelo p15.s, wzr, w12
     50c:       0b01018e        add     w14, w12, w1
     510:       04a40785        sub     z5.s, z28.s, z4.s
     514:       25ae0fe7        whilelo p7.s, wzr, w14
     518:       cb2c4019        sub     x25, x0, w12, uxtw
     51c:       04a600a7        add     z7.s, z5.s, z6.s
     520:       25075fe0        not     p0.b, p7/z, p15.b
     524:       8b190bcd        add     x13, x30, x25, lsl #2
     528:       14000005        b       53c 
     52c:       91001210        add     x16, x16, #0x4
     530:       25a0c087        add     z7.s, z7.s, #4
     534:       25ae0e00        whilelo p0.s, w16, w14
     538:       54000480        b.eq    5c8  // b.none
     53c:       a55041ab        ld1w    {z11.s}, p0/z, [x13, x16, lsl #2]
     540:       248b8181        cmpge   p1.s, p0/z, z12.s, z11.s
     544:       250e7a22        not     p2.b, p14/z, p1.b
     548:       2550f840        ptest   p14, p2.b
     54c:       54ffff00        b.eq    52c // b.none
     550:       1e2600f1        fmov    w17, s7

Is really broken since it only preserved the exit check. the check for 

  if (_1643 == _1649)
    goto <bb 385>; [5.50%]
  else
    goto <bb 458>; [94.50%]

seems to have disappeared

and indeed:

pairlist.cpp:2590:5: note:   ==> examining statement: if (_1643 == _1649)
pairlist.cpp:2590:5: note:   skip.
pairlist.cpp:2590:5: note:   ==> examining pattern def stmt: patt_3554 = _1643
== _1649;
pairlist.cpp:2590:5: note:   skip.
pairlist.cpp:2590:5: note:   ==> examining pattern statement: if (patt_3554 !=
0)
pairlist.cpp:2590:5: note:   skip.

the compiler knows it can't vectorize this. so why didn't we stop earlier.

This should have been checked in vect_analyze_loop_form

at:

  /* Check if we have any control flow that doesn't leave the loop.  */
  basic_block *bbs = get_loop_body (loop);
  for (unsigned i = 0; i < loop->num_nodes; i++)
    if (EDGE_COUNT (bbs[i]->succs) != 1
        && (EDGE_COUNT (bbs[i]->succs) != 2
            || !loop_exits_from_bb_p (bbs[i]->loop_father, bbs[i])))
      {
        free (bbs);
        return opt_result::failure_at (vect_location,
                                       "not vectorized:"
                                       " unsupported control flow in loop.\n");
      }

looks like this doesn't recognize this diamond layout.

Indeed it thinks 384 -> 385 is an exit..

pairlist.cpp:2590:5: note:   using as main loop exit: 384 -> 385 [AUX: (nil)]
pairlist.cpp:2590:5: note:    === get_loop_niters ===
pairlist.cpp:2590:5: note:    Loop has 2 exits.
pairlist.cpp:2590:5: note:    Analyzing exit 0...
pairlist.cpp:2590:5: note:    Analyzing exit 1...

huh, interestingly it doesn't consider 385 as part of the loop..

iterations by profile: 8.347976 (unreliable, maybe flat) entry count:1081571
(estimated locally, freq 3.3915))
{
  bb_384 (preds = {bb_387 }, succs = {bb_385 bb_458 })
  bb_458 (preds = {bb_384 }, succs = {bb_387 })
  bb_387 (preds = {bb_458 bb_386 }, succs = {bb_384 bb_388 })
}

But it clearly is..

;;   basic block 385, loop depth 2, count 525493 (estimated locally, freq
1.6478), maybe hot
;;    prev block 458, next block 386, flags: (NEW, REACHABLE, VISITED)
;;    pred:       384 [5.5% (guessed)]  count:525493 (estimated locally, freq
1.6478) (TRUE_VALUE,EXECUTABLE)
  # RANGE [irange] int [1, +INF]
  _1646 = ci_x_1920 + 1;
;;    succ:       386 [always]  count:525493 (estimated locally, freq 1.6478)
(FALLTHRU,DFS_BACK,EXECUTABLE)

;;   basic block 386, loop depth 2, count 1081571 (estimated locally, freq
3.3915), maybe hot
;;    prev block 385, next block 387, flags: (NEW, REACHABLE, VISITED)
;;    pred:       382 [always]  count:556078 (estimated locally, freq 1.7437)
(FALLTHRU,EXECUTABLE)
;;                385 [always]  count:525493 (estimated locally, freq 1.6478)
(FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_1725 = PHI <.MEM_171(382), .MEM_1725(385)>
  # RANGE [irange] int [0, +INF]
  # ci_x_1920 = PHI <ci_x_1919(382), _1646(385)>
  # RANGE [irange] int [0, +INF]
  # ci_y_1923 = PHI <ci_y_1922(382), 0(385)>
  _1650 = _1649 * ci_x_1920;
;;    succ:       387 [always]  count:1081571 (estimated locally, freq 3.3915)
(FALLTHRU,EXECUTABLE)

Oh I see..

BB 385 is at a loop depth of 2, but BB 384 is at a loop depth of 3.

So it does leave the loop, but then re-enters it into the header, skipping the
pre-header.

The exit has a phi that's within a loop iteration set by another loop:

;;   basic block 387, loop depth 3, count 10110500 (estimated locally, freq
31.7034), maybe hot
;;    prev block 386, next block 388, flags: (NEW, REACHABLE, VISITED)
;;    pred:       458 [always]  count:9028929 (estimated locally, freq 28.3119)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;;                386 [always]  count:1081571 (estimated locally, freq 3.3915)
(FALLTHRU,EXECUTABLE)
  # RANGE [irange] int [0, +INF]

so this can't really be vectorized with early exits.

So we need something like this

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bb1138bfcfb..05ae7beb1cc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1883,6 +1883,12 @@ vect_analyze_loop_form (class loop *loop, gimple
*loop_vectorized_call,
        return opt_result::failure_at (vect_location,
                                       "not vectorized:"
                                       " abnormal loop exit edge.\n");
+
+      for (auto pred : e->src->preds)
+       if (pred->src->loop_father != loop)
+         return opt_result::failure_at (vect_location,
+                                      "not vectorized:"
+                                      " abnormal loop incoming edge.\n");
     }

   info->conds

but should also be ok if one of the preds is the loop pre-header (so the patch
above is too strict).
This weirdness is happening because next_ci itself is a loop, that gets inlined
into the while condition.

Working on a small reproducer and patch.

Reply via email to