https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117875

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Remains

 fast_algorithms.c:133:19: optimized: Loop 3 distributed: split to 3 loops and
0 library calls.
 fast_algorithms.c:133:19: optimized: Loop 5 distributed: split to 2 loops and
0 library calls.
 fast_algorithms.c:133:19: optimized: loop vectorized using 32 byte vectors
 fast_algorithms.c:133:19: optimized:  loop versioned for vectorization because
of possible aliasing
 fast_algorithms.c:133:19: optimized: loop vectorized using 32 byte vectors
 fast_algorithms.c:133:19: optimized:  loop versioned for vectorization because
of possible aliasing
-fast_algorithms.c:133:19: optimized: loop vectorized using 32 byte vectors
-fast_algorithms.c:133:19: optimized:  loop versioned for vectorization because
of possible aliasing

specifically

fast_algorithms.c:133:19: note:   using as main loop exit: 58 -> 63 [AUX:
(nil)]
fast_algorithms.c:133:19: note:  LOOP VECTORIZED

is no longer vectorized.  We have

fast_algorithms.c:133:19: note:   Build SLP for _ifc__350 = _1750;
fast_algorithms.c:133:19: missed:   Build SLP failed: operation unsupported
_ifc__350 = _1750;

which came up before.  This is emitted from if-conversion:

   # DEBUG BEGIN_STMT
-  if (_1359 < -987654321)
-    goto <bb 62>; [50.00%]
-  else
-    goto <bb 61>; [50.00%]
-
-  <bb 59> [local count: 489894279]:
-  goto <bb 58>; [100.00%]
-
-  <bb 60> [local count: 550443010]:
+  _227 = _1359 < -987654321;
+  # DEBUG BEGIN_STMT
+  _ifc__347 = _227 ? -987654321 : _1355;
+  _1750 = MAX_EXPR <_1359, -987654321>;
+  _ifc__350 = _1750;
+  *_1347 = _ifc__350;
   # DEBUG BEGIN_STMT
   k_1361 = k_1291 + 1;
   # DEBUG k => k_1361
@@ -2619,14 +2683,77 @@
   else
     goto <bb 63>; [11.00%]

-  <bb 61> [local count: 374301246]:
-  *_1347 = _1359;
-  goto <bb 60>; [100.00%]
+  <bb 59> [local count: 489894279]:

specifically from VN run on the body:

Value numbering stmt = _ifc__350 = _1287 ? _ifc__348 : _ifc__349;
Applying pattern match.pd:6569, gimple-match-3.cc:47593
Setting value number of _ifc__350 to _ifc__350 (changed)
Applying pattern match.pd:5271, gimple-match-4.cc:7879
Applying pattern match.pd:6365, gimple-match-5.cc:6177
Applying pattern match.pd:6569, gimple-match-3.cc:47593
gimple_simplified to _1750 = MAX_EXPR <_1359, -987654321>;
_ifc__350 = _1750;
Making available beyond BB58 _ifc__350 for value _ifc__350

where

  (cond (cmp:c (nop_convert1?@c0 @0) (nop_convert2?@c1 @1))
        (convert3? @0) (convert4? @1))

is simplified as

       (convert (max @c0 @c1)))))

and the conversion turns out unnecessary.  There's a longer standing
issue with match producing this, we generate

                              {
                                res_op->set_op (NOP_EXPR, type, 1);
                                {
                                  tree _o1[2], _r1;
                                  _o1[0] = captures[0];
                                  _o1[1] = captures[2];
                                  gimple_match_op tem_op (res_op->cond.any_else
(), MAX_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
                                  tem_op.resimplify (lseq, valueize);
                                  _r1 = maybe_push_res_to_seq (&tem_op, lseq);
                                  if (!_r1) goto next_after_fail1387;
                                  res_op->ops[0] = _r1;
                                }
                                res_op->resimplify (lseq, valueize);

so we push the inner expression without considering the outer conversion.

The other thing is that VN elimination folds stmts we substitute into
but even in non-iterating mode we do not value-number all resulting stmts,
we merely assign value-numbers to resulting defs.

So those copies prevail.  I've thought we should fix it elsewhere than
allowing SSA copies in SLP build, but this shows it will be the easiest
fix given non-SLP handles this as vector operation just fine.

So I'll go for this, filing separate issues for the two above.

Reply via email to