12 Regression] Different results with -fpeel-loops -ftree-loop-vectorize options

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 28 Jul 2021 02:05:34 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ebotcazou at gcc dot gnu.org

--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> ---
Issues with multiple_of_p:

    case MINUS_EXPR:
      /* It is impossible to prove if op0 - op1 is multiple of bottom
         precisely, so be conservative here checking if both op0 and op1
         are multiple of bottom.  Note we check the second operand first
         since it's usually simpler.  */
      return (multiple_of_p (type, TREE_OPERAND (top, 1), bottom)
              && multiple_of_p (type, TREE_OPERAND (top, 0), bottom));

wrong for unsigned type and 0 - 3 with bottom == 3, -3u is not a multiple of 3.

    case PLUS_EXPR:
      /* The same as MINUS_EXPR, but handle cases like op0 + 0xfffffffd
         as op0 - 3 if the expression has unsigned type.  For example,
         (X / 3) + 0xfffffffd is multiple of 3, but 0xfffffffd is not.  */
      op1 = TREE_OPERAND (top, 1);
      if (TYPE_UNSIGNED (type)
          && TREE_CODE (op1) == INTEGER_CST && tree_int_cst_sign_bit (op1))
        op1 = fold_build1 (NEGATE_EXPR, type, op1);
      return (multiple_of_p (type, op1, bottom)
              && multiple_of_p (type, TREE_OPERAND (top, 0), bottom));

it probably means (X * 3) + 0xfffffffd but then for X == 0 this is wrong
again?

Of course it might hint at that multiple_of_p wants to interpret everything
as signed, thus compute whether (signed)top % (signed)bottom is zero and
maybe this is why we pass in a custom type rather than using the type of
'top'?

But then we have

    case INTEGER_CST:
      if (TREE_CODE (bottom) != INTEGER_CST
          || integer_zerop (bottom)
          || (TYPE_UNSIGNED (type)
              && (tree_int_cst_sgn (top) < 0
                  || tree_int_cst_sgn (bottom) < 0)))
        return 0;
      return wi::multiple_of_p (wi::to_widest (top), wi::to_widest (bottom),
                                SIGNED);

that doesn't agree with such interpretation.

Doing such re-interpretation might help avoid regressions for queries
on {TYPE,DECL}_SIZE top which are generally unsigned but any symbolic
computation is not expected to wrap.  We'd then pass in ssizetype
but make sure to adjust that accordingly when facing non-sign-changing
conversion?

fold_binary does

      /* Fold (X * Y) & -(1 << CST) to X * Y if Y is a constant
         multiple of 1 << CST.  */
      if (TREE_CODE (arg1) == INTEGER_CST)
        {
          wi::tree_to_wide_ref cst1 = wi::to_wide (arg1);
          wide_int ncst1 = -cst1;
          if ((cst1 & ncst1) == ncst1
              && multiple_of_p (type, arg0,
                                wide_int_to_tree (TREE_TYPE (arg1), ncst1)))
            return fold_convert_loc (loc, type, arg0);
        }

which passes in the original type of the AND but then arg0 has
nop-conversions stripped which means it can have different sign.

      /* If arg0 is a multiple of arg1, then rewrite to the fastest div
         operation, EXACT_DIV_EXPR.

         Note that only CEIL_DIV_EXPR and FLOOR_DIV_EXPR are rewritten now.
         At one time others generated faster code, it's not clear if they do
         after the last round to changes to the DIV code in expmed.c.  */
      if ((code == CEIL_DIV_EXPR || code == FLOOR_DIV_EXPR)
          && multiple_of_p (type, arg0, arg1))
        return fold_build2_loc (loc, EXACT_DIV_EXPR, type,
                                fold_convert (type, arg0),
                                fold_convert (type, arg1));

happily does the wrong thing for (unsigned * 3) / 3.  The round_{up,down}_loc
helpers have similar correctness issues.

The user I'm most worried about is stor-layout.c:place_field which
uses multiple_of_p to optimize DECL_OFFSET_ALIGN, where losing that
might have severe impact on Ada.

For each individual operation - apart from checking TYPE_OVERFLOW_UNDEFINED -
we can compute a value-range for its operands and then determine whether the
operation itself can overflow.  With ranger that should eventually be
caching, but I'm not sure about the GENERIC part where this operation would
be quadratic when we recurse down the tree.  In general that wouldn't help
the stor-layout.c case since it's looking at a series of unsigned computes
in the cases we care about.  It looks like for the purpose of sizes
we'd want sth like a EXACT_MULT_EXPR (denoting no overflow), likewise a
EXACT_PLUS_EXPR.  OTOH I remember offsets can be effectively negative in
Ada but they are still unsigned?

[Bug tree-optimization/100499] [9/10/11/12 Regression] Different results with -fpeel-loops -ftree-loop-vectorize options

Reply via email to