Re: Multiply Optimization in match and Simplify

Richard Biener Fri, 30 Oct 2015 03:08:21 -0700

On Thu, Oct 29, 2015 at 5:34 AM, Hurugalawadi, Naveen
<naveen.hurugalaw...@caviumnetworks.com> wrote:
> Hi,
>
> Please find attached the patch that moves some multiply optimizations
> from fold-const using simplify and match.
>
> Please review the patch and let me know if any modifications are required.
>
> Tested the patch on X86.
>
> Observing following failures:-
>
>>> FAIL: gcc.dg/fold-plusmult.c scan-tree-dump-times original "<a> \\* 4" 2
>
> Should the testcase be changed to suit current pattern?
>
>>> FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0
>>> FAIL: gcc.dg/tree-ssa/vrp59.c scan-tree-dump-not vrp1 " & 3;"
>
> Its due to the following pattern. Pattern seems to be right.
> Fold X & (X ^ Y) as X & ~Y
> The test PASSes when we have the result as ~X & Y in some
> of the following combinations which is wrong.
> (bit_and (convert? (bit_xor:c @0 @1) (convert? @0) ))


Please do not drop A - B -> A + (-B) from fold-const as match.pd
doesn't implement all of fold-const.c negate_expr_p support.

+/* Fold X & (X ^ Y) as X & ~Y.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1)))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (convert (bit_and @0 (bit_not @1)))))

Ok, so the VRP regression is because we convert

  _8 = x_2(D) ^ 1;
  _4 = (_Bool) _8;
  _5 = (int) _4;

to

  _7 = ~x_2(D);
  _9 = _7 & 1;

via the new pattern and

    /* A truncation to an unsigned type (a zero-extension) should be
       canonicalized as bitwise and of a mask.  */
    (if (final_int && inter_int && inside_int
         && final_prec == inside_prec
         && final_prec > inter_prec
         && inter_unsignedp)
     (convert (bit_and @0 { wide_int_to_tree
                              (inside_type,
                               wi::mask (inter_prec, false,
                                         TYPE_PRECISION (inside_type))); })))

Previously VRP ended up with

  <bb 3>:
  _8 = x_2(D) ^ 1;

and now we have

  _7 = ~x_2(D);
  _9 = _7 & 1;

which is more expensive.  This means that we miss a

(bit_and (bit_not @0) INTEGER_CST@1)

-> (bit_xor @0 @1)

pattern that applies when VRP knows the range of x_2(D)
(all masked bits are know to zero).

+/* Convert X * -C into -X * C.  */
+(simplify
+ (mult:c (convert? negate_expr_p@0) INTEGER_CST@1)
+  (if (tree_int_cst_sgn (@1) == -1)
+   (with { tree tem = const_unop (NEGATE_EXPR, type, @1); }
+    (if (!TREE_OVERFLOW (tem) && wi::ne_p (tem, @1)
+         && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+     (mult (convert (negate @0)) @1)))))

as said above match.pd negate_expr_p doesn't capture everything
fold-const.c does so moving the above isn't a good idea.

+/* Convert (A + A) * C -> A * 2 * C.  */
+(simplify
+ (mult:c (convert? (plus @0 @0)) (convert? @1))
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
+   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
+(simplify
+ (mult:c (convert? (plus @0 @0)) INTEGER_CST@1)
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
+   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))

fold-const.c only handles constant C, so we only need to 2nd pattern.
Also the :c on the mult in that is not needed due to canonicalization rules.
Please build the result of the inner multiplication directly.
I think the fold-const.c code has an overflow issue when the outer
multiplication
is signed and the inner addition unsigned.  (signed)((unsigned)INT_MAX
+ (unsigned)INT_MAX)*2
is valid but INT_MAX * 4 is not as it overflows.  So I think we should
_not_ allow
nop conversions here (it's fine if all ops are signed or unsigned).

Richard.



> Thanks,
> Naveen
>
> ChangeLog
>
> 2015-10-29  Naveen H.S  <naveen.hurugalaw...@caviumnetworks.com>
>
>         * fold-const.c (fold_binary_loc) : Remove A - B -> A + (-B) if B
>         is easily negatable as its already present.
>         Move x * -C into -x * C if x is easily negatable to match.pd.
>         Move (A + A) * C -> A * 2 * C to match.pd.
>         Move Fold (X ^ Y) & Y as ~X & Y to match.pd.
>         Move Fold (X ^ Y) & X as ~Y & X to match.pd.
>         Move Fold X & (X ^ Y) as X & ~Y to match.pd.
>         Move Fold X & (Y ^ X) as ~Y & X to match.pd.
>
>         * match.pd (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1))):
>         New simplifier.
>         (mult:c (convert? negate_expr_p@0) INTEGER_CST@1): New simplifier.
>         (mult:c (convert? (plus @0 @0)) (convert? @1)): New simplifier.
>         (mult:c (convert? (plus @0 @0)) INTEGER_CST@1): New simplifier.

Re: Multiply Optimization in match and Simplify

Reply via email to