Re: [PATCH] Final value replacement improvements for until-wrap loops.

Richard Biener via Gcc-patches Tue, 30 Nov 2021 02:00:29 -0800

On Mon, Nov 29, 2021 at 10:07 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This middle-end patch is inspired by Richard Biener's until-wrap
> loop example in PR tree-optimization/101145.
>
> unsigned foo(unsigned val, unsigned start)
> {
>   unsigned cnt = 0;
>   for (unsigned i = start; i > val; ++i)
>     cnt++;
>   return cnt;
> }
>
> For this loop, the tree optimizers currently generate:
>
> unsigned int foo (unsigned int val, unsigned int start)
> {
>   unsigned int cnt;
>   unsigned int _1;
>   unsigned int _5;
>
>   <bb 2> [local count: 118111600]:
>   if (start_3(D) > val_4(D))
>     goto <bb 3>; [89.00%]
>   else
>     goto <bb 4>; [11.00%]
>
>   <bb 3> [local count: 105119324]:
>   _1 = start_3(D) + 1;
>   _5 = -start_3(D);
>   cnt_2 = _1 > val_4(D) ? _5 : 1;
>
>   <bb 4> [local count: 118111600]:
>   # cnt_11 = PHI <cnt_2(3), 0(2)>
>   return cnt_11;
> }
>
> or perhaps slightly easier to read:
>
>   if (start > val) {
>     cnt = (start+1) > val ? -start : 1;
>   } else cnt = 0;
>
> In this snippet, if we know start > val, then (start+1) > val
> unless start+1 overflows, i.e. (start+1) == 0 and start == ~0.
> We can use this (loop header) context to simplify the ternary
> expression to "(start != -1) ? -start : 1", which with a little
> help from match.pd can be folded to -start.  Hence the optimal
> final value replacement should be:
>
>   cnt = (start > val) ? -start : 0;
>
> Or as now generated by this patch:
>
> unsigned int foo (unsigned int val, unsigned int start)
> {
>   unsigned int cnt;
>
>   <bb 2> [local count: 118111600]:
>   if (start_3(D) > val_4(D))
>     goto <bb 3>; [89.00%]
>   else
>     goto <bb 4>; [11.00%]
>
>   <bb 3> [local count: 105119324]:
>   cnt_2 = -start_3(D);
>
>   <bb 4> [local count: 118111600]:
>   # cnt_11 = PHI <cnt_2(3), 0(2)>
>   return cnt_11;
> }
>
>
> We can also improve until-wrap loops that don't have a (suitable) loop
> header, as determined by simplify_using_initial_conditions.
>
> unsigned bar(unsigned val, unsigned start)
> {
>   unsigned cnt = 0;
>   unsigned i = start;
>   do {
>     cnt++;
>     i++;
>   } while (i > val);
>   return cnt;
> }
>
> which is currently optimized to:
>
> unsigned int foo (unsigned int val, unsigned int start)
> {
>   unsigned int cnt;
>   unsigned int _9;
>   unsigned int _10;
>
>   <bb 2> [local count: 118111600]:
>   _9 = start_4(D) + 1;
>   _10 = -start_4(D);
>   cnt_3 = val_7(D) < _9 ? _10 : 1;
>   return cnt_3;
> }
>
> Here we have "val < (start+1) ? -start : 1", which again with the
> help of match.pd can be slightly simplified to "val <= start ? -start : 1"
> when dealing with unsigned types, because at the complicating value where
> start == ~0, we fortunately have -start == 1, hence it doesn't matter
> whether the second or third operand of the ternary operator is returned.
>
> To summarize, this patch (in addition to tweaking may_be_zero in
> number_of_iterations_until_wrap) adds three new constant folding
> transforms to match.pd.
>
> X != C1 ? -X : C2 simplifies to -X when -C1 == C2.
> which is the generalized form of the simplification above.
>
> X != C1 ? ~X : C2 simplifies to ~X when ~C1 == C2.
> which is the BIT_NOT_EXPR analog of the NEGATE_EXPR case.
>
> and the "until-wrap final value replacement without context":
>
> (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
> X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?


+/* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
+(simplify
+ (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
+ (if ((!wi::only_sign_bit_p (wi::to_wide (@1))
+       || TYPE_UNSIGNED (type)
+       || TYPE_OVERFLOW_WRAPS (type)
+       || (!TYPE_OVERFLOW_SANITIZED (type)
+          && !TYPE_OVERFLOW_TRAPS (type)
+          && !TYPE_SATURATING (type)))

I'm wondering about TYPE_UNSIGNED && TYPE_SATURATING.
Also I think unsigned cannot trap or be sanitized and with
TYPE_OVERFLOW_WRAPS sanitizing or trapping cannot
be true either.  Also unsigned types always wrap on overflow.  So maybe

   (if (!TYPE_SATURATING (type)
        && (TYPE_OVERFLOW_WRAPS (type)
               || !wi::only_sign_bit_p (..)))

?

+      && wi::eq_p (wi::neg (wi::to_wide (@1)), wi::to_wide (@2)))
+  @3))

+/* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
+   X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
+(simplify
+ (cond (gt (plus @0 integer_onep) @1) (negate @0) integer_onep@2)
+ (if (TYPE_UNSIGNED (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0)))

I think the second test is redundant since @0 participates in both
the comparison and the condition value.

+  (cond (ge @0 @1) (negate @0) @2)))

Otherwise looks OK to me.

Thanks,
Richard.

>
> 2021-11-29  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
>         Check if simplify_using_initial_conditions allows us to
>         simplify the expression for may_be_zero.
>         * match.pd (X != C ? -X : -C -> -X): New transform.
>         (X != C ? ~X : ~C -> ~X): Likewise.
>         ((X+1) > Y ? -X : 1 -> X >= Y ? -X : 1): Likewise.
>
> gcc/testsuite/ChangeLog
>         * gcc.dg/fold-condneg-1.c: New test case.
>         * gcc.dg/fold-condneg-2.c: New test case.
>         * gcc.dg/fold-condnot-1.c: New test case.
>         * gcc.dg/pr101145-1.c: New test case.
>         * gcc.dg/pr101145-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

Re: [PATCH] Final value replacement improvements for until-wrap loops.

Reply via email to