On Wed, Jan 5, 2022 at 7:15 AM Zhao Wei Liew via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> match.pd/95424: Simplify 1 / X for integer X
>
> This patch implements an optimization for the following C++ code:
>
> int f(int x) {
>     return 1 / x;
> }
>
> int f(unsigned int x) {
>     return 1 / x;
> }
>
> Before this patch, x86-64 gcc -std=c++20 -O3 produces the following
> assembly:
>
> f(int):
>     xor edx, edx
>     mov eax, 1
>     idiv edi
>     ret
> f(unsigned int):
>     xor edx, edx
>     mov eax, 1
>     div edi
>     ret
>
> In comparison, clang++ -std=c++20 -O3 produces the following assembly:
>
> f(int):
>     lea ecx, [rdi + 1]
>     xor eax, eax
>     cmp ecx, 3
>     cmovb eax, edi
>     ret
> f(unsigned int):
>     xor eax, eax
>     cmp edi, 1
>     sete al
>     ret
>
> Clang's output is more efficient as it avoids expensive div operations.
>
> With this patch, GCC now produces the following assembly:
> f(int):
>     lea eax, [rdi + 1]
>     cmp eax, 2
>     mov eax, 0
>     cmovbe eax, edi
>     ret
> f(unsigned int):
>     xor eax, eax
>     cmp edi, 1
>     sete al
>     ret
>
> which is virtualy identical to Clang's assembly output. Any slight
> differences
> in the output for f(int) is related to another possible missed optimization.
>
> gcc/ChangeLog:
>
>         * match.pd: Simplify 1 / X where X is an integer.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/divide-6.c: New test.
>         * gcc.dg/tree-ssa/divide-7.c: New test.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 84c9b918041..5edae1818bb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (div:C @0 (negate @0))
>     (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>   && TYPE_OVERFLOW_UNDEFINED (type))
> -    { build_minus_one_cst (type); })))
> +    { build_minus_one_cst (type); }))
> +
> + /* 1 / X -> X == 1 for unsigned integer X
> +    1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X
> +    But not for 1 / 0 so that we can get proper warnings and errors. */
> + (simplify
> +   (div integer_onep@0 @1)
> +   (switch
> +     (if (!integer_zerop (@1)
> +          && INTEGRAL_TYPE_P (TREE_TYPE (@1))

you can use INTEGRAL_TYPE_P (type), the type of @0, @1 and the
result ('type') are the same.

> +          && TYPE_UNSIGNED (TREE_TYPE (@1)))
> +      (eq @0 @1))
> +     (if (!integer_zerop (@1)
> +          && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +          && !TYPE_UNSIGNED (TREE_TYPE (@1)))

Please refactor as

    (if (INTEGRAL_TYPE_P (type)
         && !integer_zerop (@1))
     (if (TYPE_UNSIGNED (....))
      (eq @0 @1)
      (cond (...

> +      (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); })
> +                     (le @1 { build_one_cst (integer_type_node); }))

You should use build_[minus_]one_cst (type) to get -1/1 of the correct
type here.

X >= -1 && X <= 1 is (hopefully?) going to be simplified
as  (unsigned)X + 1 <= 2, it might be good to simplify it this way
here as well?

Finally I'm not sure whether 1 /[ceil] 2 isn't 1, similar -1 /[floor]
2 should be -1
so the transform looks to be correct only for TRUNC_DIV_EXPR, not all 'div'?

Thanks,
Richard.

> +        @1 { build_zero_cst (type); })))))
>
>  /* For unsigned integral types, FLOOR_DIV_EXPR is the same as
>     TRUNC_DIV_EXPR.  Rewrite into the latter in this case.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
> b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
> new file mode 100644
> index 00000000000..a9fc4c04058
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +unsigned int f(unsigned int x) {
> +  return 1 / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
> +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
> b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
> new file mode 100644
> index 00000000000..285279af7c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +int f(int x) {
> +  return 1 / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
> +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */

Reply via email to