On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew <zhaoweil...@gmail.com> wrote: > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > here as well? > > Yup, GCC does simplify it that way in the end, so I didn't really bother to > simplify it here. That said, I'm open to simplifying it here as well, but I'm > not sure how to do the unsigned cast.
You'd do sth like (with { tree utype = unsigned_type_for (type); } (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { build_int_cst (utype, 2); }) ...) extra tricky will be 1 bit integer types, I guess it might be easiest to exclude them and special case them separately - X / Y is always X for them I think, for signed 1 bit types X / Y is always undefined when X is not 0 (the division overflows or is by zero). Not sure if worth though ;) (might appear when bools end up divided by bools ...) > Besides that, thanks for the rest of your suggestions! I'm testing the > changes and will post a v2 soon. > > On Wed, 5 Jan 2022 at 16:18, Richard Biener <richard.guent...@gmail.com> > wrote: > > > > On Wed, Jan 5, 2022 at 7:15 AM Zhao Wei Liew via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > match.pd/95424: Simplify 1 / X for integer X > > > > > > This patch implements an optimization for the following C++ code: > > > > > > int f(int x) { > > > return 1 / x; > > > } > > > > > > int f(unsigned int x) { > > > return 1 / x; > > > } > > > > > > Before this patch, x86-64 gcc -std=c++20 -O3 produces the following > > > assembly: > > > > > > f(int): > > > xor edx, edx > > > mov eax, 1 > > > idiv edi > > > ret > > > f(unsigned int): > > > xor edx, edx > > > mov eax, 1 > > > div edi > > > ret > > > > > > In comparison, clang++ -std=c++20 -O3 produces the following assembly: > > > > > > f(int): > > > lea ecx, [rdi + 1] > > > xor eax, eax > > > cmp ecx, 3 > > > cmovb eax, edi > > > ret > > > f(unsigned int): > > > xor eax, eax > > > cmp edi, 1 > > > sete al > > > ret > > > > > > Clang's output is more efficient as it avoids expensive div operations. > > > > > > With this patch, GCC now produces the following assembly: > > > f(int): > > > lea eax, [rdi + 1] > > > cmp eax, 2 > > > mov eax, 0 > > > cmovbe eax, edi > > > ret > > > f(unsigned int): > > > xor eax, eax > > > cmp edi, 1 > > > sete al > > > ret > > > > > > which is virtualy identical to Clang's assembly output. Any slight > > > differences > > > in the output for f(int) is related to another possible missed > > > optimization. > > > > > > gcc/ChangeLog: > > > > > > * match.pd: Simplify 1 / X where X is an integer. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/tree-ssa/divide-6.c: New test. > > > * gcc.dg/tree-ssa/divide-7.c: New test. > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index 84c9b918041..5edae1818bb 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -422,7 +422,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (div:C @0 (negate @0)) > > > (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > > > && TYPE_OVERFLOW_UNDEFINED (type)) > > > - { build_minus_one_cst (type); }))) > > > + { build_minus_one_cst (type); })) > > > + > > > + /* 1 / X -> X == 1 for unsigned integer X > > > + 1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X > > > + But not for 1 / 0 so that we can get proper warnings and errors. */ > > > + (simplify > > > + (div integer_onep@0 @1) > > > + (switch > > > + (if (!integer_zerop (@1) > > > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > > > > you can use INTEGRAL_TYPE_P (type), the type of @0, @1 and the > > result ('type') are the same. > > > > > + && TYPE_UNSIGNED (TREE_TYPE (@1))) > > > + (eq @0 @1)) > > > + (if (!integer_zerop (@1) > > > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > > > + && !TYPE_UNSIGNED (TREE_TYPE (@1))) > > > > Please refactor as > > > > (if (INTEGRAL_TYPE_P (type) > > && !integer_zerop (@1)) > > (if (TYPE_UNSIGNED (....)) > > (eq @0 @1) > > (cond (... > > > > > + (cond (bit_and (ge @1 { build_minus_one_cst (integer_type_node); }) > > > + (le @1 { build_one_cst (integer_type_node); })) > > > > You should use build_[minus_]one_cst (type) to get -1/1 of the correct > > type here. > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > here as well? > > > > Finally I'm not sure whether 1 /[ceil] 2 isn't 1, similar -1 /[floor] > > 2 should be -1 > > so the transform looks to be correct only for TRUNC_DIV_EXPR, not all 'div'? > > > > Thanks, > > Richard. > > > > > + @1 { build_zero_cst (type); }))))) > > > > > > /* For unsigned integral types, FLOOR_DIV_EXPR is the same as > > > TRUNC_DIV_EXPR. Rewrite into the latter in this case. */ > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > > new file mode 100644 > > > index 00000000000..a9fc4c04058 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c > > > @@ -0,0 +1,9 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O -fdump-tree-optimized" } */ > > > + > > > +unsigned int f(unsigned int x) { > > > + return 1 / x; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > > new file mode 100644 > > > index 00000000000..285279af7c2 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c > > > @@ -0,0 +1,9 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O -fdump-tree-optimized" } */ > > > + > > > +int f(int x) { > > > + return 1 / x; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */