Minor refine of checking iterations nonoverflow and a testcase for stage 1.
This "subtract/extend/add" existed for a long time and still annoying us (PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr register is used as 64bits on powerpc64, Andraw Pinski had a patch but caused some issue and reverted by Joseph S. Myers(PR37451, PR37782). Andraw: http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html Joseph: https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html We can do the simplification from "subtract/extend/add" to only extend when loop iterations is known to be LT than MODE_MAX-1(NOT do simplify when counter+0x1 overflow). Bootstrap and regression tested pass on Power8-LE. gcc/ChangeLog 2020-05-12 Xiong Hu Luo <luo...@linux.ibm.com> PR rtl-optimization/37451, part of PR target/61837 * loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) to zero_ext when not wrapping overflow. gcc/testsuite/ChangeLog 2020-05-12 Xiong Hu Luo <luo...@linux.ibm.com> PR rtl-optimization/37451, part of PR target/61837 * gcc.target/powerpc/doloop-2.c: New test. --- gcc/loop-doloop.c | 46 ++++++++++++++++++++- gcc/testsuite/gcc.target/powerpc/doloop-2.c | 14 +++++++ 2 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c index db6a014e43d..16372382a22 100644 --- a/gcc/loop-doloop.c +++ b/gcc/loop-doloop.c @@ -477,7 +477,51 @@ doloop_modify (class loop *loop, class niter_desc *desc, } if (increment_count) - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + { + /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e: + + 73: r145:SI=r123:DI#0-0x1 + 74: r144:DI=zero_extend (r145:SI) + 75: r143:DI=r144:DI+0x1 + ... + 31: r135:CC=cmp (r123:DI,0) + 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber + scratch;clobber scratch;} + + r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the + loop iterations, if loop iterations expression doesn't overflow, then + (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only. + */ + bool simplify_zext = false; + rtx extop0 = XEXP (count, 0); + if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS) + { + rtx addop0 = XEXP (extop0, 0); + rtx addop1 = XEXP (extop0, 1); + + int nonoverflow = 0; + unsigned int_mode + = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0)); + unsigned HOST_WIDE_INT int_mode_max + = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1; + if (get_max_loop_iterations (loop, &iterations) + && wi::ltu_p (iterations, int_mode_max)) + nonoverflow = 1; + + if (nonoverflow + && CONST_SCALAR_INT_P (addop1) + && GET_MODE_PRECISION (mode) == int_mode * 2 + && addop1 == GEN_INT (-1)) + { + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, + GET_MODE (addop0)); + simplify_zext = true; + } + } + + if (!simplify_zext) + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + } /* Insert initialization of the count register into the loop header. */ start_sequence (); diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c new file mode 100644 index 00000000000..dc8516bb0ab --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-options "-O2 -fno-unroll-loops" } */ + +int f(int l, int *a) +{ + int i; + for(i = 0;i < l; i++) + a[i] = i; + return l; +} + +/* { dg-final { scan-assembler-not "-1" } } */ +/* { dg-final { scan-assembler "bdnz" } } */ +/* { dg-final { scan-assembler-times "mtctr" 1 } } */ -- 2.21.0.777.g83232e3864