Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > "Richard Earnshaw (lists)" <richard.earns...@arm.com> writes: > >> On 09/02/2021 16:27, Andrea Corallo via Gcc-patches wrote: >>> Jakub Jelinek <ja...@redhat.com> writes: >>> >>>> On Tue, Feb 09, 2021 at 03:09:43PM +0100, Jakub Jelinek via Gcc-patches >>>> wrote: >>>>>> "TARGET_32BIT && TARGET_HAVE_LOB" >>>>>> - "le\t%|lr, %l0") >>>>>> + "* >>>>>> + if (get_attr_length (insn) == 4) >>>>>> + return \"le\\t%|lr, %l0\"; >>>>>> + else >>>>>> + return \"subs\\t%|lr, #1\;bne\\t%l0\"; >>>>>> + " >>>>> >>>>> Why not >>>>> { >>>>> if (get_attr_length (insn) == 4) >>>>> return "le\t%|lr, %l0"; >>>>> else >>>>> return "subs\t%|lr, #1;bne\t%l0"; >>>>> } >>>>> instead? Seems the arm backend uses "*..." more than the more modern {}, >>>>> but one needs to backslash prefix a lot which makes it less readable? >>>> >>>> Where "more modern" is introduced 19.5 years ago ;) >>>> >>>> Jakub >>> >>> I guess we really like traditions :) >>> >>> Attached second version addressing this. >>> >>> Thanks >>> >>> Andrea >>> >> >> You're missing a clobber of the condition codes in the RTL. This wasn't >> needed before, but is now. >> >> R. > > Hi Richard, > > thanks for reviewing, I guess this is going to be a good learning moment > for me. > > What we originally expand is: > > (insn 2396 2360 2397 3 (parallel [ > (set (reg:CC_NZ 100 cc) > (compare:CC_NZ (plus:SI (reg:SI 14 lr) > (const_int -1 [0xffffffffffffffff])) > (const_int 0 [0]))) > (set (reg:SI 14 lr) > (plus:SI (reg:SI 14 lr) > (const_int -1 [0xffffffffffffffff]))) > ]) "p1.c":4:21 -1 > (nil)) > (jump_insn 2397 2396 2365 3 (set (pc) > (if_then_else (ne (reg:CC_NZ 100 cc) > (const_int 0 [0])) > (label_ref:SI 2361) > (pc))) "p1.c":4:21 273 {arm_cond_branch} > (expr_list:REG_DEAD (reg:CC_NZ 100 cc) > (int_list:REG_BR_PROB 1062895996 (nil))) > -> 2361) > > Combine recognizing cc:CC_NZ as a dead reg and rewriting the two insns > as: > > (jump_insn 2397 2396 2365 3 (parallel [ > (set (pc) > (if_then_else (ne (reg:SI 14 lr) > (const_int 1 [0x1])) > (label_ref:SI 2361) > (pc))) > (set (reg:SI 14 lr) > (plus:SI (reg:SI 14 lr) > (const_int -1 [0xffffffffffffffff]))) > ]) "p1.c":4:21 1047 {*doloop_end_internal} > (int_list:REG_BR_PROB 1062895996 (nil)) > -> 2361) > > I originally thought that because the write of reg:CC_NZ is explicit in > the first pattern we expand this was sufficient, but I now understand > I'm wrong and combine should produce a pattern still expressing this. > Now the question is how to do that. > > If I add the clobber '(clobber (reg:CC CC_REGNUM))' inside the parallel > of *doloop_end_internal as last element of the vector we ICE in > 'add_clobbers' called during combine, apparently 'add_clobbers' does not > handle the insn_code_number. > > If I add it as second element of the parallel combine is not combining > the two insns. > > If I place the clobber outside the parallel as a second element of the > insn vector combine is crashing in 'recog_for_combine_1'. > > So the question is probably: where should the clobber be positioned > canonically to have this working? > > Thanks! > > Andrea
Righ, I've been explained by a knowledgeable colleague that the 'parallel' is implicit in the 'define_insn' and there's no need to express it (interestgly this is confusing the code generating 'add_clobbers'). The attached patch rewrites the pattern as such and adds the missing clobber. Tests are running, okay for trunk when done with these? Regards Andrea
>From c822f9374f1c16a0f863ca521fd87a350b616db4 Mon Sep 17 00:00:00 2001 From: Andrea Corallo <andrea.cora...@arm.com> Date: Wed, 3 Feb 2021 15:21:54 +0100 Subject: [PATCH] arm: Low overhead loop handle long range branches [PR98931] gcc/ChangeLog 2021-02-05 Andrea Corallo <andrea.cora...@arm.com> * config/arm/thumb2.md (*doloop_end_internal): Generate alternative sequence to handle long range branches. gcc/testsuite/Changelog 2021-02-08 Andrea Corallo <andrea.cora...@arm.com> * gcc.target/arm/pr98931.c: New testcase. --- gcc/config/arm/thumb2.md | 28 ++++++++++++++++++-------- gcc/testsuite/gcc.target/arm/pr98931.c | 17 ++++++++++++++++ 2 files changed, 37 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/pr98931.c diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index bd53bf320de..320fe998075 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1711,15 +1711,27 @@ ;; Originally expanded by 'doloop_end'. (define_insn "*doloop_end_internal" - [(parallel [(set (pc) - (if_then_else - (ne (reg:SI LR_REGNUM) (const_int 1)) - (label_ref (match_operand 0 "" "")) - (pc))) - (set (reg:SI LR_REGNUM) - (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])] + [(set (pc) + (if_then_else + (ne (reg:SI LR_REGNUM) (const_int 1)) + (label_ref (match_operand 0 "" "")) + (pc))) + (set (reg:SI LR_REGNUM) + (plus:SI (reg:SI LR_REGNUM) (const_int -1))) + (clobber (reg:CC CC_REGNUM))] "TARGET_32BIT && TARGET_HAVE_LOB" - "le\t%|lr, %l0") + { + if (get_attr_length (insn) == 4) + return "le\t%|lr, %l0"; + else + return "subs\t%|lr, #1;bne\t%l0"; + } + [(set (attr "length") + (if_then_else + (lt (minus (pc) (match_dup 0)) (const_int 1024)) + (const_int 4) + (const_int 6))) + (set_attr "type" "branch")]) (define_expand "doloop_begin" [(match_operand 0 "" "") diff --git a/gcc/testsuite/gcc.target/arm/pr98931.c b/gcc/testsuite/gcc.target/arm/pr98931.c new file mode 100644 index 00000000000..313876a3912 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr98931.c @@ -0,0 +1,17 @@ +/* { dg-do assemble } */ +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */ +/* { dg-options "-march=armv8.1-m.main -O3 --param=max-completely-peeled-insns=1300 --save-temps" } */ + +extern long long a[][20][26][26][22]; + +void +foo () +{ + for (short d = 0; d + 1; d++) + for (unsigned e = 0; e < 25; e += 4) + for (unsigned f = 0; f < 25; f += 4) + for (int g = 0; g < 21; g += 4) + a[4][d][e][f][g] = 0; +} + +/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */ -- 2.20.1