On Mon, 10 Feb 2020 at 17:45, Richard Earnshaw (lists) <richard.earns...@arm.com> wrote: > > On 10/02/2020 09:27, Christophe Lyon wrote: > > On Fri, 7 Feb 2020 at 17:55, Richard Earnshaw (lists) > > <richard.earns...@arm.com> wrote: > >> > >> On 07/02/2020 16:43, Christophe Lyon wrote: > >>> On Fri, 7 Feb 2020 at 14:49, Richard Earnshaw (lists) > >>> <richard.earns...@arm.com> wrote: > >>>> > >>>> On 07/02/2020 13:19, Christophe Lyon wrote: > >>>>> When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code > >>>>> for cortex-m0, I noticed that some testcases were failing because we > >>>>> still generate "ldr rX, .LCY", which is what we want to avoid with > >>>>> -mpure-code. This is latent since a recent improvement in fwprop > >>>>> (PR88833). > >>>>> > >>>>> In this patch I change the thumb1_movsi_insn pattern so that it emits > >>>>> the desired instruction sequence when arm_disable_literal_pool is set. > >>>>> > >>>>> I tried to add a define_split instead, but couldn't make it work: the > >>>>> compiler then complains it cannot split the instruction, while my new > >>>>> define_split accepts the same operand types as thumb1_movsi_insn: > >>>>> > >>>>> c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: could not > >>>>> split insn > >>>>> (insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342]) > >>>>> (symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 > >>>>> {*thumb1_movsi_insn} > >>>>> (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") [flags 0x2]) > >>>>> (nil))) > >>>>> during RTL pass: final > >>>>> > >>>>> (define_split > >>>>> [(set (match_operand:SI 0 "register_operand" "") > >>>>> (match_operand:SI 1 "general_operand" ""))] > >>>>> "TARGET_THUMB1 > >>>>> && arm_disable_literal_pool > >>>>> && GET_CODE (operands[1]) == SYMBOL_REF" > >>>>> [(clobber (const_int 0))] > >>>>> " > >>>>> gen_thumb1_movsi_symbol_ref(operands[0], operands[1]); > >>>>> DONE; > >>>>> " > >>>>> ) > >>>>> and I put this in thumb1_movsi_insn: > >>>>> if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool) > >>>>> { > >>>>> return \"#\"; > >>>>> } > >>>>> return \"ldr\\t%0, %1\"; > >>>>> > >>>>> 2020-02-07 Christophe Lyon <christophe.l...@linaro.org> > >>>>> > >>>>> * config/arm/thumb1.md (thumb1_movsi_insn): Fix ldr > >>>>> alternative to > >>>>> work with -mpure-code. > >>>>> > >>>> > >>>> + case 0: > >>>> + case 1: > >>>> + return \"movs %0, %1\"; > >>>> + case 2: > >>>> + return \"movw %0, %1\"; > >>>> > >>>> This is OK, but please replace the hard tab in the strings for MOVS/MOVW > >>>> with \\t. > >>>> > >>> > >>> OK that was merely a cut & paste from the existing code. > >>> > >>> I'm concerned that the length attribute is becoming wrong with my > >>> patch, isn't this a problem? > >>> > >> > >> Potentially yes. The branch range code needs this to handle overly long > >> jumps correctly. > >> > > > > Do you mean that the probability of problems due to that shortcoming > > is low enough that I can commit my patch? > > It's hard to say that. The number of instructions you generate for this > is significant, so it likely will throw off the calculations and > somebody will get unlucky. On the other hand, I don't think we should > pessimize code for the non-pure-code variants by inflating the size for > this unconditionally. > > It seems there are two ways to fix this. > > 1) create a new alternative for the pure-code variant with its own > length attribute, then only enable that for the case you need. That > would also allow you to go back to the templated asm format. > 2) use a level of indirection to calculate the length - I haven't tried > this, but it would be something like: > > - create a new attribute - lets say default_length > - rename length for this pattern to default_length > - create another new attribute - lets say purecode_length, add lengths > for each alternative but adjusted for the purecode variant. > - make the length attribute for this pattern select based on the > disable_literal_pool variable between the default_length and > purecode_length attributes. >
Hi Richard, Thanks for the suggestions. I've implemented option (1) above, does it match what you had in mind? Tested on arm-eabi, with -mpure-code forced, no regression. Manually checked that the expected sequence is generated with -fdisable-rtl-fwprop2. Thanks, Christophe > R. > > > > > Thanks, > > > > Christophe > > > >> R. > >> > >>> Thanks, > >>> > >>> Christophe > >>> > >>>> R. > >> >
[ARM] Fix -mpure-code for v6m When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code for cortex-m0, I noticed that some testcases were failing because we still generate "ldr rX, .LCY", which is what we want to avoid with -mpure-code. This is latent since a recent improvement in fwprop (PR88833). In this patch I change the thumb1_movsi_insn pattern so that it emits the desired instruction sequence when arm_disable_literal_pool is set. To achieve that, I introduce a new required_for_purecode attribute to enable the corresponding alternative in thumb1_movsi_insn and take the actual instruction sequence length into account. gcc/ChangeLog: 2020-02-13 Christophe Lyon <christophe.l...@linaro.org> * config/arm/arm.md (required_for_purecode): New attribute. (enabled): Handle required_for_purecode. * config/arm/thumb1.md (thumb1_movsi_insn): Add alternative to work with -mpure-code.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index f89a2d4..aa8c34d 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -97,6 +97,11 @@ ; an IT block in their expansion which is not a short IT. (define_attr "enabled_for_short_it" "no,yes" (const_string "yes")) +; Mark an instruction sequence as the required way of loading a +; constant when -mpure-code is enabled (which implies +; arm_disable_literal_pool) +(define_attr "required_for_purecode" "no,yes" (const_string "no")) + ;; Operand number of an input operand that is shifted. Zero if the ;; given instruction does not shift one of its input operands. (define_attr "shift" "" (const_int 0)) @@ -230,6 +235,10 @@ (match_test "arm_restrict_it")) (const_string "no") + (and (eq_attr "required_for_purecode" "yes") + (not (match_test "arm_disable_literal_pool"))) + (const_string "no") + (eq_attr "arch_enabled" "no") (const_string "no")] (const_string "yes"))) diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 613cf9c..2486163 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -691,8 +691,8 @@ ) (define_insn "*thumb1_movsi_insn" - [(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,r,l,l,l,>,l, m,*l*h*k") - (match_operand:SI 1 "general_operand" "l, I,j,J,K,>,l,mi,l,*l*h*k"))] + [(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,r,l,l,l,>,l, l, m,*l*h*k") + (match_operand:SI 1 "general_operand" "l, I,j,J,K,>,l,i, mi,l,*l*h*k"))] "TARGET_THUMB1 && ( register_operand (operands[0], SImode) || register_operand (operands[1], SImode))" @@ -704,14 +704,16 @@ # ldmia\\t%1, {%0} stmia\\t%0, {%1} + movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, #:lower0_7:%1 ldr\\t%0, %1 str\\t%1, %0 mov\\t%0, %1" - [(set_attr "length" "2,2,4,4,4,2,2,2,2,2") - (set_attr "type" "mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,load_4,store_4,mov_reg") - (set_attr "pool_range" "*,*,*,*,*,*,*,1018,*,*") - (set_attr "arch" "t1,t1,v8mb,t1,t1,t1,t1,t1,t1,t1") - (set_attr "conds" "set,clob,nocond,*,*,nocond,nocond,nocond,nocond,nocond")]) + [(set_attr "length" "2,2,4,4,4,2,2,14,2,2,2") + (set_attr "type" "mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,alu_sreg,load_4,store_4,mov_reg") + (set_attr "pool_range" "*,*,*,*,*,*,*, *,1018,*,*") + (set_attr "arch" "t1,t1,v8mb,t1,t1,t1,t1,t1,t1,t1,t1") + (set_attr "required_for_purecode" "no,no,no,no,no,no,no,yes,no,no,no") + (set_attr "conds" "set,clob,nocond,*,*,nocond,nocond,nocond,nocond,nocond,nocond")]) ; Split the load of 64-bit constant into two loads for high and low 32-bit parts respectively ; to see if we can load them in fewer instructions or fewer cycles.