ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists) <richard.earns...@arm.com>: > > On 18/06/2024 19:14, Siarhei Volkau wrote: > > If the address register is dead after load/store operation it looks > > beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions, > > at least if optimizing for size. > > > > Changes v1 -> v2: > > - switching to peephole2 approach > > - added test case > > > > gcc/ChangeLog: > > > > * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New. > > (peephole2 to rewrite DI/DF store): New. > > (thumb1_movdi_insn): Handle overlapped regs ldmia case. > > (thumb_movdf_insn): Likewise. > > > > * config/arm/iterators.md (DIDF): New. > > > > gcc/testsuite: > > > > * gcc.target/arm/thumb1-load-store-64bit.c: Add new test. > > > > Signed-off-by: Siarhei Volkau <lis8...@gmail.com> > > --- > > gcc/config/arm/iterators.md | 3 +++ > > gcc/config/arm/thumb1.md | 27 ++++++++++++++++++- > > .../gcc.target/arm/thumb1-load-store-64bit.c | 16 +++++++++++ > > 3 files changed, 45 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c > > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > > index 8d066fcf05d..09046bff83b 100644 > > --- a/gcc/config/arm/iterators.md > > +++ b/gcc/config/arm/iterators.md > > @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI]) > > ;; A list of the 32bit and 64bit integer modes > > (define_mode_iterator SIDI [SI DI]) > > > > +;; A list of the 64bit modes for thumb1. > > +(define_mode_iterator DIDF [DI DF]) > > + > > ;; A list of atomic compare and swap success return modes > > (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")]) > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > index d7074b43f60..ed4b706773a 100644 > > --- a/gcc/config/arm/thumb1.md > > +++ b/gcc/config/arm/thumb1.md > > @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn" > > gcc_assert (TARGET_HAVE_MOVT); > > return \"movw\\t%Q0, %L1\;movs\\tR0, #0\"; > > case 4: > > + if (reg_overlap_mentioned_p (operands[0], operands[1])) > > + return \"ldmia\\t%m1, {%0, %H0}\"; > > See below for why I don't think this is a case we need to consider here. > > > return \"ldmia\\t%1, {%0, %H0}\"; > > case 5: > > return \"stmia\\t%0, {%1, %H1}\"; > > @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn" > > return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\"; > > return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\"; > > case 1: > > + if (reg_overlap_mentioned_p (operands[0], operands[1])) > > + return \"ldmia\\t%m1, {%0, %H0}\"; > > return \"ldmia\\t%1, {%0, %H0}\"; > > case 2: > > return \"stmia\\t%0, {%1, %H1}\"; > > @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn" > > (set_attr "conds" "clob") > > (set_attr "type" "multiple")] > > ) > > - > > + > > +;; match patterns usable by ldmia/stmia > > +(define_peephole2 > > + [(set (match_operand:DIDF 0 "low_register_operand" "") > > + (mem:DIDF (match_operand:SI 1 "low_register_operand")))] > > + "TARGET_THUMB1 > > + && (peep2_reg_dead_p (1, operands[1]) > > + || REGNO (operands[0]) + 1 == REGNO (operands[1]))" > > I don't understand this second condition (partial overlap of the base address > with the value loaded), what are you guarding against here? The instruction > specification says that there is no writeback if the base register has any > overlap with any of the loaded registers, so really we should reject the > peephole if that's true (we'd have invalid RTL as well as there would end up > being two writes to the same register). >
The first condition here is for ldmia cases with writeback, the second condition is for cases without writeback. I falsely thought that base register has to be the last register in the list similarly like for stmia it has to be first. So, I have to reject the no-writeback case there because it emits incorrect RTL, this is clear now. However, to not miss possible optimization, no-writeback cases check might be moved to instruction itself as it was done in v1 patch (incorrect instruction length will appear again then). Any objections on that? > > + [(set (match_dup 0) > > + (mem:DIDF (post_inc:SI (match_dup 1))))] > > + "" > > This is not enough, unfortunately. MEM() objects carry attributes about the > memory accessed (alias sets, known alignment, etc) and these will not be > propagated correctly if you rewrite the pattern this way. The correct > solution is to match the entire mem as operand1, then use change_address to > rewrite that. Something like: > > operands[1] = change_address (operands[1], VOIDmode, > gen_rtx_POST_INC (SImode, > XEXP (operands[1], 0))); > Got it, will rewrite, thanks. > > +) > > + > > +(define_peephole2 > > + [(set (mem:DIDF (match_operand:SI 1 "low_register_operand")) > > + (match_operand:DIDF 0 "low_register_operand" ""))] > > + "TARGET_THUMB1 > > + && peep2_reg_dead_p (1, operands[1])" > > + [(set (mem:DIDF (post_inc:SI (match_dup 1))) > > + (match_dup 0))] > > There's a similar address rewriting issue here as well, but we also have to > guard against the (unlikely) case where the base register is part of the > value stored if it isn't the lowest numbered register in the list. In that > case the value stored would be undefined. That is: > > stmia r0!, {r0, r1} > > is ok, but > > stmia r1!, {r0, r1} > > is not. > I don't think that base register will appear also in the list here, because it can't be Pmode and DI/DFmode at the same time, can it? > > + "" > > +) > > diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c > > b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c > > new file mode 100644 > > index 00000000000..167fa9ec876 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c > > @@ -0,0 +1,16 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-mthumb -Os" } */ > > +/* { dg-require-effective-target arm_thumb1_ok } */ > > + > > +void copy_df(double *dst, const double *src) > > +{ > > + *dst = *src; > > +} > > + > > +void copy_di(unsigned long long *dst, const unsigned long long *src) > > +{ > > + *dst = *src; > > +} > > + > > +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */ > > +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */ > > R. > BR, Siarhei