ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists)
<richard.earns...@arm.com>:
>
> On 18/06/2024 19:14, Siarhei Volkau wrote:
> > If the address register is dead after load/store operation it looks
> > beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> > at least if optimizing for size.
> >
> > Changes v1 -> v2:
> >  - switching to peephole2 approach
> >  - added test case
> >
> > gcc/ChangeLog:
> >
> >         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
> >         (peephole2 to rewrite DI/DF store): New.
> >         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
> >         (thumb_movdf_insn): Likewise.
> >
> >       * config/arm/iterators.md (DIDF): New.
> >
> > gcc/testsuite:
> >
> >         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> >
> > Signed-off-by: Siarhei Volkau <lis8...@gmail.com>
> > ---
> >  gcc/config/arm/iterators.md                   |  3 +++
> >  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
> >  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
> >  3 files changed, 45 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 8d066fcf05d..09046bff83b 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
> >  ;; A list of the 32bit and 64bit integer modes
> >  (define_mode_iterator SIDI [SI DI])
> >
> > +;; A list of the 64bit modes for thumb1.
> > +(define_mode_iterator DIDF [DI DF])
> > +
> >  ;; A list of atomic compare and swap success return modes
> >  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
> >
> > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > index d7074b43f60..ed4b706773a 100644
> > --- a/gcc/config/arm/thumb1.md
> > +++ b/gcc/config/arm/thumb1.md
> > @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
> >        gcc_assert (TARGET_HAVE_MOVT);
> >        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
> >      case 4:
> > +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> > +     return \"ldmia\\t%m1, {%0, %H0}\";
>
> See below for why I don't think this is a case we need to consider here.
>
> >        return \"ldmia\\t%1, {%0, %H0}\";
> >      case 5:
> >        return \"stmia\\t%0, {%1, %H1}\";
> > @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
> >       return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
> >        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
> >      case 1:
> > +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> > +     return \"ldmia\\t%m1, {%0, %H0}\";
> >        return \"ldmia\\t%1, {%0, %H0}\";
> >      case 2:
> >        return \"stmia\\t%0, {%1, %H1}\";
> > @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
> >     (set_attr "conds" "clob")
> >     (set_attr "type" "multiple")]
> >  )
> > -
> > +
> > +;; match patterns usable by ldmia/stmia
> > +(define_peephole2
> > +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> > +     (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> > +  "TARGET_THUMB1
> > +   && (peep2_reg_dead_p (1, operands[1])
> > +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
>
> I don't understand this second condition (partial overlap of the base address 
> with the value loaded), what are you guarding against here?  The instruction 
> specification says that there is no writeback if the base register has any 
> overlap with any of the loaded registers, so really we should reject the 
> peephole if that's true (we'd have invalid RTL as well as there would end up 
> being two writes to the same register).
>

The first condition here is for ldmia cases with writeback, the second
condition is for cases without writeback. I falsely thought that base
register has to be the last register in the list similarly like for
stmia it has to be first.
So, I have to reject the no-writeback case there because it emits
incorrect RTL, this is clear now.

However, to not miss possible optimization, no-writeback cases check
might be moved to instruction itself as it was done in v1 patch
(incorrect instruction length will appear again then).
Any objections on that?

> > +  [(set (match_dup 0)
> > +     (mem:DIDF (post_inc:SI (match_dup 1))))]
> > +  ""
>
> This is not enough, unfortunately.  MEM() objects carry attributes about the 
> memory accessed (alias sets, known alignment, etc) and these will not be 
> propagated correctly if you rewrite the pattern this way.  The correct 
> solution is to match the entire mem as operand1, then use change_address to 
> rewrite that.  Something like:
>
>      operands[1] = change_address (operands[1], VOIDmode,
>                                    gen_rtx_POST_INC (SImode,
>                                                      XEXP (operands[1], 0)));
>

Got it, will rewrite, thanks.

> > +)
> > +
> > +(define_peephole2
> > +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> > +     (match_operand:DIDF 0 "low_register_operand" ""))]
> > +  "TARGET_THUMB1
> > +   && peep2_reg_dead_p (1, operands[1])"
> > +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> > +     (match_dup 0))]
>
> There's a similar address rewriting issue here as well, but we also have to 
> guard against the (unlikely) case where the base register is part of the 
> value stored if it isn't the lowest numbered register in the list.  In that 
> case the value stored would be undefined.  That is:
>
>   stmia r0!, {r0, r1}
>
> is ok, but
>
>   stmia r1!, {r0, r1}
>
> is not.
>

I don't think that base register will appear also in the list here,
because it can't be Pmode and DI/DFmode at the same time, can it?

> > +  ""
> > +)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c 
> > b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> > new file mode 100644
> > index 00000000000..167fa9ec876
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mthumb -Os" }  */
> > +/* { dg-require-effective-target arm_thumb1_ok } */
> > +
> > +void copy_df(double *dst, const double *src)
> > +{
> > +    *dst = *src;
> > +}
> > +
> > +void copy_di(unsigned long long *dst, const unsigned long long *src)
> > +{
> > +    *dst = *src;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> > +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
>
> R.
>

BR,
Siarhei

Reply via email to