Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes:
> Hi Richard, thanks for the review!
>
> We've tested the pre-patch version on PowerPC and there was indeed an
> issue, which is now fixed with the patched version.

Thanks for the extra testing.  The patch is ok for trunk and for
any necessary backports with the subreg_size_lowpart_offset change
mentioned below.

It doesn't look like you have commit access yet.  If you'd like it,
please follow the instructions at https://gcc.gnu.org/gitwrite.html
(I'll sponsor).

Richard

> Konstantinos
>
> On Fri, Jul 4, 2025 at 9:34 AM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes:
>> > On Wed, May 7, 2025 at 11:29 AM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> >> But I thought the code was allowing multiple stores to be forwarded to
>> >> a single (wider) load.  E.g. 4 individual byte stores at address X, X+1,
>> >> X+2 and X+3 could be forwarded to a 4-byte load at address X.  And the 
>> >> code
>> >> I mentioned is handling the least significant byte by zero-extending it.
>> >>
>> >> For big-endian targets, the least significant byte should come from
>> >> address X+3 rather than address X.  The byte at address X (i.e. the
>> >> byte with the equal offset) should instead go in the most significant
>> >> byte, typically using a shift left.
>> > Hi, I'm attaching a patch that we prepared for this. It would be of
>> > great help if someone could test it on a big-endian target, preferably
>> > one with BITS_BIG_ENDIAN == 0 as we were having issues with that in
>> > the past.
>> >
>> > From 278f83b834a97541fe0a2d2bbad84aca34601fed Mon Sep 17 00:00:00 2001
>> > From: Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu>
>> > Date: Tue, 3 Jun 2025 09:16:17 +0200
>> > Subject: [PATCH] asf: Fix offset check in base reg initialization for
>> >  big-endian targets
>> >
>> > During the base register initialization, in the case that we are
>> > eliminating the load instruction, we are using `offset == 0` in order
>> > to find the store instruction that has the same offset as the load. This
>> > would not work on big-endian targets where byte 0 would be the MS byte.
>> >
>> > This patch updates the condition to take into account the target's 
>> > endianness.
>> >
>> > We are, also, removing the adjustment of the starting position for the
>> > bitfield insertion, when BYTES_BIG_ENDIAN != BITS_BIG_ENDIAN. This is
>> > supposed to be handled inside `store_bit_field` and it's not needed anymore
>> > after the offset fix.
>>
>> Yeah.  In particular BITS_BIG_ENDIAN describes how bits are measured
>> by insv and extv.  Its effect is localise as much as possible and so it
>> doesn't affect how store_bit_field measures bits.  store_bit_field
>> instead always measures in memory order (BITS_PER_UNIT == second byte
>> in memory order, etc.)
>>
>> Out of curiosity, did you try this on a little-endian POWER system?
>> That has BITS_BIG_ENDIAN==1, BYTES_BIG_ENDIAN==0, so I would have
>> expected something to have gone wrong with the pre-patch code.  It would
>> be good to check if you could (there are some machines in the compile farm).
>>
>> > gcc/ChangeLog:
>> >
>> >       * avoid-store-forwarding.cc (generate_bit_insert_sequence):
>> >       Remove adjustment of bitfield insertion's starting position
>> >       when BYTES_BIG_ENDIAN != BITS_BIG_ENDIAN.
>> >       * avoid-store-forwarding.cc (process_store_forwarding):
>> >       Update offset check in base reg initialization to take
>> >       into account the target's endianness.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >       * gcc.target/aarch64/avoid-store-forwarding-be.c: New test.
>> > ---
>> >  gcc/avoid-store-forwarding.cc                 | 18 ++++-----------
>> >  .../aarch64/avoid-store-forwarding-be.c       | 23 +++++++++++++++++++
>> >  2 files changed, 28 insertions(+), 13 deletions(-)
>> >  create mode 100644 
>> > gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c
>> >
>> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
>> > index 6825d0426ecc..457d1d0c200c 100644
>> > --- a/gcc/avoid-store-forwarding.cc
>> > +++ b/gcc/avoid-store-forwarding.cc
>> > @@ -119,17 +119,6 @@ generate_bit_insert_sequence (store_fwd_info 
>> > *store_info, rtx dest)
>> >    unsigned HOST_WIDE_INT bitsize = store_size * BITS_PER_UNIT;
>> >    unsigned HOST_WIDE_INT start = store_info->offset * BITS_PER_UNIT;
>> >
>> > -  /* Adjust START for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
>> > -     Given that the bytes will be reversed in this case, we need to
>> > -     calculate the starting position from the end of the destination
>> > -     register.  */
>> > -  if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
>> > -    {
>> > -      unsigned HOST_WIDE_INT load_mode_bitsize
>> > -     = (GET_MODE_BITSIZE (GET_MODE (dest))).to_constant ();
>> > -      start = load_mode_bitsize - bitsize - start;
>> > -    }
>> > -
>> >    rtx mov_reg = store_info->mov_reg;
>> >    store_bit_field (dest, bitsize, start, 0, 0, GET_MODE (mov_reg), 
>> > mov_reg,
>> >                  false, false);
>> > @@ -248,11 +237,14 @@ process_store_forwarding (vec<store_fwd_info> 
>> > &stores, rtx_insn *load_insn,
>> >      {
>> >        it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
>> >        rtx_insn *insns = NULL;
>> > -      const bool has_zero_offset = it->offset == 0;
>> > +      HOST_WIDE_INT store_size = MEM_SIZE (it->store_mem).to_constant ();
>> > +      const bool has_base_offset = BYTES_BIG_ENDIAN
>> > +                                ? it->offset == load_size - store_size
>> > +                                : it->offset == 0;
>>
>> Although this looks correct for pure-endian systems, could you instead use:
>>
>>       const bool has_base_offset
>>         = known_eq (poly_uint64 (it->offset),
>>                     subreg_size_lowpart_offset (MEM_SIZE (it->store_mem),
>>                                                 load_size));
>>
>> That should cope with WORDS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
>>
>> I've tested the patch with that change on aarch64_be-none-elf, where it
>> does fix the testcase and shows no regressions.  So the patch is ok with
>> the change above.  Like I say, though, it would be good to double-check
>> on a little-endian POWER system to see whether removing the
>> BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN has a noticeable effect.  Hopefully
>> the patch fixes something there too.
>>
>> Thanks,
>> Richard
>>
>> >
>> >        /* If we're eliminating the load then find the store with zero 
>> > offset
>> >        and use it as the base register to avoid a bit insert if possible.  
>> > */
>> > -      if (load_elim && has_zero_offset)
>> > +      if (load_elim && has_base_offset)
>> >       {
>> >         start_sequence ();
>> >
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c 
>> > b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c
>> > new file mode 100644
>> > index 000000000000..2e8946b25e34
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c
>> > @@ -0,0 +1,23 @@
>> > +/* { dg-do run } */
>> > +/* { dg-require-effective-target aarch64_big_endian } */
>> > +/* { dg-options "-O2 -favoid-store-forwarding" } */
>> > +
>> > +typedef union {
>> > +    char arr[2];
>> > +    short value;
>> > +} DataUnion;
>> > +
>> > +short __attribute__ ((noinline))
>> > +ssll (DataUnion *data, char x, char y)
>> > +{
>> > +  data->arr[0] = x;
>> > +  data->arr[1] = y;
>> > +  return data->value;
>> > +}
>> > +
>> > +int main () {
>> > +  DataUnion data = {};
>> > +  short value = ssll (&data, 0, 1);
>> > +  if (value != 1)
>> > +    __builtin_abort ();
>> > +}
>> > \ No newline at end of file

Reply via email to