Applied to trunk (with fixups to up the Changelog and after checking again that the subreg_size_lowpart_offset is included). Thanks!
--Philipp. On Tue, 15 Jul 2025 at 15:43, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes: > > Hi Richard, thanks for the review! > > > > We've tested the pre-patch version on PowerPC and there was indeed an > > issue, which is now fixed with the patched version. > > Thanks for the extra testing. The patch is ok for trunk and for > any necessary backports with the subreg_size_lowpart_offset change > mentioned below. > > It doesn't look like you have commit access yet. If you'd like it, > please follow the instructions at https://gcc.gnu.org/gitwrite.html > (I'll sponsor). > > Richard > > > Konstantinos > > > > On Fri, Jul 4, 2025 at 9:34 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> writes: > >> > On Wed, May 7, 2025 at 11:29 AM Richard Sandiford > >> > <richard.sandif...@arm.com> wrote: > >> >> But I thought the code was allowing multiple stores to be forwarded to > >> >> a single (wider) load. E.g. 4 individual byte stores at address X, X+1, > >> >> X+2 and X+3 could be forwarded to a 4-byte load at address X. And the > >> >> code > >> >> I mentioned is handling the least significant byte by zero-extending it. > >> >> > >> >> For big-endian targets, the least significant byte should come from > >> >> address X+3 rather than address X. The byte at address X (i.e. the > >> >> byte with the equal offset) should instead go in the most significant > >> >> byte, typically using a shift left. > >> > Hi, I'm attaching a patch that we prepared for this. It would be of > >> > great help if someone could test it on a big-endian target, preferably > >> > one with BITS_BIG_ENDIAN == 0 as we were having issues with that in > >> > the past. > >> > > >> > From 278f83b834a97541fe0a2d2bbad84aca34601fed Mon Sep 17 00:00:00 2001 > >> > From: Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> > >> > Date: Tue, 3 Jun 2025 09:16:17 +0200 > >> > Subject: [PATCH] asf: Fix offset check in base reg initialization for > >> > big-endian targets > >> > > >> > During the base register initialization, in the case that we are > >> > eliminating the load instruction, we are using `offset == 0` in order > >> > to find the store instruction that has the same offset as the load. This > >> > would not work on big-endian targets where byte 0 would be the MS byte. > >> > > >> > This patch updates the condition to take into account the target's > >> > endianness. > >> > > >> > We are, also, removing the adjustment of the starting position for the > >> > bitfield insertion, when BYTES_BIG_ENDIAN != BITS_BIG_ENDIAN. This is > >> > supposed to be handled inside `store_bit_field` and it's not needed > >> > anymore > >> > after the offset fix. > >> > >> Yeah. In particular BITS_BIG_ENDIAN describes how bits are measured > >> by insv and extv. Its effect is localise as much as possible and so it > >> doesn't affect how store_bit_field measures bits. store_bit_field > >> instead always measures in memory order (BITS_PER_UNIT == second byte > >> in memory order, etc.) > >> > >> Out of curiosity, did you try this on a little-endian POWER system? > >> That has BITS_BIG_ENDIAN==1, BYTES_BIG_ENDIAN==0, so I would have > >> expected something to have gone wrong with the pre-patch code. It would > >> be good to check if you could (there are some machines in the compile > >> farm). > >> > >> > gcc/ChangeLog: > >> > > >> > * avoid-store-forwarding.cc (generate_bit_insert_sequence): > >> > Remove adjustment of bitfield insertion's starting position > >> > when BYTES_BIG_ENDIAN != BITS_BIG_ENDIAN. > >> > * avoid-store-forwarding.cc (process_store_forwarding): > >> > Update offset check in base reg initialization to take > >> > into account the target's endianness. > >> > > >> > gcc/testsuite/ChangeLog: > >> > > >> > * gcc.target/aarch64/avoid-store-forwarding-be.c: New test. > >> > --- > >> > gcc/avoid-store-forwarding.cc | 18 ++++----------- > >> > .../aarch64/avoid-store-forwarding-be.c | 23 +++++++++++++++++++ > >> > 2 files changed, 28 insertions(+), 13 deletions(-) > >> > create mode 100644 > >> > gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c > >> > > >> > diff --git a/gcc/avoid-store-forwarding.cc > >> > b/gcc/avoid-store-forwarding.cc > >> > index 6825d0426ecc..457d1d0c200c 100644 > >> > --- a/gcc/avoid-store-forwarding.cc > >> > +++ b/gcc/avoid-store-forwarding.cc > >> > @@ -119,17 +119,6 @@ generate_bit_insert_sequence (store_fwd_info > >> > *store_info, rtx dest) > >> > unsigned HOST_WIDE_INT bitsize = store_size * BITS_PER_UNIT; > >> > unsigned HOST_WIDE_INT start = store_info->offset * BITS_PER_UNIT; > >> > > >> > - /* Adjust START for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. > >> > - Given that the bytes will be reversed in this case, we need to > >> > - calculate the starting position from the end of the destination > >> > - register. */ > >> > - if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) > >> > - { > >> > - unsigned HOST_WIDE_INT load_mode_bitsize > >> > - = (GET_MODE_BITSIZE (GET_MODE (dest))).to_constant (); > >> > - start = load_mode_bitsize - bitsize - start; > >> > - } > >> > - > >> > rtx mov_reg = store_info->mov_reg; > >> > store_bit_field (dest, bitsize, start, 0, 0, GET_MODE (mov_reg), > >> > mov_reg, > >> > false, false); > >> > @@ -248,11 +237,14 @@ process_store_forwarding (vec<store_fwd_info> > >> > &stores, rtx_insn *load_insn, > >> > { > >> > it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem)); > >> > rtx_insn *insns = NULL; > >> > - const bool has_zero_offset = it->offset == 0; > >> > + HOST_WIDE_INT store_size = MEM_SIZE (it->store_mem).to_constant > >> > (); > >> > + const bool has_base_offset = BYTES_BIG_ENDIAN > >> > + ? it->offset == load_size - store_size > >> > + : it->offset == 0; > >> > >> Although this looks correct for pure-endian systems, could you instead use: > >> > >> const bool has_base_offset > >> = known_eq (poly_uint64 (it->offset), > >> subreg_size_lowpart_offset (MEM_SIZE (it->store_mem), > >> load_size)); > >> > >> That should cope with WORDS_BIG_ENDIAN != BYTES_BIG_ENDIAN. > >> > >> I've tested the patch with that change on aarch64_be-none-elf, where it > >> does fix the testcase and shows no regressions. So the patch is ok with > >> the change above. Like I say, though, it would be good to double-check > >> on a little-endian POWER system to see whether removing the > >> BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN has a noticeable effect. Hopefully > >> the patch fixes something there too. > >> > >> Thanks, > >> Richard > >> > >> > > >> > /* If we're eliminating the load then find the store with zero > >> > offset > >> > and use it as the base register to avoid a bit insert if > >> > possible. */ > >> > - if (load_elim && has_zero_offset) > >> > + if (load_elim && has_base_offset) > >> > { > >> > start_sequence (); > >> > > >> > diff --git > >> > a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c > >> > b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c > >> > new file mode 100644 > >> > index 000000000000..2e8946b25e34 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-be.c > >> > @@ -0,0 +1,23 @@ > >> > +/* { dg-do run } */ > >> > +/* { dg-require-effective-target aarch64_big_endian } */ > >> > +/* { dg-options "-O2 -favoid-store-forwarding" } */ > >> > + > >> > +typedef union { > >> > + char arr[2]; > >> > + short value; > >> > +} DataUnion; > >> > + > >> > +short __attribute__ ((noinline)) > >> > +ssll (DataUnion *data, char x, char y) > >> > +{ > >> > + data->arr[0] = x; > >> > + data->arr[1] = y; > >> > + return data->value; > >> > +} > >> > + > >> > +int main () { > >> > + DataUnion data = {}; > >> > + short value = ssll (&data, 0, 1); > >> > + if (value != 1) > >> > + __builtin_abort (); > >> > +} > >> > \ No newline at end of file