On Wed, Jul 31, 2024 at 9:41 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > The testcase contains the constant: > > arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); > > which was initially hoisted by hand, but which gimple optimisers later > propagated to each use (as expected). The constant was then expanded > as a load-and-duplicate from the constant pool. Normally that load > should then be hoisted back out of the loop, but may_trap_or_fault_p > stopped that from happening in this case. > > The code responsible was: > > if (/* MEM_NOTRAP_P only relates to the actual position of the memory > reference; moving it out of context such as when moving code > when optimizing, might cause its address to become invalid. */ > code_changed > || !MEM_NOTRAP_P (x)) > { > poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; > return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size, > GET_MODE (x), code_changed); > } > > where code_changed is true. (Arguably it doesn't need to be true in > this case, if we inserted invariants on the preheader edge, but it > would still need to be true for conditionally executed loads.) > > Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1 > would recognise that the address refers to the constant pool. > However, the SVE load-and-replicate instructions have a limited > offset range, so it isn't possible for them to have a LO_SUM address. > All we have is a plain pseudo base register. > > MEM_READONLY_P is defined as: > > /* 1 if RTX is a mem that is statically allocated in read-only memory. */ > (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging) > > and so I think it should be safe to move memory references if both > MEM_READONLY_P and MEM_NOTRAP_P are true. > > The testcase isn't a minimal reproducer, but I think it's good > to have a realistic full routine in the testsuite. > > Bootstrapped & regression-tested on aarch64-linux-gnu. OK to install?
This is breaking the build on a few targets (x86_64 and powerpc64le so far reported, see PR 116200). >From what I can tell is that it is treating `(plus:DI (ashift:DI (reg:DI 0 ax [690]) (const_int 3 [0x3])) (label_ref:DI 1620))` as not trapping and allowing it to be moved before the check of ax being in the range [0..2] and we have eax being (unsigned long)(unsigned int)-9 in value. So we get a bogus address which will trap. I put my findings in PR 116200 too. Thanks, Andrew Pinski > > Richard > > > gcc/ > PR rtl-optimization/116145 > * rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code > movement if MEM_READONLY_P is also true. > > gcc/testsuite/ > PR rtl-optimization/116145 > * gcc.target/aarch64/sve/acle/general/pr116145.c: New test. > --- > gcc/rtlanal.cc | 14 ++++-- > .../aarch64/sve/acle/general/pr116145.c | 46 +++++++++++++++++++ > 2 files changed, 56 insertions(+), 4 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > > diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc > index 4158a531bdd..893a6afbbc5 100644 > --- a/gcc/rtlanal.cc > +++ b/gcc/rtlanal.cc > @@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags) > && MEM_VOLATILE_P (x) > && XEXP (x, 0) == stack_pointer_rtx) > return true; > - if (/* MEM_NOTRAP_P only relates to the actual position of the memory > - reference; moving it out of context such as when moving code > - when optimizing, might cause its address to become invalid. */ > - code_changed > + if (/* MEM_READONLY_P means that the memory is both statically > + allocated and readonly, so MEM_NOTRAP_P should remain true > + even if the memory reference is moved. This is certainly > + true for the important case of force_const_mem. > + > + Otherwise, MEM_NOTRAP_P only relates to the actual position > + of the memory reference; moving it out of context such as > + when moving code when optimizing, might cause its address > + to become invalid. */ > + (code_changed && !MEM_READONLY_P (x)) > || !MEM_NOTRAP_P (x)) > { > poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > new file mode 100644 > index 00000000000..a3d93d3e1c8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > @@ -0,0 +1,46 @@ > +// { dg-options "-O2" } > + > +#include <stdlib.h> > +#include <arm_sve.h> > + > +#pragma GCC target "+sve2" > + > +typedef unsigned char uchar; > + > +const uchar * > +search_line_fast (const uchar *s, const uchar *end) > +{ > + size_t VL = svcntb(); > + svuint8_t arr1, arr2; > + svbool_t pc, pg = svptrue_b8(); > + > + // This should not be loaded inside the loop every time. > + arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); > + > + for (; s+VL <= end; s += VL) { > + arr1 = svld1_u8(pg, s); > + pc = svmatch_u8(pg, arr1, arr2); > + > + if (svptest_any(pg, pc)) { > + pc = svbrkb_z(pg, pc); > + return s+svcntp_b8(pg, pc); > + } > + } > + > + // Handle remainder. > + if (s < end) { > + pg = svwhilelt_b8((size_t)s, (size_t)end); > + > + arr1 = svld1_u8(pg, s); > + pc = svmatch_u8(pg, arr1, arr2); > + > + if (svptest_any(pg, pc)) { > + pc = svbrkb_z(pg, pc); > + return s+svcntp_b8(pg, pc); > + } > + } > + > + return end; > +} > + > +// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } > } > -- > 2.25.1 >