On Wed, Jul 31, 2024 at 9:41 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> The testcase contains the constant:
>
>   arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));
>
> which was initially hoisted by hand, but which gimple optimisers later
> propagated to each use (as expected).  The constant was then expanded
> as a load-and-duplicate from the constant pool.  Normally that load
> should then be hoisted back out of the loop, but may_trap_or_fault_p
> stopped that from happening in this case.
>
> The code responsible was:
>
>       if (/* MEM_NOTRAP_P only relates to the actual position of the memory
>              reference; moving it out of context such as when moving code
>              when optimizing, might cause its address to become invalid.  */
>           code_changed
>           || !MEM_NOTRAP_P (x))
>         {
>           poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
>           return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size,
>                                         GET_MODE (x), code_changed);
>         }
>
> where code_changed is true.  (Arguably it doesn't need to be true in
> this case, if we inserted invariants on the preheader edge, but it
> would still need to be true for conditionally executed loads.)
>
> Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1
> would recognise that the address refers to the constant pool.
> However, the SVE load-and-replicate instructions have a limited
> offset range, so it isn't possible for them to have a LO_SUM address.
> All we have is a plain pseudo base register.
>
> MEM_READONLY_P is defined as:
>
> /* 1 if RTX is a mem that is statically allocated in read-only memory.  */
>   (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging)
>
> and so I think it should be safe to move memory references if both
> MEM_READONLY_P and MEM_NOTRAP_P are true.
>
> The testcase isn't a minimal reproducer, but I think it's good
> to have a realistic full routine in the testsuite.
>
> Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?


This is breaking the build on a few targets (x86_64 and powerpc64le so
far reported, see PR 116200).

>From what I can tell is that it is treating `(plus:DI (ashift:DI
(reg:DI 0 ax [690]) (const_int 3 [0x3]))  (label_ref:DI 1620))` as not
trapping and allowing it to be moved before the check of ax being in
the range [0..2] and we have eax being (unsigned long)(unsigned int)-9
in value. So we get a bogus address which will trap. I put my findings
in PR 116200 too.

Thanks,
Andrew Pinski


>
> Richard
>
>
> gcc/
>         PR rtl-optimization/116145
>         * rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code
>         movement if MEM_READONLY_P is also true.
>
> gcc/testsuite/
>         PR rtl-optimization/116145
>         * gcc.target/aarch64/sve/acle/general/pr116145.c: New test.
> ---
>  gcc/rtlanal.cc                                | 14 ++++--
>  .../aarch64/sve/acle/general/pr116145.c       | 46 +++++++++++++++++++
>  2 files changed, 56 insertions(+), 4 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
>
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index 4158a531bdd..893a6afbbc5 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags)
>           && MEM_VOLATILE_P (x)
>           && XEXP (x, 0) == stack_pointer_rtx)
>         return true;
> -      if (/* MEM_NOTRAP_P only relates to the actual position of the memory
> -            reference; moving it out of context such as when moving code
> -            when optimizing, might cause its address to become invalid.  */
> -         code_changed
> +      if (/* MEM_READONLY_P means that the memory is both statically
> +            allocated and readonly, so MEM_NOTRAP_P should remain true
> +            even if the memory reference is moved.  This is certainly
> +            true for the important case of force_const_mem.
> +
> +            Otherwise, MEM_NOTRAP_P only relates to the actual position
> +            of the memory reference; moving it out of context such as
> +            when moving code when optimizing, might cause its address
> +            to become invalid.  */
> +         (code_changed && !MEM_READONLY_P (x))
>           || !MEM_NOTRAP_P (x))
>         {
>           poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
> new file mode 100644
> index 00000000000..a3d93d3e1c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
> @@ -0,0 +1,46 @@
> +// { dg-options "-O2" }
> +
> +#include <stdlib.h>
> +#include <arm_sve.h>
> +
> +#pragma GCC target "+sve2"
> +
> +typedef unsigned char uchar;
> +
> +const uchar *
> +search_line_fast (const uchar *s, const uchar *end)
> +{
> +  size_t VL = svcntb();
> +  svuint8_t arr1, arr2;
> +  svbool_t pc, pg = svptrue_b8();
> +
> +  // This should not be loaded inside the loop every time.
> +  arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));
> +
> +  for (; s+VL <= end; s += VL) {
> +    arr1 = svld1_u8(pg, s);
> +    pc = svmatch_u8(pg, arr1, arr2);
> +
> +    if (svptest_any(pg, pc)) {
> +      pc = svbrkb_z(pg, pc);
> +      return s+svcntp_b8(pg, pc);
> +    }
> +  }
> +
> +  // Handle remainder.
> +  if (s < end) {
> +    pg = svwhilelt_b8((size_t)s, (size_t)end);
> +
> +    arr1 = svld1_u8(pg, s);
> +    pc = svmatch_u8(pg, arr1, arr2);
> +
> +    if (svptest_any(pg, pc)) {
> +      pc = svbrkb_z(pg, pc);
> +      return s+svcntp_b8(pg, pc);
> +    }
> +  }
> +
> +  return end;
> +}
> +
> +// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } 
> }
> --
> 2.25.1
>

Reply via email to