Andrew Pinski <quic_apin...@quicinc.com> writes: > The split for movv8di is not ready to handle the case where the setting > register overlaps with the address of the memory that is being load. > Fixing the split than just making the output constraint as an early clobber > for this alternative. The split would first need to figure out which register > is overlapping with the address and then only emit that move last.
I was curious how strained that detection would be in practice, and in the end it didn't seem too bad. I pushed the following variant after testing on aarch64-linux-gnu. Thanks, Richard The LS64 movv8di pattern didn't handle loads that overlapped with the address register (unless the overlap happened to be in the last subload). gcc/ PR target/113550 * config/aarch64/aarch64-simd.md: In the movv8di splitter, check whether each split instruction is a load that clobbers the source address. Emit that instruction last if so. gcc/testsuite/ PR target/113550 * gcc.target/aarch64/pr113550.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 18 ++++++-- gcc/testsuite/gcc.target/aarch64/pr113550.c | 48 +++++++++++++++++++++ 2 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113550.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 48f0741e7d0..f036f6ce997 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -8221,11 +8221,21 @@ (define_split || (memory_operand (operands[0], V8DImode) && register_operand (operands[1], V8DImode))) { + std::pair<rtx, rtx> last_pair = {}; for (int offset = 0; offset < 64; offset += 16) - emit_move_insn (simplify_gen_subreg (TImode, operands[0], - V8DImode, offset), - simplify_gen_subreg (TImode, operands[1], - V8DImode, offset)); + { + std::pair<rtx, rtx> pair = { + simplify_gen_subreg (TImode, operands[0], V8DImode, offset), + simplify_gen_subreg (TImode, operands[1], V8DImode, offset) + }; + if (register_operand (pair.first, TImode) + && reg_overlap_mentioned_p (pair.first, pair.second)) + last_pair = pair; + else + emit_move_insn (pair.first, pair.second); + } + if (last_pair.first) + emit_move_insn (last_pair.first, last_pair.second); DONE; } else diff --git a/gcc/testsuite/gcc.target/aarch64/pr113550.c b/gcc/testsuite/gcc.target/aarch64/pr113550.c new file mode 100644 index 00000000000..0ff3c7b5c00 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113550.c @@ -0,0 +1,48 @@ +/* { dg-options "-O" } */ +/* { dg-do run } */ + +#pragma GCC push_options +#pragma GCC target "+ls64" +#pragma GCC aarch64 "arm_acle.h" +#pragma GCC pop_options + +#define DEF_FUNCTION(NAME, ARGS) \ + __attribute__((noipa)) \ + __arm_data512_t \ + NAME ARGS \ + { \ + return *ptr; \ + } + +DEF_FUNCTION (f0, (__arm_data512_t *ptr)) +DEF_FUNCTION (f1, (int x0, __arm_data512_t *ptr)) +DEF_FUNCTION (f2, (int x0, int x1, __arm_data512_t *ptr)) +DEF_FUNCTION (f3, (int x0, int x1, int x2, __arm_data512_t *ptr)) +DEF_FUNCTION (f4, (int x0, int x1, int x2, int x3, __arm_data512_t *ptr)) +DEF_FUNCTION (f5, (int x0, int x1, int x2, int x3, int x4, + __arm_data512_t *ptr)) +DEF_FUNCTION (f6, (int x0, int x1, int x2, int x3, int x4, int x5, + __arm_data512_t *ptr)) +DEF_FUNCTION (f7, (int x0, int x1, int x2, int x3, int x4, int x5, int x6, + __arm_data512_t *ptr)) + +int +main (void) +{ + __arm_data512_t x = { 0, 10, 20, 30, 40, 50, 60, 70 }; + __arm_data512_t res[8] = + { + f0 (&x), + f1 (0, &x), + f2 (0, 1, &x), + f3 (0, 1, 2, &x), + f4 (0, 1, 2, 3, &x), + f5 (0, 1, 2, 3, 4, &x), + f6 (0, 1, 2, 3, 4, 5, &x), + f7 (0, 1, 2, 3, 4, 5, 6, &x) + }; + for (int i = 0; i < 8; ++i) + if (__builtin_memcmp (&x, &res[i], sizeof (x)) != 0) + __builtin_abort (); + return 0; +} -- 2.25.1