mem-shift-canonical.c started failing after the fix for PR97684. We used to generate things like:
add x7, x1, x5, lsl 2 // 54 [c=4 l=4] *add_lsl_di ld1 {v0.s}[1], [x7] // 18 [c=4 l=4] aarch64_simd_vec_setv4si/2 where the add_lsl_di was generated by LRA. After PR97684 we generate: ldr s1, [x1, x5, lsl 2] // 17 [c=4 l=4] *zero_extendsidi2_aarch64/3 ins v0.s[1], v1.s[0] // 18 [c=4 l=4] aarch64_simd_vec_setv4si/0 Which one is better is an interesting question. However, it was really only a fluke that we generated the original code. The pseudo that becomes s1 in the new code above has a REG_EQUIV note: (insn 17 16 18 3 (set (reg:SI 111 [ MEM[(int *)b_25(D) + ivtmp.9_30 * 4] ]) (mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ]) (const_int 4 [0x4])) (reg/v/f:DI 105 [ b ])) [2 MEM[(int *)b_25(D) + ivtmp.9_30 * 4]+0 S4 A32])) "gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c":21:18 52 {*movsi_aarch64} (expr_list:REG_EQUIV (mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ]) (const_int 4 [0x4])) (reg/v/f:DI 105 [ b ])) [2 MEM[(int *)b_25(D) + ivtmp.9_30 * 4]+0 S4 A32]) (nil))) (insn 18 17 19 3 (set (reg:V4SI 109 [ MEM[(int *)a_23(D) + ivtmp.9_30 * 4] ]) (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 111 [ MEM[(int *)b_25(D) + ivtmp.9_30 * 4] ])) (subreg:V4SI (reg:SI 110 [ MEM[(int *)a_23(D) + ivtmp.9_30 * 4] ]) 0) (const_int 2 [0x2]))) "gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c":21:18 1683 {aarch64_simd_vec_setv4si} (expr_list:REG_DEAD (reg:SI 111 [ MEM[(int *)b_25(D) + ivtmp.9_30 * 4] ]) (expr_list:REG_DEAD (reg:SI 110 [ MEM[(int *)a_23(D) + ivtmp.9_30 * 4] ]) (nil)))) Before the PR, IRA didn't allocate a register to r111 and so LRA rematerialised the REG_EQUIV note inside insn 18, leading to the reload. Now IRA allocates a register instead. So I think this is working as expected, in the sense that IRA is now doing what the backend asked it to do. If the backend prefers the first version (and it might not), it needs to do more than it's currently doing to promote the use of lane loads. E.g. it should probably have a combine define_split that splits the combination of insn 17 and insn 18 into an ADD + an LD1. I think for now the best thing is to use a different approach to triggering the original bug. The asm in the new test ICEs with the r11-2903 LRA patch reverted and passes with it applied. Tested on aarch64-linux-gnu, pushed to trunk. Richard gcc/testsuite/ * gcc.target/aarch64/mem-shift-canonical.c: Use an asm instead of relying on vectorisation. --- .../gcc.target/aarch64/mem-shift-canonical.c | 26 ++++--------------- 1 file changed, 5 insertions(+), 21 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c index 47479ffff29..e3c83e8b32d 100644 --- a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c +++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c @@ -1,28 +1,12 @@ -/* This test is a copy of gcc.dg/torture/pr34330.c: here we are looking for - specific patterns being matched in the AArch64 backend. */ - /* { dg-do compile } */ -/* { dg-options "-Os -ftree-vectorize -dp" } */ +/* { dg-options "-O2 -dp" } */ /* { dg-require-effective-target lp64 } */ - -struct T -{ - int t; - struct { short s1, s2, s3, s4; } *s; -}; - void -foo (int *a, int *b, int *c, int *d, struct T *e) +f (int x0, int *x1, long x2) { - int i; - for (i = 0; i < e->t; i++) - { - e->s[i].s1 = a[i]; - e->s[i].s2 = b[i]; - e->s[i].s3 = c[i]; - e->s[i].s4 = d[i]; - } + asm volatile ("// foo %0 %1" + :: "r,w" (x0), "Q,m" (x1[x2])); } -/* { dg-final { scan-assembler-times "add_lsl_di" 3 } } */ +/* { dg-final { scan-assembler-times "add_lsl_di" 1 } } */