Hi,
Case gcc.target/aarch64/ldp_stp_3.c test fails on aarch64-none-elf.
Instead of merging the loads into ldp it generates:
foo:
adrp x1, .LANCHOR0
add x1, x1, :lo12:.LANCHOR0
ldr w0, [x1, 4]
ldr w3, [x1, 20]
ldr w2, [x1, 32]
ldr w1, [x1, 16]
add x2, x3, x2
add x0, x0, x1
add x0, x2, x0
ret
Once register allocation decides to load [x1, 16] into x1(w1) like below:
14: x0:DI = zero_extend([x1:DI+0x4])
7: x3:DI = zero_extend([x1:DI+0x14])
10: x2:DI = zero_extend([x1:DI+0x20])
17: x1:DI = zero_extend([x1:DI+0x10])
Instructions 14/7/10 are anti-dependent on insn 17, bug sched_fusion orders
ready list (14/7/10) in ascending order of address. As a result insn 10
intervenes between 7 and 17.
This patch fixes this by making cases less vulnerable. One possible fix is
to move sched_fusion after regrename, it does help a lot. I didn't do that
because regrenamre is currently disabled.
Tested on aarch64-elf. Is it OK?
Thanks,
bin
gcc/testsuite/ChangeLog
2014-12-11 Bin Cheng <bin.ch...@arm.com>
* gcc.target/aarch64/ldp_stp_2.c: Make test less vulnerable.
* gcc.target/aarch64/ldp_stp_3.c: Ditto.
Index: gcc/testsuite/gcc.target/aarch64/ldp_stp_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/ldp_stp_2.c (revision 218558)
+++ gcc/testsuite/gcc.target/aarch64/ldp_stp_2.c (working copy)
@@ -7,10 +7,8 @@ long long
foo ()
{
long long ll = 0;
- ll += arr[0][1];
ll += arr[1][0];
ll += arr[1][1];
- ll += arr[2][0];
return ll;
}
Index: gcc/testsuite/gcc.target/aarch64/ldp_stp_3.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/ldp_stp_3.c (revision 218558)
+++ gcc/testsuite/gcc.target/aarch64/ldp_stp_3.c (working copy)
@@ -7,10 +7,8 @@ unsigned long long
foo ()
{
unsigned long long ll = 0;
- ll += arr[0][1];
ll += arr[1][0];
ll += arr[1][1];
- ll += arr[2][0];
return ll;
}