http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53241
Bug #: 53241
Summary: Bad pre increment insn for ARM vfp store instructions
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: [email protected]
ReportedBy: [email protected]
Target: arm-unknown-linux-gnueabi
Compile the following code with options -march=armv7-a -mfloat-abi=softfp
-mfpu=neon -mthumb -Os
void t0o(double* p0, double* p1, double* p2)
{
int i;
for (i=0; i<10; i++)
p0[i+2] = p1[i] + p2[i];
}
GCC generates:
t0o:
adds r0, r0, #8
movs r3, #0
push {r4, r5, lr}
.L3:
adds r5, r1, r3
adds r4, r2, r3
fldd d17, [r5, #0]
fldd d16, [r4, #0]
faddd d16, d17, d16
adds r3, r3, #8
cmp r3, #80
fmrrd r4, r5, d16 // A
strd r4, [r0, #8]! // B
bne .L3
pop {r4, r5, pc}
If we change instructions AB to
fstd d16, [r0, #8]
adds r0, r0, 8
It is better in terms of both performance and code size since adds is shorter
than strd, and the move between vfp register and core register may be expensive
in some implementation, the current result also needs two extra core registers.
-O2 has the same problem.
Before pass auto_inc_dec, the code is in good shape
64 (insn 42 41 43 3 (set (reg:SI 181 [ ivtmp.29 ]) 65 (plus:SI (reg:SI
181 [ ivtmp.29 ])
66 (const_int 8 [0x8]))) 4 {*arm_addsi3}
67 (nil))
...
96 (insn 48 47 49 3 (set (mem:DF (reg:SI 181 [ ivtmp.29 ]) [2 MEM[base:
D.5019_41, offset: 0B]+0 S8 A64])
97 (reg:DF 191 [ D.4979 ])) src/t0o.c:5 653 {*thumb2_movdf_vfp}
98 (expr_list:REG_DEAD (reg:DF 191 [ D.4979 ])
99 (nil)))
Pass auto_inc_dec wrongly combined these two insns:
150 (insn 48 47 49 3 (set (mem:DF (pre_inc:SI (reg:SI 181 [ ivtmp.29 ])) [2
MEM[base: D.5019_41, offset: 0B]+0 S8 A64])
151 (reg:DF 191 [ D.4979 ])) src/t0o.c:5 653 {*thumb2_movdf_vfp}
152 (expr_list:REG_INC (reg:SI 181 [ ivtmp.29 ])
153 (expr_list:REG_DEAD (reg:DF 191 [ D.4979 ])
154 (nil))))
Although arm supports pre increment for normal memory access, but it doesn't
support pre increment for vfp store instructions. So in later reload pass, the
vfp register is moved to core registers, then store them with pre increment.
So we should prevent such pre increment cases.