http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29294
Siarhei Siamashka <siarhei.siamashka at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |siarhei.siamashka at gmail | |dot com --- Comment #9 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2012-12-20 04:45:10 UTC --- (In reply to comment #3) > Actually this case should not be using post modify at all except how many bits > does ARM have to use for an offset? I thought 16bits which means you don't > need > that at all and GCC should generate it without an increment. Oh and this is a > RTL opt issue. Seems like gcc 4.7.2 and 4.8.0 20121219 (experimental) are already doing this, which hides the postincrement issue for the currently attached testcase. However postincrement is still a performance problem for ARM. The code I'm having troubles with is the following: /*******************************************/ typedef unsigned long long T; void fill(T *buf, int n, T v) { while ((n -= 16) >= 0) { *buf++ = v; *buf++ = v; } } /*******************************************/ $ arm-none-eabi-gcc-4.7.2 -O2 -mcpu=cortex-a8 -c test.c $ objdump -d test.o 00000000 <fill>: 0: e2511010 subs r1, r1, #16 4: 412fff1e bxmi lr 8: e2511010 subs r1, r1, #16 c: e1c020f0 strd r2, [r0] 10: e1c020f8 strd r2, [r0, #8] 14: e2800010 add r0, r0, #16 18: 5afffffa bpl 8 <fill+0x8> 1c: e12fff1e bx lr $ arm-none-eabi-gcc-4.8.0 -O2 -mcpu=cortex-a8 -c test.c $ objdump -d test.o 00000000 <fill>: 0: e351000f cmp r1, #15 4: d12fff1e bxle lr 8: e2411010 sub r1, r1, #16 c: e280c010 add ip, r0, #16 10: e3c1100f bic r1, r1, #15 14: e08c1001 add r1, ip, r1 18: e1c020f0 strd r2, [r0] 1c: e2800010 add r0, r0, #16 20: e14020f8 strd r2, [r0, #-8] 24: e1500001 cmp r0, r1 28: 1afffffa bne 18 <fill+0x18> 2c: e12fff1e bx lr