http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58623
Bug ID: 58623 Summary: lack of ldp/stp optimization Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: b.grayson at samsung dot com Target: AArch64 Build: 20130602 The following C code: long long a, b; int c, d; int foo() { return a+b; } int bar() { return c+d; } generates this assembly code under -O3 -fsection-anchors -fno-common: foo: adrp x1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 ldr x2, [x1] ldr x0, [x1,8] add w0, w2, w0 ret bar: adrp x1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 ldr w2, [x1,16] ldr w0, [x1,20] add w0, w2, w0 ret Note that the ldr x2 and ldr x0 could have been merged into an ldp, in foo(). Similarly, the ldr w2 and ldr w0 (32-bit loads) could have been merged into an ldp in bar(). The same optimization applies to stores as well. I am not sure if this would be handled by the proposed (but apparently not accepted) patch from March 2013: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01051.html