https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100219
Bug ID: 100219
Summary: Arm/Cortex-M: Suboptimal code returning unaligned
struct with non-empty stack frame
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: matthijs at stdin dot nl
Target Milestone: ---
Consider the program below, which deals with functions returning a struct of
two members, either using a literal value or by forwarding the return value
from another function. When the struct has no alignment, this results in
suboptimal code that breaks the struct (stored in a single registrer) apart
into its members and reassembles them into the struct into a single register
again, where it could just have done absolutely nothing. Giving the struct some
alignment somehow prevents this problem from occuring.
Consider this program:
$ cat Foo.c
struct Result { char a, b; }
#if defined(ALIGN)
__attribute((aligned(ALIGN)))__
#endif
;
struct Result other(const int*);
struct Result func1() {
int x;
return other(&x);
}
struct Result func2() {
struct Result y = {0x12, 0x34};
return y;
}
struct Result func3() {
return other(0);
}
Which produces the following code:
$ arm-linux-gnueabi-gcc-10 --version
arm-linux-gnueabi-gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
$ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3
~/Foo.c && objdump -d Foo.o
00000000 <func1>:
0: b500 push {lr}
2: b083 sub sp, #12
4: a801 add r0, sp, #4
6: f7ff fffe bl 0 <other>
a: 4603 mov r3, r0
c: b2da uxtb r2, r3
e: 2000 movs r0, #0
10: f362 0007 bfi r0, r2, #0, #8
14: f3c3 2307 ubfx r3, r3, #8, #8
18: f363 200f bfi r0, r3, #8, #8
1c: b003 add sp, #12
1e: f85d fb04 ldr.w pc, [sp], #4
22: bf00 nop
00000024 <func2>:
24: f243 4312 movw r3, #13330 ; 0x3412
28: f003 0212 and.w r2, r3, #18
2c: 2000 movs r0, #0
2e: f362 0007 bfi r0, r2, #0, #8
32: 0a1b lsrs r3, r3, #8
34: b082 sub sp, #8
36: f363 200f bfi r0, r3, #8, #8
3a: b002 add sp, #8
3c: 4770 bx lr
3e: bf00 nop
00000040 <func3>:
40: b082 sub sp, #8
42: 2000 movs r0, #0
44: b002 add sp, #8
46: f7ff bffe b.w 0 <other>
4a: bf00 nop
Especially note func2, which correctly builds the struct using a single word
literal, and then continues to break it apart and rebuild it.
Note that I added -fno-stack-protector to make the generated code more consise,
but the problem occurs even without this option.
Somehow, the alignment influences this, since adding some alignment makes the
problem disappear:
$ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3
~/Foo.c -DALIGN=2 && objdump -d Foo.o
Foo.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <func1>:
0: b500 push {lr}
2: b083 sub sp, #12
4: a801 add r0, sp, #4
6: f7ff fffe bl 0 <other>
a: b003 add sp, #12
c: f85d fb04 ldr.w pc, [sp], #4
00000010 <func2>:
10: f243 4012 movw r0, #13330 ; 0x3412
14: 4770 bx lr
16: bf00 nop
00000018 <func3>:
18: 2000 movs r0, #0
1a: f7ff bffe b.w 0 <other>
1e: bf00 nop
Other things I've observed:
- When using ALIGN=2 or ALIGN=4, the problem disappears as shown above.
ALIGN=1 is equivalent to no alignment. Using ALIGN=8 also makes the problem
disappear, but it seams this cause the return value to be passed in memory,
rather than in r0 directly.
- Using -mcpu=arm8, or arm7tdmi, or some other arm cpus I tried, the problem
disappears. With all cortex variants I tried the problem stays, though
sometimes it seems slightly less severe.
- I could not reproduce this on x86_64.
- Using a struct with just 1 char, the problem disappears.
- Using a struct with 4 chars, the problem stays (and becomes more pronounced
because there's more work to rebuild the struct).
- Using a struct with 2 shorts, the problem disappears for func2, but stays
for func1.
- Writing something equivalent in C++, the problem also appears (I originally
saw this problem in C++ and then tried reproducing in C).
- When running with -Os, the problem disappears for func2 but stays for func1.
Also note that in almost all cases (except with ALIGN=4 and no stack
variables), the stack frame size seems to be 8 bytes bigger then I'd expect,
and in some cases there are some pointless add/sub instructions messing with
the stack for no (apparent to me) reason.