https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103500
Bug ID: 103500 Summary: Stack slots for overaligned stack temporaries are not properly aligned Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- gcc/testsuite/gcc.target/aarch64/aapcs64/rec_align-8.c has the following struct declaration: /* The alignment also gives this size 32, so will be passed by reference. */ typedef struct __attribute__ ((__aligned__ (32))) { long x; long y; } overaligned; and as the comment suggests, AAPCS64 requires that the struct is passed by reference. The test proceeds to check that the copy of the passed struct is 32-byte aligned (as required by the PCS), with: long addr = ((long) &x1) & 31; if (addr != 0) { __builtin_printf ("Alignment was %d\n", addr); abort (); } but because GCC "knows" the struct is aligned, the expression assigned to addr is folded to zero by the frontend (even at -O0). With -fdump-tree-original I see: long int addr = 0; Moreover, it turns out that GCC is not actually aligning the struct copy properly in the call here. Consider the simplified testcase: typedef struct __attribute__((aligned(32))) { long x,y; } S; S x; void f(S); void g(void) { f(x); } for which we currently generate (at -O2): g: adrp x1, .LANCHOR0 add x1, x1, :lo12:.LANCHOR0 stp x29, x30, [sp, -48]! mov x29, sp ldp q0, q1, [x1] add x0, sp, 16 stp q0, q1, [sp, 16] bl f ldp x29, x30, [sp], 48 ret i.e. the struct is stored at sp + 16, but the stack pointer is only guaranteed to be 16-byte aligned, so the stack slot here is only 16-byte aligned. In fact, tweaking the testcase (rec_align-8.c) to __builtin_snprintf the pointer into a buffer and __builtin_sscanf it out again before performing the alignment check (to prevent the folding by the frontend), we can see the execution test failing (sporadically, if ASLR is enabled) on aarch64 linux. Note that for the related: void f2(S*); void g2(void) { S x; f2(&x); } we generate: g2: stp x29, x30, [sp, -64]! add x0, sp, 47 mov x29, sp and x0, x0, -32 bl f2 ldp x29, x30, [sp], 64 ret i.e. we actually align the stack slot properly. We should do the same for the PCS-mandated passed-by-reference struct. I have a patch to fix the issue in the mid-end which I will post to the list shortly to get some feedback.