https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100
Bug ID: 103100 Summary: unaligned access generated when zero-initializing large locals with SIMD-instructions and -O2 -mstrict-align Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: felix at breitweiser dot de Target Milestone: --- Created attachment 51738 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51738&action=edit source code that generates the faulty assembly when zero-intializing large local variables, gcc 11.2 (with -O2 and -O3) uses SIMD registers to store a pair of 16-byte registers at once into memory. When doing so, gcc can generate code that does not access memory on a 16-byte aligned boundary, even though the aarch64 architecture requires memory accesses to be 16-byte aligned when using the full 16-byte SIMD registers. This happens with -mstrict-align enabled. For example: static void (*use)(unsigned char*); // to suppress optimizations extern "C" void _start() { unsigned char t2[216]={}; use(t2); } when compiled with "gcc -save-temps -O2 -mstrict-align" generates the following assembly: _start: stp x29, x30, [sp, #-240]!// assuming sp is aligned to 16-bytes here mov x1, #0x0 movi v0.4s, #0x0 add x2, sp, #0x28 // the value in x2 is 8-byte aligned, but not 16-byte aligned mov x29, sp stp xzr, xzr, [sp, #24] add x0, sp, #0x18 stp q0, q0, [x2] // x2 is not 16-byte aligned, so the store is not aligned add x2, sp, #0x48 str xzr, [sp, #232] stp q0, q0, [x2] add x2, sp, #0x68 stp q0, q0, [x2] add x2, sp, #0x88 stp q0, q0, [x2] add x2, sp, #0xa8 stp q0, q0, [x2] add x2, sp, #0xc8 stp q0, q0, [x2] blr x1 ldp x29, x30, [sp], #240 ret I have seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727 and even though that is marked as fixed, this issue persists in gcc 11.2