https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100

            Bug ID: 103100
           Summary: unaligned access generated when zero-initializing
                    large locals with SIMD-instructions and -O2
                    -mstrict-align
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: felix at breitweiser dot de
  Target Milestone: ---

Created attachment 51738
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51738&action=edit
source code that generates the faulty assembly

when zero-intializing large local variables, gcc 11.2 (with -O2 and -O3) uses
SIMD registers to store a pair of 16-byte registers at once into memory. When
doing so, gcc can generate code that does not access memory on a 16-byte
aligned boundary, even though the aarch64 architecture requires memory accesses
to be 16-byte aligned when using the full 16-byte SIMD registers. This happens
with -mstrict-align enabled.

For example:

static void (*use)(unsigned char*); // to suppress optimizations

extern "C" void _start() {
    unsigned char t2[216]={};
    use(t2);
}

when compiled with "gcc -save-temps -O2 -mstrict-align" generates the following
assembly:
_start:
        stp     x29, x30, [sp, #-240]!// assuming sp is aligned to 16-bytes
here 
        mov     x1, #0x0
        movi    v0.4s, #0x0
        add     x2, sp, #0x28 // the value in x2 is 8-byte aligned, but not 
                                 16-byte aligned
        mov     x29, sp
        stp     xzr, xzr, [sp, #24]
        add     x0, sp, #0x18
        stp     q0, q0, [x2] // x2 is not 16-byte aligned, so the store is not
                                aligned
        add     x2, sp, #0x48
        str     xzr, [sp, #232]
        stp     q0, q0, [x2]
        add     x2, sp, #0x68
        stp     q0, q0, [x2]
        add     x2, sp, #0x88
        stp     q0, q0, [x2]
        add     x2, sp, #0xa8
        stp     q0, q0, [x2]
        add     x2, sp, #0xc8
        stp     q0, q0, [x2]
        blr     x1
        ldp     x29, x30, [sp], #240
        ret

I have seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727 and even though
that is marked as fixed, this issue persists in gcc 11.2

Reply via email to