https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Here is what the first testcase looks like at -O1 -mstrict-align on aarch64-linux-gnu for GCC 10.3.0: test: .LFB1: .cfi_startproc adrp x0, output_len add x1, x0, :lo12:output_len ldrb w2, [x0, #:lo12:output_len] ldrb w0, [x1, 1] orr x2, x2, x0, lsl 8 ldrb w0, [x1, 2] orr x0, x2, x0, lsl 16 ldrb w1, [x1, 3] orr w0, w0, w1, lsl 24 ret .cfi_endproc .LFE1: .size test, .-test .ident "GCC: (GNU) 10.3.0" .section .note.GNU-stack,"",@progbits This is doing the correct thing in splitting up the load into bytes loads.