https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81356
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |target --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- This is already done during builtin expansion; just aarch64 backend has the wrong choices. Take: void f(char *a) { __builtin_strcpy (a, "Hi!"); } On x86_64 (even with 4.4) produces: movl $2189640, (%rdi) ret While on aarch64 produces: f: adrp x1, .LC0 add x1, x1, :lo12:.LC0 ldr w1, [x1] str w1, [x0] ret Why not just (for little-endian): mov w1, #0x21 lsl #16 movz w1, #0x6948 str w1, [x0] ret STORE_BY_PIECES (MOVE_BY_PIECES and MOVE_RATIO are related) controls this. Except it is disabled on aarch64: /* MOVE_RATIO dictates when we will use the move_by_pieces infrastructure. move_by_pieces will continually copy the largest safe chunks. So a 7-byte copy is a 4-byte + 2-byte + byte copy. This proves inefficient for both size and speed of copy, so we will instead use the "movmem" standard name to implement the copy. This logic does not apply when targeting -mstrict-align, so keep a sensible default in that case. */