https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81356

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |target

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is already done during builtin expansion; just aarch64 backend has the
wrong choices.

Take:
void f(char *a)
{
  __builtin_strcpy (a, "Hi!");
}
On x86_64 (even with 4.4) produces:
        movl    $2189640, (%rdi)
        ret

While on aarch64 produces:
f:
        adrp    x1, .LC0
        add     x1, x1, :lo12:.LC0
        ldr     w1, [x1]
        str     w1, [x0]
        ret

Why not just (for little-endian):
       mov     w1, #0x21 lsl #16
       movz    w1, #0x6948
       str     w1, [x0]
       ret

STORE_BY_PIECES (MOVE_BY_PIECES and MOVE_RATIO are related) controls this.
Except it is disabled on aarch64:
/* MOVE_RATIO dictates when we will use the move_by_pieces infrastructure.
   move_by_pieces will continually copy the largest safe chunks.  So a
   7-byte copy is a 4-byte + 2-byte + byte copy.  This proves inefficient
   for both size and speed of copy, so we will instead use the "movmem"
   standard name to implement the copy.  This logic does not apply when
   targeting -mstrict-align, so keep a sensible default in that case.  */

Reply via email to