https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66152

            Bug ID: 66152
           Summary: suboptimal load bytes to stack
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: SztfG at yandex dot ru
  Target Milestone: ---

for code

void foo(char *);

void bar(void)
{
  char a[] = {0,1,2,3,4,5,6,7};
  foo(a);
}

gcc generates many movb instructions:
        subq    $24, %rsp
        movq    %rsp, %rdi
        movb    $0, (%rsp)
        movb    $1, 1(%rsp)
        movb    $2, 2(%rsp)
        movb    $3, 3(%rsp)
        movb    $4, 4(%rsp)
        movb    $5, 5(%rsp)
        movb    $6, 6(%rsp)
        movb    $7, 7(%rsp)

clang produces:
        pushq   %rax
        movabsq $506097522914230528, %rax # imm = 0x706050403020100
        movq    %rax, (%rsp)
        leaq    (%rsp), %rdi

for 16-byte array, gcc 5.1.0 builds the array separately and copies it using
movqda and movaps instruction i.e. :
  char a[] = {0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7};
produces:
        subq    $24, %rsp
        movdqa  .LC0(%rip), %xmm0
        movq    %rsp, %rdi
        movaps  %xmm0, (%rsp)
but if I make a 17-byte array, it again create movb instruction for each byte

Reply via email to