https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66152
Bug ID: 66152 Summary: suboptimal load bytes to stack Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: SztfG at yandex dot ru Target Milestone: --- for code void foo(char *); void bar(void) { char a[] = {0,1,2,3,4,5,6,7}; foo(a); } gcc generates many movb instructions: subq $24, %rsp movq %rsp, %rdi movb $0, (%rsp) movb $1, 1(%rsp) movb $2, 2(%rsp) movb $3, 3(%rsp) movb $4, 4(%rsp) movb $5, 5(%rsp) movb $6, 6(%rsp) movb $7, 7(%rsp) clang produces: pushq %rax movabsq $506097522914230528, %rax # imm = 0x706050403020100 movq %rax, (%rsp) leaq (%rsp), %rdi for 16-byte array, gcc 5.1.0 builds the array separately and copies it using movqda and movaps instruction i.e. : char a[] = {0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7}; produces: subq $24, %rsp movdqa .LC0(%rip), %xmm0 movq %rsp, %rdi movaps %xmm0, (%rsp) but if I make a 17-byte array, it again create movb instruction for each byte