https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70784
Bug ID: 70784 Summary: Merge multiple short stores of immediates into wider stores Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org CC: ramana at gcc dot gnu.org Target Milestone: --- Consider the code: struct bar { int a; char b; char c; char d; char e; char f; char g; }; // packed 64-bit structure void foozero (struct bar *p) { p->b = 0; p->a = 0; p->c = 0; p->d = 0; p->e = 0; } On aarch64 we currently generate: foozero: str wzr, [x0] strb wzr, [x0, 4] strb wzr, [x0, 5] strb wzr, [x0, 6] strb wzr, [x0, 7] ret But could generate a single 64-bit store: foozero: str xzr, [x0] ret Also, for: void foo (struct bar *p) { p->b = 0; p->a = 0; p->c = 0; p->d = 1; p->e = 0; } we could generate: foo: mov x1, #0x1000000000000 str x1, [x0] Other targets could benefit from this too. x86_64 currently generates for 'foo': foo: .LFB0: .cfi_startproc movl $0, (%rdi) movb $0, 4(%rdi) movb $0, 5(%rdi) movb $1, 6(%rdi) movb $0, 7(%rdi) ret but could genereate: foo: .LFB0: .cfi_startproc movabsq $281474976710656, %rax movq %rax, (%rdi) ret