https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70784

            Bug ID: 70784
           Summary: Merge multiple short stores of immediates into wider
                    stores
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
                CC: ramana at gcc dot gnu.org
  Target Milestone: ---

Consider the code:
struct bar {
  int a;
  char b;
  char c;
  char d;
  char e;
  char f;
  char g;
}; // packed 64-bit structure

void
foozero (struct bar *p)
{
  p->b = 0;
  p->a = 0;
  p->c = 0;
  p->d = 0;
  p->e = 0;
}

On aarch64 we currently generate:
foozero:
        str     wzr, [x0]
        strb    wzr, [x0, 4]
        strb    wzr, [x0, 5]
        strb    wzr, [x0, 6]
        strb    wzr, [x0, 7]
        ret

But could generate a single 64-bit store:
foozero:
        str     xzr, [x0]
        ret

Also, for:
void
foo (struct bar *p)
{
  p->b = 0;
  p->a = 0;
  p->c = 0;
  p->d = 1;
  p->e = 0;
}

we could generate:
foo:
        mov     x1, #0x1000000000000
        str     x1, [x0]

Other targets could benefit from this too.
x86_64 currently generates for 'foo':
foo:
.LFB0:
        .cfi_startproc
        movl    $0, (%rdi)
        movb    $0, 4(%rdi)
        movb    $0, 5(%rdi)
        movb    $1, 6(%rdi)
        movb    $0, 7(%rdi)
        ret

but could genereate:
foo:
.LFB0:
        .cfi_startproc
        movabsq $281474976710656, %rax
        movq    %rax, (%rdi)
        ret

Reply via email to