https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90202
            Bug ID: 90202
           Summary: AVX-512 instructions not used
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the following test program:


struct v {
    int val[16];
};

v test(v a, v b) {
    v res;

    for (int i = 0; i < 16; i++)
        res.val[i] = a.val[i] + b.val[i];

    return res;
}


When compiled with `g++ -O3 -march=skylake-avx512` the following assembly is
produced:
test(v, v):
  push rbp
  mov rax, rdi
  mov rbp, rsp
  vmovdqu32 ymm1, YMMWORD PTR [rbp+16]
  vmovdqu32 ymm2, YMMWORD PTR [rbp+48]
  vpaddd ymm0, ymm1, YMMWORD PTR [rbp+80]
  vmovdqu32 YMMWORD PTR [rdi], ymm0
  vpaddd ymm0, ymm2, YMMWORD PTR [rbp+112]
  vmovdqu32 YMMWORD PTR [rdi+32], ymm0
  vzeroupper
  pop rbp
  ret

it seems suboptimal, as the 512 registers are available and a better assembly
is possible:
test(v, v):
  vmovdqu32 zmm0, zmmword ptr [rsp + 72]
  vpaddd zmm0, zmm0, zmmword ptr [rsp + 8]
  vmovdqu32 zmmword ptr [rdi], zmm0
  mov rax, rdi
  vzeroupper
  ret

Reply via email to