https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90202
Bug ID: 90202 Summary: AVX-512 instructions not used Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the following test program: struct v { int val[16]; }; v test(v a, v b) { v res; for (int i = 0; i < 16; i++) res.val[i] = a.val[i] + b.val[i]; return res; } When compiled with `g++ -O3 -march=skylake-avx512` the following assembly is produced: test(v, v): push rbp mov rax, rdi mov rbp, rsp vmovdqu32 ymm1, YMMWORD PTR [rbp+16] vmovdqu32 ymm2, YMMWORD PTR [rbp+48] vpaddd ymm0, ymm1, YMMWORD PTR [rbp+80] vmovdqu32 YMMWORD PTR [rdi], ymm0 vpaddd ymm0, ymm2, YMMWORD PTR [rbp+112] vmovdqu32 YMMWORD PTR [rdi+32], ymm0 vzeroupper pop rbp ret it seems suboptimal, as the 512 registers are available and a better assembly is possible: test(v, v): vmovdqu32 zmm0, zmmword ptr [rsp + 72] vpaddd zmm0, zmm0, zmmword ptr [rsp + 8] vmovdqu32 zmmword ptr [rdi], zmm0 mov rax, rdi vzeroupper ret