https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95791
Bug ID: 95791 Summary: Unnecessary vzeroupper when only using zmm16 through zmm31 Product: gcc Version: 10.1.0 Status: UNCONFIRMED Keywords: missed-optimization, ssemmx Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: josephcsible at gmail dot com Target Milestone: --- Target: x86_64-linux-gnu Consider this C code: void f(void) { __asm__ __volatile__("" ::: "zmm16"); } When compiled with "-O2 -mavx512f", it generates a vzeroupper instruction, but this is unnecessary, since zmm16 through zmm31 don't cause the performance penalty, and in fact they aren't even affected by vzeroupper.