https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121688
Bug ID: 121688 Summary: F16C/AVX512F cvtph2ps and cvtps2ph not used on __builtin_convertvector Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mkretz at gcc dot gnu.org Target Milestone: --- Target: x86-64-*-*, i686-*-* Test case (https://compiler-explorer.com/z/Yz7hvxGd1): using v4hf [[gnu::vector_size(8)]] = _Float16; using v8hf [[gnu::vector_size(16)]] = _Float16; using v16hf [[gnu::vector_size(32)]] = _Float16; using v4sf [[gnu::vector_size(16)]] = float; using v8sf [[gnu::vector_size(32)]] = float; using v16sf [[gnu::vector_size(64)]] = float; v4sf cvtph2ps(v4hf x) { return __builtin_convertvector(x, v4sf); } v4hf cvtps2ph(v4sf x) { return __builtin_convertvector(x, v4hf); } v8sf cvtph2ps(v8hf x) { return __builtin_convertvector(x, v8sf); } v8hf cvtps2ph(v8sf x) { return __builtin_convertvector(x, v8hf); } v16sf cvtph2ps(v16hf x) { return __builtin_convertvector(x, v16sf); } v16hf cvtps2ph(v16sf x) { return __builtin_convertvector(x, v16hf); } Compile with -O2 -march=x86-64-v4 (or -v3). All of these functions should get translated to a single cvtph2ps/cvtps2ph instruction + ret. Similar to when compiling with '-mavx512fp16', except that the 'x' from the instruction needs to be removed 😉. (This seems to be a prerequisite for PR121587.)