https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542
Bug ID: 117542 Summary: Missed loop vectorization for truncate from float to __bf16. Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* i?86-*-* For loop vectorization, GCC relies on optab vec_pack_trunk_m to check if backend supports that or not. But the optab is already used by truncate from float to _Float16 and can't be overloaded. The document only mention the dest has 2*N elements of size S/2, but doesn't specify the dest mode and there're 2 kinds of half-precision floating-point. ------ ‘vec_pack_trunc_m’ Narrow (demote) and merge the elements of two vectors. Operands 1 and 2 are vectors of the same mode having N integral or floating point elements of size S. Operand 0 is the resulting vector in which 2*N elements of size S/2 are concatenated after narrowing them down using truncation. ---------- void foo (__bf16* a, float* b) { for (int i = 0; i != 10000; i++) a[i] = b[i]; } couldn't vectorize loop not vectorized: no vectype for stmt: _4 = *_3;