https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105966
Bug ID: 105966
Summary: x86: operations on certain few-element vectors yield
very inefficient code
Product: gcc
Version: 12.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jbeulich at suse dot com
Target Milestone: ---
Respective operations on vectors with more than one element but less than
enough elements to fill minimal available register width are decomposed into
scalar FMA insns. While this may be on-par for small element counts, it
certainly generates absurd code for e.g. AVX512-FP16 with, say, 128- or 256-bit
vectors but AVX512VL not enabled. This would be far more efficient by
zero-extending the vectors to 512 bits (to avoid exceptions on the unused
elements), emitting the FMA insn on %zmm registers, and then using just the low
part of the result. (The same likely applies to e.g. plain addition,
subtraction, and multiplication.)
If necessary the example code from bug 105965 can be re-used to easily see the
odd behavior.