https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873
Bug ID: 88873 Summary: missing vectorization for decomposed operations on a vector type Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vincent-gcc at vinc17 dot net Target Milestone: --- To compute a vectorized fma, one needs to apply it on the decomposed vector components. Here's an example with a structure type and with a vector type. The structure type solution is just given for comparison. This bug is about the vector type solution. #include <math.h> typedef struct { double x, y; } s_t; typedef double v2df __attribute__ ((vector_size (2 * sizeof(double)))); s_t foo (s_t a, s_t b, s_t c) { return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) }; } v2df bar (v2df a, v2df b, v2df c) { v2df r; r[0] = fma (a[0], b[0], c[0]); r[1] = fma (a[1], b[1], c[1]); return r; } With -O3, I get on x86_64: * For function foo (struct type): [...] vfmadd132pd -40(%rsp), %xmm7, %xmm6 [...] This is vectorized as expected, though this solution is affected by bug 65847. * For function bar (vector type): bar: .LFB1: .cfi_startproc vmovapd %xmm0, %xmm3 vunpckhpd %xmm0, %xmm0, %xmm0 vfmadd132sd %xmm1, %xmm2, %xmm3 vunpckhpd %xmm1, %xmm1, %xmm1 vunpckhpd %xmm2, %xmm2, %xmm2 vfmadd132sd %xmm1, %xmm2, %xmm0 vunpcklpd %xmm0, %xmm3, %xmm0 ret .cfi_endproc This is not vectorized: one has 2 vfmadd132sd instead of a single vfmadd132pd. Note: The problem is the same with addition, but in the addition case, one can simply do a + b. This is not possible with fma. This bug seems similar to bug 77399.