double bar(double x, double y) { double tmp = 0.1234 * y; return ((x + tmp) * (x - tmp)); }
GCC should use multiply-add and multiply-sub when that is cheaper than one multiplication and two additions. With -mfma4 on x86_64 instead of vmulsd .LC0(%rip), %xmm1, %xmm1 vaddsd %xmm1, %xmm0, %xmm2 vsubsd %xmm1, %xmm0, %xmm0 vmulsd %xmm0, %xmm2, %xmm0 it should generate vmovsd .LC0(%rip), %xmm3 vfmaddsd %xmm0, %xmm3, %xmm1, %xmm2 vfnmaddsd %xmm0, %xmm3, %xmm1, %xmm0 vmulsd %xmm0, %xmm2, %xmm0 See also PR19988. FMA opportunities of this kind should probably be detected during RTL expansion, similar to widening multiplications. -- Summary: FMAs not exploited Product: gcc Version: 4.5.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at gcc dot gnu dot org GCC target triplet: powerpc64-*-*, x86_64-*-* OtherBugsDependingO 19988 nThis: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42802