https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- This might be true for A57 but for our chip (ThunderX), using fused multiply-add is better. The other question here are there denormals happening? That might cause some performance differences between using fmadd and fmul/fadd. On most normal processors using fused multiply-add is an improvement also. Can you attach the preprocessed source and what options you are using?