https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94212
--- Comment #8 from Dmitrij Pochepko <dpochepk at gmail dot com> --- (In reply to Richard Biener from comment #7) > (In reply to Dmitrij Pochepko from comment #6) > > Just checked: non-vectorized assembly for aarch64 (O2) is using fmadd and > > fmsub intensively. > > Try with -ffp-contract=off then. Note due to effective unrolling of > the loop with vectorization we might end up forming "different" fmadd > groups. So you might also want to check whether the vectorized loop still > sees fmadd use. -O2 -ffp-contract=off -O3 -ffp-contract=off produce same calculation result as -O2 regarging assembly: vectorized version is using fmla and fmls, which is vectorized version of multiply-add/sub. It's hard to say the difference in how multiplications and additions/subtractions are grouped without detailed step-by-step comparison though.