https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581
--- Comment #3 from ktkachov at gcc dot gnu.org --- I can't reproduce the difference on my machine. Judging by your -mcpu option is this on a Cortex-A5? As far as codegen goes the major difference I can see is that the vfpv4 version generates vfma instructions instead of vmla ones. Also there are cases where the vfpv3 version will generate multiple vmls instructions whereas the vfpv4 one will generate an explicit vneg followed by vfma instructions