https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102604
--- Comment #3 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- (In reply to Christophe Lyon from comment #2) > Right, using -Os makes these tests pass (but vsqrt.f32 and vsqrt.f64 would > fail), Ah, because sqrt() is expected to set errno? Would changing the code to __builtin_sqrt() be enough? This test shouldn't really be about other optimizations. > but I'm still wondering about the purpose of vmla? > > Rather than benchmarking, the costs may come from the Architecture > documentation? But then, if vmla is so costly, when is it supposed to be > used? Only when optimizing for size? It's part of the ISA, so has to be implemented. As to why it's not preferred, I can only speculate. > > Note that the DP/f64 version does not have this problem. This DP versions simply don't exist on most m-profile CPUs (they certainly don't exist on the cortex-m4 and cortex-m7) and this is another case of a test forcing instructions into a compilation which make no sense at all. Consequently, there are no costs associated with these operations because they simply are irrelevant when the instruction doesn't exist (the opcode will just trap).