https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106187
--- Comment #7 from Jan Wassenberg <jan.wassenberg at gmail dot com> --- The easiest way to reduce the amount of code in the binary is to comment out from mul_test.cc all the HWY_EXPORT_AND_TEST_P except the one with TestAllMulEven. The actual miscompilation is probably happening within ops/emu128-inl.h. You can further reduce instantiations by replacing ForFloatTypes(ForPartialVectors<TestMulAdd>()); with TestMulAdd()(float(), FixedTag<float, 4>()); That gets us down to a fairly minimal single TU (mul_test.cc).