https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127

--- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> More so, gcc variant occupies 2 reservation station entries (2 fused uOps) vs
> 4 entries by de-transformed sequence.

I don't think this is true for the test at hand? With base+offset memory
operand the renaming stage already sees two separate uops for each fma, so
reservation etc. should also see two for each fma, 4 uops in total. And they
will not be fused.

It would be true if memory operands required just one register (and then
pressure on renaming stage would be the same for both variants).


> For me it's enough to know that it *is* slower.

Understood, but I hope GCC developers want to understand the nature of the
slowdown before attempting to fix it.

Reply via email to