https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70461
Alexander Fomin <afomin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #38134|0 |1 is obsolete| | --- Comment #5 from Alexander Fomin <afomin at gcc dot gnu.org> --- Created attachment 38184 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38184&action=edit Another reproducer Thanks, performance is back on Core CPUs. However, I've noticed that given a slightly different testcase compiled with -m32 -O2 we also generate extra insns for the loop (the degradation can be seen on some other CPUs, e.g. when specifying -march=slm). What I see in RTL ira dump is (with some identical lines removed): +---------------------------------------------------------------+ | Before r234527 | After r234527 | ---------------------------------------+------------------------- | Assigning 0 to a26r113 | Assigning 4 to a14r144 | | Assigning 0 to a27r181 | Assigning 4 to a42r113 | | Spilling a29r178 for a28r180 | Assigning 4 to a46r137 | | Assigning 0 to a28r180 | Assigning 4 to a50r128 | | Assigning 0 to a30r137 | Assigning 4 to a54r121 | | Assigning 0 to a31r177 | Assigning 4 to a26r113 | | Spilling a33r174 for a32r176 | Assigning 4 to a30r137 | | Assigning 0 to a32r176 | Assigning 4 to a34r128 | | Assigning 0 to a34r128 | Assigning 4 to a38r121 | | Assigning 0 to a35r173 | | | Spilling a37r170 for a36r172 | | | Assigning 0 to a36r172 | | | Assigning 0 to a38r121 | | | Assigning 0 to a39r169 | | | Spilling a41r166 for a40r168 | | | Assigning 0 to a40r168 | | | a41(r166,l1) -- (...) assign memory | | | a29(r178,l1) -- (...) assign memory | | | a33(r174,l1) -- (...) assign memory | | | a37(r170,l1) -- (...) assign memory | | +--------------------------------------+------------------------+ Looks like we don't consider spilling and memory more profitable anymore... Could you please take a look?