https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117957
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[15 regression] |[15 regression] |vectorization pesimises |vectorization pessimizes |std::vector push/pop test |std::vector push/pop test Ever confirmed|0 |1 Last reconfirmed| |2025-01-17 Status|UNCONFIRMED |NEW --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #5) > > I suspect the issue is very similar (or the same) as PR 115777 . > Yep, I think it store-to-load forwarding. The stack is organized in > pairs that are likely written independetly and loaded together. > Sadly I think it is relatively common pattern that stack values are > reused quickly. A wider store fed into smaller loads is fine though. In this case it looks like the GPR -> XMM move might be the real issue. As │ │ movd %ebx,%xmm2 ▒ 8.23 │ │ punpckldq %xmm2,%xmm0 ◆ 24.84 │ │ movq %xmm0,-0x8(%rax) ▒ is just V2SImode we should have been able to turn this into a left-shift of %ebx and a movq from %rbx? The vectorizer costs are "OK", but we suffer from general very high load/store cost compared to other ops. For the store/load what might happen is that the tricks for example Zen2 can do to rename some memory ops into registers are made impossible if the access sizes no longer match. Zen3 dropped this feature (temporarily, I think Zen4 brought it back), so measuring across uarchs might be interesting as well. Btw, I see we elide the zero store somewhere on RTL when not vectorizing and just keep one movd. Confirmed on Zen4 and Zen2, with way more effect on Zen4. The actual regression is likely that we enabled vectorization by C++ standard library changes.