On May 25, 2023, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, May 25, 2023 at 1:10 PM Alexandre Oliva <ol...@adacore.com> wrote: >> >> On May 25, 2023, Richard Biener <richard.guent...@gmail.com> wrote: >> >> > I mean we could do what RTL expansion would do later and do >> > by-pieces, thus emit multiple loads/stores but not n loads and then >> > n stores but interleaved. >> >> That wouldn't help e.g. gcc.dg/memcpy-6.c's fold_move_8, because >> MOVE_MAX and MOVE_MAX_PIECES currently limits inline expansion to 4 >> bytes on x86 without SSE, both in gimple and RTL, and interleaved loads >> and stores wouldn't help with memmove. We can't fix that by changing >> code that uses MOVE_MAX and/or MOVE_MAX_PIECES, when these limits are >> set too low.
> Btw, there was a short period where the MOVE_MAX limit was restricted > but that had fallout and we've reverted since then. Erhm... Are we even talking about the same issue? i386/i386.h reduced the 32-bit non-SSE MOVE_MAX from 16 to 4, which broke this test; I'm proposing to bounce it back up to 8, so that we get a little more memmove inlining, enough for tests that expect that much to pass. You may be focusing on the gimple-fold bit, because I mentioned it, but even the rtl expander is failing to expand the memmove because of the setting, as evidenced by the test's failure in the scan for memmove in the final dump. That MOVE_MAX change was a significant regression in codegen for 32-bit non-SSE x86, and I'm proposing to fix that. Compensating for that regression elsewhere doesn't seem desirable to me: MOVE_MAX can be much higher even on other x86 variants, so the effects of such attempts may harm quite significantly more modern CPUs. Conversely, I don't expect the reduction of MOVE_MAX on SSE-less x86 a couple of years ago to have been measured for performance effects, given the little overall relevance of such CPUs, and the very visible and undesirable effects on codegen that change brought onto them. And yet, I'm being very conservative in the proposed reversion, because benchmarking such targets in any meaningful way would be somewhat challenging for myself as well. So, could we please have this narrow fix of this limited regression at the spot where it was introduced accepted, rather than debating tangents? -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>