On Jan 27, 2021, Richard Biener <richard.guent...@gmail.com> wrote: > That said, rather than not transforming the loop as you do I'd > say we want to re-inline small copies more forcefully during > loop distribution code-gen so we turn a loop that sets > 3 'short int' to zero into a 'int' store and a 'short' store for example > (we have more code-size leeway here because we formerly had > a loop).
Ok, that makes sense to me, if there's a chance of growing the access stride. > Since you don't add a testcase Uhh, sorry, I mentioned TFmode emulation calls, but I wasn't explicit I meant the soft-fp ones from libgcc. ./xgcc -B./ -O2 $srcdir/libgcc/soft-fp/fixtfdi.c \ -I$srcdir/libgcc/config/riscv -I$srcdir/include \ -S -o - | grep memset > I can't see whether the actual case would be fixed by setting SSA > pointer alignment on the memset arguments The dest pointer alignment is known at the builtin expansion time, that's not a problem. What isn't known (*) is that the length is a multiple of the word size: what gets to the expander is that it's between 4 and 12 bytes (targeting 32-bit risc-v), but not that it's either 4, 8 or 12. Coming up with an efficient inline expansion becomes a bit of a challenge without that extra knowledge. (*) at least before my related patch for get_nonzero_bits https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564344.html -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar