On 10/18/24 7:12 AM, Craig Blackmore wrote:
`expand_vec_setmem` only generated vectorized memset if it fitted into a
single vector store. Extend it to generate a loop for longer and
unknown lengths.
The test cases now use -O1 so that they are not sensitive to scheduling.
gcc/ChangeLog:
* config/riscv/riscv-string.cc
(use_vector_stringop_p): Add comment.
(expand_vec_setmem): Use use_vector_stringop_p instead of
check_vectorise_memory_operation. Add loop generation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/setmem-1.c: Use -O1. Expect a loop
instead of a libcall. Add test for unknown length.
* gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise and expect smaller
lmul.
So why handle memset differently than the other mem* routines where we
limit ourselves to what we can handle without needing loops?
My suspicion is that once we're moving enough data that we can't do it
with a single big lmul store that calling out to the library variant
probably isn't a big deal for memset. Do you have data which suggests
otherwise?
Jeff