On 19/10/2024 14:05, Jeff Law wrote:


On 10/18/24 7:12 AM, Craig Blackmore wrote:
`expand_vec_setmem` only generated vectorized memset if it fitted into a
single vector store.  Extend it to generate a loop for longer and
unknown lengths.

The test cases now use -O1 so that they are not sensitive to scheduling.

gcc/ChangeLog:

    * config/riscv/riscv-string.cc
    (use_vector_stringop_p): Add comment.
    (expand_vec_setmem): Use use_vector_stringop_p instead of
    check_vectorise_memory_operation.  Add loop generation.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/base/setmem-1.c: Use -O1.  Expect a loop
    instead of a libcall.  Add test for unknown length.
    * gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
    * gcc.target/riscv/rvv/base/setmem-3.c: Likewise and expect smaller
    lmul.
So why handle memset differently than the other mem* routines where we limit ourselves to what we can handle without needing loops?

My suspicion is that once we're moving enough data that we can't do it with a single big lmul store that calling out to the library variant probably isn't a big deal for memset.  Do you have data which suggests otherwise?
I don't have data for this yet. My thinking was that the glibc and newlib memset implementations are scalar and they also do byte stores to reach alignment which is unnecessary on fast unaligned access targets.

This patch may still be useful in the meantime if I removed the loop generation parts as it would still allow us to generate vector setmem for smaller lengths than currently allowed.

Craig


Jeff


Reply via email to