Re: [PATCH 6/7] RISC-V: Make vectorized memset handle more cases

Craig Blackmore Wed, 30 Oct 2024 10:21:54 -0700


On 29/10/2024 15:09, Jeff Law wrote:

On 10/29/24 7:59 AM, Craig Blackmore wrote:
On 19/10/2024 14:05, Jeff Law wrote:
On 10/18/24 7:12 AM, Craig Blackmore wrote:
`expand_vec_setmem` only generated vectorized memset if it fittedinto a
single vector store.  Extend it to generate a loop for longer and
unknown lengths.
The test cases now use -O1 so that they are not sensitive toscheduling.
gcc/ChangeLog:

    * config/riscv/riscv-string.cc
    (use_vector_stringop_p): Add comment.
    (expand_vec_setmem): Use use_vector_stringop_p instead of
    check_vectorise_memory_operation.  Add loop generation.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/base/setmem-1.c: Use -O1.  Expect a loop
    instead of a libcall.  Add test for unknown length.
    * gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise and expectsmaller
    lmul.
So why handle memset differently than the other mem* routines wherewe limit ourselves to what we can handle without needing loops?
My suspicion is that once we're moving enough data that we can't doit with a single big lmul store that calling out to the libraryvariant probably isn't a big deal for memset. Do you have datawhich suggests otherwise?
I don't have data for this yet. My thinking was that the glibc andnewlib memset implementations are scalar and they also do byte storesto reach alignment which is unnecessary on fast unaligned accesstargets.
Consider that a short term problem, at least for glibc. I've got themagic ifunc bits which introduce vector versions and also check forfast unaligned support. Does that change the calculus in your mind?

Yes, with those bits in place it would seem less of an obvious win.

This patch may still be useful in the meantime if I removed the loopgeneration parts as it would still allow us to generate vector setmemfor smaller lengths than currently allowed.
Yea, which would unblock #7 of the series. Then we could circle backon whether or not we should let setmem loop when expanded by thecompiler?

Ok, I'll follow up with a non-loop version of this patch.

Thanks,

Craig

Jeff

Re: [PATCH 6/7] RISC-V: Make vectorized memset handle more cases

Reply via email to