On Thu, Oct 23, 2025 at 10:15 AM H.J. Lu <[email protected]> wrote:
>
> Inline memmove in 64-bit since there are much less registers available
> in 32-bit:
>
> 1. Load all sources into registers and store them together to avoid
>    possible address overlap between source and destination.
> 2. For known size, first try to fully unroll with 8 registers.
> 3. For size <= 2 * MOVE_MAX, load all sources into 2 registers first
>    and then store them together.
> 4. For size > 2 * MOVE_MAX and size <= 4 * MOVE_MAX, load all sources
>    into 4 registers first and then store them together.
> 5. For size > 4 * MOVE_MAX and size <= 8 * MOVE_MAX, load all sources
>    into 8 registers first and then store them together.
> 6. For size > 8 * MOVE_MAX,
>    a. If address of destination > address of source, copy backward
>       with a 4 * MOVE_MAX loop with unaligned loads and stores.  Load
>       the first 4 * MOVE_MAX into 4 registers before the loop and
>       store them after the loop to support overlapping addresses.
>    b. Otherwise, copy forward with a 4 * MOVE_MAX loop with unaligned
>       loads and stores.  Load the last 4 * MOVE_MAX into 4 registers
>       before the loop and store them after the loop to support
>       overlapping addresses.
>
> Verified and benchmarked memmove implementations inlined with GPR, SSE2,
> AVX2 and AVX512 using glibc memmove tests.  It is available at
>
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/test/memmove
>
> Their performances are comparable with optimized memmove implementations
> in glibc on Intel Core i7-1195G7.
I'll measure performance on SPEC and get back later, could take couple days.
>
> gcc/
>
> PR target/90262
> * config/i386/i386-expand.cc (ix86_expand_unroll_movmem): New.
> (ix86_expand_n_move_movmem): Likewise.
> (ix86_expand_load_movmem): Likewise.
> (ix86_expand_store_movmem): Likewise.
> (ix86_expand_n_overlapping_move_movmem): Likewise.
> (ix86_expand_less_move_movmem): Likewise.
> (ix86_expand_movmem): Likewise.
> * i386-protos.h (ix86_expand_movmem): Likewise.
> * config/i386/i386.md (movmem<mode>): Likewise.
>
> gcc/testsuite/
>
> * gcc.target/i386/builtin-memmove-1a.c: New test.
> * gcc.target/i386/builtin-memmove-1b.c: Likewise.
> * gcc.target/i386/builtin-memmove-1c.c: Likewise.
> * gcc.target/i386/builtin-memmove-1d.c: Likewise.
> * gcc.target/i386/builtin-memmove-2a.c: Likewise.
> * gcc.target/i386/builtin-memmove-2b.c: Likewise.
> * gcc.target/i386/builtin-memmove-2c.c: Likewise.
> * gcc.target/i386/builtin-memmove-2d.c: Likewise.
> * gcc.target/i386/builtin-memmove-3a.c: Likewise.
> * gcc.target/i386/builtin-memmove-3b.c: Likewise.
> * gcc.target/i386/builtin-memmove-3c.c: Likewise.
> * gcc.target/i386/builtin-memmove-4a.c: Likewise.
> * gcc.target/i386/builtin-memmove-4b.c: Likewise.
> * gcc.target/i386/builtin-memmove-4c.c: Likewise.
> * gcc.target/i386/builtin-memmove-5a.c: Likewise.
> * gcc.target/i386/builtin-memmove-5b.c: Likewise.
> * gcc.target/i386/builtin-memmove-5c.c: Likewise.
> * gcc.target/i386/builtin-memmove-6.c: Likewise.
> * gcc.target/i386/builtin-memmove-7.c: Likewise.
> * gcc.target/i386/builtin-memmove-8.c: Likewise.
> * gcc.target/i386/builtin-memmove-9.c: Likewise.
> * gcc.target/i386/builtin-memmove-10.c: Likewise.
> * gcc.target/i386/builtin-memmove-11a.c: Likewise.
> * gcc.target/i386/builtin-memmove-11b.c: Likewise.
> * gcc.target/i386/builtin-memmove-11c.c: Likewise.
> * gcc.target/i386/builtin-memmove-12.c: Likewise.
> * gcc.target/i386/builtin-memmove-13.c: Likewise.
> * gcc.target/i386/builtin-memmove-14.c: Likewise.
> * gcc.target/i386/builtin-memmove-15.c: Likewise.
>
> OK for master?
>
> Thanks.
>
> --
> H.J.



-- 
BR,
Hongtao

Reply via email to