Simply memcpy and memset inline strategies to avoid branches: 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fixed and known. 2. Inline only if data size is known to be <= 256. a. Use "rep movsb/stosb" with simple code sequence if the data size is a constant. b. Use loop if data size is not a constant. 3. Use memcpy/memset libray function if data size is unknown or > 256.
There are no significant performance impacts on SPEC CPU 2017. There are visible performance improvements on eembc benchmarks with one regression. H.J. Lu (3): x86: Update memcpy/memset inline strategies for Ice Lake x86: Update memcpy/memset inline strategies for Skylake family CPUs x86: Update memcpy/memset inline strategies for -mtune=generic gcc/config/i386/i386-expand.c | 11 +- gcc/config/i386/i386-options.c | 12 +- gcc/config/i386/i386.h | 2 + gcc/config/i386/x86-tune-costs.h | 185 ++++++++++++++++-- gcc/config/i386/x86-tune.def | 6 + .../gcc.target/i386/memcpy-strategy-10.c | 11 ++ .../gcc.target/i386/memcpy-strategy-11.c | 18 ++ .../gcc.target/i386/memcpy-strategy-12.c | 9 + .../gcc.target/i386/memcpy-strategy-13.c | 11 ++ .../gcc.target/i386/memcpy-strategy-5.c | 11 ++ .../gcc.target/i386/memcpy-strategy-6.c | 18 ++ .../gcc.target/i386/memcpy-strategy-7.c | 9 + .../gcc.target/i386/memcpy-strategy-8.c | 18 ++ .../gcc.target/i386/memcpy-strategy-9.c | 9 + .../gcc.target/i386/memset-strategy-10.c | 11 ++ .../gcc.target/i386/memset-strategy-11.c | 9 + .../gcc.target/i386/memset-strategy-3.c | 17 ++ .../gcc.target/i386/memset-strategy-4.c | 17 ++ .../gcc.target/i386/memset-strategy-5.c | 11 ++ .../gcc.target/i386/memset-strategy-6.c | 9 + .../gcc.target/i386/memset-strategy-7.c | 11 ++ .../gcc.target/i386/memset-strategy-8.c | 9 + .../gcc.target/i386/memset-strategy-9.c | 17 ++ gcc/testsuite/gcc.target/i386/shrink_wrap_1.c | 2 +- gcc/testsuite/gcc.target/i386/sw-1.c | 2 +- 25 files changed, 413 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-10.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-11.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-12.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-13.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-5.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-6.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-7.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-8.c create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-9.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-10.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-11.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-3.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-4.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-5.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-6.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-7.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-8.c create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-9.c -- 2.30.2