Simply memcpy and memset inline strategies to avoid branches:

1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
   load and store for up to 16 * 16 (256) bytes when the data size is
   fixed and known.
2. Inline only if data size is known to be <= 256.
   a. Use "rep movsb/stosb" with simple code sequence if the data size
      is a constant.
   b. Use loop if data size is not a constant.
3. Use memcpy/memset libray function if data size is unknown or > 256.

There are no significant performance impacts on SPEC CPU 2017.  There
are visible performance improvements on eembc benchmarks with one
regression.

H.J. Lu (3):
  x86: Update memcpy/memset inline strategies for Ice Lake
  x86: Update memcpy/memset inline strategies for Skylake family CPUs
  x86: Update memcpy/memset inline strategies for -mtune=generic

 gcc/config/i386/i386-expand.c                 |  11 +-
 gcc/config/i386/i386-options.c                |  12 +-
 gcc/config/i386/i386.h                        |   2 +
 gcc/config/i386/x86-tune-costs.h              | 185 ++++++++++++++++--
 gcc/config/i386/x86-tune.def                  |   6 +
 .../gcc.target/i386/memcpy-strategy-10.c      |  11 ++
 .../gcc.target/i386/memcpy-strategy-11.c      |  18 ++
 .../gcc.target/i386/memcpy-strategy-12.c      |   9 +
 .../gcc.target/i386/memcpy-strategy-13.c      |  11 ++
 .../gcc.target/i386/memcpy-strategy-5.c       |  11 ++
 .../gcc.target/i386/memcpy-strategy-6.c       |  18 ++
 .../gcc.target/i386/memcpy-strategy-7.c       |   9 +
 .../gcc.target/i386/memcpy-strategy-8.c       |  18 ++
 .../gcc.target/i386/memcpy-strategy-9.c       |   9 +
 .../gcc.target/i386/memset-strategy-10.c      |  11 ++
 .../gcc.target/i386/memset-strategy-11.c      |   9 +
 .../gcc.target/i386/memset-strategy-3.c       |  17 ++
 .../gcc.target/i386/memset-strategy-4.c       |  17 ++
 .../gcc.target/i386/memset-strategy-5.c       |  11 ++
 .../gcc.target/i386/memset-strategy-6.c       |   9 +
 .../gcc.target/i386/memset-strategy-7.c       |  11 ++
 .../gcc.target/i386/memset-strategy-8.c       |   9 +
 .../gcc.target/i386/memset-strategy-9.c       |  17 ++
 gcc/testsuite/gcc.target/i386/shrink_wrap_1.c |   2 +-
 gcc/testsuite/gcc.target/i386/sw-1.c          |   2 +-
 25 files changed, 413 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memcpy-strategy-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/memset-strategy-9.c

-- 
2.30.2

Reply via email to