Hi Kewen, Thanks for your comments. 在 2024/5/9 13:44, Kewen.Lin 写道: > Hi, > > on 2024/5/8 14:47, HAO CHEN GUI wrote: >> Hi, >> This patch enables overlapped by-piece operations. On rs6000, default >> move/set/clear ratio is 2. So the overlap is only enabled with compare >> by-pieces. > > Thanks for enabling this, did you evaluate if it can help some benchmark?
Tested it with SPEC2017. No obvious performance impact. I think memory compare might not be hot enough. Tested it with my micro benchmark. 5-10% performance gain when compare length is 7. > >> >> Bootstrapped and tested on powerpc64-linux BE and LE with no >> regressions. Is it OK for the trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> rs6000: Enable overlapped by-pieces operations >> >> This patch enables overlapped by-piece operations by defining >> TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear >> ratio is 2. So the overlap is only enabled with compare by-pieces. >> >> gcc/ >> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. >> >> gcc/testsuite/ >> * gcc.target/powerpc/block-cmp-9.c: New. >> >> >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index 6b9a40fcc66..2b5f5cf1d86 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const >> rs6000_attribute_table[] = >> #undef TARGET_CONST_ANCHOR >> #define TARGET_CONST_ANCHOR 0x8000 >> >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P >> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true >> + >> >> >> /* Processor table. */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> new file mode 100644 >> index 00000000000..b5f51affbb7 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> @@ -0,0 +1,11 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > > Why does it need power8 forced here? I just want to exclude P7 LE as targetm.slow_unaligned_access return false for it and the expand cmpmemsi won't be invoked. > > BR, > Kewen > >> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ >> + >> +/* Test if by-piece overlap compare is enabled and following case is >> + implemented by two overlap word loads and compares. */ >> + >> +int foo (const char* s1, const char* s2) >> +{ >> + return __builtin_memcmp (s1, s2, 7) == 0; >> +} > Thanks Gui Haochen