Hi, on 2024/5/9 15:35, HAO CHEN GUI wrote: > Hi Kewen, > Thanks for your comments. > > 在 2024/5/9 13:44, Kewen.Lin 写道: >> Hi, >> >> on 2024/5/8 14:47, HAO CHEN GUI wrote: >>> Hi, >>> This patch enables overlapped by-piece operations. On rs6000, default >>> move/set/clear ratio is 2. So the overlap is only enabled with compare >>> by-pieces. >> >> Thanks for enabling this, did you evaluate if it can help some benchmark? > > Tested it with SPEC2017. No obvious performance impact. I think memory > compare might not be hot enough. > > Tested it with my micro benchmark. 5-10% performance gain when compare > length is 7.
Nice! > >> >>> >>> Bootstrapped and tested on powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> rs6000: Enable overlapped by-pieces operations >>> >>> This patch enables overlapped by-piece operations by defining >>> TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear >>> ratio is 2. So the overlap is only enabled with compare by-pieces. >>> >>> gcc/ >>> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. >>> >>> gcc/testsuite/ >>> * gcc.target/powerpc/block-cmp-9.c: New. >>> >>> >>> patch.diff >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >>> index 6b9a40fcc66..2b5f5cf1d86 100644 >>> --- a/gcc/config/rs6000/rs6000.cc >>> +++ b/gcc/config/rs6000/rs6000.cc >>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const >>> rs6000_attribute_table[] = >>> #undef TARGET_CONST_ANCHOR >>> #define TARGET_CONST_ANCHOR 0x8000 >>> >>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P >>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true >>> + >>> >>> >>> /* Processor table. */ >>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> new file mode 100644 >>> index 00000000000..b5f51affbb7 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> @@ -0,0 +1,11 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ >> >> Why does it need power8 forced here? > > I just want to exclude P7 LE as targetm.slow_unaligned_access return false > for it and the expand cmpmemsi won't be invoked. > I think it over. It's no need. For the sub-targets which library is > called, l[hb]z won't be generated too. Thanks for checking, OK with dropping this forced power8. BR, Kewen > >> >> BR, >> Kewen >> >>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ >>> + >>> +/* Test if by-piece overlap compare is enabled and following case is >>> + implemented by two overlap word loads and compares. */ >>> + >>> +int foo (const char* s1, const char* s2) >>> +{ >>> + return __builtin_memcmp (s1, s2, 7) == 0; >>> +} >> > > Thanks > Gui Haochen