Hi Kewen,
  Thanks for your comments.

在 2024/5/9 13:44, Kewen.Lin 写道:
> Hi,
> 
> on 2024/5/8 14:47, HAO CHEN GUI wrote:
>> Hi,
>>   This patch enables overlapped by-piece operations. On rs6000, default
>> move/set/clear ratio is 2. So the overlap is only enabled with compare
>> by-pieces.
> 
> Thanks for enabling this, did you evaluate if it can help some benchmark?

Tested it with SPEC2017. No obvious performance impact. I think memory
compare might not be hot enough.

Tested it with my micro benchmark. 5-10% performance gain when compare
length is 7.

> 
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> rs6000: Enable overlapped by-pieces operations
>>
>> This patch enables overlapped by-piece operations by defining
>> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
>> ratio is 2.  So the overlap is only enabled with compare by-pieces.
>>
>> gcc/
>>      * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
>>
>> gcc/testsuite/
>>      * gcc.target/powerpc/block-cmp-9.c: New.
>>
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 6b9a40fcc66..2b5f5cf1d86 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
>> rs6000_attribute_table[] =
>>  #undef TARGET_CONST_ANCHOR
>>  #define TARGET_CONST_ANCHOR 0x8000
>>
>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>> +
>>  
>>
>>  /* Processor table.  */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
>> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>> new file mode 100644
>> index 00000000000..b5f51affbb7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> Why does it need power8 forced here?

I just want to exclude P7 LE as targetm.slow_unaligned_access return false
for it and the expand cmpmemsi won't be invoked.

> 
> BR,
> Kewen
> 
>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
>> +
>> +/* Test if by-piece overlap compare is enabled and following case is
>> +   implemented by two overlap word loads and compares.  */
>> +
>> +int foo (const char* s1, const char* s2)
>> +{
>> +  return __builtin_memcmp (s1, s2, 7) == 0;
>> +}
> 

Thanks
Gui Haochen

Reply via email to