On 17 November 2017 at 22:41, James Greenhalgh <james.greenha...@arm.com> wrote:
> On Fri, Sep 15, 2017 at 07:22:39PM +0100, Steve Ellcey wrote:
>> PR 81356 points out that doing a __builtin_strcpy of an empty string on
>> aarch64 does a copy from memory instead of just writing out a zero byte.
>> In looking at this I found that it was because of
>> aarch64_use_by_pieces_infrastructure_p, which returns false for
>> STORE_BY_PIECES.  The comment says:
>>
>>   /* STORE_BY_PIECES can be used when copying a constant string, but
>>      in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>>      For now we always fail this and let the move_by_pieces code copy
>>      the string from read-only memory.  */
>>
>> But this doesn't seem to be the case anymore.  When I remove this function
>> and the TARGET_USE_BY_PIECES_INFRASTRUCTURE_P macro that uses it the code
>> for __builtin_strcpy of a constant string seems to be either better or the
>> same.  The only time I got more instructions after removing this function
>> was on an 8 byte __builtin_strcpy where we now generate a mov and 3 movk
>> instructions to create the source followed by a store instead of doing a
>> load/store of 8 bytes.  The comment may have been applicable for
>> -mstrict-align at one time but it doesn't seem to be the case now.  I still
>> get better code without this routine under that option as well.
>
> I've been trying to remember why this is the way it is, It looks like it dates
> back to the initial port, and the only note I have on it points at an
> incorrect PR number. So, I think this is probably a safe and sensible
> choice.
>
> OK.
>

As a result of this patch, we now have:
XPASS: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2 "memset" 0
instead of:
XFAIL: gfortran.dg/pr45636.f90   -O   scan-tree-dump-times forwprop2
"memset" 0 (found 2 times)

Is it expected?

Christophe


> Reviewed-by: James Greenhalgh <james.greenha...@arm.com>
>
> Thanks,
> James
>
>>
>> Bootstrapped and tested without regressions, OK to checkin?
>>
>> Steve Ellcey
>> sell...@cavium.com
>>
>>
>>
>> 2017-09-15  Steve Ellcey  <sell...@cavium.com>
>>
>>       PR target/81356
>>       * config/aarch64/aarch64.c (aarch64_use_by_pieces_infrastructure_p):
>>       Remove.
>>       (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Remove define.
>>
>>
>> 2017-09-15  Steve Ellcey  <sell...@cavium.com>
>>
>>       * gcc.target/aarch64/pr81356.c: New test.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 1c14008..fc72236 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -14118,22 +14118,6 @@ aarch64_asan_shadow_offset (void)
>>    return (HOST_WIDE_INT_1 << 36);
>>  }
>>
>> -static bool
>> -aarch64_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
>> -                                     unsigned int align,
>> -                                     enum by_pieces_operation op,
>> -                                     bool speed_p)
>> -{
>> -  /* STORE_BY_PIECES can be used when copying a constant string, but
>> -     in that case each 64-bit chunk takes 5 insns instead of 2 (LDR/STR).
>> -     For now we always fail this and let the move_by_pieces code copy
>> -     the string from read-only memory.  */
>> -  if (op == STORE_BY_PIECES)
>> -    return false;
>> -
>> -  return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
>> -}
>> -
>>  static rtx
>>  aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
>>                       int code, tree treeop0, tree treeop1)
>> @@ -15631,10 +15615,6 @@ aarch64_libgcc_floating_mode_supported_p
>>  #undef TARGET_LEGITIMIZE_ADDRESS
>>  #define TARGET_LEGITIMIZE_ADDRESS aarch64_legitimize_address
>>
>> -#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
>> -#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>> -  aarch64_use_by_pieces_infrastructure_p
>> -
>>  #undef TARGET_SCHED_CAN_SPECULATE_INSN
>>  #define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
>>
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr81356.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> index e69de29..9fd6baa 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr81356.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +void f(char *a)
>> +{
>> +  __builtin_strcpy (a, "");
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "ldrb" } } */
>

Reply via email to