Hi Jeff,

在 2024/7/24 5:57, Jeff Law 写道:
> 
> 
> On 7/21/24 7:58 PM, HAO CHEN GUI wrote:
>> Hi,
>>    This patch adds const0 move checking for CLEAR_BY_PIECES. The original
>> vec_duplicate handles duplicates of non-constant inputs. But 0 is a
>> constant. So even a platform doesn't support vec_duplicate, it could
>> still do clear by pieces if it supports const0 move by that mode.
>>
>>    Compared to the previous version, the main change is to do const0
>> direct move for by-piece clear if the target supports const0 move by
>> that mode.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643063.html
>>
>>    Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. There are several regressions on aarch64. They could be
>> fixed by enhancing const0 move on V2x8QImode. Is it OK for trunk?
> Can you be more specific about the aarch64 regressions?  Execution? Scan-asm? 
>  ICE?
> 
> Ideally we'd include a patch to fix those regressions as well.
> 
For aarch64, it supports const0 move on V2x8QImode (the first vector mode it
could find for 16-byte length). But the generated instructions are not
preferable. aarch64 has a better "stp" instruction which leverages its zero
register to store 16-byte zeros into memory.

I tested following experiment patch on aarch64. It converts a const0 move on
V2x8QImode to a const0 move on TImode which generates "stp" instruction. It
fixed all regressions.

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 01b084d8ccb..8aa72940b12 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7766,7 +7766,14 @@ (define_expand "mov<mode>"
        (match_operand:VSTRUCT_QD 1 "general_operand"))]
   "TARGET_FLOAT"
 {
-  if (can_create_pseudo_p ())
+  if (<MODE>mode == V2x8QImode
+      && operands[1] == CONST0_RTX (V2x8QImode)
+      && MEM_P (operands[0]))
+
+      operands[0] = adjust_address (operands[0], TImode, 0);
+      operands[1] = CONST0_RTX (TImode);
+    }
+  else if (can_create_pseudo_p ())
     {
       if (GET_CODE (operands[0]) != REG)
        operands[1] = force_reg (<MODE>mode, operands[1]);

I still have some questions about the patch to discuss.

1. Originally it tests vec_duplicate_optab for clear by pieces. According to
the words in md.texi, this pattern should only handle non-constant inputs.
Constant vector should go through the move pattern instead. So shall we remove
vec_duplicate_optab checking thoroughly?

2. The clear by pieces final calls emit_move_insn to do the memory set. I
noticed that emit_move_insn doesn't check the predicate of move expand. So any
operand (mem/reg/constant) should be fine when the certain mode of mov_optab
exists. So shall we need following predicate checking for const0 move?
insn_operand_matches (icode, 1, CONST0_RTX (mode))

3. For clear by pieces, currently the zero is not directly passed to generate
function. It uses help fucntion - builtin_memset_read_str to store the zero
in a pseudo then do the move with the pseudo. I saw i386 can benefit with this
process as it calls gen_memset_value_from_prev to generate the value from
previous one. The i386 can utilize its variable length register for storing
different modes of constant zero. If we pass zero directly to generate function,
this optimization will lost. Shall we set a target hook for it?

Looking forward to your advice.

Thanks
Gui Haochen
> 
> jeff
> 

Reply via email to