Hi,
  I submitted a patch to change the mode checking for
CLEAR_BY_PIECES.
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html

  It causes some regressions on aarch64. With the patch,
V2x8QImode is used to do clear by pieces instead of TImode as
vector mode is preferable and V2x8QImode supports const0 store.
Thus the efficient "stp" instructions can't be generated.

  I drafted following patch to fix the problem. It can fix
regressions found in memset-corner-cases.c, memset-q-reg.c,
auto-init-padding-11.c and auto-init-padding-5.c.

  Not sure if it should be done on all 16-byte vector modes.
Also not sure if the patch is proper. So I send this RFC email.

Thanks
Gui Haochen

ChangeLog
aarch64: Implement 16-byte vector mode const0 store by TImode

gcc/
        * config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD):
        Expand V2x8QImode const0 store by TImode.


patch.diff
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 01b084d8ccb..8aa72940b12 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7766,7 +7766,14 @@ (define_expand "mov<mode>"
        (match_operand:VSTRUCT_QD 1 "general_operand"))]
   "TARGET_FLOAT"
 {
-  if (can_create_pseudo_p ())
+  if (<MODE>mode == V2x8QImode
+      && operands[1] == CONST0_RTX (V2x8QImode)
+      && MEM_P (operands[0]))
+    {
+      operands[0] = adjust_address (operands[0], TImode, 0);
+      operands[1] = CONST0_RTX (TImode);
+    }
+  else if (can_create_pseudo_p ())
     {
       if (GET_CODE (operands[0]) != REG)
        operands[1] = force_reg (<MODE>mode, operands[1]);

Reply via email to