Hi,
  This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.

  Compared to the previous version, the main change is to set up a
new function to generate const0 for certain modes and use the function
as by_pieces_constfn for CLEAR_BY_PIECES.
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions.

  On i386, it got several regressions. One issue is the predicate of
V16QI move expand doesn't include const0. Thus V16QI mode can't be used
for clear by pieces with the patch. The second issue is the const0 is
passed directly to the move expand with the patch. Originally it is
forced to a pseudo and i386 can leverage the previous data to do
optimization.

  The patch also raises several regressions on aarch64. The V2x8QImode
replaces TImode to do 16-byte clear by pieces as V2x8QImode move expand
supports const0 and vector mode is preferable. I drafted a patch to
address the issue. It will be sent for review in a separate email.
Another problem is V8QImode replaces DImode to do 8-byte clear by pieces.
It seems cause different sequences of instructions but the actually
instructions are the same.

Thanks
Gui Haochen

ChangeLog
expand: Add const0 move checking for CLEAR_BY_PIECES optabs

vec_duplicate handles duplicates of non-constant inputs.  The 0 is a
constant.  So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move.  This patch adds
the checking.

gcc/
        * expr.cc (by_pieces_mode_supported_p): Add const0 move checking
        for CLEAR_BY_PIECES.
        (set_zero): New.
        (clear_by_pieces): Pass set_zero as by_pieces_constfn.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index ffbac513692..7199e0956f8 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1014,14 +1014,20 @@ can_use_qi_vectors (by_pieces_operation op)
 static bool
 by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
-  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+  enum insn_code icode = optab_handler (mov_optab, mode);
+  if (icode == CODE_FOR_nothing)
     return false;

-  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  if (op == SET_BY_PIECES
       && VECTOR_MODE_P (mode)
       && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
     return false;

+  if (op == CLEAR_BY_PIECES
+      && VECTOR_MODE_P (mode)
+      && !insn_operand_matches (icode, 1, CONST0_RTX (mode)))
+   return false;
+
   if (op == COMPARE_BY_PIECES
       && !can_compare_p (EQ, mode, ccp_jump))
     return false;
@@ -1840,16 +1846,20 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
     return to;
 }

+static rtx
+set_zero (void *, void *, HOST_WIDE_INT, fixed_size_mode mode)
+{
+  return CONST0_RTX (mode);
+}
+
 void
 clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align)
 {
   if (len == 0)
     return;

-  /* Use builtin_memset_read_str to support vector mode broadcast.  */
-  char c = 0;
-  store_by_pieces_d data (to, builtin_memset_read_str, &c, len, align,
-                         CLEAR_BY_PIECES);
+  /* Use set_zero to generate const0 of centain mode.  */
+  store_by_pieces_d data (to, set_zero, NULL, len, align, CLEAR_BY_PIECES);
   data.run ();
 }

Reply via email to