Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode.
Compared to the previous version, the main change is to set up a new function to generate const0 for certain modes and use the function as by_pieces_constfn for CLEAR_BY_PIECES. https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. On i386, it got several regressions. One issue is the predicate of V16QI move expand doesn't include const0. Thus V16QI mode can't be used for clear by pieces with the patch. The second issue is the const0 is passed directly to the move expand with the patch. Originally it is forced to a pseudo and i386 can leverage the previous data to do optimization. The patch also raises several regressions on aarch64. The V2x8QImode replaces TImode to do 16-byte clear by pieces as V2x8QImode move expand supports const0 and vector mode is preferable. I drafted a patch to address the issue. It will be sent for review in a separate email. Another problem is V8QImode replaces DImode to do 8-byte clear by pieces. It seems cause different sequences of instructions but the actually instructions are the same. Thanks Gui Haochen ChangeLog expand: Add const0 move checking for CLEAR_BY_PIECES optabs vec_duplicate handles duplicates of non-constant inputs. The 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move. This patch adds the checking. gcc/ * expr.cc (by_pieces_mode_supported_p): Add const0 move checking for CLEAR_BY_PIECES. (set_zero): New. (clear_by_pieces): Pass set_zero as by_pieces_constfn. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index ffbac513692..7199e0956f8 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1014,14 +1014,20 @@ can_use_qi_vectors (by_pieces_operation op) static bool by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) + enum insn_code icode = optab_handler (mov_optab, mode); + if (icode == CODE_FOR_nothing) return false; - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + if (op == SET_BY_PIECES && VECTOR_MODE_P (mode) && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) return false; + if (op == CLEAR_BY_PIECES + && VECTOR_MODE_P (mode) + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) + return false; + if (op == COMPARE_BY_PIECES && !can_compare_p (EQ, mode, ccp_jump)) return false; @@ -1840,16 +1846,20 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, return to; } +static rtx +set_zero (void *, void *, HOST_WIDE_INT, fixed_size_mode mode) +{ + return CONST0_RTX (mode); +} + void clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align) { if (len == 0) return; - /* Use builtin_memset_read_str to support vector mode broadcast. */ - char c = 0; - store_by_pieces_d data (to, builtin_memset_read_str, &c, len, align, - CLEAR_BY_PIECES); + /* Use set_zero to generate const0 of centain mode. */ + store_by_pieces_d data (to, set_zero, NULL, len, align, CLEAR_BY_PIECES); data.run (); }