HAO CHEN GUI <guih...@linux.ibm.com> writes: > Hi, > This patch adds const0 move checking for CLEAR_BY_PIECES. The original > vec_duplicate handles duplicates of non-constant inputs. But 0 is a > constant. So even a platform doesn't support vec_duplicate, it could > still do clear by pieces if it supports const0 move by that mode. > > Compared to the previous version, the main change is to set up a > new function to generate const0 for certain modes and use the function > as by_pieces_constfn for CLEAR_BY_PIECES. > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html > > Bootstrapped and tested on powerpc64-linux BE and LE with no > regressions. > > On i386, it got several regressions. One issue is the predicate of > V16QI move expand doesn't include const0. Thus V16QI mode can't be used > for clear by pieces with the patch. The second issue is the const0 is > passed directly to the move expand with the patch. Originally it is > forced to a pseudo and i386 can leverage the previous data to do > optimization.
The patch looks good to me, but I suppose we'll need to decide what to do about x86. It's not obvious to me why movv16qi requires a nonimmediate_operand source, especially since ix86_expand_vector_mode does have code to cope with constant operand[1]s. emit_move_insn_1 doesn't check the predicates anyway, so the predicate will have little effect. A workaround would be to check legitimate_constant_p instead of the predicate, but I'm not sure that that should be necessary. Has this already been discussed? If not, we should loop in the x86 maintainers (but I didn't do that here in case it would be a repeat). As far as the second issue goes, I suppose there are at least three ways of handling shared constants: (1) Force the zero into a register and leave later optimisations to propagate the zero where profitable. (2) Emit stores of zero and expect a later pass to share constants where beneficial. (3) Generate stores of zero and leave the target expanders to force constants into registers on the fly if reuse seems plausibly beneficial. where (3) is a middle ground between (1) and (2). Thanks, Richard > The patch also raises several regressions on aarch64. The V2x8QImode > replaces TImode to do 16-byte clear by pieces as V2x8QImode move expand > supports const0 and vector mode is preferable. I drafted a patch to > address the issue. It will be sent for review in a separate email. > Another problem is V8QImode replaces DImode to do 8-byte clear by pieces. > It seems cause different sequences of instructions but the actually > instructions are the same. > > Thanks > Gui Haochen > > ChangeLog > expand: Add const0 move checking for CLEAR_BY_PIECES optabs > > vec_duplicate handles duplicates of non-constant inputs. The 0 is a > constant. So even a platform doesn't support vec_duplicate, it could > still do clear by pieces if it supports const0 move. This patch adds > the checking. > > gcc/ > * expr.cc (by_pieces_mode_supported_p): Add const0 move checking > for CLEAR_BY_PIECES. > (set_zero): New. > (clear_by_pieces): Pass set_zero as by_pieces_constfn. > > patch.diff > diff --git a/gcc/expr.cc b/gcc/expr.cc > index ffbac513692..7199e0956f8 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -1014,14 +1014,20 @@ can_use_qi_vectors (by_pieces_operation op) > static bool > by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) > { > - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) > + enum insn_code icode = optab_handler (mov_optab, mode); > + if (icode == CODE_FOR_nothing) > return false; > > - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) > + if (op == SET_BY_PIECES > && VECTOR_MODE_P (mode) > && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) > return false; > > + if (op == CLEAR_BY_PIECES > + && VECTOR_MODE_P (mode) > + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) > + return false; > + > if (op == COMPARE_BY_PIECES > && !can_compare_p (EQ, mode, ccp_jump)) > return false; > @@ -1840,16 +1846,20 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, > return to; > } > > +static rtx > +set_zero (void *, void *, HOST_WIDE_INT, fixed_size_mode mode) > +{ > + return CONST0_RTX (mode); > +} > + > void > clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align) > { > if (len == 0) > return; > > - /* Use builtin_memset_read_str to support vector mode broadcast. */ > - char c = 0; > - store_by_pieces_d data (to, builtin_memset_read_str, &c, len, align, > - CLEAR_BY_PIECES); > + /* Use set_zero to generate const0 of centain mode. */ > + store_by_pieces_d data (to, set_zero, NULL, len, align, CLEAR_BY_PIECES); > data.run (); > }