On Tue, Aug 20, 2024 at 2:50 PM Hongtao Liu <crazy...@gmail.com> wrote: > > On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI <guih...@linux.ibm.com> wrote: > > > > Hi, > > Add Hongtao Liu as the patch affects x86. > > > > 在 2024/8/20 6:32, Richard Sandiford 写道: > > > HAO CHEN GUI <guih...@linux.ibm.com> writes: > > >> Hi, > > >> This patch adds const0 move checking for CLEAR_BY_PIECES. The original > > >> vec_duplicate handles duplicates of non-constant inputs. But 0 is a > > >> constant. So even a platform doesn't support vec_duplicate, it could > > >> still do clear by pieces if it supports const0 move by that mode. > > >> > > >> Compared to the previous version, the main change is to set up a > > >> new function to generate const0 for certain modes and use the function > > >> as by_pieces_constfn for CLEAR_BY_PIECES. > > >> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html > > >> > > >> Bootstrapped and tested on powerpc64-linux BE and LE with no > > >> regressions. > > >> > > >> On i386, it got several regressions. One issue is the predicate of > > >> V16QI move expand doesn't include const0. Thus V16QI mode can't be used > > >> for clear by pieces with the patch. The second issue is the const0 is > > >> passed directly to the move expand with the patch. Originally it is > > >> forced to a pseudo and i386 can leverage the previous data to do > > >> optimization. > > > > > > The patch looks good to me, but I suppose we'll need to decide what > > > to do about x86. > > > > > > It's not obvious to me why movv16qi requires a nonimmediate_operand > > > source, especially since ix86_expand_vector_mode does have code to > > > cope with constant operand[1]s. emit_move_insn_1 doesn't check the > > > predicates anyway, so the predicate will have little effect. > > > > > > A workaround would be to check legitimate_constant_p instead of the > > > predicate, but I'm not sure that that should be necessary. > > > > > > Has this already been discussed? If not, we should loop in the x86 > > > maintainers (but I didn't do that here in case it would be a repeat). > > > > I also noticed it. Not sure why movv16qi requires a > > nonimmediate_operand, while ix86_expand_vector_mode could deal with > > constant op. Looking forward to Hongtao's comments. > The code has been there since 2005 before I'm involved. > It looks to me at the beginning both mov<mode> and > *mov<mode>_internal only support nonimmediate_operand for the > operands[1]. > And r0-75606-g5656a184e83983 adjusted the nonimmediate_operand to > nonimmediate_or_sse_const_operand for *mov<mode>_internal, but not for > mov<mode>. > I think we can align the predicate between mov<mode> and *mov<mode>_internal. > I'll do some tests and reach back to you. r15-3058-gbb42c551905024 support const0 operand for movv16qi, please rebase your patch and see if there's still the regressions. > > > > > > > > As far as the second issue goes, I suppose there are at least three > > > ways of handling shared constants: > > > > > > (1) Force the zero into a register and leave later optimisations to > > > propagate the zero where profitable. > > The zero can be propagated into the store, but the address adjustment > > may not be combined into insn properly. For instance, if zero is > > forced to a register, "movv2x8qi" insn is generated. The address > > adjustment becomes a separate insn as "movv2x8qi" insn doesn't support > > d-from address. When zero is propagated, it converts "movv2x8qi" to > > "movti". "movti" supports d-from as well as post/inc address. Probably, > > the auto_inc_dec pass combines address adjustment insn into previous > > "movti" to generate a post inc "movti". The expected optimization might > > be to combine address adjustment insn into second "movit" and generate a > > d-form "movti". It's a regression issue I found in aarch64. > > > > Also we checks if const0 is supported for mov optab. But finally we > > force the const0 to a register and generate a store with the register. > > Seems it's not reasonable. > > > > > > > > (2) Emit stores of zero and expect a later pass to share constants > > > where beneficial. > > Not sure which pass can optimize it. > > > > > > > > (3) Generate stores of zero and leave the target expanders to force > > > constants into registers on the fly if reuse seems plausibly > > > beneficial. > > > > > The constant zero with different modes are not relevant. Not sure > > which pass can optimize it. The compiler should be taught that > > reg 102 can be expressed as a subreg of reg 100. > > > > (insn 6 5 7 2 (set (reg:V32QI 100) > > (const_vector:V32QI [ > > (const_int 0 [0]) repeated x32 > > ])) > > > > (insn 8 7 0 2 (set (reg:V16QI 102) > > (const_vector:V16QI [ > > (const_int 0 [0]) repeated x16 > > ])) > > > It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080, I have some > experimental patch which tries to eliminate those redundancy in cse. > > > I tested a case with one 32-byte and one 16-byte memory clear on x86. > > These two sets can't be optimized. They can be optimized only when they > > are in the same memory clear operation (for example, a 48-byte memory > > clear). > > > > Thanks > > Gui Haochen > > > > > where (3) is a middle ground between (1) and (2). > > > > > > Thanks, > > > Richard > > > > > >> The patch also raises several regressions on aarch64. The V2x8QImode > > >> replaces TImode to do 16-byte clear by pieces as V2x8QImode move expand > > >> supports const0 and vector mode is preferable. I drafted a patch to > > >> address the issue. It will be sent for review in a separate email. > > >> Another problem is V8QImode replaces DImode to do 8-byte clear by pieces. > > >> It seems cause different sequences of instructions but the actually > > >> instructions are the same. > > >> > > >> Thanks > > >> Gui Haochen > > >> > > >> ChangeLog > > >> expand: Add const0 move checking for CLEAR_BY_PIECES optabs > > >> > > >> vec_duplicate handles duplicates of non-constant inputs. The 0 is a > > >> constant. So even a platform doesn't support vec_duplicate, it could > > >> still do clear by pieces if it supports const0 move. This patch adds > > >> the checking. > > >> > > >> gcc/ > > >> * expr.cc (by_pieces_mode_supported_p): Add const0 move checking > > >> for CLEAR_BY_PIECES. > > >> (set_zero): New. > > >> (clear_by_pieces): Pass set_zero as by_pieces_constfn. > > >> > > >> patch.diff > > >> diff --git a/gcc/expr.cc b/gcc/expr.cc > > >> index ffbac513692..7199e0956f8 100644 > > >> --- a/gcc/expr.cc > > >> +++ b/gcc/expr.cc > > >> @@ -1014,14 +1014,20 @@ can_use_qi_vectors (by_pieces_operation op) > > >> static bool > > >> by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation > > >> op) > > >> { > > >> - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) > > >> + enum insn_code icode = optab_handler (mov_optab, mode); > > >> + if (icode == CODE_FOR_nothing) > > >> return false; > > >> > > >> - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) > > >> + if (op == SET_BY_PIECES > > >> && VECTOR_MODE_P (mode) > > >> && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) > > >> return false; > > >> > > >> + if (op == CLEAR_BY_PIECES > > >> + && VECTOR_MODE_P (mode) > > >> + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) > > >> + return false; > > >> + > > >> if (op == COMPARE_BY_PIECES > > >> && !can_compare_p (EQ, mode, ccp_jump)) > > >> return false; > > >> @@ -1840,16 +1846,20 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT > > >> len, > > >> return to; > > >> } > > >> > > >> +static rtx > > >> +set_zero (void *, void *, HOST_WIDE_INT, fixed_size_mode mode) > > >> +{ > > >> + return CONST0_RTX (mode); > > >> +} > > >> + > > >> void > > >> clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align) > > >> { > > >> if (len == 0) > > >> return; > > >> > > >> - /* Use builtin_memset_read_str to support vector mode broadcast. */ > > >> - char c = 0; > > >> - store_by_pieces_d data (to, builtin_memset_read_str, &c, len, align, > > >> - CLEAR_BY_PIECES); > > >> + /* Use set_zero to generate const0 of centain mode. */ > > >> + store_by_pieces_d data (to, set_zero, NULL, len, align, > > >> CLEAR_BY_PIECES); > > >> data.run (); > > >> } > > > > -- > BR, > Hongtao
-- BR, Hongtao