On Fri, Jul 30, 2021 at 2:05 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "H.J. Lu" <hjl.to...@gmail.com> writes: > > On Tue, Jul 27, 2021 at 8:31 AM H.J. Lu <hjl.to...@gmail.com> wrote: > >> > >> On Mon, Jul 26, 2021 at 4:19 PM H.J. Lu <hjl.to...@gmail.com> wrote: > >> > > >> > On Mon, Jul 26, 2021 at 3:56 PM H.J. Lu <hjl.to...@gmail.com> wrote: > >> > > > >> > > On Mon, Jul 26, 2021 at 2:53 PM Richard Sandiford > >> > > <richard.sandif...@arm.com> wrote: > >> > > > > >> > > > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> > > > > On Mon, Jul 26, 2021 at 11:42 AM Richard Sandiford > >> > > > > <richard.sandif...@arm.com> wrote: > >> > > > >> > >> > > > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> > > > >> > +to avoid stack realignment when expanding memset. The default > >> > > > >> > is > >> > > > >> > +@code{gen_reg_rtx}. > >> > > > >> > +@end deftypefn > >> > > > >> > + > >> > > > >> > @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST > >> > > > >> > (unsigned @var{nunroll}, class loop *@var{loop}) > >> > > > >> > This target hook returns a new value for the number of times > >> > > > >> > @var{loop} > >> > > > >> > should be unrolled. The parameter @var{nunroll} is the number > >> > > > >> > of times > >> > > > >> > […] > >> > > > >> > @@ -1446,7 +1511,10 @@ can_store_by_pieces (unsigned > >> > > > >> > HOST_WIDE_INT len, > >> > > > >> > max_size = STORE_MAX_PIECES + 1; > >> > > > >> > while (max_size > 1 && l > 0) > >> > > > >> > { > >> > > > >> > - scalar_int_mode mode = widest_int_mode_for_size > >> > > > >> > (max_size); > >> > > > >> > + /* Since this can be called before virtual registers > >> > > > >> > are ready > >> > > > >> > + to use, avoid QI vector mode here. */ > >> > > > >> > + fixed_size_mode mode > >> > > > >> > + = widest_fixed_size_mode_for_size (max_size, false); > >> > > > >> > >> > > > >> I think I might have asked this before, sorry, but: when is that > >> > > > >> true > >> > > > >> and why does it matter? > >> > > > > > >> > > > > can_store_by_pieces may be called: > >> > > > > > >> > > > > value-prof.c: if (!can_store_by_pieces (val, > >> > > > > builtin_memset_read_str, > >> > > > > value-prof.c: if (!can_store_by_pieces (val, > >> > > > > builtin_memset_read_str, > >> > > > > > >> > > > > before virtual registers can be used. When true is passed to > >> > > > > widest_fixed_size_mode_for_size, virtual registers may be used > >> > > > > to expand memset to broadcast, which leads to ICE. Since for the > >> > > > > purpose of can_store_by_pieces, we don't need to expand memset > >> > > > > to broadcast and pass false here can avoid ICE. > >> > > > > >> > > > Ah, I see, thanks. > >> > > > > >> > > > That sounds like a problem in the way that the memset const function > >> > > > is > >> > > > written though. can_store_by_pieces is just a query function, so I > >> > > > don't > >> > > > think it should be trying to create new registers for > >> > > > can_store_by_pieces, > >> > > > even if it could. At the same time, can_store_by_pieces should make > >> > > > the > >> > > > same choices as the real expander would. > >> > > > > >> > > > I think this means that: > >> > > > > >> > > > - gen_memset_broadcast should be inlined into its callers, with the > >> > > > builtin_memset_read_str getting the CONST_INT_P case and > >> > > > builtin_memset_gen_str getting the variable case. > >> > > > > >> > > > - builtin_memset_read_str should then stop at and return the > >> > > > gen_const_vec_duplicate when the prev argument is null. > >> > > >> > This doesn't work since can_store_by_pieces has > >> > > >> > cst = (*constfun) (constfundata, nullptr, offset, mode); > >> > if (!targetm.legitimate_constant_p (mode, cst)) > >> > >> We can add a target hook, targetm.legitimate_memset_constant_p, > >> which defaults to targetm.legitimate_constant_p. Will it be acceptable? > > > > In the v5 patch, I changed it to > > > > cst = (*constfun) (constfundata, nullptr, offset, mode); > > /* All CONST_VECTORs are legitimate if vec_duplicate > > is supported. */ > > Maybe “can be loaded” rather than “are legitimate”, since they're
Fixed. > not necessarily legitimate in the sense of legitimate_constant_p > (hence the patch). Also, since we assume elsewhere that > vec_duplicate is a precondition for picking a vector mode, > I think we should do the same here (and note that in the comment). Fixed. > So… > > > if (!((memsetp > > && VECTOR_MODE_P (mode) > > && GET_MODE_INNER (mode) == QImode > > && (optab_handler (vec_duplicate_optab, mode) > > != CODE_FOR_nothing)) > > I think we need only the (memsetp && VECTOR_MODE_P (mode)) check. > > This feels a bit of a hack TBH. I think the same principles apply > to vectors and integers here: forcing the constant to memory is > still likely to be an optimisation, but is an extra overhead that > we should probably account for. Fixed. > However, I agree this is probably the most practical way forward > at the moment. I sent out the v6 patch. Thanks. > Thanks, > Richard > > > || targetm.legitimate_constant_p (mode, cst))) > > return 0; > > > >> > ix86_legitimate_constant_p only allows 0 or -1 for CONST_VECTOR. > >> > can_store_by_pieces doesn't work well for vector modes. > >> > > >> > > > Only when prev is nonnull should it go on to call the hook > >> > > > and copy the constant to the register that the hook returns. > >> > > > >> > > How about keeping gen_memset_broadcast and passing PREV to it: > >> > > > >> > > rtx target; > >> > > if (CONST_INT_P (data)) > >> > > { > >> > > rtx const_vec = gen_const_vec_duplicate (mode, data); > >> > > if (prev == NULL) > >> > > /* Return CONST_VECTOR when called by a query function. */ > >> > > target = const_vec; > >> > > else > >> > > { > >> > > /* Use the move expander with CONST_VECTOR. */ > >> > > target = targetm.gen_memset_scratch_rtx (mode); > >> > > emit_move_insn (target, const_vec); > >> > > } > >> > > } > >> > > else > >> > > { > >> > > target = targetm.gen_memset_scratch_rtx (mode); > >> > > class expand_operand ops[2]; > >> > > create_output_operand (&ops[0], target, mode); > >> > > create_input_operand (&ops[1], data, QImode); > >> > > expand_insn (icode, 2, ops); > >> > > if (!rtx_equal_p (target, ops[0].value)) > >> > > emit_move_insn (target, ops[0].value); > >> > > } > >> > > > >> > > > I admit that's uglier than the current version, but it looks like > >> > > > that's > >> > > > what the current interface expects. > >> > > > > >> > > > Thanks, > >> > > > Richard > >> > > > >> > > > >> > > > >> > > -- > >> > > H.J. > >> > > >> > > >> > > >> > -- > >> > H.J. > >> > >> > >> > >> -- > >> H.J. -- H.J.