"H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: >> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with >> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value. >> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard >> > scratch register to avoid stack realignment when expanding memset. >> > >> > PR middle-end/90773 >> > * builtins.c (gen_memset_value_from_prev): New function. >> > (gen_memset_broadcast): Likewise. >> > (builtin_memset_read_str): Use gen_memset_value_from_prev >> > and gen_memset_broadcast. >> > (builtin_memset_gen_str): Likewise. >> > * target.def (gen_memset_scratch_rtx): New hook. >> > * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX. >> > * doc/tm.texi: Regenerated. >> > --- >> > gcc/builtins.c | 123 +++++++++++++++++++++++++++++++++++++-------- >> > gcc/doc/tm.texi | 5 ++ >> > gcc/doc/tm.texi.in | 2 + >> > gcc/target.def | 7 +++ >> > 4 files changed, 116 insertions(+), 21 deletions(-) >> > >> > diff --git a/gcc/builtins.c b/gcc/builtins.c >> > index 39ab139b7e1..c1758ae2efc 100644 >> > --- a/gcc/builtins.c >> > +++ b/gcc/builtins.c >> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target) >> > return NULL_RTX; >> > } >> > >> > -/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE) >> > - bytes from constant string DATA + OFFSET and return it as target >> > - constant. If PREV isn't nullptr, it has the RTL info from the >> > +/* Return the RTL of a register in MODE generated from PREV in the >> > previous iteration. */ >> > >> > -rtx >> > -builtin_memset_read_str (void *data, void *prevp, >> > - HOST_WIDE_INT offset ATTRIBUTE_UNUSED, >> > - scalar_int_mode mode) >> > +static rtx >> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode) >> > { >> > + rtx target = nullptr; >> > by_pieces_prev *prev = (by_pieces_prev *) prevp; >> > if (prev != nullptr && prev->data != nullptr) >> > { >> > /* Use the previous data in the same mode. */ >> > if (prev->mode == mode) >> > return prev->data; >> > + >> > + rtx prev_rtx = prev->data; >> > + machine_mode prev_mode = prev->mode; >> > + unsigned int word_size = GET_MODE_SIZE (word_mode); >> > + if (word_size < GET_MODE_SIZE (prev->mode) >> > + && word_size > GET_MODE_SIZE (mode)) >> > + { >> > + /* First generate subreg of word mode if the previous mode is >> > + wider than word mode and word mode is wider than MODE. */ >> > + prev_rtx = simplify_gen_subreg (word_mode, prev_rtx, >> > + prev_mode, 0); >> > + prev_mode = word_mode; >> > + } >> > + if (prev_rtx != nullptr) >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); >> > } >> > + return target; >> > +} >> > + >> > +/* Return the RTL of a register in MODE broadcasted from DATA. */ >> > + >> > +static rtx >> > +gen_memset_broadcast (rtx data, scalar_int_mode mode) >> > +{ >> > + /* Skip if regno_reg_rtx isn't initialized. */ >> > + if (!regno_reg_rtx) >> > + return nullptr; >> > + >> > + rtx target = nullptr; >> > + >> > + unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode); >> > + machine_mode vector_mode; >> > + if (!mode_for_vector (QImode, nunits).exists (&vector_mode)) >> > + gcc_unreachable (); >> >> Sorry, I realise it's a bit late to be raising this objection now, >> but I don't think it's a good idea to use scalar integer modes as >> a proxy for vector modes. In principle there's no reason why a >> target has to define an integer mode for every vector mode. > > A target always defines the largest integer mode.
Right. But a target shouldn't *need* to define an integer mode for every vector mode. >> If we want the mode to be a vector then I think the by-pieces >> infrastructure should be extended to support vectors directly, >> rather than assuming that each piece can be represented as >> a scalar_int_mode. >> > > The current by-pieces infrastructure operates on scalar_int_mode. > Only for memset, there is > > /* Callback routine for store_by_pieces. Return the RTL of a register > containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned > char value given in the RTL register data. For example, if mode is > 4 bytes wide, return the RTL for 0x01010101*data. If PREV isn't > nullptr, it has the RTL info from the previous iteration. */ > > static rtx > builtin_memset_gen_str (void *data, void *prevp, > HOST_WIDE_INT offset ATTRIBUTE_UNUSED, > scalar_int_mode mode) > > It is a broadcast. If a target can broadcast a byte to a wider integer, > can you suggest a way to use it in the current by-pieces infrastructure? I meant that we should extend the by-pieces infrastructure so that it no longer requires scalar_int_mode, rather than work within the current limits. IMO the fact that we're (legitimately) going through vec_duplicate_optab shows that we really are dealing with vectors here, not integers. Thanks, Richard