On Fri, Jul 16, 2021 at 6:24 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > > On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > >> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with > >> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value. > >> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard > >> > scratch register to avoid stack realignment when expanding memset. > >> > > >> > PR middle-end/90773 > >> > * builtins.c (gen_memset_value_from_prev): New function. > >> > (gen_memset_broadcast): Likewise. > >> > (builtin_memset_read_str): Use gen_memset_value_from_prev > >> > and gen_memset_broadcast. > >> > (builtin_memset_gen_str): Likewise. > >> > * target.def (gen_memset_scratch_rtx): New hook. > >> > * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX. > >> > * doc/tm.texi: Regenerated. > >> > --- > >> > gcc/builtins.c | 123 +++++++++++++++++++++++++++++++++++++-------- > >> > gcc/doc/tm.texi | 5 ++ > >> > gcc/doc/tm.texi.in | 2 + > >> > gcc/target.def | 7 +++ > >> > 4 files changed, 116 insertions(+), 21 deletions(-) > >> > > >> > diff --git a/gcc/builtins.c b/gcc/builtins.c > >> > index 39ab139b7e1..c1758ae2efc 100644 > >> > --- a/gcc/builtins.c > >> > +++ b/gcc/builtins.c > >> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target) > >> > return NULL_RTX; > >> > } > >> > > >> > -/* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE) > >> > - bytes from constant string DATA + OFFSET and return it as target > >> > - constant. If PREV isn't nullptr, it has the RTL info from the > >> > +/* Return the RTL of a register in MODE generated from PREV in the > >> > previous iteration. */ > >> > > >> > -rtx > >> > -builtin_memset_read_str (void *data, void *prevp, > >> > - HOST_WIDE_INT offset ATTRIBUTE_UNUSED, > >> > - scalar_int_mode mode) > >> > +static rtx > >> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode) > >> > { > >> > + rtx target = nullptr; > >> > by_pieces_prev *prev = (by_pieces_prev *) prevp; > >> > if (prev != nullptr && prev->data != nullptr) > >> > { > >> > /* Use the previous data in the same mode. */ > >> > if (prev->mode == mode) > >> > return prev->data; > >> > + > >> > + rtx prev_rtx = prev->data; > >> > + machine_mode prev_mode = prev->mode; > >> > + unsigned int word_size = GET_MODE_SIZE (word_mode); > >> > + if (word_size < GET_MODE_SIZE (prev->mode) > >> > + && word_size > GET_MODE_SIZE (mode)) > >> > + { > >> > + /* First generate subreg of word mode if the previous mode is > >> > + wider than word mode and word mode is wider than MODE. */ > >> > + prev_rtx = simplify_gen_subreg (word_mode, prev_rtx, > >> > + prev_mode, 0); > >> > + prev_mode = word_mode; > >> > + } > >> > + if (prev_rtx != nullptr) > >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0); > >> > } > >> > + return target; > >> > +} > >> > + > >> > +/* Return the RTL of a register in MODE broadcasted from DATA. */ > >> > + > >> > +static rtx > >> > +gen_memset_broadcast (rtx data, scalar_int_mode mode) > >> > +{ > >> > + /* Skip if regno_reg_rtx isn't initialized. */ > >> > + if (!regno_reg_rtx) > >> > + return nullptr; > >> > + > >> > + rtx target = nullptr; > >> > + > >> > + unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode); > >> > + machine_mode vector_mode; > >> > + if (!mode_for_vector (QImode, nunits).exists (&vector_mode)) > >> > + gcc_unreachable (); > >> > >> Sorry, I realise it's a bit late to be raising this objection now, > >> but I don't think it's a good idea to use scalar integer modes as > >> a proxy for vector modes. In principle there's no reason why a > >> target has to define an integer mode for every vector mode. > > > > A target always defines the largest integer mode. > > Right. But a target shouldn't *need* to define an integer mode > for every vector mode. > > >> If we want the mode to be a vector then I think the by-pieces > >> infrastructure should be extended to support vectors directly, > >> rather than assuming that each piece can be represented as > >> a scalar_int_mode. > >> > > > > The current by-pieces infrastructure operates on scalar_int_mode. > > Only for memset, there is > > > > /* Callback routine for store_by_pieces. Return the RTL of a register > > containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned > > char value given in the RTL register data. For example, if mode is > > 4 bytes wide, return the RTL for 0x01010101*data. If PREV isn't > > nullptr, it has the RTL info from the previous iteration. */ > > > > static rtx > > builtin_memset_gen_str (void *data, void *prevp, > > HOST_WIDE_INT offset ATTRIBUTE_UNUSED, > > scalar_int_mode mode) > > > > It is a broadcast. If a target can broadcast a byte to a wider integer, > > can you suggest a way to use it in the current by-pieces infrastructure? > > I meant that we should extend the by-pieces infrastructure so that > it no longer requires scalar_int_mode, rather than work within the > current limits. > > IMO the fact that we're (legitimately) going through vec_duplicate_optab
If vec_duplicate_optab is the only blocking issue, can we add an integer broadcast optab? I want to avoid a major surgery just because we don't support integer broadcast from byte. > shows that we really are dealing with vectors here, not integers. > Are you suggesting we should add the optional vector mode support to by-pieces for memset? -- H.J.