On Tue, Mar 22, 2022 at 9:20 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Tue, Mar 22, 2022 at 3:51 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Mon, Mar 14, 2022 at 8:44 AM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> > >
> > > Richard Biener <richard.guent...@gmail.com> writes:
> > > > On Wed, Mar 9, 2022 at 7:04 PM Richard Sandiford
> > > > <richard.sandif...@arm.com> wrote:
> > > >>
> > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > >> > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >> >>
> > > >> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> > > >> >> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> > > >> >> > <gcc-patches@gcc.gnu.org> wrote:
> > > >> >> > >
> > > >> >> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to 
> > > >> >> > > fold memcpy.
> > > >> >> > > The default is
> > > >> >> > >
> > > >> >> > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> > > >> >> > >
> > > >> >> > > For x86, it is MOVE_MAX to restore the old behavior before
> > > >> >> >
> > > >> >> > I know we've discussed this to death in the PR, I just want to 
> > > >> >> > repeat here
> > > >> >> > that the GIMPLE folding expects to generate a single load and a 
> > > >> >> > single
> > > >> >> > store (that is what it does on the GIMPLE level) which is why 
> > > >> >> > MOVE_MAX
> > > >> >> > was chosen originally (it's documented to what a "single 
> > > >> >> > instruction" does).
> > > >> >> > In practice MOVE_MAX does not seem to cover vector register sizes
> > > >> >> > so Richard pulled MOVE_RATIO which is really intended to cover
> > > >> >> > the case of using multiple instructions for moving memory (but 
> > > >> >> > then I
> > > >> >> > don't remember whether for the ARM case the single load/store 
> > > >> >> > GIMPLE
> > > >> >> > will be expanded to multiple load/store instructions).
> > > >> >> >
> > > >> >> > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> > > >> >> > being very specific for memcpy folding (we also fold memmove btw).
> > > >> >> >
> > > >> >> > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> > > >> >> > than MOVE_MAX here and still honor the idea of single 
> > > >> >> > instructions.
> > > >> >> > Now neither arm nor aarch64 define this and it defaults to 
> > > >> >> > MOVE_MAX,
> > > >> >> > not MOVE_MAX * MOVE_RATIO.
> > > >> >> >
> > > >> >> > So if we need a new hook then that hook should at least get the
> > > >> >> > 'speed' argument of MOVE_RATIO and it should get a better name.
> > > >> >> >
> > > >> >> > I still think that it should be possible to improve the insn 
> > > >> >> > check to
> > > >> >> > avoid use of "disabled" modes, maybe that's also a point to add
> > > >> >> > a new hook like .move_with_mode_p or so?  To quote, we do
> > > >> >>
> > > >> >> Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
> > > >> >
> > > >> > Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
> > > >> > mentions "a load or store used TO COPY MEMORY" (emphasis mine)
> > > >> > and whose x86 implementation would already be fine (doing larger 
> > > >> > moves
> > > >> > and also not doing too large moves).  But appearantly the arm folks
> > > >> > decided that that's not fit and instead (mis-?)used MOVE_MAX * 
> > > >> > MOVE_RATIO.
> > > >>
> > > >> It seems like there are old comments and old documentation that justify
> > > >> both interpretations, so there are good arguments on both sides.  But
> > > >> with this kind of thing I think we have to infer the meaning of the
> > > >> macro from the way it's currently used, rather than trusting such old
> > > >> and possibly out-of-date and contradictory information.
> > > >>
> > > >> FWIW, I agree that (if we exclude old reload, which we should!) the
> > > >> only direct uses of MOVE_MAX before the patch were not specific to
> > > >> integer registers and so MOVE_MAX should include vectors if the
> > > >> target wants vector modes to be used for general movement.
> > > >>
> > > >> Even if people disagree that that's the current meaning, I think it's
> > > >> at least a sensible meaning.  It provides information that AFAIK isn't
> > > >> available otherwise, and it avoids overlap with MAX_FIXED_MODE_SIZE.
> > > >>
> > > >> So FWIW, I think it'd be reasonable to change non-x86 targets if they
> > > >> want vector modes to be used for single-insn copies.
> > > >
> > > > Note a slight complication in the GIMPLE folding case is that we
> > > > do not end up using vector modes but we're using "fake"
> > > > integer modes like OImode which x86 has move patterns for.
> > > > If we'd use vector modes we could use existing target hooks to
> > > > eventually decide whether auto-using those is desired or not.
> > >
> > > Hmm, yeah.  Certainly we shouldn't require the target to support
> > > a scalar integer equivalent of every vector mode.
> > >
> >
> > I'd like to resolve this before GCC 12 is released.
>
> I've come to the conclusion that we should revert r12-3482-g5f6a6c91d7c592,
> it abuses MOVE_MAX * MOVE_RATIO to trick GIMPLE into thinking we can
> move a larger amount of data with a single instruction while in reality it 
> wants
> to use multiple load/store stmts - that's something the GIMPLE folding
> doesn't do.
> If arm wants to use larger moves with a single instruction it should
> adjust MOVE_MAX
> instead.  In fact the motivating testcase doesn't use larger loads -
> we are just able
> to elide the memcpy without the stack copy.

Btw, the memcpy would have been folded by the following code but
chickens out due to arm being STRICT_ALIGNMENT.

      /* Now that we chose an access type express the other side in
         terms of it if the target allows that with respect to alignment
         constraints.  */
      if (srcvar == NULL_TREE)
        {
          if (src_align >= TYPE_ALIGN (desttype))
            srcvar = fold_build2 (MEM_REF, desttype, src, off0);
          else
            {
              if (STRICT_ALIGNMENT)
                return false;

that path could be improved to handle the case where the desttype mode
is not BLKmode and thus we can check movmisalign_optab.

I'm going to test the attached.  I've verified that with this I can revert
r12-3482-g5f6a6c91d7c592 without regressing

        uint64_t bar64(const uint8_t *rData1)
        {
            uint64_t buffer;
            memcpy(&buffer, rData1, sizeof(buffer));
            return buffer;
        }

on arm.

Richard.

>
> Alternatively as Richard hints above, the folding code should try
> using integer vector
> modes (and adhere to the vector target hooks as to which modes to "auto" use).
>
> That said, iff we do not want to "regress" arm by reversion the new
> hackish target hook
> should be on the arm side - sth like MOVE_MAX_FOR_GIMPLE, defaulting
> to MOVE_MAX.
>
> If we want to use vector integer types/modes or fold to multiple (up
> to MOVE_RATIO) stmts
> that's stage1 material.
>
> Richard.
>
>
>
> > Thanks.
> >
> >
> > --
> > H.J.

Reply via email to