Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

Segher Boessenkool Fri, 18 Sep 2020 11:15:48 -0700

On Fri, Sep 18, 2020 at 01:08:42PM +0930, Alan Modra wrote:
> On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> > > -      if (CONST_INT_P (XEXP (x, 1))
> > > -   && satisfies_constraint_I (XEXP (x, 1)))
> > > +      if (!speed)
> > > + /* A little more than one insn so that nothing is tempted to
> > > +    turn a shift left into a multiply.  */
> > > + *total = COSTS_N_INSNS (1) + 1;
> > 
> > Please don't.  We have a lot of code elsewhere to handle this directly,
> > already.  Also, this is just wrong for size.  Five shifts is *not*
> > better than four muls.  If that is the only way to get good results,
> > than unfortunately we probably have to; but do not do this without any
> > proof.
> 
> Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
> of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
> result for !speed rs6000_rtx_costs is the entire basis for the !speed
> changes.  I don't have any proof that this is correct.


No, just 4, like all other insns -- it is the size of the insn, after
all!  Not accidentally using the speed cost is a fine change of course,
but the + 1 isn't based on anything afaics, so it can only hurt.

> > > -   if (GET_MODE (XEXP (x, 1)) == DImode)
> > > +   if (!speed)
> > > +     *total = COSTS_N_INSNS (1) + 2;
> > > +   else if (GET_MODE (XEXP (x, 1)) == DImode)
> > >       *total = rs6000_cost->divdi;
> > >     else
> > >       *total = rs6000_cost->divsi;
> > 
> > (more)
> 
> OK, I can remove all the !speed changes.

No, those are quite okay (as a separate patch though).  But you
shouldn't add random numbers to it, one insn is just one insn.  The cost
function for -O2 is just code size, nothing more, nothing less.

> > >      case AND:
> > > +      *total = COSTS_N_INSNS (1);
> > >        right = XEXP (x, 1);
> > >        if (CONST_INT_P (right))
> > >   {
> > > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > > outer_code,
> > >          || left_code == LSHIFTRT)
> > >         && rs6000_is_valid_shift_mask (right, left, mode))
> > >       {
> > > -       *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > > -       if (!CONST_INT_P (XEXP (left, 1)))
> > > -         *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> > > speed);
> > > -       *total += COSTS_N_INSNS (1);
> > > +       rtx reg_op = XEXP (left, 0);
> > > +       if (!REG_P (reg_op))
> > > +         *total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > > +       reg_op = XEXP (left, 1);
> > > +       if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > > +         *total += rtx_cost (reg_op, mode, left_code, 1, speed);
> > >         return true;
> > >       }
> > >   }
> > > -
> > > -      *total = COSTS_N_INSNS (1);
> > >        return false;
> > 
> > This doesn't improve anything?  It just makes it different from all
> > surrounding code?
> 
> So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
> regs, like it doesn't for const_int.

I meant that *total set only.  It is fine to have one extra source line.

Does not recursing for regs help anything?  Yes, for CONST_INT that is
questionable as well, but adding more stuff like it does not help
(without actual justification).

This routine is not hot at all.  Maybe decades ago it was, and back then
the thought was that micro-optimisations like this were useful (and
maybe they were, _back then_ :-) )

> > > +    case SET:
> > > +      /* The default cost of a SET is the number of general purpose
> > > +  regs being set multiplied by COSTS_N_INSNS (1).  That only
> > > +  works where the incremental cost of the operation and
> > > +  operands is zero, when the operation performed can be done in
> > > +  one instruction.  For other cases where we add COSTS_N_INSNS
> > > +  for some operation (see point 5 above), COSTS_N_INSNS (1)
> > > +  should be subtracted from the total cost.  */
> > 
> > What does "incremental cost" mean there?  If what increases?

Can you improve that comment please?

> > > +      {
> > > + rtx_code src_code = GET_CODE (SET_SRC (x));
> > > + if (src_code == CONST_INT
> > > +     || src_code == CONST_DOUBLE
> > > +     || src_code == CONST_WIDE_INT)
> > > +   return false;
> > > + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > > +                 + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> > 
> > This should use set_src_cost, if anything.  But that will recurse then,
> > currently?  Ugh.
> > 
> > Using rtx_cost for SET_DEST is problematic, too.
> > 
> > What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> > that, hrm.
> > 
> > rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> > (ab)used for that, sigh.
> > 
> > Many targets have something for it already, but all quite different from
> > this.
> 
> Right, you are starting to understand just how difficult it is to do
> anything at all to rs6000_rtx_costs.

"Starting to understand", lol :-)  I created insn_cost back in 2017, for
this and may related problems: trying to derive the cost of an RTL
expression does not make sense at all in almost all places, what we want
to know is if the machine code is faster (or smaller for -Os).  That was
after many years of fighting this (for rs6000 and other targets; hardly
any of this is specific to our port).

Eventually we should be able to delete this completely.  That will be a
good day.

Currently mostly set_rtx_cost and set_src_cost are in the way of getting
there, and mostly because many ways those are used simply make no sense
at all.

Other than those, there is just a loooong tail of random stuff.


Improving this is fine of course: it will be quite a long time before it
isn't useful anymore, and longer until it isn't used anymore.


> > > + if (set_cost >= COSTS_N_INSNS (1))
> > > +   *total += set_cost - COSTS_N_INSNS (1);
> > 
> > I don't understand this part at all, for example.  Why not just
> >   *total += set_cost - COSTS_N_INSNS (1);
> > ?  If set_cost is lower than one insn's cost, don't we have a problem
> > already?
> 
> The set_cost I calculate here from src and dest can easily be zero.
> (set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
> dest cost of zero and a src cost of zero.  That can't change without
> breaking places where rtx_costs is called to compare pieces of RTL.
> Here though we happen to be looking at a SET, so have an entire
> instruction.  The value returned should be comparable to our
> instruction costs.  That's tricky to do, and this change is just a
> hack.  Without the hack I saw some testcases regress.
> 
> I don't like this hack any more than you do reviewing it!

So maybe handle all SETs with a zero set_cost before the general case?
Isn't that what we used to do, and what the general code does?

> > Generic things.  Please split this patch up when sending it again, it
> > does too many different things, and many of those are not obvious.
> > 
> > All such changes that aren't completely obvious (like the previous ones
> > were) should have some measurement.  We are in stage1, and we will
> > notice (non-trivial) degradations, but if we can expect degradations
> > (like for this patch), it needs benchmarking.
> 
> Pat did benchmark these changes..  I was somewhat surprised to see
> a small improvement in spec results.

Thanks (to both of you).  Interesting!  Which of these unrelated changes
does this come from?


Segher

Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

Reply via email to