Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

Segher Boessenkool Thu, 17 Sep 2020 10:54:25 -0700

Hi!

On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> Also use rs6000_cost only for speed.


More directly: use something completely different for !speed, namely,
code size.

> -      if (CONST_INT_P (XEXP (x, 1))
> -       && satisfies_constraint_I (XEXP (x, 1)))
> +      if (!speed)
> +     /* A little more than one insn so that nothing is tempted to
> +        turn a shift left into a multiply.  */
> +     *total = COSTS_N_INSNS (1) + 1;

Please don't.  We have a lot of code elsewhere to handle this directly,
already.  Also, this is just wrong for size.  Five shifts is *not*
better than four muls.  If that is the only way to get good results,
than unfortunately we probably have to; but do not do this without any
proof.

>      case FMA:
> -      if (mode == SFmode)
> +      if (!speed)
> +     *total = COSTS_N_INSNS (1) + 1;

Not here, either.

>      case DIV:
>      case MOD:
>        if (FLOAT_MODE_P (mode))
>       {
> -       *total = mode == DFmode ? rs6000_cost->ddiv
> -                               : rs6000_cost->sdiv;
> +       if (!speed)
> +         *total = COSTS_N_INSNS (1) + 2;

And why + 2 even?

> -       if (GET_MODE (XEXP (x, 1)) == DImode)
> +       if (!speed)
> +         *total = COSTS_N_INSNS (1) + 2;
> +       else if (GET_MODE (XEXP (x, 1)) == DImode)
>           *total = rs6000_cost->divdi;
>         else
>           *total = rs6000_cost->divsi;

(more)

> @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>        return false;
>  
>      case AND:
> +      *total = COSTS_N_INSNS (1);
>        right = XEXP (x, 1);
>        if (CONST_INT_P (right))
>       {
> @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>              || left_code == LSHIFTRT)
>             && rs6000_is_valid_shift_mask (right, left, mode))
>           {
> -           *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> -           if (!CONST_INT_P (XEXP (left, 1)))
> -             *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> speed);
> -           *total += COSTS_N_INSNS (1);
> +           rtx reg_op = XEXP (left, 0);
> +           if (!REG_P (reg_op))
> +             *total += rtx_cost (reg_op, mode, left_code, 0, speed);
> +           reg_op = XEXP (left, 1);
> +           if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> +             *total += rtx_cost (reg_op, mode, left_code, 1, speed);
>             return true;
>           }
>       }
> -
> -      *total = COSTS_N_INSNS (1);
>        return false;

This doesn't improve anything?  It just makes it different from all
surrounding code?

> @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>        if (outer_code == TRUNCATE
>         && GET_CODE (XEXP (x, 0)) == MULT)
>       {
> -       if (mode == DImode)
> +       if (!speed)
> +         *total = COSTS_N_INSNS (1) + 1;

(more)

> +    case SET:
> +      /* The default cost of a SET is the number of general purpose
> +      regs being set multiplied by COSTS_N_INSNS (1).  That only
> +      works where the incremental cost of the operation and
> +      operands is zero, when the operation performed can be done in
> +      one instruction.  For other cases where we add COSTS_N_INSNS
> +      for some operation (see point 5 above), COSTS_N_INSNS (1)
> +      should be subtracted from the total cost.  */

What does "incremental cost" mean there?  If what increases?

> +      {
> +     rtx_code src_code = GET_CODE (SET_SRC (x));
> +     if (src_code == CONST_INT
> +         || src_code == CONST_DOUBLE
> +         || src_code == CONST_WIDE_INT)
> +       return false;
> +     int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> +                     + rtx_cost (SET_DEST (x), mode, SET, 0, speed));

This should use set_src_cost, if anything.  But that will recurse then,
currently?  Ugh.

Using rtx_cost for SET_DEST is problematic, too.

What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
that, hrm.

rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
(ab)used for that, sigh.

Many targets have something for it already, but all quite different from
this.

> +     if (set_cost >= COSTS_N_INSNS (1))
> +       *total += set_cost - COSTS_N_INSNS (1);

I don't understand this part at all, for example.  Why not just
  *total += set_cost - COSTS_N_INSNS (1);
?  If set_cost is lower than one insn's cost, don't we have a problem
already?


Generic things.  Please split this patch up when sending it again, it
does too many different things, and many of those are not obvious.

All such changes that aren't completely obvious (like the previous ones
were) should have some measurement.  We are in stage1, and we will
notice (non-trivial) degradations, but if we can expect degradations
(like for this patch), it needs benchmarking.

Since you add !speed all over the place, maybe we should just have a
separate function that does !speed?  It looks like quite a few things
will simplify.


Segher

Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

Reply via email to