Roger Sayle <ro...@nextmovesoftware.com> 于2023年12月29日周五 00:54写道:
>
>
>
> The current (default) behavior is that when the target doesn’t define
>
> TARGET_INSN_COST the middle-end uses the backend’s
>
> TARGET_RTX_COSTS, so multiplications are slower than additions,
>
> but about the same size when optimizing for size (with -Os or -Oz).
>
>
>
> All of this gets disabled with your proposed patch.
>
> [If you don’t check speed, you probably shouldn’t touch insn_cost].
>
>
>
> I agree that a backend can fine tune the (speed and size) costs of
>
> instructions (especially complex !single_set instructions) via
>
> attributes in the machine description, but these should be used
>
> to override/fine-tune rtx_costs, not override/replace/duplicate them.
>
>
>
> Having accurate rtx_costs also helps RTL expansion and the earlier
>
> optimizers, but insn_cost is used by combine and the later RTL
>
> optimization passes, once instructions have been recognized.
>
>

Yes. I find this problem when I try to combine sign_extend and zero_extract.
When I try to add an new define_insn for
(set (reg/v:DI 200 [ val ])
    (sign_extend:DI
(ior:SI (and:SI (subreg:SI (reg/v:DI 200 [ val ]) 0)
                (const_int 16777215 [0xffffff]))
            (ashift:SI (subreg:SI (reg:QI 205 [ MEM[(const unsigned
char *)buf_8(D) + 3B] ]) 0)
                (const_int 24 [0x18])))))

to generate an `ins` instruction.
It is refused by `combine_validate_cost`.
`combine_validate_cost` considers our RTX has cost COSTS_N_INSNS(3)
instead of COSTS_N_INSNS(1).
So we need a method to do so.

I guess for all ports, we need a framework.
`rtx_cost` should also tell me how many instructions it believes this RTX has.
It may help us to accept some more complex RTX_INSNs, and convert them
to 1 or 2 instructions.
We can combine INSNs more aggressively.

If so, we can calculate a ratio: total / insn_count.
For MUL/DIV, the ratio may be a number > COSTS_N_INSNS (1).
For our example above, the ratio will be COSTS_N_INSNS (1).
So we can decide if we should accept this new RTX.

>
> Might I also recommend that instead of insn_count*perf_ratio*4,
>
> or even the slightly better COSTS_N_INSNS (insn_count*perf_ratio),
>
> that encode the relative cost in the attribute, avoiding the multiplication
>
> (at runtime), and allowing fine tuning like “COSTS_N_INSNS(2) – 1”.
>
> Likewise, COSTS_N_BYTES is a very useful macro for a backend to
>
> define/use in rtx_costs.  Conveniently for many RISC machines,
>
> 1 instruction takes about 4 bytes, for COSTS_N_INSNS (1) is
>
> (approximately) comparable to COSTS_N_BYTES (4).
>
>
>
> I hope this helps.  Perhaps something like:
>
>
>
>
>
> static int
>
> mips_insn_cost (rtx_insn *insn, bool speed)
>
> {
>
>   int cost;
>
>   if (recog_memoized (insn) >= 0)
>
>     {
>
>       if (speed)
>
>         {
>
>           /* Use cost if provided.  */
>
>           cost = get_attr_cost (insn);
>
>           if (cost > 0)
>
>             return cost;
>
>         }
>
>       else
>
>         {
>
>           /* If optimizing for size, we want the insn size.  */
>
>           return get_attr_length (insn);
>
>         }
>
>     }
>
>
>
>   if (rtx set = single_set (insn))
>
>     cost = set_rtx_cost (set, speed);
>
>   else
>
>     cost = pattern_cost (PATTERN (insn), speed);
>
>   /* If the cost is zero, then it's likely a complex insn.  We don't
>
>      want the cost of these to be less than something we know about.  */
>
>   return cost ? cost : COSTS_N_INSNS (2);
>
> }
>
>

Reply via email to