Roger Sayle <ro...@nextmovesoftware.com> 于2023年12月29日周五 00:54写道: > > > > The current (default) behavior is that when the target doesn’t define > > TARGET_INSN_COST the middle-end uses the backend’s > > TARGET_RTX_COSTS, so multiplications are slower than additions, > > but about the same size when optimizing for size (with -Os or -Oz). > > > > All of this gets disabled with your proposed patch. > > [If you don’t check speed, you probably shouldn’t touch insn_cost]. > > > > I agree that a backend can fine tune the (speed and size) costs of > > instructions (especially complex !single_set instructions) via > > attributes in the machine description, but these should be used > > to override/fine-tune rtx_costs, not override/replace/duplicate them. > > > > Having accurate rtx_costs also helps RTL expansion and the earlier > > optimizers, but insn_cost is used by combine and the later RTL > > optimization passes, once instructions have been recognized. > >
Yes. I find this problem when I try to combine sign_extend and zero_extract. When I try to add an new define_insn for (set (reg/v:DI 200 [ val ]) (sign_extend:DI (ior:SI (and:SI (subreg:SI (reg/v:DI 200 [ val ]) 0) (const_int 16777215 [0xffffff])) (ashift:SI (subreg:SI (reg:QI 205 [ MEM[(const unsigned char *)buf_8(D) + 3B] ]) 0) (const_int 24 [0x18]))))) to generate an `ins` instruction. It is refused by `combine_validate_cost`. `combine_validate_cost` considers our RTX has cost COSTS_N_INSNS(3) instead of COSTS_N_INSNS(1). So we need a method to do so. I guess for all ports, we need a framework. `rtx_cost` should also tell me how many instructions it believes this RTX has. It may help us to accept some more complex RTX_INSNs, and convert them to 1 or 2 instructions. We can combine INSNs more aggressively. If so, we can calculate a ratio: total / insn_count. For MUL/DIV, the ratio may be a number > COSTS_N_INSNS (1). For our example above, the ratio will be COSTS_N_INSNS (1). So we can decide if we should accept this new RTX. > > Might I also recommend that instead of insn_count*perf_ratio*4, > > or even the slightly better COSTS_N_INSNS (insn_count*perf_ratio), > > that encode the relative cost in the attribute, avoiding the multiplication > > (at runtime), and allowing fine tuning like “COSTS_N_INSNS(2) – 1”. > > Likewise, COSTS_N_BYTES is a very useful macro for a backend to > > define/use in rtx_costs. Conveniently for many RISC machines, > > 1 instruction takes about 4 bytes, for COSTS_N_INSNS (1) is > > (approximately) comparable to COSTS_N_BYTES (4). > > > > I hope this helps. Perhaps something like: > > > > > > static int > > mips_insn_cost (rtx_insn *insn, bool speed) > > { > > int cost; > > if (recog_memoized (insn) >= 0) > > { > > if (speed) > > { > > /* Use cost if provided. */ > > cost = get_attr_cost (insn); > > if (cost > 0) > > return cost; > > } > > else > > { > > /* If optimizing for size, we want the insn size. */ > > return get_attr_length (insn); > > } > > } > > > > if (rtx set = single_set (insn)) > > cost = set_rtx_cost (set, speed); > > else > > cost = pattern_cost (PATTERN (insn), speed); > > /* If the cost is zero, then it's likely a complex insn. We don't > > want the cost of these to be less than something we know about. */ > > return cost ? cost : COSTS_N_INSNS (2); > > } > >