On Sun, Aug 21, 2011 at 12:01 PM, Georg-Johann Lay <a...@gjlay.de> wrote: > > Richard Sandiford schrieb: >> >> Georg-Johann Lay <a...@gjlay.de> writes: >> >>> Richard Sandiford schrieb: >>> >>>> I've been working on some patches to make insn_rtx_cost take account >>>> of the cost of SET_DESTs as well as SET_SRCs. But I'm slowly beginning >>>> to realise that I don't understand what rtx costs are supposed to >>>> represent. >>>> >>>> AIUI the rules have historically been: >>>> >>>> 1) Registers have zero cost. >>>> >>>> 2) Constants have a cost relative to that of registers. By extension, >>>> constants have zero cost if they are as cheap as a register. >>>> >>>> 3) With an outer code of SET, actual operations have the cost >>>> of the associated instruction. E.g. the cost of a PLUS >>>> is the cost of an addition instruction. >>>> >>>> 4) With other outer codes, actual operations have the cost >>>> of the combined instruction, if available, or the cost of >>>> a separate instruction otherwise. E.g. the cost of a NEG >>>> inside an AND might be zero on targets that support BIC-like >>>> instructions, and COSTS_N_INSNS (1) on most others. >>>> >>>> [...] >>>> >>>> But that hardly seems clean either. Perhaps we should instead make >>>> the SET_SRC always include the cost of the SET, even for registers, >>>> constants and the like. Thoughts? >>> >>> IMO a clean approach would be to query the costs of a whole insn (resp. >>> it's pattern) rather than the cost of an RTX. COSTS_N_INSNS already >>> indicates that the costs are compared to *insn* costs i.e. cost of the >>> whole pattern (modulo clobbers). >> >> The problem is that we sometimes want the cost of something that cannot >> be done using a single instruction. E.g. some CONST_INTs take several >> instructions to create on MIPS. In this case the costs are really >> measuring the cost of an emit_move_insn sequence, not a single insn. >> >> I suppose we could use emit_move_insn to create a temporary sequence >> and sum the cost of each individual instruction. But that's potentially >> expensive. > > No, that complexity is not needed. For (set (reg) (const_int)) the BE can > just return the cost of the expanded sequence because it knows how it will be > expanded and how much it will cost. There's no need to really expand the > sequence. > > That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded > because the hardware does not support them natively. The rtx_cost for such > an expression (which are always interpreted as RHS of a (set (reg) ...)) are > the sum over the costs of all insns the expander will produce.
One of my problems with this approach is that the logic that's put into an expander definition preparation statement (or, in the case of AVR, the function invoked by the insn output statement) gets replicated abstractly in rtx_costs: both places have long switch statements on operand mode and const shift value to determine the instructions that get emitted (in the former) or how many of them there are (in the latter). How likely is it the two are kept consistent over the years? I'm working on the (not yet pushed upstream) back-end for the TI MSP430, which has some historical relationship to AVR from about a decade ago, and the answer to that question is "not very likely". I've changed the msp430 back-end so that instead of putting all that logic in the output statement for the insn, it goes into a preparation statement for a standard expander. This way the individual insns that result in (say) a constant shift of 8 bits using xor and bswap are available for the optimizer and register allocator to improve. This works pretty well, but still leaves me with problems when it comes to computing RTX costs, because there seems to be some strength reduction optimization for multiplication that's asking for the costs to shift each integer type by 1 to 15 bits, when in fact no such insn should ever be produced if real code was being generated. I think this is an example of the case Richard's describing. If, in rtx_costs, I could detect an unexpected insn, deduce the correct expander function, call it, then recurse on the sequence it generated, I'd get the right answer---though I'd infinitely prefer not to be asked to calculate the cost of an unexpected insn. Doing this expansion would probably be very expensive, though, and with the side effects that are part of emit_insn I don't know how to safely call things that invoke it when what gets emitted isn't part of the actual stream. >> >> Also, any change along these lines is similar to the "tie costs to >> .md patterns" thing that I mentioned at the end of the message. >> I don't really have time to work on anything so invasive, so the >> question is really whether we can sensibly change the costs within >> the current framework. >> >>> E.g. the cost of a CONST_INT is meaningless if you don't know what to do >>> with the constant. (set (reg:QI) (const_int 0)) might have different cost >>> than (set (reg:SI) (const_int 0)). >> >> That's the old modeless-CONST_INT problem though. I agree the mode >> makes a difference, but I think it's a separate issue. > > It's the same bottom line: If you see the insn, you know the cost. But can you know it without duplicating all the effort that goes into expanding it? >> If we did pass complete instructions, each caller would have to provide >> a SET whose SET_DEST is an arbitrary psuedo register of the required mode. >> In all likelihood, we'd define a new utility function do this for us. >> >> So in the modeless-CONST_INT world, we would need to pass the mode >> to that utility function. But if we're prepared to pass such extra >> information around, we could simply do it by passing the mode directly >> to rtx_cost. >> >> I think the cost of a full rvalue (rather than a complete SET), >> should be based on the assumption that the destination is a register. >> So I don't the backend is missing any information in that respect. >> I.e. the problem doesn't seem to be lack of information, but rather >> lack of consistency (between registers, constants, operators, >> and so on). >> >>> Similar applies for an addition if you don't know if it's for >>> arithmetic or address offset computation inside a MEM. >> >> Yeah, that'd be bad, but I don't think we take costs at that level. >> Only a few places take the cost of anything "smaller" than a complete >> rvalue, and they either take the cost of a condition or take the cost of >> an operand to a "top-level" operator. The target then has the option of >> stopping the recursion whenever it likes. >> >> Richard > > Johann Peter