Peter Bigot wrote: > On Sun, Aug 21, 2011 at 12:01 PM, Georg-Johann Lay <a...@gjlay.de> wrote: >> Richard Sandiford schrieb: >>> Georg-Johann Lay <a...@gjlay.de> writes: >>> >>>> Richard Sandiford schrieb: >>>> >>>>> I've been working on some patches to make insn_rtx_cost take account >>>>> of the cost of SET_DESTs as well as SET_SRCs. But I'm slowly >>>>> beginning to realise that I don't understand what rtx costs are >>>>> supposed to represent. >>>>> >>>>> AIUI the rules have historically been: >>>>> >>>>> 1) Registers have zero cost. >>>>> >>>>> 2) Constants have a cost relative to that of registers. By >>>>> extension, constants have zero cost if they are as cheap as a >>>>> register. >>>>> >>>>> 3) With an outer code of SET, actual operations have the cost of the >>>>> associated instruction. E.g. the cost of a PLUS is the cost of an >>>>> addition instruction. >>>>> >>>>> 4) With other outer codes, actual operations have the cost of the >>>>> combined instruction, if available, or the cost of a separate >>>>> instruction otherwise. E.g. the cost of a NEG inside an AND might >>>>> be zero on targets that support BIC-like instructions, and >>>>> COSTS_N_INSNS (1) on most others. >>>>> >>>>> [...] >>>>> >>>>> But that hardly seems clean either. Perhaps we should instead make >>>>> the SET_SRC always include the cost of the SET, even for registers, >>>>> constants and the like. Thoughts? >>>> IMO a clean approach would be to query the costs of a whole insn >>>> (resp. it's pattern) rather than the cost of an RTX. COSTS_N_INSNS >>>> already indicates that the costs are compared to *insn* costs i.e. >>>> cost of the whole pattern (modulo clobbers). >>> The problem is that we sometimes want the cost of something that cannot >>> be done using a single instruction. E.g. some CONST_INTs take several >>> instructions to create on MIPS. In this case the costs are really >>> measuring the cost of an emit_move_insn sequence, not a single insn. >>> >>> I suppose we could use emit_move_insn to create a temporary sequence and >>> sum the cost of each individual instruction. But that's potentially >>> expensive. >> No, that complexity is not needed. For (set (reg) (const_int)) the BE can >> just return the cost of the expanded sequence because it knows how it will >> be expanded and how much it will cost. There's no need to really expand >> the sequence. >> >> That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded >> because the hardware does not support them natively. The rtx_cost for >> such an expression (which are always interpreted as RHS of a (set (reg) >> ...)) are the sum over the costs of all insns the expander will produce. > > One of my problems with this approach is that the logic that's put into an > expander definition preparation statement (or, in the case of AVR, the > function invoked by the insn output statement) gets replicated abstractly in > rtx_costs: both places have long switch statements on operand mode and const > shift value to determine the instructions that get emitted (in the former) > or how many of them there are (in the latter). How likely is it the two are > kept consistent over the years?
Yes, it's hard and uncomfortable to have the information duplicated in several places. But recursing into the RTXes won't solve that problem because the costs cannot be separated into a sum in general. In the case where an expression is lowered down from tree to rtl and the expander has to weight several approaches against each other (like to shift or to multiply), it's not even possible to get the costs from insn attribute because there are no such insn. Getting costs from attributes -- which avoids keeping information in several places -- works fine from expand to reload. But for costs of insns sequences that are to be expanded I don't know how to avoid that duplication. > I'm working on the (not yet pushed upstream) back-end for the TI MSP430, > which has some historical relationship to AVR from about a decade ago, and > the answer to that question is "not very likely". I've changed the msp430 > back-end so that instead of putting all that logic in the output statement > for the insn, it goes into a preparation statement for a standard expander. > This way the individual insns that result in (say) a constant shift of 8 > bits using xor and bswap are available for the optimizer and register > allocator to improve. Don't know if that works fine with AVR. For shifts it's a bit similar to MSP because it can just shift by 1. However, I observed in many situations that splitting too early leads to bad code, mainly when there are SUBREGs all over the place which register allocation does handle optimally, i.e. you will see moves all over the place. Some patterns are split late because that depends on register allocation or rtl peephole (clobber available). > This works pretty well, but still leaves me with problems when it comes to > computing RTX costs, because there seems to be some strength reduction > optimization for multiplication that's asking for the costs to shift each > integer type by 1 to 15 bits, when in fact no such insn should ever be > produced if real code was being generated. I think this is an example of > the case Richard's describing. > > If, in rtx_costs, I could detect an unexpected insn, deduce the correct > expander function, call it, then recurse on the sequence it generated, I'd > get the right answer---though I'd infinitely prefer not to be asked to > calculate the cost of an unexpected insn. Doing this expansion would > probably be very expensive, though, and with the side effects that are part > of emit_insn I don't know how to safely call things that invoke it when what > gets emitted isn't part of the actual stream. Agree, but if there is no insn/expander for a, say, specific arithmetic it might still be worth to get it's cost to determine if expanding to some insn sequence is better than library call. >>> Also, any change along these lines is similar to the "tie costs to .md >>> patterns" thing that I mentioned at the end of the message. I don't >>> really have time to work on anything so invasive, so the question is >>> really whether we can sensibly change the costs within the current >>> framework. What I did in a port to get the costs for insns is all inside rtx_costs and does not rely on any extension in the compiler. FYI, rtx_costs works as follows: 1) Handle basic arithmetic like unary and binary operations and BLKmode stuff. 2) Query insn attributes: /* Some rtx singletons frequently used in rtx-cost computation */ static GTY(()) rtx fake_insn = NULL_RTX; static GTY(()) rtx fake_jump_insn = NULL_RTX; static GTY(()) rtx fake_pattern = NULL_RTX; static GTY(()) rtx fake_jump_pattern = NULL_RTX; ... /* Section 2: Query the machine description for rtx costs. We find it too tedious to write all or most of the patterns down twice: once as fancy rtl and a second time as even more braindead and hard-to-maintain XEXP-orgy. We build a fake insn and look for insn attribute "ticks" resp. "space". */ if (SET == outer_code && mode != BLKmode) { int num_clobbers; rtx pattern, dest; rtx insn; int icode = -1; /* Some passes like if-convert-after-reload call for rtx costs after reload_completed. We have no idea how the set-dest looks like, GCC developers once more make a mistery around information which is actually present. We return 'unknown', i.e. 0 in that case. */ if (reload_completed) { *total = 0; return true; } /* Set up an insn to be recognized */ if (NULL_RTX == fake_insn) { /* Doh! We've got no wrapping insn yet. Cook one up. */ fake_insn = make_insn_raw (NULL_RTX); fake_jump_insn = make_jump_insn_raw (NULL_RTX); fake_pattern = gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, 99999), gen_rtx_REG (SImode, 99999)); fake_jump_pattern = gen_rtx_SET (VOIDmode, pc_rtx, gen_rtx_REG (SImode, 99999)); } if (VOIDmode == mode) { /* This is for a conditional jump */ dest = pc_rtx; insn = fake_jump_insn; } else { /* This is for ordinary insn */ dest = gen_rtx_REG (mode, 99999); insn = fake_insn; } pattern = gen_rtx_SET (VOIDmode, dest, x); PATTERN (insn) = pattern; /* Open gate for COST_INSNs */ have_cost_insns = 1; if (icode = recog (pattern, insn, &num_clobbers), icode != -1) { int cost = (speed) ? get_attr_ticks (insn) : get_attr_space (insn); *total = cost; } /* Close gate for COST_INSNs */ have_cost_insns = 0; /* ??? We set the pattern to a valid rtx construct because we observed ggc aborting for complex programs due to invalid set_dest in the pattern which originated in x. Up to now, the following fix works... */ PATTERN (fake_insn) = fake_pattern; PATTERN (fake_jump_insn) = fake_jump_pattern; The target has quite a number of combine patterns and they all are handled by this few lines of code :-) Playing around, I added some "costs insns". These insns' condition is have_cost_insn. That way I can write arbitrary RTXes and attach costs to them which is better to read than bulk of XEXP/GET_CoDE/GET_MODE etc. >>>> E.g. the cost of a CONST_INT is meaningless if you don't know what to >>>> do with the constant. (set (reg:QI) (const_int 0)) might have >>>> different cost than (set (reg:SI) (const_int 0)). >>> That's the old modeless-CONST_INT problem though. I agree the mode >>> makes a difference, but I think it's a separate issue. >> It's the same bottom line: If you see the insn, you know the cost. > > But can you know it without duplicating all the effort that goes into > expanding it? For many cases the costs are not really known and you have to provide a good estimated value; in particular if costs depend on register allocation. I have that case on the target: multiply-add and loading constants highly depend ond register allocation. Even knowing the mode for CONST_INTs would not help because they all have word_mode. >>> If we did pass complete instructions, each caller would have to provide >>> a SET whose SET_DEST is an arbitrary psuedo register of the required >>> mode. In all likelihood, we'd define a new utility function do this for >>> us. >>> >>> So in the modeless-CONST_INT world, we would need to pass the mode to >>> that utility function. But if we're prepared to pass such extra >>> information around, we could simply do it by passing the mode directly >>> to rtx_cost. >>> >>> I think the cost of a full rvalue (rather than a complete SET), should >>> be based on the assumption that the destination is a register. So I >>> don't the backend is missing any information in that respect. I.e. the >>> problem doesn't seem to be lack of information, but rather lack of >>> consistency (between registers, constants, operators, and so on). >>> >>>> Similar applies for an addition if you don't know if it's for >>>> arithmetic or address offset computation inside a MEM. >>> Yeah, that'd be bad, but I don't think we take costs at that level. Only >>> a few places take the cost of anything "smaller" than a complete rvalue, >>> and they either take the cost of a condition or take the cost of an >>> operand to a "top-level" operator. The target then has the option of >>> stopping the recursion whenever it likes. >>> >>> Richard >> Johann > > Peter Johann