Adam Nemet <ane...@caviumnetworks.com> writes:
> Richard Sandiford writes:
>> Adam Nemet <ane...@caviumnetworks.com> writes:
>> > * Synthesizing multi-insns constants from const anchors make sense 
>> > regardless
>> > whether the constant is used as an address so I am adjusting the cost 
>> > system
>> > to make those stick independent of the context.  I still need to benchmark
>> > this.
>> 
>> Out of interest, what sort of changes did you need to make?  Do you care
>> about the cases where rtx_costs is applied to a pattern that won't later
>> be checked by recog (such as the calls from the expander)?  Or do you
>> just care about the cases where rtx_costs is used to check whether a
>> validate_change would be profitable?
>
> I am glad you asked because I myself was going to ask you some questions in
> this area ;).
>
> I think the best way to formulate the profitability question for multi-insn
> constants in CSE is this.  We found an expression using a const-anchor that is
> equivalent to the REG_EQUAL note on the insn.  We go with the const-anchor
> expression if it is cheaper than the REG_EQUAL expression.
>
> Comparing the cost of the actual source in the insn against the const-anchor
> expression would ignore the fact that the preceding instructions buiding up
> the constants become redundant with the const-anchor expression.
>
> I also have an alternative solution that feels more hackish and gives less
> control to the backend but works with the current cost model.  For a
> multi-insn constant if the current cost and the cost of the new expression are
> equal we prefer the new expression.  This works for MIPS because AFAICT we
> always replace a binary expression with another one of the same cost.

Nicely explained, thanks.  I see where you're coming from.

So it's the CSE REG_EQUAL costs that are causing problems.  We're applying
rtx_costs to something that isn't usually a valid insn pattern (the first
case above).  Hmm.

> The problem with the first approach and I feel you were anticipating this with
> your question is that multi-insn constants have a cost of zero in the MIPS
> backend currently.  Or with the code from mips_rtx_costs:
>
>     case CONST_INT:
> [...]
>         /* When not optimizing for size, we care more about the cost
>            of hot code, and hot code is often in a loop.  If a constant
>            operand needs to be forced into a register, we will often be
>            able to hoist the constant load out of the loop, so the load
>            should not contribute to the cost.  */
>         if (!optimize_size
>             || mips_immediate_operand_p (outer_code, INTVAL (x)))
>           {
>             *total = 0;
>             return true;
>           }
>
> That is really hard to beat for an immediate add, which is COSTS_N_INSNS (1).

Right. ;)

> Since you already offered :), can you please explain where the above code is
> needed and whether we can refine the picture a little bit.  It definitely
> feels like the above cost value is only true in certain contexts.  This might
> be a naive comment but maybe somehow limiting the effect to that particular
> context would be a good improvement.

OK, well, I should probably start by saying that this behaviour is
traditional.  The version in 3.0 was:

  case CONST_INT:                                                       \
    if (! TARGET_MIPS16)                                                \
      {                                                                 \
        /* Always return 0, since we don't have different sized         \
           instructions, hence different costs according to Richard     \
           Kenner */                                                    \
        return 0;                                                       \
      }                                                                 \

And like you, I found it somewhat strange at first.  (The comment didn't
exactly help.)  But when I began to experiment with costs as part of
CodeSourcery and MTI's 2007-09-11 -Os patch, it did indeed seem to be
right thing when optimising for speed rather than size.  The comment
in trunk is my attempt to explain why.  (I'm not sure now that it's
any better than the 3.0 comment, but never mind.)

If we have an instruction:

    A: (set (reg Z) (plus (reg X) (const_int 0xdeadbeef)))

we will need to use something like:

       (set (reg Y) (const_int 0xdead0000))
       (set (reg Y) (ior (reg Y) (const_int 0xbeef)))
    B: (set (reg Z) (plus (reg X) (reg Y)))

But if A is in a loop, the Y loads can be hoisted, and the cost
of A is effectively the same as the cost of B.  In other words,
the (il)legitimacy of the constant operand doesn't really matter.

(Of course, anything that relies on a hoisted value can increase register
pressure, but rtx_costs is too blunt a tool to determine when that's
actually a problem.  I remember experimenting with adding a sub-insn
cost (like 1 rather than COSTS_N_INSNS(1)), but these subinsn costs
can aggregate into a misleading whole-insn cost.)

And since hot code tends to be in a loop that the compiler knows about,
this case is usually the most important when optimising for speed.
We're only usually interested in the relative cost of insns that would
stay in a loop.

Likewise plain:

      (set (reg Z) (const_int 0xdeadbeef))

isn't really any more costly than:

      (set (reg Z) (const_int 0xbeef))

because the 0xdead0000 load can be hoisted.  (But see [1] below.)

In summary, the current costs generally work because:

  (a) We _usually_ only apply costs to arbitrary instructions
      (rather than candidate instruction patterns) before
      loop optimisation.

  (b) It doesn't matter what we return for invalid candidate
      instruction patterns, because recog will reject them anyway.

So I suppose my next question is: are you seeing this problem with cse1
or cse2?  The reasoning behind the zero cost might still be valid for
REG_EQUAL notes in cse1.  However, it's probably not right for cse2,
which runs after loop hoisting.

Perhaps we could add some kind of context parameter to rtx_costs
to choose between the hoisting and non-hoisting cost.  As well as
helping with your case, it could let us use the non-hoisting cost
before loop optimisation in cases where the insn isn't going to
go in a loop.  The drawback is that we then have to replicate
even more of the .md file in rtx_costs.

Alternatively, perhaps we could just assume that rtx_costs always
returns the hoisted cost when optimising for speed, in which case
I think your alternative solution would be theoretically correct
(i.e. not a hack ;)).

A third option is to say that rtx_costs really should be the cost of
the individual instructions needed to perform that pattern.  But I think
we'd then need to adjust many of the existing rtx_costs calls so that they
take potential hoisting into account.

E.g. suppose we're deciding how to implement an in-loop multiplication.
We calculate the cost of a multiplication instruction vs. the cost of a
shift/add sequence, but we don't consider whether any of the backend-specific
shift/add set-up instructions could be hoisted.  This would lead to us
using multiplication insns in cases where we don't want to.

(This was one of the most common situations in which the zero cost helped.)

[1] I think the need to return 0 for immediate sets is a bit unfortunate.
If we don't, we'll consider:

    (set (reg X) (reg Y))

to be cheaper than:

    (set (reg X) (const_int 1))

even though it isn't.  And we'll therefore hoist every constant argument
to an in-loop function call, which is definitely a bad idea when optimising
for size.  (At least, that's what we used to do.)  OTOH:

    (set (reg X) (const_int 1))

is really the same cost as:

    (set (reg X) (plus (reg Y) (const_int 1)))
or:
    (set (reg X) (ior (reg Y) (const_int 1)))
    
because that's how we actually implement it.  So this seems like another
situation where rtx_costs is being used for two different things: cases
where register-to-register sets should have zero cost (becuase we're
confident that we can get rid of them later) and cases where they should
have the same cost as any other simple insn.  Until that happens, giving
immediate sets a cost of 1 seems wrong.

A lot of this was probably stating the obvious or going over old ground,
so sorry in advance if so.

Richard

Reply via email to