On 09/17/2015 03:52 AM, Simon Dardis wrote:
The profitability of using an ordinary branch over a delay slot branch
depends on how the delay slot is filled. If a delay slot can be filled from
an instruction preceding the branch or instructions proceeding that must be
executed on both sides then it is profitable to use a delay slot branch.
Agreed. It's an over-simplification, but for the purposes of this
discussion it's close enough.
For cases when instructions are chosen from one side of the branch,
the proposed optimization strategy is to not speculatively execute
instructions when ordinary branches could be used. Performance-wise
this avoids executing instructions which the eager delay filler picked
wrongly.
Are you trying to say that you have the option as to what kind of branch
to use? ie, "ordinary", presumably without a delay slot or one with a
delay slot?
Is the "ordinary" actually just a nullified delay slot or some form of
likely/not likely static hint?
Since most branches have a compact form disabling the eager delay filler
should be no worse than altering it not to fill delay slots in this case.
But what is the compact form at the micro-architectural level? My
mips-fu has diminished greatly, but my recollection is the bubble is
always there. Is that not the case?
fill_eager_delay_slots is most definitely speculative and its
profitability is largely dependent on the cost of what insns it finds to
fill those delay slots and whether they're from the common or uncommon path.
If it is able to find insns from the commonly executed path that don't
have a long latency, then the fill is usually profitable (since the
pipeline bubble always exists). However, pulling a long latency
instruction (say anything that might cache miss or an fdiv/fsqrt) off
the slow path and conditionally nullifying it can be *awful*.
Everything else is in-between.
Jeff