Re: FW: [PATCH] Target hook for disabling the delay slot filler.

Jeff Law Thu, 08 Oct 2015 12:44:21 -0700

On 09/18/2015 05:10 AM, Simon Dardis wrote:

Are you trying to say that you have the option as to what kind of
branch to use?  ie, "ordinary", presumably without a delay slot or one
with a delay slot?

Is the "ordinary" actually just a nullified delay slot or some form of
likely/not likely static hint?


Specifically for MIPSR6: the ISA possesses traditional delay slot branches and
a normal branch (no delay slots, not annulling, no hints, subtle static hazard),
aka "compact branch" in MIPS terminology. They could be described as nullify
on taken delay slot branch but we saw little to no value in that.

Matthew Fortune provided a writeup with their handling in GCC:

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html

Thanks. I never looked at that message, almost certainly because it wasMIPS specific. I'm trying hard to stay out of backends that have goodactive maintainers, and MIPS certainly qualifies on that point.

But what is the compact form at the micro-architectural level?  My
mips-fu has diminished greatly, but my recollection is the bubble is
always there.   Is that not the case?


The pipeline bubble will exist but the performance impact varies across
R6 cores. High-end OoO cores won't be impacted as much, but lower
end cores will. microMIPSR6 removes delay slot branches altogether which
pushes the simplest micro-architectures to optimize away the cost of a
pipeline bubble.

[ ... snip more micro-archticture stuff ... ]

Thanks. That helps a lot. I didn't realize the bubble was beingsquashed to varying degrees. And FWIW, I wouldn't be surprised if youreach a point on the OoO cores where you'll just want to move away fromdelay slots totally and rely on your compact branches as much aspossible. It may give your hardware guys a degree of freedom that helpsthem in the common case (compact branches) at the expense of slowingdown code with old fashioned delay slots.

Compact branches do a strange restriction in that they cannot be followed by a
CTI. This is to simplify branch predictors apparently but this may be lifted in
future ISA releases.

Come on! :-) There's some really neat things you can do when you allowbranches in delay slots. The PA was particularly fun in that regard.My recollection is HP had some hand written assembly code in theirlibraries which exploited the out-of-line execution you could get inthis case. We never tried to exploit in GCC simply because theopportunities didn't see all that common or profitable.

If it is able to find insns from the commonly executed path that don't
have a long latency, then the fill is usually profitable (since the
pipeline bubble always exists).  However, pulling a long latency
instruction (say anything that might cache miss or an fdiv/fsqrt) off
the slow path and conditionally nullifying it can be *awful*.
Everything else is in-between.


I agree. The variability in profit/loss in a concern and I see two ways to deal
with it:

A) modify the delay slot filler so that it choses speculative instructions of
less than some $cost and avoid instruction duplication when the eager filler
picks an instruction from a block with multiple predecessors. Making such
changes would be invasive and require more target specific hooks.

The cost side here should be handled by existing mechanisms. You justnever allow anything other than simple arith, logicals & copies.


You'd need a hook to avoid this when copying was needed.

You'd probably also need some kind of target hook to indicate the levelof prediction where this is profitable since the cost varies across yourmicro-architectures.

And you'd also have to worry about the special code which triggers whenthere's a well predicted branch, but a resource conflict. In that casereorg can fill the slot from the predicted path and insert compensationcode on the non-predicted path.


B) Use compact branches instead of speculative delay slot execution and forsake
variable performance for a consistent pipeline bubble by not using the
speculative delay filler altogether.

Between these two choices, B seems to better option as due to sheer simplicity.
Choosing neither gives speculative instruction execution when there could be a
small consistent penalty instead.

B is certainly easier.

The main objection I had was given my outdated knowledge of the MIPSprocessors it seemed like you were taking a step backwards. You'vecleared that up and if you're comfortable with the tradeoff, then Iwon't object to the target hook to disable eager filling.

Can you repost that patch? Given I was the last one to do major work onreorg (~20 years ago mind you) it probably makes the most sense for meto own the review.


jeff

Re: FW: [PATCH] Target hook for disabling the delay slot filler.

Reply via email to