> -----Original Message----- > From: Jeff Law [mailto:l...@redhat.com] > Sent: 08 October 2015 20:44 > To: Simon Dardis; Bernd Schmidt > Cc: gcc-patches@gcc.gnu.org > Subject: Re: FW: [PATCH] Target hook for disabling the delay slot filler. > > On 09/18/2015 05:10 AM, Simon Dardis wrote: > >> Are you trying to say that you have the option as to what kind of > >> branch to use? ie, "ordinary", presumably without a delay slot or > >> one with a delay slot? > > > >> Is the "ordinary" actually just a nullified delay slot or some form > >> of likely/not likely static hint? > > > > Specifically for MIPSR6: the ISA possesses traditional delay slot > > branches and a normal branch (no delay slots, not annulling, no hints, > > subtle static hazard), aka "compact branch" in MIPS terminology. They > > could be described as nullify on taken delay slot branch but we saw little > > to > no value in that. > > > > Matthew Fortune provided a writeup with their handling in GCC: > > > > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html > Thanks. I never looked at that message, almost certainly because it was MIPS > specific. I'm trying hard to stay out of backends that have good active > maintainers, and MIPS certainly qualifies on that point. > > > > > >> But what is the compact form at the micro-architectural level? My > >> mips-fu has diminished greatly, but my recollection is the bubble is > >> always there. Is that not the case? > > > > The pipeline bubble will exist but the performance impact varies > > across > > R6 cores. High-end OoO cores won't be impacted as much, but lower end > > cores will. microMIPSR6 removes delay slot branches altogether which > > pushes the simplest micro-architectures to optimize away the cost of a > > pipeline bubble. > [ ... snip more micro-archticture stuff ... ] Thanks. That helps a lot. I > didn't > realize the bubble was being squashed to varying degrees. And FWIW, I > wouldn't be surprised if you reach a point on the OoO cores where you'll just > want to move away from delay slots totally and rely on your compact > branches as much as possible. It may give your hardware guys a degree of > freedom that helps them in the common case (compact branches) at the > expense of slowing down code with old fashioned delay slots. > > > Compact branches do a strange restriction in that they cannot be > > followed by a CTI. This is to simplify branch predictors apparently > > but this may be lifted in future ISA releases. > Come on! :-) There's some really neat things you can do when you allow > branches in delay slots. The PA was particularly fun in that regard. > My recollection is HP had some hand written assembly code in their libraries > which exploited the out-of-line execution you could get in this case. We > never tried to exploit in GCC simply because the opportunities didn't see all > that common or profitable. > > > > > > >> If it is able to find insns from the commonly executed path that > >> don't have a long latency, then the fill is usually profitable (since > >> the pipeline bubble always exists). However, pulling a long latency > >> instruction (say anything that might cache miss or an fdiv/fsqrt) off > >> the slow path and conditionally nullifying it can be *awful*. > >> Everything else is in-between. > > > > I agree. The variability in profit/loss in a concern and I see two > > ways to deal with it: > > > > A) modify the delay slot filler so that it choses speculative > > instructions of less than some $cost and avoid instruction duplication > > when the eager filler picks an instruction from a block with multiple > > predecessors. Making such changes would be invasive and require more > target specific hooks. > The cost side here should be handled by existing mechanisms. You just > never allow anything other than simple arith, logicals & copies. > > You'd need a hook to avoid this when copying was needed. > > You'd probably also need some kind of target hook to indicate the level of > prediction where this is profitable since the cost varies across your micro- > architectures. > > And you'd also have to worry about the special code which triggers when > there's a well predicted branch, but a resource conflict. In that case reorg > can > fill the slot from the predicted path and insert compensation code on the > non-predicted path. > > > > > > > B) Use compact branches instead of speculative delay slot execution > > and forsake variable performance for a consistent pipeline bubble by > > not using the speculative delay filler altogether. > > > > Between these two choices, B seems to better option as due to sheer > simplicity. > > Choosing neither gives speculative instruction execution when there > > could be a small consistent penalty instead. > B is certainly easier. > > The main objection I had was given my outdated knowledge of the MIPS > processors it seemed like you were taking a step backwards. You've cleared > that up and if you're comfortable with the tradeoff, then I won't object to > the target hook to disable eager filling. > > Can you repost that patch? Given I was the last one to do major work on > reorg (~20 years ago mind you) it probably makes the most sense for me to > own the review. > > jeff > Thanks for getting back to me.
Patch below. Target hook renamed to TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P. Tested on mips-img-elf, no new regressions. Thanks, Simon Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c (revision 229123) +++ gcc/config/mips/mips.c (working copy) @@ -4312,6 +4312,14 @@ { return mips_address_insns (addr, mode, false); } + +/* Implement TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P. */ + +static bool +mips_no_speculation_in_delay_slots_p () +{ + return TARGET_CB_MAYBE; +} /* Information about a single instruction in a multi-instruction asm sequence. */ @@ -19823,6 +19831,9 @@ #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST mips_address_cost +#undef TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P +#define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P mips_no_speculation_in_delay_slots_p + #undef TARGET_IN_SMALL_DATA_P #define TARGET_IN_SMALL_DATA_P mips_in_small_data_p Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 229123) +++ gcc/doc/tm.texi (working copy) @@ -6444,6 +6444,16 @@ registers on machines with lots of registers. @end deftypefn +@deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void) +This predicate controls the use of the eager delay slot filler to disallow +speculatively executed instructions being placed in delay slots. Targets +such as certain MIPS architectures possess both branches with and without +delay slots. As the eager delay slot filler can decrease performance, +disabling it is beneficial when ordinary branches are available. Use of +delay slot branches filled using the basic filler is often still desirable +as the delay slot can hide a pipeline bubble. +@end deftypefn + @node Scheduling @section Adjusting the Instruction Scheduler Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 229123) +++ gcc/doc/tm.texi.in (working copy) @@ -4724,6 +4724,8 @@ @hook TARGET_ADDRESS_COST +@hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P + @node Scheduling @section Adjusting the Instruction Scheduler Index: gcc/reorg.c =================================================================== --- gcc/reorg.c (revision 229123) +++ gcc/reorg.c (working copy) @@ -3747,7 +3747,8 @@ { fill_simple_delay_slots (1); fill_simple_delay_slots (0); - fill_eager_delay_slots (); + if (!targetm.no_speculation_in_delay_slots_p ()) + fill_eager_delay_slots (); relax_delay_slots (first); } Index: gcc/target.def =================================================================== --- gcc/target.def (revision 229123) +++ gcc/target.def (working copy) @@ -3498,6 +3498,19 @@ int, (rtx address, machine_mode mode, addr_space_t as, bool speed), default_address_cost) +/* Permit speculative instructions in delay slots during delayed-branch + scheduling. */ +DEFHOOK +(no_speculation_in_delay_slots_p, + "This predicate controls the use of the eager delay slot filler to disallow\n\ +speculatively executed instructions being placed in delay slots. Targets\n\ +such as certain MIPS architectures possess both branches with and without\n\ +delay slots. As the eager delay slot filler can decrease performance,\n\ +disabling it is beneficial when ordinary branches are available. Use of\n\ +delay slot branches filled using the basic filler is often still desirable\n\ +as the delay slot can hide a pipeline bubble.", bool, (void), + hook_bool_void_false) + /* Return where to allocate pseudo for a given hard register initial value. */ DEFHOOK (allocate_initial_value, Index: gcc/testsuite/gcc.target/mips/ds-schedule-1.c =================================================================== --- gcc/testsuite/gcc.target/mips/ds-schedule-1.c (revision 0) +++ gcc/testsuite/gcc.target/mips/ds-schedule-1.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-options "isa_rev>=6 -mcompact-branches=optimal -mno-abicalls -G4" } */ +/* { dg-final { scan-assembler-not "bne\t" } } */ +/* { dg-final { scan-assembler-not "beq\t" } } */ +/* { dg-final { scan-assembler-times "\\(foo\\)" 1 } } */ + +/* Test that when compact branches are used, that a compact branch is + produced in the case where code expansion would have occurred if a + delay slot branch would have be used. 'foo' should only be + referenced once in the program text. */ + +struct list +{ + struct list *next; + int element; +}; + +struct list *gr; + +int foo; + +extern void t (int, int, int*); + +void +f (struct list **ptr) +{ + if (gr) + *ptr = gr->next; + t (1, foo, &gr->element); +} Index: gcc/testsuite/gcc.target/mips/ds-schedule-2.c =================================================================== --- gcc/testsuite/gcc.target/mips/ds-schedule-2.c (revision 0) +++ gcc/testsuite/gcc.target/mips/ds-schedule-2.c (working copy) @@ -0,0 +1,28 @@ +/* { dg-options "-mcompact-branches=never -mno-abicalls -G4" } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" "-Os" } { "" } } */ +/* { dg-final { scan-assembler "beq.*\n\tlw" } } */ +/* { dg-final { scan-assembler-times "\\(foo\\)" 2 } } */ + +/* Test that when compact branches are explicitly disabled, that a non-compact + branch is produced. 'foo' should be referenced twice in the program text as the + eager delay slot filler will duplicate the load of foo. */ + +struct list +{ + struct list *next; + int element; +}; + +struct list *gr; + +int foo; + +extern void t (int, int, int*); + +void +f (struct list **ptr) +{ + if (gr) + *ptr = gr->next; + t (1, foo, &gr->element); +}