RE: FW: [PATCH] Target hook for disabling the delay slot filler.

Simon Dardis Fri, 23 Oct 2015 07:58:22 -0700


> -----Original Message-----
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: 08 October 2015 20:44
> To: Simon Dardis; Bernd Schmidt
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: FW: [PATCH] Target hook for disabling the delay slot filler.
> 
> On 09/18/2015 05:10 AM, Simon Dardis wrote:
> >> Are you trying to say that you have the option as to what kind of
> >> branch to use?  ie, "ordinary", presumably without a delay slot or
> >> one with a delay slot?
> >
> >> Is the "ordinary" actually just a nullified delay slot or some form
> >> of likely/not likely static hint?
> >
> > Specifically for MIPSR6: the ISA possesses traditional delay slot
> > branches and a normal branch (no delay slots, not annulling, no hints,
> > subtle static hazard), aka "compact branch" in MIPS terminology. They
> > could be described as nullify on taken delay slot branch but we saw little 
> > to
> no value in that.
> >
> > Matthew Fortune provided a writeup with their handling in GCC:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html
> Thanks. I never looked at that message, almost certainly because it was MIPS
> specific.  I'm trying hard to stay out of backends that have good active
> maintainers, and MIPS certainly qualifies on that point.
> 
> 
> >
> >> But what is the compact form at the micro-architectural level?  My
> >> mips-fu has diminished greatly, but my recollection is the bubble is
> >> always there.   Is that not the case?
> >
> > The pipeline bubble will exist but the performance impact varies
> > across
> > R6 cores. High-end OoO cores won't be impacted as much, but lower end
> > cores will. microMIPSR6 removes delay slot branches altogether which
> > pushes the simplest micro-architectures to optimize away the cost of a
> > pipeline bubble.
> [ ... snip more micro-archticture stuff ... ] Thanks.  That helps a lot.  I 
> didn't
> realize the bubble was being squashed to varying degrees.  And FWIW, I
> wouldn't be surprised if you reach a point on the OoO cores where you'll just
> want to move away from delay slots totally and rely on your compact
> branches as much as possible.  It may give your hardware guys a degree of
> freedom that helps them in the common case (compact branches) at the
> expense of slowing down code with old fashioned delay slots.
> 
> > Compact branches do a strange restriction in that they cannot be
> > followed by a CTI. This is to simplify branch predictors apparently
> > but this may be lifted in future ISA releases.
> Come on! :-)  There's some really neat things you can do when you allow
> branches in delay slots.  The PA was particularly fun in that regard.
> My recollection is HP had some hand written assembly code in their libraries
> which exploited the out-of-line execution you could get in this case.  We
> never tried to exploit in GCC simply because the opportunities didn't see all
> that common or profitable.
> 
> 
> 
> >
> >> If it is able to find insns from the commonly executed path that
> >> don't have a long latency, then the fill is usually profitable (since
> >> the pipeline bubble always exists).  However, pulling a long latency
> >> instruction (say anything that might cache miss or an fdiv/fsqrt) off
> >> the slow path and conditionally nullifying it can be *awful*.
> >> Everything else is in-between.
> >
> > I agree. The variability in profit/loss in a concern and I see two
> > ways to deal with it:
> >
> > A) modify the delay slot filler so that it choses speculative
> > instructions of less than some $cost and avoid instruction duplication
> > when the eager filler picks an instruction from a block with multiple
> > predecessors. Making such changes would be invasive and require more
> target specific hooks.
> The cost side here should be handled by existing mechanisms.  You just
> never allow anything other than simple arith, logicals & copies.
> 
> You'd need a hook to avoid this when copying was needed.
> 
> You'd probably also need some kind of target hook to indicate the level of
> prediction where this is profitable since the cost varies across your micro-
> architectures.
> 
> And you'd also have to worry about the special code which triggers when
> there's a well predicted branch, but a resource conflict.  In that case reorg 
> can
> fill the slot from the predicted path and insert compensation code on the
> non-predicted path.
> 
> 
> 
> >
> > B) Use compact branches instead of speculative delay slot execution
> > and forsake variable performance for a consistent pipeline bubble by
> > not using the speculative delay filler altogether.
> >
> > Between these two choices, B seems to better option as due to sheer
> simplicity.
> > Choosing neither gives speculative instruction execution when there
> > could be a small consistent penalty instead.
> B is certainly easier.
> 
> The main objection I had was given my outdated knowledge of the MIPS
> processors it seemed like you were taking a step backwards.  You've cleared
> that up and if you're comfortable with the tradeoff, then I won't object to
> the target hook to disable eager filling.
> 
> Can you repost that patch?  Given I was the last one to do major work on
> reorg (~20 years ago mind you) it probably makes the most sense for me to
> own the review.
> 
> jeff
>
Thanks for getting back to me.


Patch below. Target hook renamed to TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P.

Tested on mips-img-elf, no new regressions.

Thanks,
Simon

Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c      (revision 229123)
+++ gcc/config/mips/mips.c      (working copy)
@@ -4312,6 +4312,14 @@
 {
   return mips_address_insns (addr, mode, false);
 }
+
+/* Implement TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P.  */
+
+static bool
+mips_no_speculation_in_delay_slots_p ()
+{
+  return TARGET_CB_MAYBE;
+}
 

 /* Information about a single instruction in a multi-instruction
    asm sequence.  */
@@ -19823,6 +19831,9 @@
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST mips_address_cost
 
+#undef TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
+#define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P 
mips_no_speculation_in_delay_slots_p
+
 #undef TARGET_IN_SMALL_DATA_P
 #define TARGET_IN_SMALL_DATA_P mips_in_small_data_p
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi     (revision 229123)
+++ gcc/doc/tm.texi     (working copy)
@@ -6444,6 +6444,16 @@
 registers on machines with lots of registers.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
+This predicate controls the use of the eager delay slot filler to disallow
+speculatively executed instructions being placed in delay slots.  Targets
+such as certain MIPS architectures possess both branches with and without
+delay slots.  As the eager delay slot filler can decrease performance,
+disabling it is beneficial when ordinary branches are available.  Use of
+delay slot branches filled using the basic filler is often still desirable
+as the delay slot can hide a pipeline bubble.
+@end deftypefn
+
 @node Scheduling
 @section Adjusting the Instruction Scheduler
 
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in  (revision 229123)
+++ gcc/doc/tm.texi.in  (working copy)
@@ -4724,6 +4724,8 @@
 
 @hook TARGET_ADDRESS_COST
 
+@hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
+
 @node Scheduling
 @section Adjusting the Instruction Scheduler
 
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c (revision 229123)
+++ gcc/reorg.c (working copy)
@@ -3747,7 +3747,8 @@
     {
       fill_simple_delay_slots (1);
       fill_simple_delay_slots (0);
-      fill_eager_delay_slots ();
+      if (!targetm.no_speculation_in_delay_slots_p ())
+       fill_eager_delay_slots ();
       relax_delay_slots (first);
     }
 
Index: gcc/target.def
===================================================================
--- gcc/target.def      (revision 229123)
+++ gcc/target.def      (working copy)
@@ -3498,6 +3498,19 @@
  int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
  default_address_cost)
 
+/* Permit speculative instructions in delay slots during delayed-branch 
+   scheduling.  */
+DEFHOOK
+(no_speculation_in_delay_slots_p,
+ "This predicate controls the use of the eager delay slot filler to disallow\n\
+speculatively executed instructions being placed in delay slots.  Targets\n\
+such as certain MIPS architectures possess both branches with and without\n\
+delay slots.  As the eager delay slot filler can decrease performance,\n\
+disabling it is beneficial when ordinary branches are available.  Use of\n\
+delay slot branches filled using the basic filler is often still desirable\n\
+as the delay slot can hide a pipeline bubble.", bool, (void),
+  hook_bool_void_false)
+
 /* Return where to allocate pseudo for a given hard register initial value.  */
 DEFHOOK
 (allocate_initial_value,
Index: gcc/testsuite/gcc.target/mips/ds-schedule-1.c
===================================================================
--- gcc/testsuite/gcc.target/mips/ds-schedule-1.c       (revision 0)
+++ gcc/testsuite/gcc.target/mips/ds-schedule-1.c       (working copy)
@@ -0,0 +1,29 @@
+/* { dg-options "isa_rev>=6 -mcompact-branches=optimal -mno-abicalls -G4" } */
+/* { dg-final { scan-assembler-not "bne\t" } } */
+/* { dg-final { scan-assembler-not "beq\t" } } */
+/* { dg-final { scan-assembler-times "\\(foo\\)" 1 } } */
+
+/* Test that when compact branches are used, that a compact branch is
+   produced in the case where code expansion would have occurred if a
+   delay slot branch would have be used.  'foo' should only be
+   referenced once in the program text.  */
+
+struct list
+{
+  struct list *next;
+  int element;
+};
+
+struct list *gr;
+
+int foo;
+
+extern void t (int, int, int*);
+
+void
+f (struct list **ptr)
+{
+  if (gr)
+    *ptr = gr->next;
+  t (1, foo, &gr->element);
+}
Index: gcc/testsuite/gcc.target/mips/ds-schedule-2.c
===================================================================
--- gcc/testsuite/gcc.target/mips/ds-schedule-2.c       (revision 0)
+++ gcc/testsuite/gcc.target/mips/ds-schedule-2.c       (working copy)
@@ -0,0 +1,28 @@
+/* { dg-options "-mcompact-branches=never -mno-abicalls -G4" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" "-Os" } { "" } } */
+/* { dg-final { scan-assembler "beq.*\n\tlw" } } */
+/* { dg-final { scan-assembler-times "\\(foo\\)" 2 } } */
+
+/* Test that when compact branches are explicitly disabled, that a non-compact
+   branch is produced. 'foo' should be referenced twice in the program text as 
the
+   eager delay slot filler will duplicate the load of foo. */
+
+struct list
+{
+  struct list *next;
+  int element;
+};
+
+struct list *gr;
+
+int foo;
+
+extern void t (int, int, int*);
+
+void
+f (struct list **ptr)
+{
+  if (gr)
+    *ptr = gr->next;
+  t (1, foo, &gr->element);
+}

RE: FW: [PATCH] Target hook for disabling the delay slot filler.

Reply via email to