On Mon, Oct 4, 2021 at 6:29 PM Michael Matz <m...@suse.de> wrote: > > Hello, > > On Mon, 4 Oct 2021, Jeff Law wrote: > > > And just in case it got lost. Here's the analysis on 960218-1 on visium: > > > > We've got this going into ethread: > > > > int f (int x) > > { > > int D.1420; > > int a; > > > > ;; basic block 2, loop depth 0, maybe hot > > ;; prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) > > ;; pred: ENTRY (FALLTHRU,EXECUTABLE) > > a_4 = ~x_3(D); > > goto <bb 4>; [INV] > > ;; succ: 4 (FALLTHRU,EXECUTABLE) > > > > ;; basic block 3, loop depth 1, maybe hot > > ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) > > ;; pred: 4 (TRUE_VALUE,EXECUTABLE) > > gl = a_1; > > ;; succ: 4 (FALLTHRU,DFS_BACK,EXECUTABLE) > > > > ;; basic block 4, loop depth 1, maybe hot > > ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) > > ;; pred: 2 (FALLTHRU,EXECUTABLE) > > ;; 3 (FALLTHRU,DFS_BACK,EXECUTABLE) > > # a_1 = PHI <a_4(2), 0(3)> > > if (a_1 != 0) > > goto <bb 3>; [INV] > > else > > goto <bb 5>; [INV] > > ;; succ: 3 (TRUE_VALUE,EXECUTABLE) > > ;; 5 (FALSE_VALUE,EXECUTABLE) > > > > ;; basic block 5, loop depth 0, maybe hot > > ;; prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED) > > ;; pred: 4 (FALSE_VALUE,EXECUTABLE) > > return; > > ;; succ: EXIT > > > > } > > First notice that this doesn't have an empty latch block to start with > (in fact, there is no separate latch block at all), so the loop optimizers > haven't been initialized for simple latches at this point. Still, I would > say that even then we want to be careful with the latch edge, as putting > code onto it will most probably create a problem downstream once we _do_ > want to intialize the loop optimizers for simple latches. So, that we are > careful here is okay, we are just too careful in this specific case. > > > There's a pretty obvious jump thread in there. 3->4->5. > > > > We used to allow this, but it's currently being rejected and I'm not > > sure it should. > > > > In simplest terms we're threading from inside the loop, through the > > latch to a point outside the loop. At first glance, that seems safe. > > Is at least the unrestricted jump threader (the one after loop optimizers) > picking this up? > > Independend of that, I agree, that this specific instance of threading > through the latch block, even though followed by to-be-copied > instructions, is okay. We need to determine precisely when that is okay, > though. In this case there are several reasons I could see why this is > so: > (a) while the thread is through the latch block, it's _not_ through the > latch edge (which is 4->3). That would create the problem, so here > we're fine.
It seems we all agree Jeff's finding should be allowed, so let's attack this one first, since it gets almost all of his visium failures. I can submit the rest of the cases separately. The attached patch catches the IL discussed, and adds a relevant gimple FE test others can use for experimenting :). Tested on x86-64 and by visual inspection on visium-elf on the regressions Jeff pointed me at. OK? BTW Jeff, this fixes all the regressions you mention except: 1. pr68911.c The path not being threaded here is 7->10->11->12. It crosses loops multiple times, so I believe the restriction code is correct. 7, 10, 12 are in loop1. 11 is in loop 2. So we have a path going from loop1 -> loop2 -> loop1. I can't conceive any scenario where this is ok, but I can attach the entire IL if I'm missing something. 2. 961125-1.c This one is a bit trickier. Here we're trying to thread the following conditional, which I mentioned when I contributed this work, we don't handle (and happens infrequently enough not to matter): + // Loop 4 + <bb 6> [local count: 114863531]: + # ptr_8 = PHI <ptr_2(4), ptr_2(5)> + if (ptr_8 < &MEM <char[4]> [(void *)":ab" + 3B]) + goto <bb 7>; [50.00%] + else + goto <bb 17>; [50.00%] The hybrid threader doesn't handle &MEM in the final conditional. As I mentioned earlier, if this becomes an issue, we can adapt class pointer_equiv_analyzer like we did for evrp. I have a gimple FE test I will contribute as an XFAIL with an associated PR to keep us honest. That being said... in this particular test, this is all irrelevant because the path will be disallowed for two reasons: a) The path crosses loops, and the reason we didn't realize it in the old code was because the ASSERT_EXPR had pulled the SSA outside the loop, so it looks like the entire path is l in the same loop. If you look at the original IL, it's not. b) Now the path actually fits the pattern being discussed in this patch, where there's an early exit out of a loop, so it looks like we should handle it. But...in this case, we would fill a presently empty latch. Interestingly, the old code didn't catch it, because....you guessed it...there was an ASSERT_EXPR in the latch. So I argue that even in the absence of MEM_REF handling, we shouldn't thread this path.
From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez <al...@redhat.com> Date: Tue, 5 Oct 2021 15:03:34 +0200 Subject: [PATCH] Loosen loop crossing restriction in threader. Crossing loops is generally discouraged from the threader, but we can make an exception when we don't cross the latch or enter another loop, since this is just an early exit out of the loop. In fact, the whole threaded path is logically outside the loop. This has nice secondary effects. For example, objects on the threaded path will no longer necessarily be live throughout the loop, so we can get register allocation improvements. The threaded path can physically move outside the loop resulting in better icache efficiency, etc. Tested on x86-64 Linux, and on a visium-elf cross making sure that the following tests do not have an abort in the final assembly: gcc.c-torture/execute/960218-1.c gcc.c-torture/execute/visium-pending-4.c gcc.c-torture/execute/pr58209.c gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-thread-valid.c: New test. Co-authored-by: Jeff Law <jeffrea...@gmail.com> --- .../gcc.dg/tree-ssa/ssa-thread-valid.c | 39 ++++++++++++++++++ gcc/tree-ssa-threadupdate.c | 40 ++++++++++++++----- 2 files changed, 68 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c new file mode 100644 index 00000000000..7adca97cc2b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-valid.c @@ -0,0 +1,39 @@ +// { dg-do compile } +// { dg-options "-O2 -fgimple -fdump-statistics" } + +// This is a collection of threadable paths. To simplify maintenance, +// there should only be one threadable path per function. + +int global; + +// The thread from 3->4->5 crosses loops but is allowed because it +// never crosses the latch (BB3) and is just an early exit out of the +// loop. +int __GIMPLE (ssa) +foo1 (int x) +{ + int D_1420; + int a; + + __BB(2): + a_4 = ~x_3(D); + goto __BB4; + + // Latch. + __BB(3): + global = a_1; + goto __BB4; + + __BB(4,loop_header(1)): + a_1 = __PHI (__BB2: a_4, __BB3: 0); + if (a_1 != 0) + goto __BB3; + else + goto __BB5; + + __BB(5): + return; + +} + +// { dg-final { scan-tree-dump "Jumps threaded\" \"foo1\" 1" "statistics" } } diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index dcabfdb30d2..32ce1e3af40 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2766,10 +2766,17 @@ bool jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) { gcc_checking_assert (!path.is_empty ()); - edge taken_edge = path[path.length () - 1]->e; - loop_p loop = taken_edge->src->loop_father; + edge entry = path[0]->e; + edge exit = path[path.length () - 1]->e; bool seen_latch = false; - bool path_crosses_loops = false; + int loops_crossed = 0; + bool crossed_latch = false; + // Use ->dest here instead of ->src to ignore the first block. The + // first block is allowed to be in a different loop, since it'll be + // redirected. See similar comment in profitable_path_p: "we don't + // care about that block...". + loop_p loop = entry->dest->loop_father; + loop_p curr_loop = loop; for (unsigned int i = 0; i < path.length (); i++) { @@ -2784,19 +2791,30 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) } if (loop->latch == e->src || loop->latch == e->dest) - seen_latch = true; + { + seen_latch = true; + // Like seen_latch, but excludes the first block. + if (e->src != entry->src) + crossed_latch = true; + } - // The first entry represents the block with an outgoing edge - // that we will redirect to the jump threading path. Thus we - // don't care about that block's loop father. - if ((i > 0 && e->src->loop_father != loop) - || e->dest->loop_father != loop) - path_crosses_loops = true; + if (e->dest->loop_father != curr_loop) + { + curr_loop = e->dest->loop_father; + ++loops_crossed; + } if (flag_checking && !m_backedge_threads) gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); } + // If we crossed a loop into an outer loop without crossing the + // latch, this is just an early exit from the loop. + if (loops_crossed == 1 + && !crossed_latch + && flow_loop_nested_p (exit->dest->loop_father, exit->src->loop_father)) + return false; + if (cfun->curr_properties & PROP_loop_opts_done) return false; @@ -2806,7 +2824,7 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) "would create non-empty latch"); return true; } - if (path_crosses_loops) + if (loops_crossed) { cancel_thread (&path, "Path crosses loops"); return true; -- 2.31.1