On 10/5/2021 7:33 AM, Aldy Hernandez wrote:
On Mon, Oct 4, 2021 at 6:29 PM Michael Matz <m...@suse.de> wrote:
Hello,

On Mon, 4 Oct 2021, Jeff Law wrote:

And just in case it got lost.  Here's the analysis on 960218-1 on visium:

We've got this going into ethread:

int f (int x)
{
   int D.1420;
   int a;

;;   basic block 2, loop depth 0, maybe hot
;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
   a_4 = ~x_3(D);
   goto <bb 4>; [INV]
;;    succ:       4 (FALLTHRU,EXECUTABLE)

;;   basic block 3, loop depth 1, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 (TRUE_VALUE,EXECUTABLE)
   gl = a_1;
;;    succ:       4 (FALLTHRU,DFS_BACK,EXECUTABLE)

;;   basic block 4, loop depth 1, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 (FALLTHRU,EXECUTABLE)
;;                3 (FALLTHRU,DFS_BACK,EXECUTABLE)
   # a_1 = PHI <a_4(2), 0(3)>
   if (a_1 != 0)
     goto <bb 3>; [INV]
   else
     goto <bb 5>; [INV]
;;    succ:       3 (TRUE_VALUE,EXECUTABLE)
;;                5 (FALSE_VALUE,EXECUTABLE)

;;   basic block 5, loop depth 0, maybe hot
;;    prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 (FALSE_VALUE,EXECUTABLE)
   return;
;;    succ:       EXIT

}
First notice that this doesn't have an empty latch block to start with
(in fact, there is no separate latch block at all), so the loop optimizers
haven't been initialized for simple latches at this point.  Still, I would
say that even then we want to be careful with the latch edge, as putting
code onto it will most probably create a problem downstream once we _do_
want to intialize the loop optimizers for simple latches.  So, that we are
careful here is okay, we are just too careful in this specific case.

There's a pretty obvious jump thread in there.  3->4->5.

We used to allow this, but it's currently being rejected and I'm not
sure it should.

In simplest terms we're threading from inside the loop, through the
latch to a point outside the loop.  At first glance, that seems safe.
Is at least the unrestricted jump threader (the one after loop optimizers)
picking this up?

Independend of that, I agree, that this specific instance of threading
through the latch block, even though followed by to-be-copied
instructions, is okay.  We need to determine precisely when that is okay,
though.  In this case there are several reasons I could see why this is
so:
(a) while the thread is through the latch block, it's _not_ through the
     latch edge (which is 4->3).  That would create the problem, so here
     we're fine.
It seems we all agree Jeff's finding should be allowed, so let's
attack this one first, since it gets almost all of his visium
failures.  I can submit the rest of the cases separately.

The attached patch catches the IL discussed, and adds a relevant
gimple FE test others can use for experimenting :).

Tested on x86-64 and by visual inspection on visium-elf on the
regressions Jeff pointed me at.

OK?

BTW Jeff, this fixes all the regressions you mention except:

1. pr68911.c

The path not being threaded here is 7->10->11->12.  It crosses loops
multiple times, so I believe the restriction code is correct.

7, 10, 12 are in loop1.
11 is in loop 2.

So we have a path going from loop1 -> loop2 -> loop1.  I can't
conceive any scenario where this is ok, but I can attach the entire IL
if I'm missing something.

2. 961125-1.c

This one is a bit trickier.  Here we're trying to thread the following
conditional, which I mentioned when I contributed this work, we don't
handle (and happens infrequently enough not to matter):

+  // Loop 4
+  <bb 6> [local count: 114863531]:
+  # ptr_8 = PHI <ptr_2(4), ptr_2(5)>
+  if (ptr_8 < &MEM <char[4]> [(void *)":ab" + 3B])
+    goto <bb 7>; [50.00%]
+  else
+    goto <bb 17>; [50.00%]

The hybrid threader doesn't handle &MEM in the final conditional.  As
I mentioned earlier, if this becomes an issue, we can adapt class
pointer_equiv_analyzer like we did for evrp.  I have a gimple FE test
I will contribute as an XFAIL with an associated PR to keep us honest.

That being said... in this particular test, this is all irrelevant
because the path will be disallowed for two reasons:

a) The path crosses loops, and the reason we didn't realize it in the
old code was because the ASSERT_EXPR had pulled the SSA outside the
loop, so it looks like the entire path is l in the same loop.  If you
look at the original IL, it's not.

b) Now the path actually fits the pattern being discussed in this
patch, where there's an early exit out of a loop, so it looks like we
should handle it.  But...in this case, we would fill a presently empty
latch.  Interestingly, the old code didn't catch it, because....you
guessed it...there was an ASSERT_EXPR in the latch.

So I argue that even in the absence of MEM_REF handling, we shouldn't
thread this path.

0001-Loosen-loop-crossing-restriction-in-threader.patch

 From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <al...@redhat.com>
Date: Tue, 5 Oct 2021 15:03:34 +0200
Subject: [PATCH] Loosen loop crossing restriction in threader.

Crossing loops is generally discouraged from the threader, but we can
make an exception when we don't cross the latch or enter another loop,
since this is just an early exit out of the loop.

In fact, the whole threaded path is logically outside the loop.  This
has nice secondary effects.  For example, objects on the threaded path
will no longer necessarily be live throughout the loop, so we can get
register allocation improvements.  The threaded path can physically
move outside the loop resulting in better icache efficiency, etc.

Tested on x86-64 Linux, and on a visium-elf cross making sure that the
following tests do not have an abort in the final assembly:

gcc.c-torture/execute/960218-1.c
gcc.c-torture/execute/visium-pending-4.c
gcc.c-torture/execute/pr58209.c

gcc/ChangeLog:

        * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/ssa-thread-valid.c: New test.
I'll throw it in and see what we get :-)

jeff

Reply via email to