Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

Aldy Hernandez Fri, 13 Jan 2017 10:49:26 -0800

[Sorry for the delay, I was sick.]

On 01/09/2017 04:30 AM, Richard Biener wrote:

On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <al...@redhat.com> wrote:

On 01/04/2017 07:11 AM, Richard Biener wrote:


On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <al...@redhat.com> wrote:


On 12/20/2016 09:16 AM, Richard Biener wrote:



On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <al...@redhat.com>
wrote:



Hi folks.

This is a follow-up on Jeff and Richi's interaction on the
aforementioned
PR
here:

        https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand,
which
actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already
have
auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
code
we
already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These
were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

        tainted:        26:52
        orig:           26:48

This was the average aggregate time of two runs compiling all 242 .ii
files.
IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?




+  while (!worklist.is_empty ())
+    {
+      name = worklist.pop ();
+      gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+      if (ssa_undefined_value_p (name, true))
+       return true;
+
+      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the
worklist",
so maybe renaming it would be a good thing as well.




I don't understand what you would prefer here.



Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.


+             if (TREE_CODE (name) == SSA_NAME)
+               {
+                 /* If an SSA has already been seen, this may be a
loop.
+                    Fail conservatively.  */
+                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
(name)))
+                   return false;

so to me "conservative" is returning true, not false.

OK


+                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+                 worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And
bitmap_set_bit
returns whether it set a bit, thus

                if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
(name)))
                  worklist.safe_push (name);

should work?




Fixed.


+      /* Check that any SSA names used to define NAME is also fully
+        defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+       {
+         name = USE_FROM_PTR (use_p);
+         if (TREE_CODE (name) == SSA_NAME)

always true.

+           {
+             /* If an SSA has already been seen, this may be a loop.
+                Fail conservatively.  */
+             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+               return false;
+             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+             worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a
bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).




Done, though bitmaps are now calculated as part of the instantiation.


Also once you hit defs that are in a post-dominated region of the loop
entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined
behavior
when passing in undefined values.  So we need an extra parameter
specifying
the post-dominance region.




Done.


You do not handle memory or calls conservatively which means the
existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also
fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
      || gimple_assign_single_p (def))
    return true;




As I asked previously, I understand the !is_gimple_assign, which will
skip
over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
==
true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't
we
want to follow up the def chain precisely on these?

The attached implementation uses a cache, and a pre-computed
post-dominance
region.  A previous incantation of this patch (sans the post-dominance
stuff) had performance within the noise of the previous implementation.

I am testing again, and will do some performance comparisons in a bit,
but
for now-- are we on the same page here?  Is this what you wanted?



+      /* DEFs in the post-dominated region of the loop entry invoke
+        undefined behavior.  Adding any use won't make things any
+        worse.  */
+      for (unsigned i = 0; i < postdom.length (); ++i)
+       if (def->bb == postdom[i])

gimple_bb (def)

+         {
+           set_defined (t);
+           return false;
+         }

I think you can't really return false here but need to continue processing
the worklist.  I'd say rather than a linear walk you can as well use
dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
block instead?

Note that the way you compute post-dominators doesn't work -- nothing
keeps them up-to-date when unswitching does CFG manipulations :/
Fortunately we have a way to compute them per region, see how
tree-if-conv.c
does that (build_region, calculate_dominance_info_for_region).



If I understand correctly, we could compute them per region in
tree_unswitch_single_loop() for each individual loop with what
tree-if-conv.c uses.

I could certainly abstract build_region/calculate_dominance_info_for_region
into something new, say calculate_dominance_info_for_loop_region().  Then we
could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you
suggest.

Question... computing the post-dom tree per region as I've just described
will only create post-dom information for the loop region, which means that
any definition outside of the loop will have zero dominance info.


Yes.

What if the definition in question post-dominates the loop entry but is
outside/after of the loop?


How would we ever arrive at such def?  Once we reach a def before the
region/loop
we know it's fine to use (the missing "stop" point I pointed out).

Hmmm, it looks like we can't even build the post-dom regionappropriately in the presence of infinite loops.


First, build_region() fails if it can't find a loop post-header:

  /* The last element is loop post-header.  */
  gcc_assert (exit_bb);

which we won't have in an infinite loop like:

bb2: //loop header
|
V
loop 1:
bb3:            <-------+
        bar();          |
|                       |
V                       |
bb4:                    |
        goto bb3;   >---+

We could just build the region without the post-header, but then we failwhile building the DFS dominance region:


dom_info::calc_dfs_tree_nonrec (basic_block bb)
....
          gcc_assert (bn != m_start_block);

Presumably because we've looped back to the start block (bb4 in a postdominator world). This doesn't happen calculating going forward becausewe always have a start block outside of the region (the function entryblock).

I tried to fake edge my way out of it, but things get really messed upwhile putting/removing fake edges in the presence of loop info. I'massuming all this is by design.

Would you prefer me to continue down this path, trying to build apost-dom region with infinite loops and fixing build_region /calc_dfs_tree_nonrec, or could we do without this dominanceoptimization? Pretty please?

 We will have no post-dom information for this.
In this case, could we just ignore and continue evaluating the SSA_NAME with
the rest of our heuristics, or did you have something else in mind?  Another
option would be to calculate the post-dom information for the entire
function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but
that seems rather expensive.

As a side note, at this point I just want to fix/close the PR in an
acceptable manner, not come up with the end-all catch-all most-efficient
patch for an unlikely scenario ;-).


Yeah, which is why I wonder whether the caching is worth the trouble (in it's
current form) for the unswitching use (given it's other restrictions
on conditions
to unswitch).

We could go back to my original, no caching version (with the othersuggestions). That solves the problem quite simply, with no regressions.


Thanks again.
Aldy

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

Reply via email to