Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

Jeff Law Fri, 15 Sep 2017 15:20:06 -0700

On 09/14/2017 06:17 AM, Martin Liška wrote:
> Hello.
> 
> As mentioned at Cauldron 2017, second step in switch lowering should be 
> massive
> simplification in code that does expansion of balanced tree. Basically it 
> includes
> VRP and DCE, which we can for obvious reason do by our own.
> 
> The patch does that, and introduces a separate pass for -O0 that's responsible
> for lowering at the end of tree pass.
> 
> There's a small fallback that I would like to discuss:
> 
> 1) vrp105.c - there's no longer catches threading opportunity in between 
> default cases:
> adding Patrick who can probably help why is the opportunity skipped with 
> expanded tree
Well, VRP knows how to analyze a switch statement and determine when
paths out of one switch imply a particular case will be taken in a later
switch.  In this particular case we're trying to verify that the default
case in the first switch threads directly to the default case in the
second (though it seems we ought to be able to totally thread the cases).


Obviously we don't have switches anymore after your lowering pass.  But
ISTM we still ought to be able to evaluate how your lowering affects
jump threading on this case.  In theory the lowered switch statements
should be easier to thread.

Reality is sadly different.  There's two cases to consider.  One when i
< 0 (should go to first default, then directly to second default).  The
other when i > 1 which should also go to the first default, then
directly to the second default).

When i < 0 we do in fact thread through the second switch and go from
the first default case directly to the second default case.

When i > 1 we still go through the test for the second switch.

These are the key blocks:

;;   basic block 2, loop depth 0, freq 10000, maybe hot
;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;    pred:       ENTRY [always]  (FALLTHRU,EXECUTABLE)
  i.0_1 = i;
  if (i.0_1 > 0)
    goto <bb 4>; [50.01%] [count: INV]
  else
    goto <bb 3>; [49.99%] [count: INV]
;;    succ:       3 [50.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;                4 [50.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, freq 10000, maybe hot
;;   Invalid sum of incoming frequencies 4999, should be 10000
;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 [50.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
  if (i.0_1 == 0)
    goto <bb 5>; [40.00%] [count: INV]
  else
    goto <bb 14>; [60.00%] [count: INV]
;;    succ:       14 [60.0% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;                5 [40.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, freq 10000, maybe hot
;;   Invalid sum of incoming frequencies 5001, should be 10000
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 [50.0% (guessed)]  (TRUE_VALUE,EXECUTABLE)
  _13 = i.0_1 != 1;
  if (_13 != 0)
    goto <bb 13>; [16.68%] [count: INV]
  else
    goto <bb 6>; [83.32%] [count: INV]
;;    succ:       6 [83.3% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;                13 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)

[ ... ]

;;   basic block 9, loop depth 0, freq 9999, maybe hot
;;   Invalid sum of incoming frequencies 3999, should be 9999
;;    prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
;;    pred:       13 [always]  (FALLTHRU,EXECUTABLE)
;;                7 [always (guessed)]  (TRUE_VALUE,EXECUTABLE)
  # prephitmp_19 = PHI <prephitmp_15(13), prephitmp_12(7)>
  _2 = prephitmp_19 != 1;
  if (_2 != 0)
    goto <bb 12>; [16.68%] [count: INV]
  else
    goto <bb 11>; [83.32%] [count: INV]
;;    succ:       11 [83.3% (guessed)]  (FALSE_VALUE,EXECUTABLE)
;;                12 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)

[ ... ]
;;   basic block 13, loop depth 0, freq 1668, maybe hot
;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
;;    pred:       4 [16.7% (guessed)]  (TRUE_VALUE,EXECUTABLE)
  # prephitmp_15 = PHI <i.0_1(4)>
  goto <bb 9>; [100.00%] [count: INV]
;;    succ:       9 [always]  (FALLTHRU,EXECUTABLE)



WHen bb9 is reached by bb13 we know by back substitution that the assignemnt

 _2 = prehitmp_19 != 1
 _2 = prehitmp_15 != 1
 _2 = i.0_1 != 1

And we should know enough about the range of i.0_1 to determine that is
true which would allow us to thread the jump.  But a few things get in
the way.

First, the VRP thread doesn't try hard at all to simplify gimple
assignments.  It really just tries to simplify gimple_cond and
gimple_switch.  This could be trivially improved.

So if I do the right things by hand we end up trying to evaluate i.0_1
!= 1.  So that's good.  But we don't get a useful result back from
vrp_evaluate_conditional.  WHy?  Because the recorded range for i.0_1 is
[1,INF] -- that's the global range for i.0_1 not the range on the path.

Now it turns out this is precisely the problem that I've got code to
address in my queue which fixes a regression I was working on earlier
this year in the gcc-7 cycle.  It's backed up behind some improvements
to tree-ssa-uninit.c that Richi and I still need to hash through.

This would likely also be fixed by Andrew's work on ranges.  It's
precisely the kind of query its designed to answer.

In summary, your switch lowering does regress the code we get for this
test.  But the regression is more a failing of the jump threader than
anything.

> 
> 2) uninit-18.c where we currently report:
> 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/uninit-18.c:13:12: warning: 
> ‘tmp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>      tmp[5] = 7;    /* { dg-bogus "may be used uninitialized" } */
> 
> Am I right that the pass uses VRP?
No, it doesn't use VRP.   If you look at the history of uninit-18 it shows:
commit cac06b024306772e2be2d357064f7ca2aa82bde8
Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Jan 16 13:26:10 2015 +0000

    2015-01-16  Richard Biener  <rguent...@suse.de>

            PR middle-end/64614
            * tree-ssa-uninit.c: Include tree-cfg.h.
            (MAX_SWITCH_CASES): New define.
            (convert_control_dep_chain_into_preds): Handle switch
statements.
            (is_pred_expr_subset_of): Handle x == CST vs. (x & CST) != 0.
            (normalize_one_pred_1): Do not split bit-manipulations.
            Record (x & CST).

            * gcc.dg/uninit-18.c: New testcase.

Essentially tree-ssa-uninit.c has some specialized code to handle switch
statements in predicates with BIT_AND_EXPR and uninit-18.c tests that
capability.

Your new switch lowering breaks that relatively specialized code.

You could further enhance uninit to detect this.   From the .uninit dump:

MEM[(char *)tmp_2 + 5B] = 7;
is guarded by :

bar_5(D) > 0 (.AND.)  (.NOT.) bar_5(D) > 1


[CHECK]: Found unguarded use: MEM[(char *)tmp_2 + 5B] = 7;


So that check is really bar_5 == 1 and in theory if bar_5 == 1, then we
would have assigned a value to tmp.   VRP is capable of making that
determination and I think when VRP does that, things simplify enough
that the general predicate code in tree-ssa-uninit.c will DTRT.

I wonder if we could get the tighter test coming out of switch lowering,
or discover it in DOM, then there's a reasonable change
tree-ssa-uninit.c would just DTRT.

Looking at the DOM dumps we have:

Optimizing block #5

1>>> STMT 1 = bar_5(D) le_expr 1
1>>> STMT 0 = bar_5(D) gt_expr 1
Optimizing statement if (bar_5(D) >= 1)
LKUP STMT bar_5(D) ge_expr 1


DOM doesn't usually try to do any kind of range optimization.  But this
is a case where it might make sense.

So the first two lines indicate that we know bar5 <= 1 is true and that
bar5 > 1 must be false.  Those are expressions we can look up in the
hash table.

So when the lookup bar5 >= 1 fails we could in theory lookup bar5 <= 1
and if that's true, then we know the test could be simplified to bar5 == 1

I don't know how often this would hit in general and when it hits how
often the resulting code is actually better.   I can try to do some
instrumentation on that...  I wonder if that might help the first case
as well.


> 
> Last question is whether the pass is properly moved in optimization pipeline?
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
Well, that's a *real* interesting question.  THe two cases above argue
that it's too early.  However, I could easily make an argument that the
spot you chose was reasonable -- namely it's after the first set of
threading/vrp/dom passes, but before the second set of threading/vrp/dom
passes.

Let me poke a bit with DOM and see if it can reasonably clean up this
stuff and whether or not the transformation is generally useful.

Jeff

Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

Reply via email to