On 09/14/2017 06:17 AM, Martin Liška wrote: > Hello. > > As mentioned at Cauldron 2017, second step in switch lowering should be > massive > simplification in code that does expansion of balanced tree. Basically it > includes > VRP and DCE, which we can for obvious reason do by our own. > > The patch does that, and introduces a separate pass for -O0 that's responsible > for lowering at the end of tree pass. > > There's a small fallback that I would like to discuss: > > 1) vrp105.c - there's no longer catches threading opportunity in between > default cases: > adding Patrick who can probably help why is the opportunity skipped with > expanded tree Well, VRP knows how to analyze a switch statement and determine when paths out of one switch imply a particular case will be taken in a later switch. In this particular case we're trying to verify that the default case in the first switch threads directly to the default case in the second (though it seems we ought to be able to totally thread the cases).
Obviously we don't have switches anymore after your lowering pass. But ISTM we still ought to be able to evaluate how your lowering affects jump threading on this case. In theory the lowered switch statements should be easier to thread. Reality is sadly different. There's two cases to consider. One when i < 0 (should go to first default, then directly to second default). The other when i > 1 which should also go to the first default, then directly to the second default). When i < 0 we do in fact thread through the second switch and go from the first default case directly to the second default case. When i > 1 we still go through the test for the second switch. These are the key blocks: ;; basic block 2, loop depth 0, freq 10000, maybe hot ;; prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) ;; pred: ENTRY [always] (FALLTHRU,EXECUTABLE) i.0_1 = i; if (i.0_1 > 0) goto <bb 4>; [50.01%] [count: INV] else goto <bb 3>; [49.99%] [count: INV] ;; succ: 3 [50.0% (guessed)] (FALSE_VALUE,EXECUTABLE) ;; 4 [50.0% (guessed)] (TRUE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, freq 10000, maybe hot ;; Invalid sum of incoming frequencies 4999, should be 10000 ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) ;; pred: 2 [50.0% (guessed)] (FALSE_VALUE,EXECUTABLE) if (i.0_1 == 0) goto <bb 5>; [40.00%] [count: INV] else goto <bb 14>; [60.00%] [count: INV] ;; succ: 14 [60.0% (guessed)] (FALSE_VALUE,EXECUTABLE) ;; 5 [40.0% (guessed)] (TRUE_VALUE,EXECUTABLE) ;; basic block 4, loop depth 0, freq 10000, maybe hot ;; Invalid sum of incoming frequencies 5001, should be 10000 ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) ;; pred: 2 [50.0% (guessed)] (TRUE_VALUE,EXECUTABLE) _13 = i.0_1 != 1; if (_13 != 0) goto <bb 13>; [16.68%] [count: INV] else goto <bb 6>; [83.32%] [count: INV] ;; succ: 6 [83.3% (guessed)] (FALSE_VALUE,EXECUTABLE) ;; 13 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE) [ ... ] ;; basic block 9, loop depth 0, freq 9999, maybe hot ;; Invalid sum of incoming frequencies 3999, should be 9999 ;; prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) ;; pred: 13 [always] (FALLTHRU,EXECUTABLE) ;; 7 [always (guessed)] (TRUE_VALUE,EXECUTABLE) # prephitmp_19 = PHI <prephitmp_15(13), prephitmp_12(7)> _2 = prephitmp_19 != 1; if (_2 != 0) goto <bb 12>; [16.68%] [count: INV] else goto <bb 11>; [83.32%] [count: INV] ;; succ: 11 [83.3% (guessed)] (FALSE_VALUE,EXECUTABLE) ;; 12 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE) [ ... ] ;; basic block 13, loop depth 0, freq 1668, maybe hot ;; prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED) ;; pred: 4 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE) # prephitmp_15 = PHI <i.0_1(4)> goto <bb 9>; [100.00%] [count: INV] ;; succ: 9 [always] (FALLTHRU,EXECUTABLE) WHen bb9 is reached by bb13 we know by back substitution that the assignemnt _2 = prehitmp_19 != 1 _2 = prehitmp_15 != 1 _2 = i.0_1 != 1 And we should know enough about the range of i.0_1 to determine that is true which would allow us to thread the jump. But a few things get in the way. First, the VRP thread doesn't try hard at all to simplify gimple assignments. It really just tries to simplify gimple_cond and gimple_switch. This could be trivially improved. So if I do the right things by hand we end up trying to evaluate i.0_1 != 1. So that's good. But we don't get a useful result back from vrp_evaluate_conditional. WHy? Because the recorded range for i.0_1 is [1,INF] -- that's the global range for i.0_1 not the range on the path. Now it turns out this is precisely the problem that I've got code to address in my queue which fixes a regression I was working on earlier this year in the gcc-7 cycle. It's backed up behind some improvements to tree-ssa-uninit.c that Richi and I still need to hash through. This would likely also be fixed by Andrew's work on ranges. It's precisely the kind of query its designed to answer. In summary, your switch lowering does regress the code we get for this test. But the regression is more a failing of the jump threader than anything. > > 2) uninit-18.c where we currently report: > > /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/uninit-18.c:13:12: warning: > ‘tmp’ may be used uninitialized in this function [-Wmaybe-uninitialized] > tmp[5] = 7; /* { dg-bogus "may be used uninitialized" } */ > > Am I right that the pass uses VRP? No, it doesn't use VRP. If you look at the history of uninit-18 it shows: commit cac06b024306772e2be2d357064f7ca2aa82bde8 Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Fri Jan 16 13:26:10 2015 +0000 2015-01-16 Richard Biener <rguent...@suse.de> PR middle-end/64614 * tree-ssa-uninit.c: Include tree-cfg.h. (MAX_SWITCH_CASES): New define. (convert_control_dep_chain_into_preds): Handle switch statements. (is_pred_expr_subset_of): Handle x == CST vs. (x & CST) != 0. (normalize_one_pred_1): Do not split bit-manipulations. Record (x & CST). * gcc.dg/uninit-18.c: New testcase. Essentially tree-ssa-uninit.c has some specialized code to handle switch statements in predicates with BIT_AND_EXPR and uninit-18.c tests that capability. Your new switch lowering breaks that relatively specialized code. You could further enhance uninit to detect this. From the .uninit dump: MEM[(char *)tmp_2 + 5B] = 7; is guarded by : bar_5(D) > 0 (.AND.) (.NOT.) bar_5(D) > 1 [CHECK]: Found unguarded use: MEM[(char *)tmp_2 + 5B] = 7; So that check is really bar_5 == 1 and in theory if bar_5 == 1, then we would have assigned a value to tmp. VRP is capable of making that determination and I think when VRP does that, things simplify enough that the general predicate code in tree-ssa-uninit.c will DTRT. I wonder if we could get the tighter test coming out of switch lowering, or discover it in DOM, then there's a reasonable change tree-ssa-uninit.c would just DTRT. Looking at the DOM dumps we have: Optimizing block #5 1>>> STMT 1 = bar_5(D) le_expr 1 1>>> STMT 0 = bar_5(D) gt_expr 1 Optimizing statement if (bar_5(D) >= 1) LKUP STMT bar_5(D) ge_expr 1 DOM doesn't usually try to do any kind of range optimization. But this is a case where it might make sense. So the first two lines indicate that we know bar5 <= 1 is true and that bar5 > 1 must be false. Those are expressions we can look up in the hash table. So when the lookup bar5 >= 1 fails we could in theory lookup bar5 <= 1 and if that's true, then we know the test could be simplified to bar5 == 1 I don't know how often this would hit in general and when it hits how often the resulting code is actually better. I can try to do some instrumentation on that... I wonder if that might help the first case as well. > > Last question is whether the pass is properly moved in optimization pipeline? > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Well, that's a *real* interesting question. THe two cases above argue that it's too early. However, I could easily make an argument that the spot you chose was reasonable -- namely it's after the first set of threading/vrp/dom passes, but before the second set of threading/vrp/dom passes. Let me poke a bit with DOM and see if it can reasonably clean up this stuff and whether or not the transformation is generally useful. Jeff