On 09/16/2017 12:19 AM, Jeff Law wrote: > On 09/14/2017 06:17 AM, Martin Liška wrote: >> Hello. >> >> As mentioned at Cauldron 2017, second step in switch lowering should be >> massive >> simplification in code that does expansion of balanced tree. Basically it >> includes >> VRP and DCE, which we can for obvious reason do by our own. >> >> The patch does that, and introduces a separate pass for -O0 that's >> responsible >> for lowering at the end of tree pass. >> >> There's a small fallback that I would like to discuss: >> >> 1) vrp105.c - there's no longer catches threading opportunity in between >> default cases: >> adding Patrick who can probably help why is the opportunity skipped with >> expanded tree > Well, VRP knows how to analyze a switch statement and determine when > paths out of one switch imply a particular case will be taken in a later > switch. In this particular case we're trying to verify that the default > case in the first switch threads directly to the default case in the > second (though it seems we ought to be able to totally thread the cases). > > Obviously we don't have switches anymore after your lowering pass. But > ISTM we still ought to be able to evaluate how your lowering affects > jump threading on this case. In theory the lowered switch statements > should be easier to thread. > > Reality is sadly different. There's two cases to consider. One when i > < 0 (should go to first default, then directly to second default). The > other when i > 1 which should also go to the first default, then > directly to the second default). > > When i < 0 we do in fact thread through the second switch and go from > the first default case directly to the second default case. > > When i > 1 we still go through the test for the second switch. > > These are the key blocks: > > ;; basic block 2, loop depth 0, freq 10000, maybe hot > ;; prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) > ;; pred: ENTRY [always] (FALLTHRU,EXECUTABLE) > i.0_1 = i; > if (i.0_1 > 0) > goto <bb 4>; [50.01%] [count: INV] > else > goto <bb 3>; [49.99%] [count: INV] > ;; succ: 3 [50.0% (guessed)] (FALSE_VALUE,EXECUTABLE) > ;; 4 [50.0% (guessed)] (TRUE_VALUE,EXECUTABLE) > > ;; basic block 3, loop depth 0, freq 10000, maybe hot > ;; Invalid sum of incoming frequencies 4999, should be 10000 > ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) > ;; pred: 2 [50.0% (guessed)] (FALSE_VALUE,EXECUTABLE) > if (i.0_1 == 0) > goto <bb 5>; [40.00%] [count: INV] > else > goto <bb 14>; [60.00%] [count: INV] > ;; succ: 14 [60.0% (guessed)] (FALSE_VALUE,EXECUTABLE) > ;; 5 [40.0% (guessed)] (TRUE_VALUE,EXECUTABLE) > > ;; basic block 4, loop depth 0, freq 10000, maybe hot > ;; Invalid sum of incoming frequencies 5001, should be 10000 > ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) > ;; pred: 2 [50.0% (guessed)] (TRUE_VALUE,EXECUTABLE) > _13 = i.0_1 != 1; > if (_13 != 0) > goto <bb 13>; [16.68%] [count: INV] > else > goto <bb 6>; [83.32%] [count: INV] > ;; succ: 6 [83.3% (guessed)] (FALSE_VALUE,EXECUTABLE) > ;; 13 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE) > > [ ... ] > > ;; basic block 9, loop depth 0, freq 9999, maybe hot > ;; Invalid sum of incoming frequencies 3999, should be 9999 > ;; prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) > ;; pred: 13 [always] (FALLTHRU,EXECUTABLE) > ;; 7 [always (guessed)] (TRUE_VALUE,EXECUTABLE) > # prephitmp_19 = PHI <prephitmp_15(13), prephitmp_12(7)> > _2 = prephitmp_19 != 1; > if (_2 != 0) > goto <bb 12>; [16.68%] [count: INV] > else > goto <bb 11>; [83.32%] [count: INV] > ;; succ: 11 [83.3% (guessed)] (FALSE_VALUE,EXECUTABLE) > ;; 12 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE) > > [ ... ] > ;; basic block 13, loop depth 0, freq 1668, maybe hot > ;; prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED) > ;; pred: 4 [16.7% (guessed)] (TRUE_VALUE,EXECUTABLE) > # prephitmp_15 = PHI <i.0_1(4)> > goto <bb 9>; [100.00%] [count: INV] > ;; succ: 9 [always] (FALLTHRU,EXECUTABLE) > > > > WHen bb9 is reached by bb13 we know by back substitution that the assignemnt > > _2 = prehitmp_19 != 1 > _2 = prehitmp_15 != 1 > _2 = i.0_1 != 1 > > And we should know enough about the range of i.0_1 to determine that is > true which would allow us to thread the jump. But a few things get in > the way. > > First, the VRP thread doesn't try hard at all to simplify gimple > assignments. It really just tries to simplify gimple_cond and > gimple_switch. This could be trivially improved. > > So if I do the right things by hand we end up trying to evaluate i.0_1 > != 1. So that's good. But we don't get a useful result back from > vrp_evaluate_conditional. WHy? Because the recorded range for i.0_1 is > [1,INF] -- that's the global range for i.0_1 not the range on the path. > > Now it turns out this is precisely the problem that I've got code to > address in my queue which fixes a regression I was working on earlier > this year in the gcc-7 cycle. It's backed up behind some improvements > to tree-ssa-uninit.c that Richi and I still need to hash through. > > This would likely also be fixed by Andrew's work on ranges. It's > precisely the kind of query its designed to answer. > > In summary, your switch lowering does regress the code we get for this > test. But the regression is more a failing of the jump threader than > anything. > >> >> 2) uninit-18.c where we currently report: >> >> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/uninit-18.c:13:12: >> warning: ‘tmp’ may be used uninitialized in this function >> [-Wmaybe-uninitialized] >> tmp[5] = 7; /* { dg-bogus "may be used uninitialized" } */ >> >> Am I right that the pass uses VRP? > No, it doesn't use VRP. If you look at the history of uninit-18 it shows: > commit cac06b024306772e2be2d357064f7ca2aa82bde8 > Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4> > Date: Fri Jan 16 13:26:10 2015 +0000 > > 2015-01-16 Richard Biener <rguent...@suse.de> > > PR middle-end/64614 > * tree-ssa-uninit.c: Include tree-cfg.h. > (MAX_SWITCH_CASES): New define. > (convert_control_dep_chain_into_preds): Handle switch > statements. > (is_pred_expr_subset_of): Handle x == CST vs. (x & CST) != 0. > (normalize_one_pred_1): Do not split bit-manipulations. > Record (x & CST). > > * gcc.dg/uninit-18.c: New testcase. > > Essentially tree-ssa-uninit.c has some specialized code to handle switch > statements in predicates with BIT_AND_EXPR and uninit-18.c tests that > capability. > > Your new switch lowering breaks that relatively specialized code. > > You could further enhance uninit to detect this. From the .uninit dump: > > MEM[(char *)tmp_2 + 5B] = 7; > is guarded by : > > bar_5(D) > 0 (.AND.) (.NOT.) bar_5(D) > 1 > > > [CHECK]: Found unguarded use: MEM[(char *)tmp_2 + 5B] = 7; > > > So that check is really bar_5 == 1 and in theory if bar_5 == 1, then we > would have assigned a value to tmp. VRP is capable of making that > determination and I think when VRP does that, things simplify enough > that the general predicate code in tree-ssa-uninit.c will DTRT. > > I wonder if we could get the tighter test coming out of switch lowering, > or discover it in DOM, then there's a reasonable change > tree-ssa-uninit.c would just DTRT. > > Looking at the DOM dumps we have: > > Optimizing block #5 > > 1>>> STMT 1 = bar_5(D) le_expr 1 > 1>>> STMT 0 = bar_5(D) gt_expr 1 > Optimizing statement if (bar_5(D) >= 1) > LKUP STMT bar_5(D) ge_expr 1 > > > DOM doesn't usually try to do any kind of range optimization. But this > is a case where it might make sense. > > So the first two lines indicate that we know bar5 <= 1 is true and that > bar5 > 1 must be false. Those are expressions we can look up in the > hash table. > > So when the lookup bar5 >= 1 fails we could in theory lookup bar5 <= 1 > and if that's true, then we know the test could be simplified to bar5 == 1 > > I don't know how often this would hit in general and when it hits how > often the resulting code is actually better. I can try to do some > instrumentation on that... I wonder if that might help the first case > as well. > > >> >> Last question is whether the pass is properly moved in optimization pipeline? >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > Well, that's a *real* interesting question. THe two cases above argue > that it's too early. However, I could easily make an argument that the > spot you chose was reasonable -- namely it's after the first set of > threading/vrp/dom passes, but before the second set of threading/vrp/dom > passes. > > Let me poke a bit with DOM and see if it can reasonably clean up this > stuff and whether or not the transformation is generally useful. > > Jeff >
Hello. Thank you Jeff for very verbose explanation what's happening. I'm planning to do follow-up of this patch that will include clustering for bit-tests and jump tables. Maybe that will make aforementioned issues even more difficult, but we'll see. Martin