On Thu, Apr 28, 2016 at 12:52 PM, Jeff Law <l...@redhat.com> wrote: > On 04/20/2016 03:02 AM, Richard Biener wrote: >> >> On Tue, Apr 19, 2016 at 7:50 PM, Patrick Palka <patr...@parcs.ath.cx> >> wrote: >>> >>> This patch makes the jump threader look through the BIT_AND_EXPRs and >>> BIT_IOR_EXPRs within a condition so that we could find dominating >>> ASSERT_EXPRs that could help make the overall condition evaluate to a >>> constant. For example, we currently don't perform any jump threading in >>> the following test case even though it's known that if the code calls >>> foo() then it can't possibly call bar() afterwards: > > I'd always envisioned we'd do more simplifications than we're doing now and > this fits well within how I expected to exploit ASSERT_EXPRs and DOM's > available expressions/const/copies tables. > > However, I do have some long term direction plans that may make how we do > this change a bit. In the mean time I don't see a reason not to go forward > with your change. > > > > >>> >>> void >>> baz_1 (int a, int b, int c) >>> { >>> if (a && b) >>> foo (); >>> if (!b && c) >>> bar (); >>> } >>> >>> <bb 2>: >>> _4 = a_3(D) != 0; >>> _6 = b_5(D) != 0; >>> _7 = _4 & _6; >>> if (_7 != 0) >>> goto <bb 3>; >>> else >>> goto <bb 4>; >>> >>> <bb 3>: >>> b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>; >>> foo (); >>> >>> <bb 4>: >>> _10 = b_5(D) == 0; >>> _12 = c_11(D) != 0; >>> _13 = _10 & _12; >>> if (_13 != 0) >>> goto <bb 5>; >>> else >>> goto <bb 6>; >>> >>> <bb 5>: >>> bar (); >>> >>> <bb 6>: >>> return; >>> >>> So we here miss a jump threading opportunity that would have made bb 3 >>> jump >>> straight to bb 6 instead of falling through to bb 4. >>> >>> If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that >>> there is an ASSERT_EXPR that says its left operand b_5 is non-zero. We >>> could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is >>> always false. This is what this patch does, basically by making >>> simplify_control_stmt_condition recurse into BIT_AND_EXPRs and >>> BIT_IOR_EXPRs. >>> >>> Does this seem like a good idea/approach? > > So the other approach I've been pondering for a while is backwards > substitution. > > So given _13 != 0, we expand that to > > (_10 & _12) != 0 > > Which further expands into > ((b_5 == 0) & (c_11 != 0)) != 0 > > And we follow b_5 back to the ASSERT_EXPR which allows us to start > simplifying terms. > > > The glitch in that plan is there is no easy linkage between the use of b_5 > in bb4 and the ASSERT_EXPR in bb3. That's something Aldy, Andrew and myself > are looking at independently for some of Aldy's work.
I see.. One other deficiency I noticed in the existing threading code is that there may have been multiple ASSERT_EXPRs registered for b_5, so bb3 could look like <bb 3>: b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>; b_16 = ASSERT_EXPR <b_15, b_15 != 1>; foo (); but currently we don't consider the 2nd ASSERT_EXPR because we only look at the immediate uses of b_5. This oversight makes us fail to thread void bar (void); void baz (void); void foo (int a) { if (a != 5 && a != 10) bar (); if (a == 10) baz (); } > > But that's all future work... Back to your patch... > >>> >>> Notes: >>> >>> 1. This patch introduces a "regression" in >>> gcc.dg/tree-ssa/ssa-thread-11.c >>> in that we no longer perform FSM threading during vrp2 but instead we >>> detect two new jump threading opportunities during vrp1. Not sure if >>> the new code is better but it is shorter. I wonder how this should be >>> resolved... >> >> >> Try adjusting the testcase so that it performs the FSM threading again >> or adjust the expected outcome... > > Right. We just need to look closely at the before/after dumps, make a > decision about whether the result is better or worse. If it's better, then > we adjust the output to the new better result (and I would claim that the > same threading, but done earlier is better). > > Shorter isn't generally a good indicator of whether or not something is > better. The thing to look at is the number of conditional executed on the > various paths through the CFG. > > In this specific instance, there's a good chance your analysis is catching > something earlier and allowing it to be better simplified. But let's do the > analysis to make sure. >From what I can tell, the patch does cause fewer conditionals to get executed in general. I spot two extra jumps that are threaded in the final code compared to without the patch. I wouldn't trust my analysis though! By the way, the test case ssa-thread-11.c is somewhat buggy since its two functions lack return statements. Also I would expect abort() to have the noreturn attribute. > >> >>> 2. I haven't tested the performance impact of this patch. What would be >>> a good way to do this? > > I have ways to do that. Jump threading inherently is about reducing the > number of conditionals we have to evaluate at runtime. Valgrind can give > us that information. So what I do is... > > Build a control compiler. Use that to build an older version of gcc > (gcc-4.7.3 in particular) . I then use the just-built gcc-4.7.3 to build a > bunch of .i files under valgrind control. > > I then repeat, but the first compiler has whatever patch I want to evaluate > installed. > > I can then compare the output from valgrind to see which has better > branching behaviour. That makes sense! I will play around with this technique. > > It takes a few hours, but it's not terrible and has provided good data > through the years. I'll throw your patch into the tester and see what it > spits out. > > I'll also have a look at the details of the patch. Thanks! > > jeff