On Thu, Apr 28, 2016 at 12:52 PM, Jeff Law <l...@redhat.com> wrote:
> On 04/20/2016 03:02 AM, Richard Biener wrote:
>>
>> On Tue, Apr 19, 2016 at 7:50 PM, Patrick Palka <patr...@parcs.ath.cx>
>> wrote:
>>>
>>> This patch makes the jump threader look through the BIT_AND_EXPRs and
>>> BIT_IOR_EXPRs within a condition so that we could find dominating
>>> ASSERT_EXPRs that could help make the overall condition evaluate to a
>>> constant.  For example, we currently don't perform any jump threading in
>>> the following test case even though it's known that if the code calls
>>> foo() then it can't possibly call bar() afterwards:
>
> I'd always envisioned we'd do more simplifications than we're doing now and
> this fits well within how I expected to exploit ASSERT_EXPRs and DOM's
> available expressions/const/copies tables.
>
> However, I do have some long term direction plans that may make how we do
> this change a bit.  In the mean time I don't see a reason not to go forward
> with your change.
>
>
>
>
>>>
>>> void
>>> baz_1 (int a, int b, int c)
>>> {
>>>   if (a && b)
>>>     foo ();
>>>   if (!b && c)
>>>     bar ();
>>> }
>>>
>>>    <bb 2>:
>>>    _4 = a_3(D) != 0;
>>>    _6 = b_5(D) != 0;
>>>    _7 = _4 & _6;
>>>    if (_7 != 0)
>>>      goto <bb 3>;
>>>    else
>>>      goto <bb 4>;
>>>
>>>    <bb 3>:
>>>    b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>;
>>>    foo ();
>>>
>>>    <bb 4>:
>>>    _10 = b_5(D) == 0;
>>>    _12 = c_11(D) != 0;
>>>    _13 = _10 & _12;
>>>    if (_13 != 0)
>>>      goto <bb 5>;
>>>    else
>>>      goto <bb 6>;
>>>
>>>    <bb 5>:
>>>    bar ();
>>>
>>>    <bb 6>:
>>>    return;
>>>
>>> So we here miss a jump threading opportunity that would have made bb 3
>>> jump
>>> straight to bb 6 instead of falling through to bb 4.
>>>
>>> If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
>>> there is an ASSERT_EXPR that says its left operand b_5 is non-zero.  We
>>> could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
>>> always false.  This is what this patch does, basically by making
>>> simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
>>> BIT_IOR_EXPRs.
>>>
>>> Does this seem like a good idea/approach?
>
> So the other approach I've been pondering for a while is backwards
> substitution.
>
> So given _13 != 0, we expand that to
>
> (_10 & _12) != 0
>
> Which further expands into
> ((b_5 == 0) & (c_11 != 0)) != 0
>
> And we follow b_5 back to the ASSERT_EXPR which allows us to start
> simplifying terms.
>
>
> The glitch in that plan is there is no easy linkage between the use of b_5
> in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew and myself
> are looking at independently for some of Aldy's work.

I see.. One other deficiency I noticed in the existing threading code
is that there may have been multiple ASSERT_EXPRs registered for b_5,
so bb3 could look like

    <bb 3>:
    b_15 = ASSERT_EXPR <b_5(D), b_5(D) != 0>;
    b_16 = ASSERT_EXPR <b_15, b_15 != 1>;
    foo ();

but currently we don't consider the 2nd ASSERT_EXPR because we only
look at the immediate uses of b_5.  This oversight makes us fail to
thread

    void bar (void);
    void baz (void);

    void
    foo (int a)
    {
      if (a != 5 && a != 10)
        bar ();
      if (a == 10)
        baz ();
    }

>
> But that's all future work...  Back to your patch...
>
>>>
>>> Notes:
>>>
>>> 1. This patch introduces a "regression" in
>>> gcc.dg/tree-ssa/ssa-thread-11.c
>>> in that we no longer perform FSM threading during vrp2 but instead we
>>> detect two new jump threading opportunities during vrp1.  Not sure if
>>> the new code is better but it is shorter.  I wonder how this should be
>>> resolved...
>>
>>
>> Try adjusting the testcase so that it performs the FSM threading again
>> or adjust the expected outcome...
>
> Right.  We just need to look closely at the before/after dumps, make a
> decision about whether the result is better or worse.  If it's better, then
> we adjust the output to the new better result (and I would claim that the
> same threading, but done earlier is better).
>
> Shorter isn't generally a good indicator of whether or not something is
> better.  The thing to look at is the number of conditional executed on the
> various paths through the CFG.
>
> In this specific instance, there's a good chance your analysis is catching
> something earlier and allowing it to be better simplified.  But let's do the
> analysis to make sure.

>From what I can tell, the patch does cause fewer conditionals to get
executed in general.  I spot two extra jumps that are threaded in the
final code compared to without the patch.  I wouldn't trust my
analysis though!

By the way, the test case ssa-thread-11.c is somewhat buggy since its
two functions lack return statements.  Also I would expect abort() to
have the noreturn attribute.

>
>>
>>> 2. I haven't tested the performance impact of this patch.  What would be
>>> a good way to do this?
>
> I have ways to do that.  Jump threading inherently is about reducing the
> number of conditionals we have to evaluate at runtime.   Valgrind can give
> us that information.  So what I do is...
>
> Build a control compiler.  Use that to build an older version of gcc
> (gcc-4.7.3 in particular) .  I then use the just-built gcc-4.7.3 to build a
> bunch of .i files under valgrind control.
>
> I then repeat, but the first compiler has whatever patch I want to evaluate
> installed.
>
> I can then compare the output from valgrind to see which has better
> branching behaviour.

That makes sense!  I will play around with this technique.

>
> It takes a few hours, but it's not terrible and has provided good data
> through the years.  I'll throw your patch into the tester and see what it
> spits out.
>
> I'll also have a look at the details of the patch.

Thanks!

>
> jeff

Reply via email to