Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

Richard Guenther Thu, 26 Apr 2012 03:21:13 -0700

On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <tom_devr...@mentor.com> wrote:
> On 25/04/12 11:57, Richard Guenther wrote:
> <SNIP>
>>>> >> Hmm.  I'm not sure we can conclude that they have the same value!
>>>> >>
>>>> >> +int bar (int);
>>>> >> +void baz (int);
>>>> >> +void bla (int);
>>>> >> +
>>>> >> +void
>>>> >> +foo (int y)
>>>> >> +{
>>>> >> +  int a;
>>>> >> +  if (y == 6)
>>>> >> +    {
>>>> >> +      bla (5);
>>>> >> +      a = bar (7);
>>>> >> +    }
>>>> >> +  else
>>>> >> +    {
>>>> >> +      bla (5);
>>>> >> +      a = bar (7);
>>>> >> +    }
>>>> >> +  baz (a);
>>>> >> +}
>>>> >>
>>>> >> I can implement bla as
>>>> >>
>>>> >> void bla(int) { a = random (); }
>>>> >>
>>>> >> thus a would not have the same value on both paths
>>> >
>>> > I think it's hard to decide whether they have the same value or not, 
>>> > since we
>>> > cannot write a test-case that compares the results of calls from 2 
>>> > branches of
>>> > an if statement.
>> void *foo ()
>> {
>>   return __builtin_return_address (0);
>> }
>>
>> void *bar (_Bool b)
>> {
>>   if (b)
>>     return foo ();
>>   else
>>     return foo ();
>> }
>>
>> int main()
>> {
>>   if (bar(true) == bar(false))
>>     abort ();
>> }
>>
>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>> Which would question tail-merging the above at all, of course.
>>
>
> I added noinline to make sure there's no variation from inlining.
> ...
> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
> ((__noreturn__));
>
> __attribute__((noinline))
> void *foo ()
> {
>  return __builtin_return_address (0);
> }
>
> __attribute__((noinline))
> void *bar (int b)
> {
>  if (b)
>    return foo ();
>  else
>    return foo ();
> }
>
> int main()
> {
>  void *a, *b;
>  a = bar (1);
>  b = bar (0);
>  if (a == b)
>    abort ();
>  return 0;
> }
> ...
>
> This test-case passes with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge 
> -fno-crossjumping
> ...
>
> and fails with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge 
> -fcrossjumping
> ...
>
> with the proposed patch, it also fails with:
> ...
> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge 
> -fno-crossjumping
> ...
>
> Tree-tail-merge performs the same transformation as cross-jumping for this
> test-case. I think that the transformation is valid, and that the test-case
> behaves as expected. Subsequent calls to foo may or may not return the same
> value, depending on the optimizations, we can't assert a != b, that's just the
> nature of __builtin_return_address.
>
> Btw, this example is not what I meant when I said 'a test-case that compares 
> the
> results of calls from 2 branches of an if statement'. I tried to say that the
> semantics of C is defined in terms of taking either 1 branch or the other,
> rather than executing both in parallel, after which both results would be
> available. From that point of view, this example compares subsequent calls to
> foo, not parallel.


Ah, I see.  Btw, I was not suggesting that we should not optimize the above.

>>> > I started looking at the problem in terms of subsequent calls. Consider 
>>> > the
>>> > imaginary attribute nosideeffect, similar to const and pure.
>>> >
>>> > const:
>>> > - no effects except the return value
>>> > - the return value depends only on the parameters
>>> >
>>> > pure:
>>> > - no effects except the return value
>>> > - the return value depends only on the parameters and/or global variables
>>> >
>>> > nosideeffect:
>>> > - no effects except the return value
>>> >
>>> > The code fragment:
>>> > ...
>>> > extern int f (int) __attribute ((nosideeffect));
>>> > int g()
>>> > {
>>> >  int a;
>>> >  int b;
>>> >  a = f (5);
>>> >  b = f (5);
>>> >  return a + b;
>>> > }
>>> > ...
>>> >
>>> > would result in a gimple sequence more or less like this:
>>> > ...
>>> >  # VUSE <.MEMD.1712_4(D)>
>>> >  aD.1707_1 = fD.1704 (5);
>>> >  # VUSE <.MEMD.1712_4(D)>
>>> >  bD.1708_2 = fD.1704 (5);
>>> >  D.1710_3 = aD.1707_1 + bD.1708_2;
>>> >  # VUSE <.MEMD.1712_6>
>>> >  return D.1710_3;
>>> > ...
>>> >
>>> > The value numbering modification I'm proposing would say that a and b 
>>> > have the
>>> > same value, which is not true. I updated the patch to ensure in this case 
>>> > that a
>>> > and b would be seen as different.
>>> > Note that things won't break if the attribute is missing on a function, 
>>> > since
>>> > that just means that the function would have a vdef, and there would be 
>>> > no 2
>>> > subsequent function calls with the same vuse.
>>> >
>>>> >> - but that is not what
>>>> >> matters - because obviously we can still merge the blocks.
>>> >
>>> > Right.
>>> >
>>>> >>  That is,
>>>> >> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>>>> >> are the same.  But is that not what your value-numbering changes will 
>>>> >> assert?
>>> >
>>> > In the case of calls in different branches of an control flow statement, 
>>> > we can
>>> > aggressively assume they're the same, since we cannot write a check that 
>>> > would
>>> > start failing.
>> Apart from the above ... which shows that even the returned values can 
>> matter.
>>
>>> > In the case of subsequent calls with side effects, the vuses will never 
>>> > be the same.
>>> >
>>> > In the case of subsequent calls without side effects, but not pure or 
>>> > const, we
>>> > can't assume they have the same result. AFAIK, there's currently no way to
>>> > specify such a function, but the updated patch should be able handle this.
>>> >
>>>> >> (not sure how you achieve this, but it looks like you insert/lookup 
>>>> >> general
>>>> >> calls, thus not pure/const ones
>>> >
>>> > Correct.
>>> >
>>>> >> - that could easily lead to miscompiles(?)).
>>>> >>
>> And to increased compile-time (not sure if it matters).
>>
>>> > I have bootstrapped and reg-tested the updated (and the previous) patch on
>>> > x86_64 (ada inclusive), and found no issues.
>>> >
>>> > Can you think of a test-case that fails with the previous or the updated 
>>> > patch?
>> I see you do not handle
>>
>
> I assume you mean 'not handle' in the sense of 'handle conservatively'.

Yes.

>> struct S { int i; };
>> struct S foo (void);
>> struct S bar (void)
>> {
>>   struct S s1, s2;
>>   if (...)
>>    s = foo ();
>>   else
>>    s = foo ();
>>
>> because the calls have a LHS that is not an SSA name.
>
> Indeed, the gvn patch handles this example conservatively, and tree-tail-merge
> fails to optimize this test-case:
> ...
> struct S { int i; };
> extern struct S foo (void);
> extern int foo2 (void);
> struct S s;
> int bar (int c) {
>  int r;
>  if (c)
>    {
>      s = foo ();
>      r = foo2 ();
>    }
>  else
>    {
>      s = foo ();
>      r = foo2 ();
>    }
>  return r;
> }
> ...
>
> A todo.
>
>>  In general
>> value-numbering virtual operands will change what further consumers lookup.
>> So changing
>>
>>   # MEM_2 = VUSE <MEM_3>
>>   foo ();
>>   # VUSE <MEM_2>
>>   ... = x;
>>
>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>> MEM_3, not considering a clobber by foo.
>
> AFAIU, the patch doesn't do this type of value numbering.
>
>> This might not actually be an
>> issue (after all we do this for stores, too).
>>
>> I see you do not handle
>>
>> int bar (int);
>> void baz (int);
>> void bla (int);
>> int s;
>> void
>> foo (int y)
>> {
>>   int a;
>>   if (y == 6)
>>     {
>>       s = 1;
>>       bla (5);
>>       a = bar (7);
>>     }
>>   else
>>     {
>>       s = 1;
>>       bla (5);
>>       a = bar (7);
>>     }
>>   baz (a);
>> }
>>
>
> I think a more basic example is this one from PR52009, which is basically 
> about
> stores:
> ...
> int z;
>
> void
> foo (int y)
> {
>  int a;
>  if (y == 6)
>    z = 5;
>  else
>    z = 5;
> }
> ...
>
> I proposed a patch for that separately:
> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the
> patch now, I suppose I could use part of that patch as infrastructure for the
> todo above.

Yes, I think so.

>> and I don't see how you need the new ref->vdef member
>
> I'm regarding the result of the ref as a compound <vdef, result>. Most of the
> time I could (as I did in an earlier version of the patch) deduce vdef as
> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
> result is NULL_TREE.
>
>> - that is also
>> not handled consistently (not in hashing or comparing, etc.).
>
> I handle the vdef as a result.

Ok.  Can you please rename ->vdef to ->result_vdef then?

>> Can't you
>> always use SSA_VAL (vuse)?
>
> I'm interested in storing the vdef, not the vuse.

I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
the stmt defining vuse).  No?

>>
>> Btw, what's
>>
>> +  if (vdef)
>> +    VN_INFO (vdef)->use_processed = true;
>>
>> for?  We arrive here by visiting the VDEF after all and visit_use does 
>> already
>> set the use processed.  Is it that we end up visiting the call twice because
>> of the lhs SSA name definition and the virtual operand definition?
>
> Yes.
>
>> If so then
>> visit_use should instead do
>>
>> FOR_EACH_SSA_DEF_OPERAND (...)
>>    VN_INFO (def)->use_processed = true;
>>
>> and defs_to_varying then no longer needs to do that.
>>
>
> factored mark_use_processed out of defs_to_varying, calling mark_use_processed
> at start of visit_use.

Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
default-def for the virtual operand?

>>> > Otherwise, ok for trunk?
>> Let's give this one more iteration, but I guess the basic idea should work.
>>
>
> bootstrapped and reg-tested on x86_64 (ada inclusive).
>
> Is this patch ok, or is the todo required?

No, you can followup with that.

Ok with s/vdef/result_vdef/

Thanks,
Richard.

> Thanks,
>  - Tom
>
> 2012-04-25  Tom de Vries  <t...@codesourcery.com>
>
>        PR tree-optimization/51879
>        * tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
>        * tree-ssa-sccvn.c (mark_use_processed): New function, factored out
>        of ...
>        (defs_to_varying): ... here.
>        (visit_reference_op_call): Handle gimple_vdef.
>        Handle case that lhs is NULL_TREE.
>        (visit_use): Use mark_use_processed.  Handle calls with side-effect
>        using visit_reference_op_call.
>
>        gcc.dg/pr51879.c: New test.
>        gcc.dg/pr51879-2.c: Same.
>        gcc.dg/pr51879-3.c: Same.
>        gcc.dg/pr51879-4.c: Same.
>        gcc.dg/pr51879-6.c: Same.

Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

Reply via email to