Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

Richard Guenther Fri, 27 Apr 2012 02:02:16 -0700

On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries <tom_devr...@mentor.com> wrote:
> On 26/04/12 12:20, Richard Guenther wrote:
>> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <tom_devr...@mentor.com> 
>> wrote:
>>> On 25/04/12 11:57, Richard Guenther wrote:
>>> <SNIP>
>>>>>>>> Hmm.  I'm not sure we can conclude that they have the same value!
>>>>>>>>
>>>>>>>> +int bar (int);
>>>>>>>> +void baz (int);
>>>>>>>> +void bla (int);
>>>>>>>> +
>>>>>>>> +void
>>>>>>>> +foo (int y)
>>>>>>>> +{
>>>>>>>> +  int a;
>>>>>>>> +  if (y == 6)
>>>>>>>> +    {
>>>>>>>> +      bla (5);
>>>>>>>> +      a = bar (7);
>>>>>>>> +    }
>>>>>>>> +  else
>>>>>>>> +    {
>>>>>>>> +      bla (5);
>>>>>>>> +      a = bar (7);
>>>>>>>> +    }
>>>>>>>> +  baz (a);
>>>>>>>> +}
>>>>>>>>
>>>>>>>> I can implement bla as
>>>>>>>>
>>>>>>>> void bla(int) { a = random (); }
>>>>>>>>
>>>>>>>> thus a would not have the same value on both paths
>>>>>>
>>>>>> I think it's hard to decide whether they have the same value or not, 
>>>>>> since we
>>>>>> cannot write a test-case that compares the results of calls from 2 
>>>>>> branches of
>>>>>> an if statement.
>>>> void *foo ()
>>>> {
>>>>   return __builtin_return_address (0);
>>>> }
>>>>
>>>> void *bar (_Bool b)
>>>> {
>>>>   if (b)
>>>>     return foo ();
>>>>   else
>>>>     return foo ();
>>>> }
>>>>
>>>> int main()
>>>> {
>>>>   if (bar(true) == bar(false))
>>>>     abort ();
>>>> }
>>>>
>>>> ok ... outside of the scope of standard "C", but we certainly _can_ do 
>>>> this.
>>>> Which would question tail-merging the above at all, of course.
>>>>
>>>
>>> I added noinline to make sure there's no variation from inlining.
>>> ...
>>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
>>> ((__noreturn__));
>>>
>>> __attribute__((noinline))
>>> void *foo ()
>>> {
>>>  return __builtin_return_address (0);
>>> }
>>>
>>> __attribute__((noinline))
>>> void *bar (int b)
>>> {
>>>  if (b)
>>>    return foo ();
>>>  else
>>>    return foo ();
>>> }
>>>
>>> int main()
>>> {
>>>  void *a, *b;
>>>  a = bar (1);
>>>  b = bar (0);
>>>  if (a == b)
>>>    abort ();
>>>  return 0;
>>> }
>>> ...
>>>
>>> This test-case passes with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge 
>>> -fno-crossjumping
>>> ...
>>>
>>> and fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge 
>>> -fcrossjumping
>>> ...
>>>
>>> with the proposed patch, it also fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge 
>>> -fno-crossjumping
>>> ...
>>>
>>> Tree-tail-merge performs the same transformation as cross-jumping for this
>>> test-case. I think that the transformation is valid, and that the test-case
>>> behaves as expected. Subsequent calls to foo may or may not return the same
>>> value, depending on the optimizations, we can't assert a != b, that's just 
>>> the
>>> nature of __builtin_return_address.
>>>
>>> Btw, this example is not what I meant when I said 'a test-case that 
>>> compares the
>>> results of calls from 2 branches of an if statement'. I tried to say that 
>>> the
>>> semantics of C is defined in terms of taking either 1 branch or the other,
>>> rather than executing both in parallel, after which both results would be
>>> available. From that point of view, this example compares subsequent calls 
>>> to
>>> foo, not parallel.
>>
>> Ah, I see.  Btw, I was not suggesting that we should not optimize the above.
>>
>>>>>> I started looking at the problem in terms of subsequent calls. Consider 
>>>>>> the
>>>>>> imaginary attribute nosideeffect, similar to const and pure.
>>>>>>
>>>>>> const:
>>>>>> - no effects except the return value
>>>>>> - the return value depends only on the parameters
>>>>>>
>>>>>> pure:
>>>>>> - no effects except the return value
>>>>>> - the return value depends only on the parameters and/or global variables
>>>>>>
>>>>>> nosideeffect:
>>>>>> - no effects except the return value
>>>>>>
>>>>>> The code fragment:
>>>>>> ...
>>>>>> extern int f (int) __attribute ((nosideeffect));
>>>>>> int g()
>>>>>> {
>>>>>>  int a;
>>>>>>  int b;
>>>>>>  a = f (5);
>>>>>>  b = f (5);
>>>>>>  return a + b;
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> would result in a gimple sequence more or less like this:
>>>>>> ...
>>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>>  aD.1707_1 = fD.1704 (5);
>>>>>>  # VUSE <.MEMD.1712_4(D)>
>>>>>>  bD.1708_2 = fD.1704 (5);
>>>>>>  D.1710_3 = aD.1707_1 + bD.1708_2;
>>>>>>  # VUSE <.MEMD.1712_6>
>>>>>>  return D.1710_3;
>>>>>> ...
>>>>>>
>>>>>> The value numbering modification I'm proposing would say that a and b 
>>>>>> have the
>>>>>> same value, which is not true. I updated the patch to ensure in this 
>>>>>> case that a
>>>>>> and b would be seen as different.
>>>>>> Note that things won't break if the attribute is missing on a function, 
>>>>>> since
>>>>>> that just means that the function would have a vdef, and there would be 
>>>>>> no 2
>>>>>> subsequent function calls with the same vuse.
>>>>>>
>>>>>>>> - but that is not what
>>>>>>>> matters - because obviously we can still merge the blocks.
>>>>>>
>>>>>> Right.
>>>>>>
>>>>>>>>  That is,
>>>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla 
>>>>>>>> (5)
>>>>>>>> are the same.  But is that not what your value-numbering changes will 
>>>>>>>> assert?
>>>>>>
>>>>>> In the case of calls in different branches of an control flow statement, 
>>>>>> we can
>>>>>> aggressively assume they're the same, since we cannot write a check that 
>>>>>> would
>>>>>> start failing.
>>>> Apart from the above ... which shows that even the returned values can 
>>>> matter.
>>>>
>>>>>> In the case of subsequent calls with side effects, the vuses will never 
>>>>>> be the same.
>>>>>>
>>>>>> In the case of subsequent calls without side effects, but not pure or 
>>>>>> const, we
>>>>>> can't assume they have the same result. AFAIK, there's currently no way 
>>>>>> to
>>>>>> specify such a function, but the updated patch should be able handle 
>>>>>> this.
>>>>>>
>>>>>>>> (not sure how you achieve this, but it looks like you insert/lookup 
>>>>>>>> general
>>>>>>>> calls, thus not pure/const ones
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>>> - that could easily lead to miscompiles(?)).
>>>>>>>>
>>>> And to increased compile-time (not sure if it matters).
>>>>
>>>>>> I have bootstrapped and reg-tested the updated (and the previous) patch 
>>>>>> on
>>>>>> x86_64 (ada inclusive), and found no issues.
>>>>>>
>>>>>> Can you think of a test-case that fails with the previous or the updated 
>>>>>> patch?
>>>> I see you do not handle
>>>>
>>>
>>> I assume you mean 'not handle' in the sense of 'handle conservatively'.
>>
>> Yes.
>>
>>>> struct S { int i; };
>>>> struct S foo (void);
>>>> struct S bar (void)
>>>> {
>>>>   struct S s1, s2;
>>>>   if (...)
>>>>    s = foo ();
>>>>   else
>>>>    s = foo ();
>>>>
>>>> because the calls have a LHS that is not an SSA name.
>>>
>>> Indeed, the gvn patch handles this example conservatively, and 
>>> tree-tail-merge
>>> fails to optimize this test-case:
>>> ...
>>> struct S { int i; };
>>> extern struct S foo (void);
>>> extern int foo2 (void);
>>> struct S s;
>>> int bar (int c) {
>>>  int r;
>>>  if (c)
>>>    {
>>>      s = foo ();
>>>      r = foo2 ();
>>>    }
>>>  else
>>>    {
>>>      s = foo ();
>>>      r = foo2 ();
>>>    }
>>>  return r;
>>> }
>>> ...
>>>
>>> A todo.
>>>
>>>>  In general
>>>> value-numbering virtual operands will change what further consumers lookup.
>>>> So changing
>>>>
>>>>   # MEM_2 = VUSE <MEM_3>
>>>>   foo ();
>>>>   # VUSE <MEM_2>
>>>>   ... = x;
>>>>
>>>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
>>>> MEM_3, not considering a clobber by foo.
>>>
>>> AFAIU, the patch doesn't do this type of value numbering.
>>>
>>>> This might not actually be an
>>>> issue (after all we do this for stores, too).
>>>>
>>>> I see you do not handle
>>>>
>>>> int bar (int);
>>>> void baz (int);
>>>> void bla (int);
>>>> int s;
>>>> void
>>>> foo (int y)
>>>> {
>>>>   int a;
>>>>   if (y == 6)
>>>>     {
>>>>       s = 1;
>>>>       bla (5);
>>>>       a = bar (7);
>>>>     }
>>>>   else
>>>>     {
>>>>       s = 1;
>>>>       bla (5);
>>>>       a = bar (7);
>>>>     }
>>>>   baz (a);
>>>> }
>>>>
>>>
>>> I think a more basic example is this one from PR52009, which is basically 
>>> about
>>> stores:
>>> ...
>>> int z;
>>>
>>> void
>>> foo (int y)
>>> {
>>>  int a;
>>>  if (y == 6)
>>>    z = 5;
>>>  else
>>>    z = 5;
>>> }
>>> ...
>>>
>>> I proposed a patch for that separately:
>>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at 
>>> the
>>> patch now, I suppose I could use part of that patch as infrastructure for 
>>> the
>>> todo above.
>>
>> Yes, I think so.
>>
>>>> and I don't see how you need the new ref->vdef member
>>>
>>> I'm regarding the result of the ref as a compound <vdef, result>. Most of 
>>> the
>>> time I could (as I did in an earlier version of the patch) deduce vdef as
>>> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when
>>> result is NULL_TREE.
>>>
>>>> - that is also
>>>> not handled consistently (not in hashing or comparing, etc.).
>>>
>>> I handle the vdef as a result.
>>
>> Ok.  Can you please rename ->vdef to ->result_vdef then?
>>
>
> Done.
>
>>>> Can't you
>>>> always use SSA_VAL (vuse)?
>>>
>>> I'm interested in storing the vdef, not the vuse.
>>
>> I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of
>> the stmt defining vuse).  No?
>>
>
> Right.
>
>>>>
>>>> Btw, what's
>>>>
>>>> +  if (vdef)
>>>> +    VN_INFO (vdef)->use_processed = true;
>>>>
>>>> for?  We arrive here by visiting the VDEF after all and visit_use does 
>>>> already
>>>> set the use processed.  Is it that we end up visiting the call twice 
>>>> because
>>>> of the lhs SSA name definition and the virtual operand definition?
>>>
>>> Yes.
>>>
>>>> If so then
>>>> visit_use should instead do
>>>>
>>>> FOR_EACH_SSA_DEF_OPERAND (...)
>>>>    VN_INFO (def)->use_processed = true;
>>>>
>>>> and defs_to_varying then no longer needs to do that.
>>>>
>>>
>>> factored mark_use_processed out of defs_to_varying, calling 
>>> mark_use_processed
>>> at start of visit_use.
>>
>> Thanks.  Which case hits SSA_NAME_DEF_STMT == NULL btw?  The
>> default-def for the virtual operand?
>>
>
> SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed.
>
> I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm
> still somewhat confused as to what is the difference. I suppose we might have 
> a
> nop statement that is a default def, while it's unequal to NULL.


All default defs have a GIMPLE_NOP as definition, so I was wondering for
which case you see a NULL SSA_NAME_DEF_STMT (and only came up
with a virtual operand maybe?)

> Updated in committed patch.
>
>>>>>> Otherwise, ok for trunk?
>>>> Let's give this one more iteration, but I guess the basic idea should work.
>>>>
>>>
>>> bootstrapped and reg-tested on x86_64 (ada inclusive).
>>>
>>> Is this patch ok, or is the todo required?
>>
>> No, you can followup with that.
>>
>> Ok with s/vdef/result_vdef/
>>
>
> retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of
> '!stmt'.

Thanks,
Richard.

> Thanks.
> - Tom

Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

Reply via email to