On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 26/04/12 12:20, Richard Guenther wrote: >> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <tom_devr...@mentor.com> >> wrote: >>> On 25/04/12 11:57, Richard Guenther wrote: >>> <SNIP> >>>>>>>> Hmm. I'm not sure we can conclude that they have the same value! >>>>>>>> >>>>>>>> +int bar (int); >>>>>>>> +void baz (int); >>>>>>>> +void bla (int); >>>>>>>> + >>>>>>>> +void >>>>>>>> +foo (int y) >>>>>>>> +{ >>>>>>>> + int a; >>>>>>>> + if (y == 6) >>>>>>>> + { >>>>>>>> + bla (5); >>>>>>>> + a = bar (7); >>>>>>>> + } >>>>>>>> + else >>>>>>>> + { >>>>>>>> + bla (5); >>>>>>>> + a = bar (7); >>>>>>>> + } >>>>>>>> + baz (a); >>>>>>>> +} >>>>>>>> >>>>>>>> I can implement bla as >>>>>>>> >>>>>>>> void bla(int) { a = random (); } >>>>>>>> >>>>>>>> thus a would not have the same value on both paths >>>>>> >>>>>> I think it's hard to decide whether they have the same value or not, >>>>>> since we >>>>>> cannot write a test-case that compares the results of calls from 2 >>>>>> branches of >>>>>> an if statement. >>>> void *foo () >>>> { >>>> return __builtin_return_address (0); >>>> } >>>> >>>> void *bar (_Bool b) >>>> { >>>> if (b) >>>> return foo (); >>>> else >>>> return foo (); >>>> } >>>> >>>> int main() >>>> { >>>> if (bar(true) == bar(false)) >>>> abort (); >>>> } >>>> >>>> ok ... outside of the scope of standard "C", but we certainly _can_ do >>>> this. >>>> Which would question tail-merging the above at all, of course. >>>> >>> >>> I added noinline to make sure there's no variation from inlining. >>> ... >>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ >>> ((__noreturn__)); >>> >>> __attribute__((noinline)) >>> void *foo () >>> { >>> return __builtin_return_address (0); >>> } >>> >>> __attribute__((noinline)) >>> void *bar (int b) >>> { >>> if (b) >>> return foo (); >>> else >>> return foo (); >>> } >>> >>> int main() >>> { >>> void *a, *b; >>> a = bar (1); >>> b = bar (0); >>> if (a == b) >>> abort (); >>> return 0; >>> } >>> ... >>> >>> This test-case passes with: >>> ... >>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge >>> -fno-crossjumping >>> ... >>> >>> and fails with: >>> ... >>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge >>> -fcrossjumping >>> ... >>> >>> with the proposed patch, it also fails with: >>> ... >>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge >>> -fno-crossjumping >>> ... >>> >>> Tree-tail-merge performs the same transformation as cross-jumping for this >>> test-case. I think that the transformation is valid, and that the test-case >>> behaves as expected. Subsequent calls to foo may or may not return the same >>> value, depending on the optimizations, we can't assert a != b, that's just >>> the >>> nature of __builtin_return_address. >>> >>> Btw, this example is not what I meant when I said 'a test-case that >>> compares the >>> results of calls from 2 branches of an if statement'. I tried to say that >>> the >>> semantics of C is defined in terms of taking either 1 branch or the other, >>> rather than executing both in parallel, after which both results would be >>> available. From that point of view, this example compares subsequent calls >>> to >>> foo, not parallel. >> >> Ah, I see. Btw, I was not suggesting that we should not optimize the above. >> >>>>>> I started looking at the problem in terms of subsequent calls. Consider >>>>>> the >>>>>> imaginary attribute nosideeffect, similar to const and pure. >>>>>> >>>>>> const: >>>>>> - no effects except the return value >>>>>> - the return value depends only on the parameters >>>>>> >>>>>> pure: >>>>>> - no effects except the return value >>>>>> - the return value depends only on the parameters and/or global variables >>>>>> >>>>>> nosideeffect: >>>>>> - no effects except the return value >>>>>> >>>>>> The code fragment: >>>>>> ... >>>>>> extern int f (int) __attribute ((nosideeffect)); >>>>>> int g() >>>>>> { >>>>>> int a; >>>>>> int b; >>>>>> a = f (5); >>>>>> b = f (5); >>>>>> return a + b; >>>>>> } >>>>>> ... >>>>>> >>>>>> would result in a gimple sequence more or less like this: >>>>>> ... >>>>>> # VUSE <.MEMD.1712_4(D)> >>>>>> aD.1707_1 = fD.1704 (5); >>>>>> # VUSE <.MEMD.1712_4(D)> >>>>>> bD.1708_2 = fD.1704 (5); >>>>>> D.1710_3 = aD.1707_1 + bD.1708_2; >>>>>> # VUSE <.MEMD.1712_6> >>>>>> return D.1710_3; >>>>>> ... >>>>>> >>>>>> The value numbering modification I'm proposing would say that a and b >>>>>> have the >>>>>> same value, which is not true. I updated the patch to ensure in this >>>>>> case that a >>>>>> and b would be seen as different. >>>>>> Note that things won't break if the attribute is missing on a function, >>>>>> since >>>>>> that just means that the function would have a vdef, and there would be >>>>>> no 2 >>>>>> subsequent function calls with the same vuse. >>>>>> >>>>>>>> - but that is not what >>>>>>>> matters - because obviously we can still merge the blocks. >>>>>> >>>>>> Right. >>>>>> >>>>>>>> That is, >>>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla >>>>>>>> (5) >>>>>>>> are the same. But is that not what your value-numbering changes will >>>>>>>> assert? >>>>>> >>>>>> In the case of calls in different branches of an control flow statement, >>>>>> we can >>>>>> aggressively assume they're the same, since we cannot write a check that >>>>>> would >>>>>> start failing. >>>> Apart from the above ... which shows that even the returned values can >>>> matter. >>>> >>>>>> In the case of subsequent calls with side effects, the vuses will never >>>>>> be the same. >>>>>> >>>>>> In the case of subsequent calls without side effects, but not pure or >>>>>> const, we >>>>>> can't assume they have the same result. AFAIK, there's currently no way >>>>>> to >>>>>> specify such a function, but the updated patch should be able handle >>>>>> this. >>>>>> >>>>>>>> (not sure how you achieve this, but it looks like you insert/lookup >>>>>>>> general >>>>>>>> calls, thus not pure/const ones >>>>>> >>>>>> Correct. >>>>>> >>>>>>>> - that could easily lead to miscompiles(?)). >>>>>>>> >>>> And to increased compile-time (not sure if it matters). >>>> >>>>>> I have bootstrapped and reg-tested the updated (and the previous) patch >>>>>> on >>>>>> x86_64 (ada inclusive), and found no issues. >>>>>> >>>>>> Can you think of a test-case that fails with the previous or the updated >>>>>> patch? >>>> I see you do not handle >>>> >>> >>> I assume you mean 'not handle' in the sense of 'handle conservatively'. >> >> Yes. >> >>>> struct S { int i; }; >>>> struct S foo (void); >>>> struct S bar (void) >>>> { >>>> struct S s1, s2; >>>> if (...) >>>> s = foo (); >>>> else >>>> s = foo (); >>>> >>>> because the calls have a LHS that is not an SSA name. >>> >>> Indeed, the gvn patch handles this example conservatively, and >>> tree-tail-merge >>> fails to optimize this test-case: >>> ... >>> struct S { int i; }; >>> extern struct S foo (void); >>> extern int foo2 (void); >>> struct S s; >>> int bar (int c) { >>> int r; >>> if (c) >>> { >>> s = foo (); >>> r = foo2 (); >>> } >>> else >>> { >>> s = foo (); >>> r = foo2 (); >>> } >>> return r; >>> } >>> ... >>> >>> A todo. >>> >>>> In general >>>> value-numbering virtual operands will change what further consumers lookup. >>>> So changing >>>> >>>> # MEM_2 = VUSE <MEM_3> >>>> foo (); >>>> # VUSE <MEM_2> >>>> ... = x; >>>> >>>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use >>>> MEM_3, not considering a clobber by foo. >>> >>> AFAIU, the patch doesn't do this type of value numbering. >>> >>>> This might not actually be an >>>> issue (after all we do this for stores, too). >>>> >>>> I see you do not handle >>>> >>>> int bar (int); >>>> void baz (int); >>>> void bla (int); >>>> int s; >>>> void >>>> foo (int y) >>>> { >>>> int a; >>>> if (y == 6) >>>> { >>>> s = 1; >>>> bla (5); >>>> a = bar (7); >>>> } >>>> else >>>> { >>>> s = 1; >>>> bla (5); >>>> a = bar (7); >>>> } >>>> baz (a); >>>> } >>>> >>> >>> I think a more basic example is this one from PR52009, which is basically >>> about >>> stores: >>> ... >>> int z; >>> >>> void >>> foo (int y) >>> { >>> int a; >>> if (y == 6) >>> z = 5; >>> else >>> z = 5; >>> } >>> ... >>> >>> I proposed a patch for that separately: >>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at >>> the >>> patch now, I suppose I could use part of that patch as infrastructure for >>> the >>> todo above. >> >> Yes, I think so. >> >>>> and I don't see how you need the new ref->vdef member >>> >>> I'm regarding the result of the ref as a compound <vdef, result>. Most of >>> the >>> time I could (as I did in an earlier version of the patch) deduce vdef as >>> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when >>> result is NULL_TREE. >>> >>>> - that is also >>>> not handled consistently (not in hashing or comparing, etc.). >>> >>> I handle the vdef as a result. >> >> Ok. Can you please rename ->vdef to ->result_vdef then? >> > > Done. > >>>> Can't you >>>> always use SSA_VAL (vuse)? >>> >>> I'm interested in storing the vdef, not the vuse. >> >> I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of >> the stmt defining vuse). No? >> > > Right. > >>>> >>>> Btw, what's >>>> >>>> + if (vdef) >>>> + VN_INFO (vdef)->use_processed = true; >>>> >>>> for? We arrive here by visiting the VDEF after all and visit_use does >>>> already >>>> set the use processed. Is it that we end up visiting the call twice >>>> because >>>> of the lhs SSA name definition and the virtual operand definition? >>> >>> Yes. >>> >>>> If so then >>>> visit_use should instead do >>>> >>>> FOR_EACH_SSA_DEF_OPERAND (...) >>>> VN_INFO (def)->use_processed = true; >>>> >>>> and defs_to_varying then no longer needs to do that. >>>> >>> >>> factored mark_use_processed out of defs_to_varying, calling >>> mark_use_processed >>> at start of visit_use. >> >> Thanks. Which case hits SSA_NAME_DEF_STMT == NULL btw? The >> default-def for the virtual operand? >> > > SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed. > > I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm > still somewhat confused as to what is the difference. I suppose we might have > a > nop statement that is a default def, while it's unequal to NULL.
All default defs have a GIMPLE_NOP as definition, so I was wondering for which case you see a NULL SSA_NAME_DEF_STMT (and only came up with a virtual operand maybe?) > Updated in committed patch. > >>>>>> Otherwise, ok for trunk? >>>> Let's give this one more iteration, but I guess the basic idea should work. >>>> >>> >>> bootstrapped and reg-tested on x86_64 (ada inclusive). >>> >>> Is this patch ok, or is the todo required? >> >> No, you can followup with that. >> >> Ok with s/vdef/result_vdef/ >> > > retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of > '!stmt'. Thanks, Richard. > Thanks. > - Tom