On 26/04/12 12:20, Richard Guenther wrote: > On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries <tom_devr...@mentor.com> wrote: >> On 25/04/12 11:57, Richard Guenther wrote: >> <SNIP> >>>>>>> Hmm. I'm not sure we can conclude that they have the same value! >>>>>>> >>>>>>> +int bar (int); >>>>>>> +void baz (int); >>>>>>> +void bla (int); >>>>>>> + >>>>>>> +void >>>>>>> +foo (int y) >>>>>>> +{ >>>>>>> + int a; >>>>>>> + if (y == 6) >>>>>>> + { >>>>>>> + bla (5); >>>>>>> + a = bar (7); >>>>>>> + } >>>>>>> + else >>>>>>> + { >>>>>>> + bla (5); >>>>>>> + a = bar (7); >>>>>>> + } >>>>>>> + baz (a); >>>>>>> +} >>>>>>> >>>>>>> I can implement bla as >>>>>>> >>>>>>> void bla(int) { a = random (); } >>>>>>> >>>>>>> thus a would not have the same value on both paths >>>>> >>>>> I think it's hard to decide whether they have the same value or not, >>>>> since we >>>>> cannot write a test-case that compares the results of calls from 2 >>>>> branches of >>>>> an if statement. >>> void *foo () >>> { >>> return __builtin_return_address (0); >>> } >>> >>> void *bar (_Bool b) >>> { >>> if (b) >>> return foo (); >>> else >>> return foo (); >>> } >>> >>> int main() >>> { >>> if (bar(true) == bar(false)) >>> abort (); >>> } >>> >>> ok ... outside of the scope of standard "C", but we certainly _can_ do this. >>> Which would question tail-merging the above at all, of course. >>> >> >> I added noinline to make sure there's no variation from inlining. >> ... >> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ >> ((__noreturn__)); >> >> __attribute__((noinline)) >> void *foo () >> { >> return __builtin_return_address (0); >> } >> >> __attribute__((noinline)) >> void *bar (int b) >> { >> if (b) >> return foo (); >> else >> return foo (); >> } >> >> int main() >> { >> void *a, *b; >> a = bar (1); >> b = bar (0); >> if (a == b) >> abort (); >> return 0; >> } >> ... >> >> This test-case passes with: >> ... >> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge >> -fno-crossjumping >> ... >> >> and fails with: >> ... >> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge >> -fcrossjumping >> ... >> >> with the proposed patch, it also fails with: >> ... >> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge >> -fno-crossjumping >> ... >> >> Tree-tail-merge performs the same transformation as cross-jumping for this >> test-case. I think that the transformation is valid, and that the test-case >> behaves as expected. Subsequent calls to foo may or may not return the same >> value, depending on the optimizations, we can't assert a != b, that's just >> the >> nature of __builtin_return_address. >> >> Btw, this example is not what I meant when I said 'a test-case that compares >> the >> results of calls from 2 branches of an if statement'. I tried to say that the >> semantics of C is defined in terms of taking either 1 branch or the other, >> rather than executing both in parallel, after which both results would be >> available. From that point of view, this example compares subsequent calls to >> foo, not parallel. > > Ah, I see. Btw, I was not suggesting that we should not optimize the above. > >>>>> I started looking at the problem in terms of subsequent calls. Consider >>>>> the >>>>> imaginary attribute nosideeffect, similar to const and pure. >>>>> >>>>> const: >>>>> - no effects except the return value >>>>> - the return value depends only on the parameters >>>>> >>>>> pure: >>>>> - no effects except the return value >>>>> - the return value depends only on the parameters and/or global variables >>>>> >>>>> nosideeffect: >>>>> - no effects except the return value >>>>> >>>>> The code fragment: >>>>> ... >>>>> extern int f (int) __attribute ((nosideeffect)); >>>>> int g() >>>>> { >>>>> int a; >>>>> int b; >>>>> a = f (5); >>>>> b = f (5); >>>>> return a + b; >>>>> } >>>>> ... >>>>> >>>>> would result in a gimple sequence more or less like this: >>>>> ... >>>>> # VUSE <.MEMD.1712_4(D)> >>>>> aD.1707_1 = fD.1704 (5); >>>>> # VUSE <.MEMD.1712_4(D)> >>>>> bD.1708_2 = fD.1704 (5); >>>>> D.1710_3 = aD.1707_1 + bD.1708_2; >>>>> # VUSE <.MEMD.1712_6> >>>>> return D.1710_3; >>>>> ... >>>>> >>>>> The value numbering modification I'm proposing would say that a and b >>>>> have the >>>>> same value, which is not true. I updated the patch to ensure in this case >>>>> that a >>>>> and b would be seen as different. >>>>> Note that things won't break if the attribute is missing on a function, >>>>> since >>>>> that just means that the function would have a vdef, and there would be >>>>> no 2 >>>>> subsequent function calls with the same vuse. >>>>> >>>>>>> - but that is not what >>>>>>> matters - because obviously we can still merge the blocks. >>>>> >>>>> Right. >>>>> >>>>>>> That is, >>>>>>> we cannot be sure that the side-effects on memory of bla (5) and bla (5) >>>>>>> are the same. But is that not what your value-numbering changes will >>>>>>> assert? >>>>> >>>>> In the case of calls in different branches of an control flow statement, >>>>> we can >>>>> aggressively assume they're the same, since we cannot write a check that >>>>> would >>>>> start failing. >>> Apart from the above ... which shows that even the returned values can >>> matter. >>> >>>>> In the case of subsequent calls with side effects, the vuses will never >>>>> be the same. >>>>> >>>>> In the case of subsequent calls without side effects, but not pure or >>>>> const, we >>>>> can't assume they have the same result. AFAIK, there's currently no way to >>>>> specify such a function, but the updated patch should be able handle this. >>>>> >>>>>>> (not sure how you achieve this, but it looks like you insert/lookup >>>>>>> general >>>>>>> calls, thus not pure/const ones >>>>> >>>>> Correct. >>>>> >>>>>>> - that could easily lead to miscompiles(?)). >>>>>>> >>> And to increased compile-time (not sure if it matters). >>> >>>>> I have bootstrapped and reg-tested the updated (and the previous) patch on >>>>> x86_64 (ada inclusive), and found no issues. >>>>> >>>>> Can you think of a test-case that fails with the previous or the updated >>>>> patch? >>> I see you do not handle >>> >> >> I assume you mean 'not handle' in the sense of 'handle conservatively'. > > Yes. > >>> struct S { int i; }; >>> struct S foo (void); >>> struct S bar (void) >>> { >>> struct S s1, s2; >>> if (...) >>> s = foo (); >>> else >>> s = foo (); >>> >>> because the calls have a LHS that is not an SSA name. >> >> Indeed, the gvn patch handles this example conservatively, and >> tree-tail-merge >> fails to optimize this test-case: >> ... >> struct S { int i; }; >> extern struct S foo (void); >> extern int foo2 (void); >> struct S s; >> int bar (int c) { >> int r; >> if (c) >> { >> s = foo (); >> r = foo2 (); >> } >> else >> { >> s = foo (); >> r = foo2 (); >> } >> return r; >> } >> ... >> >> A todo. >> >>> In general >>> value-numbering virtual operands will change what further consumers lookup. >>> So changing >>> >>> # MEM_2 = VUSE <MEM_3> >>> foo (); >>> # VUSE <MEM_2> >>> ... = x; >>> >>> by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use >>> MEM_3, not considering a clobber by foo. >> >> AFAIU, the patch doesn't do this type of value numbering. >> >>> This might not actually be an >>> issue (after all we do this for stores, too). >>> >>> I see you do not handle >>> >>> int bar (int); >>> void baz (int); >>> void bla (int); >>> int s; >>> void >>> foo (int y) >>> { >>> int a; >>> if (y == 6) >>> { >>> s = 1; >>> bla (5); >>> a = bar (7); >>> } >>> else >>> { >>> s = 1; >>> bla (5); >>> a = bar (7); >>> } >>> baz (a); >>> } >>> >> >> I think a more basic example is this one from PR52009, which is basically >> about >> stores: >> ... >> int z; >> >> void >> foo (int y) >> { >> int a; >> if (y == 6) >> z = 5; >> else >> z = 5; >> } >> ... >> >> I proposed a patch for that separately: >> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html. Mmmm, looking at the >> patch now, I suppose I could use part of that patch as infrastructure for the >> todo above. > > Yes, I think so. > >>> and I don't see how you need the new ref->vdef member >> >> I'm regarding the result of the ref as a compound <vdef, result>. Most of the >> time I could (as I did in an earlier version of the patch) deduce vdef as >> gimple_vdef (SSA_NAME_DEF_STMT (result)), but not always, in particular when >> result is NULL_TREE. >> >>> - that is also >>> not handled consistently (not in hashing or comparing, etc.). >> >> I handle the vdef as a result. > > Ok. Can you please rename ->vdef to ->result_vdef then? >
Done. >>> Can't you >>> always use SSA_VAL (vuse)? >> >> I'm interested in storing the vdef, not the vuse. > > I meant that result_vdef will always be SSA_VAL (vuse) (the vdef of > the stmt defining vuse). No? > Right. >>> >>> Btw, what's >>> >>> + if (vdef) >>> + VN_INFO (vdef)->use_processed = true; >>> >>> for? We arrive here by visiting the VDEF after all and visit_use does >>> already >>> set the use processed. Is it that we end up visiting the call twice because >>> of the lhs SSA name definition and the virtual operand definition? >> >> Yes. >> >>> If so then >>> visit_use should instead do >>> >>> FOR_EACH_SSA_DEF_OPERAND (...) >>> VN_INFO (def)->use_processed = true; >>> >>> and defs_to_varying then no longer needs to do that. >>> >> >> factored mark_use_processed out of defs_to_varying, calling >> mark_use_processed >> at start of visit_use. > > Thanks. Which case hits SSA_NAME_DEF_STMT == NULL btw? The > default-def for the virtual operand? > SSA_NAME_DEF_STMT == NULL is tested by !stmt in mark_use_processed. I suppose I better use SSA_NAME_IS_DEFAULT_DEF, as is done in visit_use. I'm still somewhat confused as to what is the difference. I suppose we might have a nop statement that is a default def, while it's unequal to NULL. Updated in committed patch. >>>>> Otherwise, ok for trunk? >>> Let's give this one more iteration, but I guess the basic idea should work. >>> >> >> bootstrapped and reg-tested on x86_64 (ada inclusive). >> >> Is this patch ok, or is the todo required? > > No, you can followup with that. > > Ok with s/vdef/result_vdef/ > retested and committed with this change and SSA_NAME_IS_DEFAULT_DEF instead of '!stmt'. Thanks. - Tom