On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <al...@redhat.com> wrote: > On 01/04/2017 07:11 AM, Richard Biener wrote: >> >> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <al...@redhat.com> wrote: >>> >>> On 12/20/2016 09:16 AM, Richard Biener wrote: >>>> >>>> >>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <al...@redhat.com> >>>> wrote: >>>>> >>>>> >>>>> Hi folks. >>>>> >>>>> This is a follow-up on Jeff and Richi's interaction on the >>>>> aforementioned >>>>> PR >>>>> here: >>>>> >>>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html >>>>> >>>>> I decided to explore the idea of analyzing may-undefness on-demand, >>>>> which >>>>> actually looks rather cheap. >>>>> >>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already >>>>> have >>>>> auto_sbitmap's. I've implemented the former based on auto_sbitmap's >>>>> code >>>>> we >>>>> already have. >>>>> >>>>> The attached patch fixes the bug without introducing any regressions. >>>>> >>>>> I also tested the patch by compiling 242 .ii files with -O3. These >>>>> were >>>>> gathered from a stage1 build with -save-temps. There is a slight time >>>>> degradation of 4 seconds within 27 minutes of user time: >>>>> >>>>> tainted: 26:52 >>>>> orig: 26:48 >>>>> >>>>> This was the average aggregate time of two runs compiling all 242 .ii >>>>> files. >>>>> IMO, this looks reasonable. It is after all, -O3. Is it acceptable? >>>> >>>> >>>> >>>> + while (!worklist.is_empty ()) >>>> + { >>>> + name = worklist.pop (); >>>> + gcc_assert (TREE_CODE (name) == SSA_NAME); >>>> + >>>> + if (ssa_undefined_value_p (name, true)) >>>> + return true; >>>> + >>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); >>>> >>>> it should be already set as we use visited_ssa as "was it ever on the >>>> worklist", >>>> so maybe renaming it would be a good thing as well. >>> >>> >>> >>> I don't understand what you would prefer here. >> >> >> Set the bit when you put name on the worklist (and do not do that if the >> bit is set). Thus simply remove the above and add a bitmap_set_bit >> for the initial name you put on the worklist. >> >>>> >>>> + if (TREE_CODE (name) == SSA_NAME) >>>> + { >>>> + /* If an SSA has already been seen, this may be a >>>> loop. >>>> + Fail conservatively. */ >>>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION >>>> (name))) >>>> + return false; >>>> >>>> so to me "conservative" is returning true, not false. >>> >>> >>> >>> OK >>> >>>> >>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); >>>> + worklist.safe_push (name); >>>> >>>> but for loops we can just continue and ignore this use. And >>>> bitmap_set_bit >>>> returns whether it set a bit, thus >>>> >>>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION >>>> (name))) >>>> worklist.safe_push (name); >>>> >>>> should work? >>> >>> >>> >>> Fixed. >>> >>>> >>>> + /* Check that any SSA names used to define NAME is also fully >>>> + defined. */ >>>> + use_operand_p use_p; >>>> + ssa_op_iter iter; >>>> + FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE) >>>> + { >>>> + name = USE_FROM_PTR (use_p); >>>> + if (TREE_CODE (name) == SSA_NAME) >>>> >>>> always true. >>>> >>>> + { >>>> + /* If an SSA has already been seen, this may be a loop. >>>> + Fail conservatively. */ >>>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) >>>> + return false; >>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); >>>> + worklist.safe_push (name); >>>> >>>> See above. >>>> >>>> In principle the thing is sound but I'd like to be able to pass in a >>>> bitmap of >>>> known maybe-undefined/must-defined SSA names to have a cache for >>>> multiple invocations from within a pass (like this unswitching case). >>> >>> >>> >>> Done, though bitmaps are now calculated as part of the instantiation. >>> >>>> >>>> Also once you hit defs that are in a post-dominated region of the loop >>>> entry >>>> you can treat them as not undefined (as their use invokes undefined >>>> behavior). This is also how you treat function parameters (well, >>>> ssa_undefined_value_p does), where the call site invokes undefined >>>> behavior >>>> when passing in undefined values. So we need an extra parameter >>>> specifying >>>> the post-dominance region. >>> >>> >>> >>> Done. >>> >>>> >>>> You do not handle memory or calls conservatively which means the >>>> existing >>>> testcase only needs some obfuscation to become a problem again. To fix >>>> that before /* Check that any SSA names used to define NAME is also >>>> fully >>>> defined. */ bail out conservatively, like >>>> >>>> if (! is_gimple_assign (def) >>>> || gimple_assign_single_p (def)) >>>> return true; >>> >>> >>> >>> As I asked previously, I understand the !is_gimple_assign, which will >>> skip >>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) >>> == >>> true"?? >>> >>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't >>> we >>> want to follow up the def chain precisely on these? >>> >>> The attached implementation uses a cache, and a pre-computed >>> post-dominance >>> region. A previous incantation of this patch (sans the post-dominance >>> stuff) had performance within the noise of the previous implementation. >>> >>> I am testing again, and will do some performance comparisons in a bit, >>> but >>> for now-- are we on the same page here? Is this what you wanted? >> >> >> + /* DEFs in the post-dominated region of the loop entry invoke >> + undefined behavior. Adding any use won't make things any >> + worse. */ >> + for (unsigned i = 0; i < postdom.length (); ++i) >> + if (def->bb == postdom[i]) >> >> gimple_bb (def) >> >> + { >> + set_defined (t); >> + return false; >> + } >> >> I think you can't really return false here but need to continue processing >> the worklist. I'd say rather than a linear walk you can as well use >> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry >> block instead? >> >> Note that the way you compute post-dominators doesn't work -- nothing >> keeps them up-to-date when unswitching does CFG manipulations :/ >> Fortunately we have a way to compute them per region, see how >> tree-if-conv.c >> does that (build_region, calculate_dominance_info_for_region). > > > If I understand correctly, we could compute them per region in > tree_unswitch_single_loop() for each individual loop with what > tree-if-conv.c uses. > > I could certainly abstract build_region/calculate_dominance_info_for_region > into something new, say calculate_dominance_info_for_loop_region(). Then we > could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you > suggest. > > Question... computing the post-dom tree per region as I've just described > will only create post-dom information for the loop region, which means that > any definition outside of the loop will have zero dominance info.
Yes. > What if the definition in question post-dominates the loop entry but is > outside/after of the loop? How would we ever arrive at such def? Once we reach a def before the region/loop we know it's fine to use (the missing "stop" point I pointed out). > We will have no post-dom information for this. > In this case, could we just ignore and continue evaluating the SSA_NAME with > the rest of our heuristics, or did you have something else in mind? Another > option would be to calculate the post-dom information for the entire > function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but > that seems rather expensive. > > As a side note, at this point I just want to fix/close the PR in an > acceptable manner, not come up with the end-all catch-all most-efficient > patch for an unlikely scenario ;-). Yeah, which is why I wonder whether the caching is worth the trouble (in it's current form) for the unswitching use (given it's other restrictions on conditions to unswitch). Richard. > Thanks for taking the time to review and offer suggestions here. > > Aldy