On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez <al...@redhat.com> wrote:
> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>
>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <al...@redhat.com> wrote:
>>>
>>> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <al...@redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi folks.
>>>>>
>>>>> This is a follow-up on Jeff and Richi's interaction on the
>>>>> aforementioned
>>>>> PR
>>>>> here:
>>>>>
>>>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>>>
>>>>> I decided to explore the idea of analyzing may-undefness on-demand,
>>>>> which
>>>>> actually looks rather cheap.
>>>>>
>>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>>>> have
>>>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
>>>>> code
>>>>> we
>>>>> already have.
>>>>>
>>>>> The attached patch fixes the bug without introducing any regressions.
>>>>>
>>>>> I also tested the patch by compiling 242 .ii files with -O3.  These
>>>>> were
>>>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>>>> degradation of 4 seconds within 27 minutes of user time:
>>>>>
>>>>>         tainted:        26:52
>>>>>         orig:           26:48
>>>>>
>>>>> This was the average aggregate time of two runs compiling all 242 .ii
>>>>> files.
>>>>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>>>>
>>>>
>>>>
>>>> +  while (!worklist.is_empty ())
>>>> +    {
>>>> +      name = worklist.pop ();
>>>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>>>> +
>>>> +      if (ssa_undefined_value_p (name, true))
>>>> +       return true;
>>>> +
>>>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>>
>>>> it should be already set as we use visited_ssa as "was it ever on the
>>>> worklist",
>>>> so maybe renaming it would be a good thing as well.
>>>
>>>
>>>
>>> I don't understand what you would prefer here.
>>
>>
>> Set the bit when you put name on the worklist (and do not do that if the
>> bit is set).  Thus simply remove the above and add a bitmap_set_bit
>> for the initial name you put on the worklist.
>>
>>>>
>>>> +             if (TREE_CODE (name) == SSA_NAME)
>>>> +               {
>>>> +                 /* If an SSA has already been seen, this may be a
>>>> loop.
>>>> +                    Fail conservatively.  */
>>>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>>>> (name)))
>>>> +                   return false;
>>>>
>>>> so to me "conservative" is returning true, not false.
>>>
>>>
>>>
>>> OK
>>>
>>>>
>>>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>> +                 worklist.safe_push (name);
>>>>
>>>> but for loops we can just continue and ignore this use.  And
>>>> bitmap_set_bit
>>>> returns whether it set a bit, thus
>>>>
>>>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>>>> (name)))
>>>>                   worklist.safe_push (name);
>>>>
>>>> should work?
>>>
>>>
>>>
>>> Fixed.
>>>
>>>>
>>>> +      /* Check that any SSA names used to define NAME is also fully
>>>> +        defined.  */
>>>> +      use_operand_p use_p;
>>>> +      ssa_op_iter iter;
>>>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>>>> +       {
>>>> +         name = USE_FROM_PTR (use_p);
>>>> +         if (TREE_CODE (name) == SSA_NAME)
>>>>
>>>> always true.
>>>>
>>>> +           {
>>>> +             /* If an SSA has already been seen, this may be a loop.
>>>> +                Fail conservatively.  */
>>>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>>>> +               return false;
>>>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>>> +             worklist.safe_push (name);
>>>>
>>>> See above.
>>>>
>>>> In principle the thing is sound but I'd like to be able to pass in a
>>>> bitmap of
>>>> known maybe-undefined/must-defined SSA names to have a cache for
>>>> multiple invocations from within a pass (like this unswitching case).
>>>
>>>
>>>
>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>
>>>>
>>>> Also once you hit defs that are in a post-dominated region of the loop
>>>> entry
>>>> you can treat them as not undefined (as their use invokes undefined
>>>> behavior).  This is also how you treat function parameters (well,
>>>> ssa_undefined_value_p does), where the call site invokes undefined
>>>> behavior
>>>> when passing in undefined values.  So we need an extra parameter
>>>> specifying
>>>> the post-dominance region.
>>>
>>>
>>>
>>> Done.
>>>
>>>>
>>>> You do not handle memory or calls conservatively which means the
>>>> existing
>>>> testcase only needs some obfuscation to become a problem again.  To fix
>>>> that before /* Check that any SSA names used to define NAME is also
>>>> fully
>>>> defined.  */ bail out conservatively, like
>>>>
>>>>    if (! is_gimple_assign (def)
>>>>       || gimple_assign_single_p (def))
>>>>     return true;
>>>
>>>
>>>
>>> As I asked previously, I understand the !is_gimple_assign, which will
>>> skip
>>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
>>> ==
>>> true"??
>>>
>>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't
>>> we
>>> want to follow up the def chain precisely on these?
>>>
>>> The attached implementation uses a cache, and a pre-computed
>>> post-dominance
>>> region.  A previous incantation of this patch (sans the post-dominance
>>> stuff) had performance within the noise of the previous implementation.
>>>
>>> I am testing again, and will do some performance comparisons in a bit,
>>> but
>>> for now-- are we on the same page here?  Is this what you wanted?
>>
>>
>> +      /* DEFs in the post-dominated region of the loop entry invoke
>> +        undefined behavior.  Adding any use won't make things any
>> +        worse.  */
>> +      for (unsigned i = 0; i < postdom.length (); ++i)
>> +       if (def->bb == postdom[i])
>>
>> gimple_bb (def)
>>
>> +         {
>> +           set_defined (t);
>> +           return false;
>> +         }
>>
>> I think you can't really return false here but need to continue processing
>> the worklist.  I'd say rather than a linear walk you can as well use
>> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
>> block instead?
>>
>> Note that the way you compute post-dominators doesn't work -- nothing
>> keeps them up-to-date when unswitching does CFG manipulations :/
>> Fortunately we have a way to compute them per region, see how
>> tree-if-conv.c
>> does that (build_region, calculate_dominance_info_for_region).
>
>
> If I understand correctly, we could compute them per region in
> tree_unswitch_single_loop() for each individual loop with what
> tree-if-conv.c uses.
>
> I could certainly abstract build_region/calculate_dominance_info_for_region
> into something new, say calculate_dominance_info_for_loop_region().  Then we
> could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you
> suggest.
>
> Question... computing the post-dom tree per region as I've just described
> will only create post-dom information for the loop region, which means that
> any definition outside of the loop will have zero dominance info.

Yes.

> What if the definition in question post-dominates the loop entry but is
> outside/after of the loop?

How would we ever arrive at such def?  Once we reach a def before the
region/loop
we know it's fine to use (the missing "stop" point I pointed out).

>  We will have no post-dom information for this.
> In this case, could we just ignore and continue evaluating the SSA_NAME with
> the rest of our heuristics, or did you have something else in mind?  Another
> option would be to calculate the post-dom information for the entire
> function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but
> that seems rather expensive.
>
> As a side note, at this point I just want to fix/close the PR in an
> acceptable manner, not come up with the end-all catch-all most-efficient
> patch for an unlikely scenario ;-).

Yeah, which is why I wonder whether the caching is worth the trouble (in it's
current form) for the unswitching use (given it's other restrictions
on conditions
to unswitch).

Richard.

> Thanks for taking the time to review and offer suggestions here.
>
> Aldy

Reply via email to