Re: [PATCH][middle-end/88784] Middle end is missing some optimizations about unsigned

Martin Liška Fri, 30 Aug 2019 03:18:46 -0700
@Richi: PING^1

On 7/16/19 8:35 AM, Li Jia He wrote:
> 
> 
> On 2019/7/2 4:51 PM, Richard Biener wrote:
>> On Tue, 2 Jul 2019, Richard Biener wrote:
>>
>>> On Tue, 2 Jul 2019, Li Jia He wrote:
>>>
>>>>
>>>>
>>>> On 2019/7/1 3:30 PM, Richard Biener wrote:
>>>>> On Fri, 28 Jun 2019, Andrew Pinski wrote:
>>>>>
>>>>>> On Thu, Jun 27, 2019 at 9:55 PM Li Jia He <heli...@linux.ibm.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2019/6/27 11:48 PM, Jeff Law wrote:
>>>>>>>> On 6/27/19 12:11 AM, Li Jia He wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> According to the optimizable case described by Qi Feng on
>>>>>>>>> issue 88784, we can combine the cases into the following:
>>>>>>>>>
>>>>>>>>> 1. x >  y  &&  x != XXX_MIN  -->   x > y
>>>>>>>>> 2. x >  y  &&  x == XXX_MIN  -->   false
>>>>>>>>> 3. x <= y  &&  x == XXX_MIN  -->   x == XXX_MIN
>>>>>>>>>
>>>>>>>>> 4. x <  y  &&  x != XXX_MAX  -->   x < y
>>>>>>>>> 5. x <  y  &&  x == XXX_MAX  -->   false
>>>>>>>>> 6. x >= y  &&  x == XXX_MAX  -->   x == XXX_MAX
>>>>>>>>>
>>>>>>>>> 7. x >  y  ||  x != XXX_MIN  -->   x != XXX_MIN
>>>>>>>>> 8. x <= y  ||  x != XXX_MIN  -->   true
>>>>>>>>> 9. x <= y  ||  x == XXX_MIN  -->   x <= y
>>>>>>>>>
>>>>>>>>> 10. x <  y  ||  x != XXX_MAX  -->   x != UXXX_MAX
>>>>>>>>> 11. x >= y  ||  x != XXX_MAX  -->   true
>>>>>>>>> 12. x >= y  ||  x == XXX_MAX  -->   x >= y
>>>>>>>>>
>>>>>>>>> Note: XXX_MIN represents the minimum value of type x.
>>>>>>>>>          XXX_MAX represents the maximum value of type x.
>>>>>>>>>
>>>>>>>>> Here we don't need to care about whether the operation is
>>>>>>>>> signed or unsigned.  For example, in the below equation:
>>>>>>>>>
>>>>>>>>> 'x >  y  &&  x != XXX_MIN  -->   x > y'
>>>>>>>>>
>>>>>>>>> If the x type is signed int and XXX_MIN is INT_MIN, we can
>>>>>>>>> optimize it to 'x > y'.  However, if the type of x is unsigned
>>>>>>>>> int and XXX_MIN is 0, we can still optimize it to 'x > y'.
>>>>>>>>>
>>>>>>>>> The regression testing for the patch was done on GCC mainline on
>>>>>>>>>
>>>>>>>>>        powerpc64le-unknown-linux-gnu (Power 9 LE)
>>>>>>>>>
>>>>>>>>> with no regressions.  Is it OK for trunk ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Lijia He
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog
>>>>>>>>>
>>>>>>>>> 2019-06-27  Li Jia He  <heli...@linux.ibm.com>
>>>>>>>>>            Qi Feng  <ffen...@linux.ibm.com>
>>>>>>>>>
>>>>>>>>>        PR middle-end/88784
>>>>>>>>>        * gimple-fold.c (and_comparisons_contain_equal_operands): New
>>>>>>>>> function.
>>>>>>>>>        (and_comparisons_1): Use
>>>>>>>>> and_comparisons_contain_equal_operands.
>>>>>>>>>        (or_comparisons_contain_equal_operands): New function.
>>>>>>>>>        (or_comparisons_1): Use or_comparisons_contain_equal_operands.
>>>>>>>> Would this be better done via match.pd?  ISTM this transformation
>>>>>>>> would
>>>>>>>> be well suited for that framework.
>>>>>>>
>>>>>>> Hi, Jeff
>>>>>>>
>>>>>>> I did this because of the following test case:
>>>>>>> `
>>>>>>> _Bool comp(unsigned x, unsigned y)
>>>>>>> {
>>>>>>>      return x > y && x != 0;
>>>>>>> }
>>>>>>> `
>>>>>>> The gimple file dumped on the power platform is:
>>>>>>> `
>>>>>>> comp (unsigned int x, unsigned int y)
>>>>>>> {
>>>>>>>      _Bool D.2837;
>>>>>>>      int iftmp.0;
>>>>>>>
>>>>>>>      if (x > y) goto <D.2841>; else goto <D.2839>;
>>>>>>>      <D.2841>:
>>>>>>>      if (x != 0) goto <D.2842>; else goto <D.2839>;
>>>>>>>      <D.2842>:
>>>>>>>      iftmp.0 = 1;
>>>>>>>      goto <D.2840>;
>>>>>>>      <D.2839>:
>>>>>>>      iftmp.0 = 0;
>>>>>>>      <D.2840>:
>>>>>>>      D.2837 = (_Bool) iftmp.0;
>>>>>>>      return D.2837;
>>>>>>> }
>>>>>>> `
>>>>>>> However, the gimple file dumped on x86 is
>>>>>>> `
>>>>>>> comp (unsigned int x, unsigned int y)
>>>>>>> {
>>>>>>>      _Bool D.2837;
>>>>>>>
>>>>>>>      _1 = x > y;
>>>>>>>      _2 = x != 0;
>>>>>>>      _3 = _1 & _2;
>>>>>>>      _4 = (int) _3;
>>>>>>>      D.2837 = (_Bool) _4;
>>>>>>>      return D.2837;
>>>>>>> }
>>>>>>> `
>>>>>>>
>>>>>>> The reason for the inconsistency between these two behaviors is param
>>>>>>> logical-op-non-short-circuit.  If we add the pattern to the match.pd
>>>>>>> file, we can only optimize the situation in which the statement is in
>>>>>>> the same basic block (logical-op-non-short-circuit=1, x86).  But for
>>>>>>> a cross-basic block (logical-op-non-short-circuit=0, power), match.pd
>>>>>>> can't handle this situation.
>>>>>>>
>>>>>>> Another reason is that I found out maybe_fold_and_comparisons and
>>>>>>> maybe_fold_or_comparisons are not only called by ifcombine pass but
>>>>>>> also by reassoc pass. Using this method can basically unify param
>>>>>>> logical-op-non-short-circuit=0 or 1.
>>>>>>
>>>>>>
>>>>>> As mentioned before ifcombine pass should be using gimple-match
>>>>>> instead of fold_build.  Try converting ifcombine over to gimple-match
>>>>>> infrastructure and add these to match.pd.
>>>>>
>>>>> Yes, I mentioned that in the PR.  The issue is that at the moment
>>>>> to combine x > y with x <= y you'd have to build GENERIC trees
>>>>> for both or temporary GIMPLE assign with a SSA def (and then feed
>>>>> that into the GENERIC or GIMPLE match.pd path).
>>>>
>>>> Hi,
>>>>
>>>> I did some experimentation using ‘temporary GIMPLE with a SSA def (and then
>>>> feed that into the GIMPLE match.pd path’.  Could we consider the code in 
>>>> the
>>>> attachment(I did a test and the code took effect)?
>>>
>>> No, that's excessive overhead IMHO - the whole point of these functions
>>> originally was to avoid build2 (...) on both conditionals.  If we
>>> now create two SSA names and two GIMPLE statements that's too much.
>>>
>>> As said it shouldn't be rocket science to teach genmatch to
>>> auto-generate those functions from match.pd patterns but it certainly
>>> is some work (less so for somebody with genmatch knowledge).
>>> I'll give it a poke...
>>
>> OK, it's not so easy.  I guess we could lower the cost of building
>> SSA names / gimple stmts significantly if we allocated them on the
>> stack like via
>>
>> gimple *stmt1 = XALLOCAVEC (char, gimple_size (GIMPLE_ASSIGN) + 2 *
>> sizeof (tree));
>> memset (stmt1, 0, ...);
>> gimple_set_code (stmt1, GIMPLE_ASSIGN);
>> gimple_set_num_ops (stmt1, 3);
>> gimple_init_singleton (stmt1);
>>
>> gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
>> gimple_assign_set_rhs_with_ops (&gsi, actual operation...);
>>
>> tree lhs1 = XALLOCA (tree_ssa_name);
>> memset (lhs1, 0, sizeof (tree_ssa_name));
>> TREE_SET_CODE (lhs1, SSA_NAME);
>> TREE_TYPE (lhs1) = ...;
>>
>> gimple_assing_set_lhs (stmt1, lhs1);
>>
>> it's all a bit ugly and some factoring in the generic gimple machinery
>> might easen this kind of hacks, but well...
>>
>> With the above you could then use
>>
>>    gimple_match_op op (gimple_match_cond::UNCOND, BIT_AND_EXPR /* or OR */,
>>               boolean_type_node, lhs1, lhs2);
>>    if (gimple_resimplify2 (NULL, &op, follow_all_ssa_edges))
>>      ... successfully simplified into 'op' ...
>>
>> with the simplification path then extracting the simplified result
>> from 'op'.  Note this deliberately leaves out passing a gimple_seq
>> as storage for more complex simplification results to match
>> existing behavior.
>>
>> Your patch has the GC allocation and SSA name (and namespace!)
>> allocation overhead and most definitely misses releasing the
>> SSA names you allocated.
>>
>> Note with the above hack the simplified result has to be checked
>> for mentions of lhs1 or lhs2 - a case we have to reject because
>> their definitions are transitional.
>>
> 
> Hi,
> 
>   I made some changes based on the recommendations. Would you like to
>   help me to see it again ? Sorry, it took so long time to provide the
>   patch.
> 
>   Note: 1. I keep the code for and_comparisons_1 and or_comparisons_1.
>     The reason is that I did not found a good way to handle the
>     optimization of '((x CODE1 y) AND (x CODE2 y))' in match.pd.
>     Maybe I missing some important information about match.pd.
>     2. The gimple_resimplify2 function is not used.  Since stmt1,
>     stmt2, lhs1 and lhs2 are allocated on the stack, Is there a
>     question with the value on the stack as the return value ?
>     I may have misunderstood Richard's intention.
> 
> Thanks,
> Lijia He
> 
>> Richard.
>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Lijia He
>>>>
>>>>>
>>>>> maybe_fold_and/or_comparisons handle two exploded binary expressions
>>>>> while the current match.pd entries handle at most one exploded one
>>>>> (the outermost then, either AND or OR).  But it would be definitely
>>>>> doable to auto-generate maybe_fold_and/or_comparisons from match.pd
>>>>> patterns which is what I'd ultimatively suggest to do (in some more
>>>>> generalized form maybe).  Either with a separate genmatch invocation
>>>>> or as part of the --gimple processing (not sure what is more feasible
>>>>> here).
>>>>>
>>>>> I told Li Jia He that I don't expect him to do this work.
>>>>>
>>>>> Note I didn't review the actual patch yet.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>
>>>
>>>
>>
Re: [PATCH][middle-end/88784] Middle end is missing some optimizations about unsigned

Reply via email to