@Richi: PING^1
On 7/16/19 8:35 AM, Li Jia He wrote:
>
>
> On 2019/7/2 4:51 PM, Richard Biener wrote:
>> On Tue, 2 Jul 2019, Richard Biener wrote:
>>
>>> On Tue, 2 Jul 2019, Li Jia He wrote:
>>>
>>>>
>>>>
>>>> On 2019/7/1 3:30 PM, Richard Biener wrote:
>>>>> On Fri, 28 Jun 2019, Andrew Pinski wrote:
>>>>>
>>>>>> On Thu, Jun 27, 2019 at 9:55 PM Li Jia He <heli...@linux.ibm.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2019/6/27 11:48 PM, Jeff Law wrote:
>>>>>>>> On 6/27/19 12:11 AM, Li Jia He wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> According to the optimizable case described by Qi Feng on
>>>>>>>>> issue 88784, we can combine the cases into the following:
>>>>>>>>>
>>>>>>>>> 1. x > y && x != XXX_MIN --> x > y
>>>>>>>>> 2. x > y && x == XXX_MIN --> false
>>>>>>>>> 3. x <= y && x == XXX_MIN --> x == XXX_MIN
>>>>>>>>>
>>>>>>>>> 4. x < y && x != XXX_MAX --> x < y
>>>>>>>>> 5. x < y && x == XXX_MAX --> false
>>>>>>>>> 6. x >= y && x == XXX_MAX --> x == XXX_MAX
>>>>>>>>>
>>>>>>>>> 7. x > y || x != XXX_MIN --> x != XXX_MIN
>>>>>>>>> 8. x <= y || x != XXX_MIN --> true
>>>>>>>>> 9. x <= y || x == XXX_MIN --> x <= y
>>>>>>>>>
>>>>>>>>> 10. x < y || x != XXX_MAX --> x != UXXX_MAX
>>>>>>>>> 11. x >= y || x != XXX_MAX --> true
>>>>>>>>> 12. x >= y || x == XXX_MAX --> x >= y
>>>>>>>>>
>>>>>>>>> Note: XXX_MIN represents the minimum value of type x.
>>>>>>>>> XXX_MAX represents the maximum value of type x.
>>>>>>>>>
>>>>>>>>> Here we don't need to care about whether the operation is
>>>>>>>>> signed or unsigned. For example, in the below equation:
>>>>>>>>>
>>>>>>>>> 'x > y && x != XXX_MIN --> x > y'
>>>>>>>>>
>>>>>>>>> If the x type is signed int and XXX_MIN is INT_MIN, we can
>>>>>>>>> optimize it to 'x > y'. However, if the type of x is unsigned
>>>>>>>>> int and XXX_MIN is 0, we can still optimize it to 'x > y'.
>>>>>>>>>
>>>>>>>>> The regression testing for the patch was done on GCC mainline on
>>>>>>>>>
>>>>>>>>> powerpc64le-unknown-linux-gnu (Power 9 LE)
>>>>>>>>>
>>>>>>>>> with no regressions. Is it OK for trunk ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Lijia He
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog
>>>>>>>>>
>>>>>>>>> 2019-06-27 Li Jia He <heli...@linux.ibm.com>
>>>>>>>>> Qi Feng <ffen...@linux.ibm.com>
>>>>>>>>>
>>>>>>>>> PR middle-end/88784
>>>>>>>>> * gimple-fold.c (and_comparisons_contain_equal_operands): New
>>>>>>>>> function.
>>>>>>>>> (and_comparisons_1): Use
>>>>>>>>> and_comparisons_contain_equal_operands.
>>>>>>>>> (or_comparisons_contain_equal_operands): New function.
>>>>>>>>> (or_comparisons_1): Use or_comparisons_contain_equal_operands.
>>>>>>>> Would this be better done via match.pd? ISTM this transformation
>>>>>>>> would
>>>>>>>> be well suited for that framework.
>>>>>>>
>>>>>>> Hi, Jeff
>>>>>>>
>>>>>>> I did this because of the following test case:
>>>>>>> `
>>>>>>> _Bool comp(unsigned x, unsigned y)
>>>>>>> {
>>>>>>> return x > y && x != 0;
>>>>>>> }
>>>>>>> `
>>>>>>> The gimple file dumped on the power platform is:
>>>>>>> `
>>>>>>> comp (unsigned int x, unsigned int y)
>>>>>>> {
>>>>>>> _Bool D.2837;
>>>>>>> int iftmp.0;
>>>>>>>
>>>>>>> if (x > y) goto <D.2841>; else goto <D.2839>;
>>>>>>> <D.2841>:
>>>>>>> if (x != 0) goto <D.2842>; else goto <D.2839>;
>>>>>>> <D.2842>:
>>>>>>> iftmp.0 = 1;
>>>>>>> goto <D.2840>;
>>>>>>> <D.2839>:
>>>>>>> iftmp.0 = 0;
>>>>>>> <D.2840>:
>>>>>>> D.2837 = (_Bool) iftmp.0;
>>>>>>> return D.2837;
>>>>>>> }
>>>>>>> `
>>>>>>> However, the gimple file dumped on x86 is
>>>>>>> `
>>>>>>> comp (unsigned int x, unsigned int y)
>>>>>>> {
>>>>>>> _Bool D.2837;
>>>>>>>
>>>>>>> _1 = x > y;
>>>>>>> _2 = x != 0;
>>>>>>> _3 = _1 & _2;
>>>>>>> _4 = (int) _3;
>>>>>>> D.2837 = (_Bool) _4;
>>>>>>> return D.2837;
>>>>>>> }
>>>>>>> `
>>>>>>>
>>>>>>> The reason for the inconsistency between these two behaviors is param
>>>>>>> logical-op-non-short-circuit. If we add the pattern to the match.pd
>>>>>>> file, we can only optimize the situation in which the statement is in
>>>>>>> the same basic block (logical-op-non-short-circuit=1, x86). But for
>>>>>>> a cross-basic block (logical-op-non-short-circuit=0, power), match.pd
>>>>>>> can't handle this situation.
>>>>>>>
>>>>>>> Another reason is that I found out maybe_fold_and_comparisons and
>>>>>>> maybe_fold_or_comparisons are not only called by ifcombine pass but
>>>>>>> also by reassoc pass. Using this method can basically unify param
>>>>>>> logical-op-non-short-circuit=0 or 1.
>>>>>>
>>>>>>
>>>>>> As mentioned before ifcombine pass should be using gimple-match
>>>>>> instead of fold_build. Try converting ifcombine over to gimple-match
>>>>>> infrastructure and add these to match.pd.
>>>>>
>>>>> Yes, I mentioned that in the PR. The issue is that at the moment
>>>>> to combine x > y with x <= y you'd have to build GENERIC trees
>>>>> for both or temporary GIMPLE assign with a SSA def (and then feed
>>>>> that into the GENERIC or GIMPLE match.pd path).
>>>>
>>>> Hi,
>>>>
>>>> I did some experimentation using ‘temporary GIMPLE with a SSA def (and then
>>>> feed that into the GIMPLE match.pd path’. Could we consider the code in
>>>> the
>>>> attachment(I did a test and the code took effect)?
>>>
>>> No, that's excessive overhead IMHO - the whole point of these functions
>>> originally was to avoid build2 (...) on both conditionals. If we
>>> now create two SSA names and two GIMPLE statements that's too much.
>>>
>>> As said it shouldn't be rocket science to teach genmatch to
>>> auto-generate those functions from match.pd patterns but it certainly
>>> is some work (less so for somebody with genmatch knowledge).
>>> I'll give it a poke...
>>
>> OK, it's not so easy. I guess we could lower the cost of building
>> SSA names / gimple stmts significantly if we allocated them on the
>> stack like via
>>
>> gimple *stmt1 = XALLOCAVEC (char, gimple_size (GIMPLE_ASSIGN) + 2 *
>> sizeof (tree));
>> memset (stmt1, 0, ...);
>> gimple_set_code (stmt1, GIMPLE_ASSIGN);
>> gimple_set_num_ops (stmt1, 3);
>> gimple_init_singleton (stmt1);
>>
>> gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
>> gimple_assign_set_rhs_with_ops (&gsi, actual operation...);
>>
>> tree lhs1 = XALLOCA (tree_ssa_name);
>> memset (lhs1, 0, sizeof (tree_ssa_name));
>> TREE_SET_CODE (lhs1, SSA_NAME);
>> TREE_TYPE (lhs1) = ...;
>>
>> gimple_assing_set_lhs (stmt1, lhs1);
>>
>> it's all a bit ugly and some factoring in the generic gimple machinery
>> might easen this kind of hacks, but well...
>>
>> With the above you could then use
>>
>> gimple_match_op op (gimple_match_cond::UNCOND, BIT_AND_EXPR /* or OR */,
>> boolean_type_node, lhs1, lhs2);
>> if (gimple_resimplify2 (NULL, &op, follow_all_ssa_edges))
>> ... successfully simplified into 'op' ...
>>
>> with the simplification path then extracting the simplified result
>> from 'op'. Note this deliberately leaves out passing a gimple_seq
>> as storage for more complex simplification results to match
>> existing behavior.
>>
>> Your patch has the GC allocation and SSA name (and namespace!)
>> allocation overhead and most definitely misses releasing the
>> SSA names you allocated.
>>
>> Note with the above hack the simplified result has to be checked
>> for mentions of lhs1 or lhs2 - a case we have to reject because
>> their definitions are transitional.
>>
>
> Hi,
>
> I made some changes based on the recommendations. Would you like to
> help me to see it again ? Sorry, it took so long time to provide the
> patch.
>
> Note: 1. I keep the code for and_comparisons_1 and or_comparisons_1.
> The reason is that I did not found a good way to handle the
> optimization of '((x CODE1 y) AND (x CODE2 y))' in match.pd.
> Maybe I missing some important information about match.pd.
> 2. The gimple_resimplify2 function is not used. Since stmt1,
> stmt2, lhs1 and lhs2 are allocated on the stack, Is there a
> question with the value on the stack as the return value ?
> I may have misunderstood Richard's intention.
>
> Thanks,
> Lijia He
>
>> Richard.
>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Lijia He
>>>>
>>>>>
>>>>> maybe_fold_and/or_comparisons handle two exploded binary expressions
>>>>> while the current match.pd entries handle at most one exploded one
>>>>> (the outermost then, either AND or OR). But it would be definitely
>>>>> doable to auto-generate maybe_fold_and/or_comparisons from match.pd
>>>>> patterns which is what I'd ultimatively suggest to do (in some more
>>>>> generalized form maybe). Either with a separate genmatch invocation
>>>>> or as part of the --gimple processing (not sure what is more feasible
>>>>> here).
>>>>>
>>>>> I told Li Jia He that I don't expect him to do this work.
>>>>>
>>>>> Note I didn't review the actual patch yet.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>
>>>
>>>
>>