Hi Bin,

On 25 June 2018 at 13:56, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
> <kugan.vivekanandara...@linaro.org> wrote:
>> Hi Bin,
>>
>> Thanks for your comments.
>>
>> On 25 June 2018 at 11:15, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>> When we set niter with maybe_zero, currently final_value_relacement
>>>> will not happen due to expression_expensive_p not handling. Patch 1
>>>> adds this.
>>>>
>>>> With that we have the following optimized gimple.
>>>>
>>>>   <bb 2> [local count: 118111601]:
>>>>   if (b_4(D) != 0)
>>>>     goto <bb 3>; [89.00%]
>>>>   else
>>>>     goto <bb 4>; [11.00%]
>>>>
>>>>   <bb 3> [local count: 105119324]:
>>>>   _2 = (unsigned long) b_4(D);
>>>>   _9 = __builtin_popcountl (_2);
>>>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>>>
>>>>   <bb 4> [local count: 118111601]:
>>>>   # c_12 = PHI <c_3(3), 0(2)>
>>>>
>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount
>>> computed should be 0.  Point is you can never get b_4(D) == 0 with
>>> guard condition in basic block 2.  So the result should simply be:
>>
>> When we do  calculate niter (for the copy header case), we set
>> may_be_zero as (which I think is correct)
>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
>>                   build_zero_cst
>>                   (TREE_TYPE (src)));
>>
>> Then in final_value_replacement_loop (struct loop *loop)
>>
>> for the PHI stmt for which we are going to do the final value replacement,
>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.
>>
>> then we do
>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)
>>
>> where when we do chrec_apply to the polynomial_chrec with niter from
>> popcount which also has the may_be_zero, we end up with the 1.
>> Looking at this, I am not sure if this is wrong. May be I am missing 
>> something.
> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
> 0?  Though it never happens in this case.

We dont set popcount = 1. When we set niter for popcount pattern with
niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
                  build_zero_cst (TREE_TYPE (src)));

Because of which, we have an niter in the final_value_replacement, we have
(gdb) p debug_tree (niter)
 <cond_expr 0x7ffff6a76a80
    type <integer_type 0x7ffff694d1f8 public unsigned DI
        size <integer_cst 0x7ffff692dcf0 constant 64>
        unit-size <integer_cst 0x7ffff692dd08 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst
0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0
18446744073709551615>>

    arg:0 <ne_expr 0x7ffff6a80910
        type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI
            size <integer_cst 0x7ffff692dde0 constant 8>
            unit-size <integer_cst 0x7ffff692ddf8 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff6945b28 precision:1 min <integer_cst
0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>>

        arg:0 <ssa_name 0x7ffff6937a68 type <integer_type
0x7ffff6945738 long int>
            visited var <parm_decl 0x7ffff6a79000 b>
            def_stmt GIMPLE_NOPvolu
            version:4>
        arg:1 <integer_cst 0x7ffff6a64720 constant 0>>
    arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8>

        arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8>

            arg:0 <call_expr 0x7ffff69396c8 type <integer_type
0x7ffff69455e8 int>

                fn <addr_expr 0x7ffff6a883a0 type <pointer_type 0x7ffff6a55888>
                    readonly constant arg:0 <function_decl
0x7ffff69ff600 __builtin_popcountl>>
                arg:0 <nop_expr 0x7ffff6a88380 type <integer_type
0x7ffff694d1f8>
                    arg:0 <ssa_name 0x7ffff6937a68>>>>
        arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>>
    arg:2 <integer_cst 0x7ffff694a120 type <integer_type
0x7ffff694d1f8> constant 0>>

Then from there then we do compute_overall_effect_of_inner_loop for
scalar evolution of PHI with niter we get the 1.

>>
>> In this testcase, before we enter the loop we have a check for (b_4(D)
>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but
>> conservatively correct (?).
> Yes, but not necessarily.  Setting maybe_zero could confuse following
> optimizations and we should avoid doing that whenever possible.  If
> any pass goes wrong because it's not set conservatively, it is that
> pass' responsibility and should be fixed accordingly.  Here IMHO, we
> don't need to set it.

My patch 2 is for not setting this when we know know a_4(D) is not
zero in this path.

Thanks,
Kugan




>
> Thanks,
> bin
>>
>> Thanks,
>> Kugan
>>
>>>
>>>>   <bb 2> [local count: 118111601]:
>>>>   if (b_4(D) != 0)
>>>>     goto <bb 3>; [89.00%]
>>>>   else
>>>>     goto <bb 4>; [11.00%]
>>>>
>>>>   <bb 3> [local count: 105119324]:
>>>>   _2 = (unsigned long) b_4(D);
>>>>   c_3 = __builtin_popcountl (_2);
>>>>
>>>>   <bb 4> [local count: 118111601]:
>>>>   # c_12 = PHI <c_3(3), 0(2)>
>>>
>>> I think this is the code generated if maybe_zero is not set?  which it
>>> should not be set here.
>>> For the same reason, it can be further optimized into:
>>>
>>>>   <bb 2> [local count: 118111601]:
>>>>   _2 = (unsigned long) b_4(D);
>>>>   c_12 = __builtin_popcountl (_2);
>>>>
>>>
>>>> latch execute zero times for b_4 == 0 means that the body will execute
>>>> ones.
>>> You never get zero times latch here with the aforementioned guard condition.
>>>
>>> BTW, I didn't look at following patches which could be wanted optimizations.
>>>
>>> Thanks,
>>> bin
>>>>
>>>> The issue here is, since we are checking if (b_4(D) != 0) before
>>>> entering the loop means we don't need to set maybe_zero. Patch 2
>>>> handles this.
>>>>
>>>> With that we have
>>>>   <bb 2> [local count: 118111601]:
>>>>   if (b_4(D) != 0)
>>>>     goto <bb 3>; [89.00%]
>>>>   else
>>>>     goto <bb 4>; [11.00%]
>>>>
>>>>   <bb 3> [local count: 105119324]:
>>>>   _2 = (unsigned long) b_4(D);
>>>>   _9 = __builtin_popcountl (_2);
>>>>
>>>>   <bb 4> [local count: 118111601]:
>>>>   # c_12 = PHI <0(2), _9(3)>
>>>>
>>>> As advised earlier, patch 3 adds phiopt support to remove this.
>>>>
>>>> Bootstrap and regression testing are ongoing.
>>>>
>>>> Is this OK for trunk.
>>>>
>>>> Thanks,
>>>> Kugan

Reply via email to