Hi Bin, On 25 June 2018 at 13:56, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah > <kugan.vivekanandara...@linaro.org> wrote: >> Hi Bin, >> >> Thanks for your comments. >> >> On 25 June 2018 at 11:15, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah >>> <kugan.vivekanandara...@linaro.org> wrote: >>>> When we set niter with maybe_zero, currently final_value_relacement >>>> will not happen due to expression_expensive_p not handling. Patch 1 >>>> adds this. >>>> >>>> With that we have the following optimized gimple. >>>> >>>> <bb 2> [local count: 118111601]: >>>> if (b_4(D) != 0) >>>> goto <bb 3>; [89.00%] >>>> else >>>> goto <bb 4>; [11.00%] >>>> >>>> <bb 3> [local count: 105119324]: >>>> _2 = (unsigned long) b_4(D); >>>> _9 = __builtin_popcountl (_2); >>>> c_3 = b_4(D) != 0 ? _9 : 1; >>>> >>>> <bb 4> [local count: 118111601]: >>>> # c_12 = PHI <c_3(3), 0(2)> >>>> >>>> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the >>> No, it doesn't make much sense. when b_4(D) == 0, the popcount >>> computed should be 0. Point is you can never get b_4(D) == 0 with >>> guard condition in basic block 2. So the result should simply be: >> >> When we do calculate niter (for the copy header case), we set >> may_be_zero as (which I think is correct) >> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, >> build_zero_cst >> (TREE_TYPE (src))); >> >> Then in final_value_replacement_loop (struct loop *loop) >> >> for the PHI stmt for which we are going to do the final value replacement, >> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. >> >> then we do >> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn) >> >> where when we do chrec_apply to the polynomial_chrec with niter from >> popcount which also has the may_be_zero, we end up with the 1. >> Looking at this, I am not sure if this is wrong. May be I am missing >> something. > I think it is wrong. How could you get popcount == 1 when b_4(D) == > 0? Though it never happens in this case.
We dont set popcount = 1. When we set niter for popcount pattern with niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst (TREE_TYPE (src))); Because of which, we have an niter in the final_value_replacement, we have (gdb) p debug_tree (niter) <cond_expr 0x7ffff6a76a80 type <integer_type 0x7ffff694d1f8 public unsigned DI size <integer_cst 0x7ffff692dcf0 constant 64> unit-size <integer_cst 0x7ffff692dd08 constant 8> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst 0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0 18446744073709551615>> arg:0 <ne_expr 0x7ffff6a80910 type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI size <integer_cst 0x7ffff692dde0 constant 8> unit-size <integer_cst 0x7ffff692ddf8 constant 1> align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6945b28 precision:1 min <integer_cst 0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>> arg:0 <ssa_name 0x7ffff6937a68 type <integer_type 0x7ffff6945738 long int> visited var <parm_decl 0x7ffff6a79000 b> def_stmt GIMPLE_NOPvolu version:4> arg:1 <integer_cst 0x7ffff6a64720 constant 0>> arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8> arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8> arg:0 <call_expr 0x7ffff69396c8 type <integer_type 0x7ffff69455e8 int> fn <addr_expr 0x7ffff6a883a0 type <pointer_type 0x7ffff6a55888> readonly constant arg:0 <function_decl 0x7ffff69ff600 __builtin_popcountl>> arg:0 <nop_expr 0x7ffff6a88380 type <integer_type 0x7ffff694d1f8> arg:0 <ssa_name 0x7ffff6937a68>>>> arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>> arg:2 <integer_cst 0x7ffff694a120 type <integer_type 0x7ffff694d1f8> constant 0>> Then from there then we do compute_overall_effect_of_inner_loop for scalar evolution of PHI with niter we get the 1. >> >> In this testcase, before we enter the loop we have a check for (b_4(D) >>> 0). Thus, setting niter->may_be_zero is not strictly necessary but >> conservatively correct (?). > Yes, but not necessarily. Setting maybe_zero could confuse following > optimizations and we should avoid doing that whenever possible. If > any pass goes wrong because it's not set conservatively, it is that > pass' responsibility and should be fixed accordingly. Here IMHO, we > don't need to set it. My patch 2 is for not setting this when we know know a_4(D) is not zero in this path. Thanks, Kugan > > Thanks, > bin >> >> Thanks, >> Kugan >> >>> >>>> <bb 2> [local count: 118111601]: >>>> if (b_4(D) != 0) >>>> goto <bb 3>; [89.00%] >>>> else >>>> goto <bb 4>; [11.00%] >>>> >>>> <bb 3> [local count: 105119324]: >>>> _2 = (unsigned long) b_4(D); >>>> c_3 = __builtin_popcountl (_2); >>>> >>>> <bb 4> [local count: 118111601]: >>>> # c_12 = PHI <c_3(3), 0(2)> >>> >>> I think this is the code generated if maybe_zero is not set? which it >>> should not be set here. >>> For the same reason, it can be further optimized into: >>> >>>> <bb 2> [local count: 118111601]: >>>> _2 = (unsigned long) b_4(D); >>>> c_12 = __builtin_popcountl (_2); >>>> >>> >>>> latch execute zero times for b_4 == 0 means that the body will execute >>>> ones. >>> You never get zero times latch here with the aforementioned guard condition. >>> >>> BTW, I didn't look at following patches which could be wanted optimizations. >>> >>> Thanks, >>> bin >>>> >>>> The issue here is, since we are checking if (b_4(D) != 0) before >>>> entering the loop means we don't need to set maybe_zero. Patch 2 >>>> handles this. >>>> >>>> With that we have >>>> <bb 2> [local count: 118111601]: >>>> if (b_4(D) != 0) >>>> goto <bb 3>; [89.00%] >>>> else >>>> goto <bb 4>; [11.00%] >>>> >>>> <bb 3> [local count: 105119324]: >>>> _2 = (unsigned long) b_4(D); >>>> _9 = __builtin_popcountl (_2); >>>> >>>> <bb 4> [local count: 118111601]: >>>> # c_12 = PHI <0(2), _9(3)> >>>> >>>> As advised earlier, patch 3 adds phiopt support to remove this. >>>> >>>> Bootstrap and regression testing are ongoing. >>>> >>>> Is this OK for trunk. >>>> >>>> Thanks, >>>> Kugan