On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah >>> <kugan.vivekanandara...@linaro.org> wrote: >>>> Hi Richard, >>>> >>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> >>>> wrote: >>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah >>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>> Hi Richard, >>>>>> >>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> >>>>>> wrote: >>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah >>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> Thanks for the review. >>>>>>>> On 25 January 2018 at 20:04, Richard Biener >>>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah >>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I >>>>>>>>>> would like to queue this for review for next stage 1. >>>>>>>>>> >>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and >>>>>>>>>> above. >>>>>>>>>> 2. This does not distribute loop to detect popcount (like >>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please >>>>>>>>>> correct >>>>>>>>>> me if I am wrong. >>>>>>>>> >>>>>>>>> But then it has no business inside loop distribution but instead is >>>>>>>>> doing final value >>>>>>>>> replacement, right? You are pattern-matching the whole loop after >>>>>>>>> all. I think >>>>>>>>> final value replacement would already do the correct thing if you >>>>>>>>> teached number of >>>>>>>>> iteration analysis that niter for >>>>>>>>> >>>>>>>>> <bb 3> [local count: 955630224]: >>>>>>>>> # b_11 = PHI <b_5(5), b_8(6)> >>>>>>>>> _1 = b_11 + -1; >>>>>>>>> b_8 = _1 & b_11; >>>>>>>>> if (b_8 != 0) >>>>>>>>> goto <bb 6>; [89.00%] >>>>>>>>> else >>>>>>>>> goto <bb 8>; [11.00%] >>>>>>>>> >>>>>>>>> <bb 6> [local count: 850510900]: >>>>>>>>> goto <bb 3>; [100.00%] >>>>>>>> >>>>>>>> I am looking into this approach. What should be the scalar evolution >>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me >>>>>>>> and can this be represented with the scev? >>>>>>> >>>>>>> No, it's not affine and thus cannot be represented. You only need the >>>>>>> scalar evolution of the counting IV which is already handled and >>>>>>> the number of iteration analysis needs to handle the above IV - this >>>>>>> is the missing part. >>>>>> Thanks for the clarification. I am now matching this loop pattern in >>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions >>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in >>>>>> the loop preheater and setting the loop niter with this. This will be >>>>>> used by the final value replacement. Is this what you wanted? >>>>> >>>>> No, you shouldn't insert a popcount stmt but instead the niter >>>>> GENERIC tree should be a CALL_EXPR to popcount with the >>>>> appropriate argument. >>>> >>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if >>>> niter in tree_niter_desc can take such. >>>> >>>> Attached patch now does this. Also had to add support for CALL_EXPR in >>>> few places to handle niter with CALL_EXPR. Does this look OK? >>> >>> Overall this looks ok - the patch includes changes in places that I don't >>> think >>> need changes such as chrec_convert_1 or extract_ops_from_tree. >>> The expression_expensive_p change should be more specific than making >>> all calls inexpensive as well. >>> >>> The verify_ssa change looks bogus, you do >>> >>> + dest = gimple_phi_result (count_phi); >>> + tree var = make_ssa_name (TREE_TYPE (dest), NULL); >>> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); >>> + >>> + var = build_call_expr (fn, 1, src); >>> + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var, >>> + build_int_cst (TREE_TYPE (dest), 1)); >>> >>> why do you allocate a new SSA name here? It seems unused >>> as you overwrive 'var' with the CALL_EXPR immediately. >>> >>> I didn't review the pattern matching thoroughly nor the exact place you >>> call it. But >>> >>> + if (check_popcount_pattern (loop, &count)) >>> + { >>> + niter->assumptions = boolean_false_node; >>> + niter->control.base = NULL_TREE; >>> + niter->control.step = NULL_TREE; >>> + niter->control.no_overflow = false; >>> + niter->niter = count; >>> + niter->assumptions = boolean_true_node; >>> + niter->may_be_zero = boolean_false_node; >>> + niter->max = -1; >>> + niter->bound = NULL_TREE; >>> + niter->cmp = ERROR_MARK; >>> + return true; >>> + } >>> >>> simply setting may_be_zero to false looks fishy. Try >>> with -fno-tree-loop-ch. Also max should not be negative, >>> it should be the number of bits in the IV type? >>> >>> A related testcase could be that we can completely peel >>> a loop like the following which iterates at most 8 times: >>> >>> int a[8]; >>> void foo (unsigned char ctrl) >>> { >>> int c = 0; >>> while (ctrl) >>> { >>> ctrl = ctrl & (ctrl - 1); >>> a[c++] = ctrl; >>> } >>> } >>> >>> This is now stage1 material so please update and re-post. Maybe Bin has >>> further suggestions as well. >> Sorry for being late on this. Actually I thought about popcount in >> distribution before. More like the first patch, but handled as >> another distribution pattern rather than a special case. It's a bit >> strange to compute and store the info in niters. It's also not >> straight forward when/where the transformation is finally done. > > It's done at final value replacement if the counter is used. But with > it in place we should also be able to compute niters for the case > where it isn't (like the above case). So I believe that in the end handling > this in niter analysis is more powerful. Yes, you are right, as distribution pattern, we would only be able to handle standalone popcount loop.
Thanks, bin > >> I haven't looked into the details so not sure how appropriate it will >> be as a distribution pattern (current ones are only about data >> references). So I am okay with this version if it's more appropriate. > > Yeah, adding distribution patterns for scalar reduction forms is certainly > appropriate but then this is the same or at least similar as final > value replacement. > > Richard. > >> Thanks, >> bin >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> Kugan >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> >>>> >>>> * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR. >>>> * tree-chrec.c (chrec_convert_1): Likewise. >>>> * tree-scalar-evolution.c (expression_expensive_p): Likewise. >>>> * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise. >>>> * tree-ssa-loop-niter.c (check_popcount_pattern): New. >>>> (number_of_iterations_exit): Record niter for popcount patern. >>>> * tree-ssa.c (verify_ssa): Check stmt to be non NULL. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> >>>> >>>> * gcc.dg/tree-ssa/popcount.c: New test. >>>> >>>> >>>>> >>>>>> In general, there is also a condition gating the loop like >>>>>> >>>>>> if (b_4 != 0) >>>>>> goto loop; >>>>>> else >>>>>> end: >>>>>> >>>>>> This of course will not be removed by final value replacement. Since >>>>>> popcount (0) is defined, this is redundant and could be removed but >>>>>> not removed. >>>>> >>>>> Yeah, that's probably sth for another pass though. I suppose the >>>>> end: case just uses zero in which case you'll have a PHI and you >>>>> can optimize >>>>> >>>>> if (b != 0) >>>>> return popcount (b); >>>>> return 0; >>>>> >>>>> in phiopt. >>>>> >>>>> Richard. >>>>> >>>>>> Thanks, >>>>>> Kugan >>>>>> >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Thanks, >>>>>>>> Kugan >>>>>>>>> >>>>>>>>> is related to popcount (b_5). >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new >>>>>>>>>> regressions. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kugan >>>>>>>>>> >>>>>>>>>> gcc/ChangeLog: >>>>>>>>>> >>>>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>>>>> >>>>>>>>>> PR middle-end/82479 >>>>>>>>>> * tree-loop-distribution.c (handle_popcount): New. >>>>>>>>>> (pass_loop_distribution::execute): Use handle_popcount. >>>>>>>>>> >>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>> >>>>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>>>>> >>>>>>>>>> PR middle-end/82479 >>>>>>>>>> * gcc.dg/tree-ssa/popcount.c: New test.