On Fri, Jun 16, 2017 at 5:48 PM, Marc Glisse <marc.gli...@inria.fr> wrote: > On Fri, 16 Jun 2017, Bin.Cheng wrote: > >> On Fri, Jun 16, 2017 at 5:16 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> >>> On June 16, 2017 3:31:32 PM GMT+02:00, "Bin.Cheng" >>> <amker.ch...@gmail.com> wrote: >>>> >>>> On Fri, Jun 16, 2017 at 2:10 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> >>>>> On Fri, Jun 16, 2017 at 3:06 PM, Bin.Cheng <amker.ch...@gmail.com> >>>> >>>> wrote: >>>>>> >>>>>> On Fri, Jun 16, 2017 at 11:49 AM, Richard Biener >>>>>> <richard.guent...@gmail.com> wrote: >>>>>>> >>>>>>> On Wed, Jun 14, 2017 at 3:07 PM, Bin Cheng <bin.ch...@arm.com> >>>> >>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> Loop split forces intermediate computation to gimple operands all >>>> >>>> the time when >>>>>>>> >>>>>>>> computing bound information. This is not good since folding >>>> >>>> opportunities are >>>>>>>> >>>>>>>> missed. This patch fixes the issue by feeding all computation to >>>> >>>> folder and only >>>>>>>> >>>>>>>> forcing to gimple operand at last. >>>>>>>> >>>>>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>>>>> >>>>>>> >>>>>>> Hm? It uses gimple_build () which should do the same as >>>> >>>> fold_buildN in terms >>>>>>> >>>>>>> of simplification. >>>>>>> >>>>>>> So where does that not work? It is supposed to be the prefered way >>>> >>>> and no >>>>>>> >>>>>>> new code should use force_gimple_operand (unless dealing with >>>> >>>> generic >>>>>>> >>>>>>> coming from other middle-end infrastructure like SCEV or niter >>>> >>>> analysis) >>>>>> >>>>>> Hmm, current code calls force_gimpele operand several times which >>>>>> causes the inefficiency. The patch avoids that and does one call at >>>>>> the end. >>>>> >>>>> >>>>> But it forces to the same sequence that is used for extending the >>>> >>>> expression >>>>> >>>>> so folding should work. Where do you see that it does not? Note the >>>>> code uses gimple_build (), not gimple_build_assign (). >>>> >>>> In spec2k6/hmmer, when building fast_algorithms.c with below command >>>> line: >>>> ./gcc -Ofast -S fast_algorithms.c -o fast_algorithms.S -fdump-tree-all >>>> -fdump-tree-lsplit >>>> The lsplit dump contains: >>>> <bb 11> [12.75%]: >>>> _124 = _197 + 1; >>>> _123 = _124 + -1; >>>> _115 = MIN_EXPR <_197, _124>; >>>> Which is generated here. >>> >>> >>> That means we miss a pattern in match.PD to handle this case. >> >> I see. I will withdraw this patch and look in that direction. > > > For _123, we have > > /* (A +- CST1) +- CST2 -> A + CST3 > or > /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */ > > > For _115, we have > > /* min (a, a + CST) -> a where CST is positive. */ > /* min (a, a + CST) -> a + CST where CST is negative. */ > (simplify > (min:c @0 (plus@2 @0 INTEGER_CST@1)) > (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))) > (if (tree_int_cst_sgn (@1) > 0) > @0 > @2))) > > What is the type of all those SSA_NAMEs? Hi Marc, Thanks for pointing out the exact patterns. The variables are of int type. The redundant operation disappears in reduced test case though.
Thanks, bin > > -- > Marc Glisse