On Mon, 9 Oct 2023, Andrew Pinski wrote:

> On Mon, Oct 9, 2023 at 12:20?AM Richard Biener <rguent...@suse.de> wrote:
> >
> > On Sat, 7 Oct 2023, Richard Sandiford wrote:
> >
> > > Richard Biener <rguent...@suse.de> writes:
> > > >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford 
> > > >> <richard.sandif...@arm.com>>> Richard Biener <rguent...@suse.de> 
> > > >> writes:
> > > >>> On Thu, 5 Oct 2023, Tamar Christina wrote:
> > > >>>
> > > >>>>> I suppose the idea is that -abs(x) might be easier to optimize with 
> > > >>>>> other
> > > >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> > > >>>>>
> > > >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) 
> > > >>>>> is less
> > > >>>>> canonical than copysign.
> > > >>>>>
> > > >>>>>> Should I try removing this?
> > > >>>>>
> > > >>>>> I'd say yes (and put the reverse canonicalization next to this 
> > > >>>>> pattern).
> > > >>>>>
> > > >>>>
> > > >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is 
> > > >>>> more
> > > >>>> canonical and allows a target to expand this sequence efficiently.  
> > > >>>> Such
> > > >>>> sequences are common in scientific code working with gradients.
> > > >>>>
> > > >>>> various optimizations in match.pd only happened on COPYSIGN but not 
> > > >>>> COPYSIGN_ALL
> > > >>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is 
> > > >>>> restricted to only
> > > >>>
> > > >>> That's not true:
> > > >>>
> > > >>> (define_operator_list COPYSIGN
> > > >>>    BUILT_IN_COPYSIGNF
> > > >>>    BUILT_IN_COPYSIGN
> > > >>>    BUILT_IN_COPYSIGNL
> > > >>>    IFN_COPYSIGN)
> > > >>>
> > > >>> but they miss the extended float builtin variants like
> > > >>> __builtin_copysignf16.  Also see below
> > > >>>
> > > >>>> the C99 builtins and so doesn't work for vectors.
> > > >>>>
> > > >>>> The patch expands these optimizations to work on COPYSIGN_ALL.
> > > >>>>
> > > >>>> There is an existing canonicalization of copysign (x, -1) to fneg 
> > > >>>> (fabs (x))
> > > >>>> which I remove since this is a less efficient form.  The testsuite 
> > > >>>> is also
> > > >>>> updated in light of this.
> > > >>>>
> > > >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >>>>
> > > >>>> Ok for master?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Tamar
> > > >>>>
> > > >>>> gcc/ChangeLog:
> > > >>>>
> > > >>>>    PR tree-optimization/109154
> > > >>>>    * match.pd: Add new neg+abs rule, remove inverse copysign rule and
> > > >>>>    expand existing copysign optimizations.
> > > >>>>
> > > >>>> gcc/testsuite/ChangeLog:
> > > >>>>
> > > >>>>    PR tree-optimization/109154
> > > >>>>    * gcc.dg/fold-copysign-1.c: Updated.
> > > >>>>    * gcc.dg/pr55152-2.c: Updated.
> > > >>>>    * gcc.dg/tree-ssa/abs-4.c: Updated.
> > > >>>>    * gcc.dg/tree-ssa/backprop-6.c: Updated.
> > > >>>>    * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> > > >>>>    * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> > > >>>>    * gcc.target/aarch64/fneg-abs_1.c: New test.
> > > >>>>    * gcc.target/aarch64/fneg-abs_2.c: New test.
> > > >>>>    * gcc.target/aarch64/fneg-abs_3.c: New test.
> > > >>>>    * gcc.target/aarch64/fneg-abs_4.c: New test.
> > > >>>>    * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> > > >>>>    * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> > > >>>>    * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> > > >>>>    * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
> > > >>>>
> > > >>>> --- inline copy of patch ---
> > > >>>>
> > > >>>> diff --git a/gcc/match.pd b/gcc/match.pd
> > > >>>> index 
> > > >>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
> > > >>>>  100644
> > > >>>> --- a/gcc/match.pd
> > > >>>> +++ b/gcc/match.pd
> > > >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >>>>
> > > >>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
> > > >>>> (for coss (COS COSH)
> > > >>>> -     copysigns (COPYSIGN)
> > > >>>> - (simplify
> > > >>>> -  (coss (copysigns @0 @1))
> > > >>>> -   (coss @0)))
> > > >>>> + (for copysigns (COPYSIGN_ALL)
> > > >>>
> > > >>> So this ends up generating for example the match
> > > >>> (cosf (copysignl ...)) which doesn't make much sense.
> > > >>>
> > > >>> The lock-step iteration did
> > > >>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
> > > >>> which is leaner but misses the case of
> > > >>> (cosf (ifn_copysign ..)) - that's probably what you are
> > > >>> after with this change.
> > > >>>
> > > >>> That said, there isn't a nice solution (without altering the match.pd
> > > >>> IL).  There's the explicit solution, spelling out all combinations.
> > > >>>
> > > >>> So if we want to go with yout pragmatic solution changing this
> > > >>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
> > > >>> for iteration to a cross product for iteration is.
> > > >>>
> > > >>> Changing just this pattern to
> > > >>>
> > > >>> (for coss (COS COSH)
> > > >>> (for copysigns (COPYSIGN)
> > > >>>  (simplify
> > > >>>   (coss (copysigns @0 @1))
> > > >>>   (coss @0))))
> > > >>>
> > > >>> increases the total number of gimple-match-x.cc lines from
> > > >>> 234988 to 235324.
> > > >>
> > > >> I guess the difference between this and the later suggestions is that
> > > >> this one allows builtin copysign to be paired with ifn cos, which would
> > > >> be potentially useful in other situations.  (It isn't here because
> > > >> ifn_cos is rarely provided.)  How much of the growth is due to that,
> > > >> and much of it is from nonsensical combinations like
> > > >> (builtin_cosf (builtin_copysignl ...))?
> > > >>
> > > >> If it's mostly from nonsensical combinations then would it be possible
> > > >> to make genmatch drop them?
> > > >>
> > > >>> The alternative is to do
> > > >>>
> > > >>> (for coss (COS COSH)
> > > >>>     copysigns (COPYSIGN)
> > > >>> (simplify
> > > >>>  (coss (copysigns @0 @1))
> > > >>>   (coss @0))
> > > >>> (simplify
> > > >>>  (coss (IFN_COPYSIGN @0 @1))
> > > >>>   (coss @0)))
> > > >>>
> > > >>> which properly will diagnose a duplicate pattern.  Ther are
> > > >>> currently no operator lists with just builtins defined (that
> > > >>> could be fixed, see gencfn-macros.cc), supposed we'd have
> > > >>> COS_C we could do
> > > >>>
> > > >>> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
> > > >>>     copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN
> > > >>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN
> > > >>> IFN_COPYSIGN)
> > > >>> (simplify
> > > >>>  (coss (copysigns @0 @1))
> > > >>>   (coss @0)))
> > > >>>
> > > >>> which of course still looks ugly ;) (some syntax extension like
> > > >>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
> > > >>> enough to do)
> > > >>>
> > > >>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
> > > >>> re-do it to only split the fors, keeping COPYSIGN and provide
> > > >>> some statistics on the gimple-match-* size?  I think this might
> > > >>> be the pragmatic solution for now.
> > > >>>
> > > >>> Richard - can you think of a clever way to express the desired
> > > >>> iteration?  How do RTL macro iterations address cases like this?
> > > >>
> > > >> I don't think .md files have an equivalent construct, unfortunately.
> > > >> (I also regret some of the choices I made for .md iterators, but that's
> > > >> another story.)
> > > >>
> > > >> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
> > > >> with the "..." meaning "fill to match the longest operator list
> > > >> in the loop".
> > > >
> > > > Hm, I?ll think about this.  It would be useful to have a function like
> > > >
> > > > Internal_fn ifn_for (combined_fn);
> > > >
> > > > So we can indirectly match all builtins with a switch on the ifn code.
> > >
> > > There's:
> > >
> > > extern internal_fn associated_internal_fn (combined_fn, tree);
> > > extern internal_fn associated_internal_fn (tree);
> > > extern internal_fn replacement_internal_fn (gcall *);
> > >
> > > where the first one requires the return type, and the second one
> > > operates on CALL_EXPRs.
> >
> > Hmm, for full generality the way we code-generate would need to change
> > quite a bit.  Instead I've come up with the following quite limited
> > approach.  You can write
> >
> > (for coss (COS COSH)
> >  (simplify
> >   (coss (ANY_COPYSIGN @0 @1))
> >   (coss @0))))
> 
> This optimization is also handled by backprop (gimple-ssa-backprop.cc)
> in a better way than the match code handle.
> So maybe we don't really need to extend match-and-simplify here.
> Right now backprop is only ran once early after inlining. Maybe run it
> once more late would help?

I think it generally makes sense to clean up simple things during
building/folding and not wait for specialized passes.

The question here is mostly whether we are fine with some bloat
in {generic,gimple}-match-?.cc or not.  The change proposed likely
doens't make a big impact as it's going to be of limited use.

Any change exposing semantics of the builtins to genmatch so it
can rule out say combining BUILT_IN_SINF and BUILT_IN_COS is going
to be quite difficult if you consider

(for coss (BUILT_IN_COS BUILT_IN_COSL)
     sins (BUILT_IN_SINF BUILT_IN_SIN)
 (simplify
  (coss (convert (sins @0)))
...

or so.  The current operator-list handling is handling them ordered,
we might want to introduce a semantically different operator-set
iteration, treating them unordered.  That's basically how

 (for list1 (...)
  (for list2 (...)

works.  There would be the opportunity to change code generation
for such case to catch-all case, but the way we generate the
decision tree makes this difficult I think.  I've filed PR111732
to track this genmatch optimization opportunity.

Richard.



> Thanks,
> Andrew
> 
> 
> >
> > with it.  For each internal function the following patch adds a
> > ANY_<name> identifier.  The use is somewhat limited - you cannot
> > use it as the outermost operation in the match part and you cannot
> > use it in the replacement part at all.  The nice thing is there's
> > no "iteration" at all, the ANY_COPYSIGN doesn't cause any pattern
> > duplication, instead we match it via CASE_CFN_<name> so it will
> > happily match mis-matched (typewise) calls (but those shouldn't
> > be there...).
> >
> > The patch doesn't contain any defensiveness in the parser for the
> > use restriction, but you should get compile failures for misuses
> > at least.
> >
> > It should match quite some of the copysign cases, I suspect its
> > of no use for most of the operators so maybe less general handling
> > and only specifically introducing ANY_COPYSIGN would be better.
> > At least I cannot think of any other functions that are matched
> > but disappear in the resulting replacement?
> >
> > Richard.
> >
> > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> > index 03d325efdf6..f7d3f51c013 100644
> > --- a/gcc/genmatch.cc
> > +++ b/gcc/genmatch.cc
> > @@ -524,10 +524,14 @@ class fn_id : public id_base
> >  {
> >  public:
> >    fn_id (enum built_in_function fn_, const char *id_)
> > -      : id_base (id_base::FN, id_), fn (fn_) {}
> > +      : id_base (id_base::FN, id_), fn (fn_), case_macro (nullptr) {}
> >    fn_id (enum internal_fn fn_, const char *id_)
> > -      : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {}
> > +      : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)),
> > +    case_macro (nullptr) {}
> > +  fn_id (const char *case_macro_, const char *id_)
> > +      : id_base (id_base::FN, id_), fn (-1U), case_macro (case_macro_) {}
> >    unsigned int fn;
> > +  const char *case_macro;
> >  };
> >
> >  class simplify;
> > @@ -3262,6 +3266,10 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool 
> > gimple, int depth,
> >               if (user_id *u = dyn_cast <user_id *> (e->operation))
> >                 for (auto id : u->substitutes)
> >                   fprintf_indent (f, indent, "case %s:\n", id->id);
> > +             else if (is_a <fn_id *> (e->operation)
> > +                      && as_a <fn_id *> (e->operation)->case_macro)
> > +               fprintf_indent (f, indent, "%s:\n",
> > +                               as_a <fn_id *> (e->operation)->case_macro);
> >               else
> >                 fprintf_indent (f, indent, "case %s:\n", e->operation->id);
> >               /* We need to be defensive against bogus prototypes allowing
> > @@ -3337,9 +3345,12 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool 
> > gimple, int depth,
> >        for (unsigned j = 0; j < generic_fns.length (); ++j)
> >         {
> >           expr *e = as_a <expr *>(generic_fns[j]->op);
> > -         gcc_assert (e->operation->kind == id_base::FN);
> > +         fn_id *oper = as_a <fn_id *> (e->operation);
> >
> > -         fprintf_indent (f, indent, "case %s:\n", e->operation->id);
> > +         if (oper->case_macro)
> > +           fprintf_indent (f, indent, "%s:\n", oper->case_macro);
> > +         else
> > +           fprintf_indent (f, indent, "case %s:\n", e->operation->id);
> >           fprintf_indent (f, indent, "  if (call_expr_nargs (%s) == %d)\n"
> >                                      "    {\n", kid_opname, e->ops.length 
> > ());
> >           generic_fns[j]->gen (f, indent + 6, false, depth);
> > @@ -5496,7 +5507,8 @@ main (int argc, char **argv)
> >  #include "builtins.def"
> >
> >  #define DEF_INTERNAL_FN(CODE, NAME, FNSPEC) \
> > -  add_function (IFN_##CODE, "CFN_" #CODE);
> > +  add_function (IFN_##CODE, "CFN_" #CODE); \
> > +  add_function ("CASE_CFN_" # CODE, "ANY_" # CODE);
> >  #include "internal-fn.def"
> >
> >    /* Parse ahead!  */
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to