On Mon, 9 Oct 2023, Andrew Pinski wrote: > On Mon, Oct 9, 2023 at 12:20?AM Richard Biener <rguent...@suse.de> wrote: > > > > On Sat, 7 Oct 2023, Richard Sandiford wrote: > > > > > Richard Biener <rguent...@suse.de> writes: > > > >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford > > > >> <richard.sandif...@arm.com>>> Richard Biener <rguent...@suse.de> > > > >> writes: > > > >>> On Thu, 5 Oct 2023, Tamar Christina wrote: > > > >>> > > > >>>>> I suppose the idea is that -abs(x) might be easier to optimize with > > > >>>>> other > > > >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)). > > > >>>>> > > > >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) > > > >>>>> is less > > > >>>>> canonical than copysign. > > > >>>>> > > > >>>>>> Should I try removing this? > > > >>>>> > > > >>>>> I'd say yes (and put the reverse canonicalization next to this > > > >>>>> pattern). > > > >>>>> > > > >>>> > > > >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is > > > >>>> more > > > >>>> canonical and allows a target to expand this sequence efficiently. > > > >>>> Such > > > >>>> sequences are common in scientific code working with gradients. > > > >>>> > > > >>>> various optimizations in match.pd only happened on COPYSIGN but not > > > >>>> COPYSIGN_ALL > > > >>>> which means they exclude IFN_COPYSIGN. COPYSIGN however is > > > >>>> restricted to only > > > >>> > > > >>> That's not true: > > > >>> > > > >>> (define_operator_list COPYSIGN > > > >>> BUILT_IN_COPYSIGNF > > > >>> BUILT_IN_COPYSIGN > > > >>> BUILT_IN_COPYSIGNL > > > >>> IFN_COPYSIGN) > > > >>> > > > >>> but they miss the extended float builtin variants like > > > >>> __builtin_copysignf16. Also see below > > > >>> > > > >>>> the C99 builtins and so doesn't work for vectors. > > > >>>> > > > >>>> The patch expands these optimizations to work on COPYSIGN_ALL. > > > >>>> > > > >>>> There is an existing canonicalization of copysign (x, -1) to fneg > > > >>>> (fabs (x)) > > > >>>> which I remove since this is a less efficient form. The testsuite > > > >>>> is also > > > >>>> updated in light of this. > > > >>>> > > > >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > >>>> > > > >>>> Ok for master? > > > >>>> > > > >>>> Thanks, > > > >>>> Tamar > > > >>>> > > > >>>> gcc/ChangeLog: > > > >>>> > > > >>>> PR tree-optimization/109154 > > > >>>> * match.pd: Add new neg+abs rule, remove inverse copysign rule and > > > >>>> expand existing copysign optimizations. > > > >>>> > > > >>>> gcc/testsuite/ChangeLog: > > > >>>> > > > >>>> PR tree-optimization/109154 > > > >>>> * gcc.dg/fold-copysign-1.c: Updated. > > > >>>> * gcc.dg/pr55152-2.c: Updated. > > > >>>> * gcc.dg/tree-ssa/abs-4.c: Updated. > > > >>>> * gcc.dg/tree-ssa/backprop-6.c: Updated. > > > >>>> * gcc.dg/tree-ssa/copy-sign-2.c: Updated. > > > >>>> * gcc.dg/tree-ssa/mult-abs-2.c: Updated. > > > >>>> * gcc.target/aarch64/fneg-abs_1.c: New test. > > > >>>> * gcc.target/aarch64/fneg-abs_2.c: New test. > > > >>>> * gcc.target/aarch64/fneg-abs_3.c: New test. > > > >>>> * gcc.target/aarch64/fneg-abs_4.c: New test. > > > >>>> * gcc.target/aarch64/sve/fneg-abs_1.c: New test. > > > >>>> * gcc.target/aarch64/sve/fneg-abs_2.c: New test. > > > >>>> * gcc.target/aarch64/sve/fneg-abs_3.c: New test. > > > >>>> * gcc.target/aarch64/sve/fneg-abs_4.c: New test. > > > >>>> > > > >>>> --- inline copy of patch --- > > > >>>> > > > >>>> diff --git a/gcc/match.pd b/gcc/match.pd > > > >>>> index > > > >>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2 > > > >>>> 100644 > > > >>>> --- a/gcc/match.pd > > > >>>> +++ b/gcc/match.pd > > > >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > >>>> > > > >>>> /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */ > > > >>>> (for coss (COS COSH) > > > >>>> - copysigns (COPYSIGN) > > > >>>> - (simplify > > > >>>> - (coss (copysigns @0 @1)) > > > >>>> - (coss @0))) > > > >>>> + (for copysigns (COPYSIGN_ALL) > > > >>> > > > >>> So this ends up generating for example the match > > > >>> (cosf (copysignl ...)) which doesn't make much sense. > > > >>> > > > >>> The lock-step iteration did > > > >>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...)) > > > >>> which is leaner but misses the case of > > > >>> (cosf (ifn_copysign ..)) - that's probably what you are > > > >>> after with this change. > > > >>> > > > >>> That said, there isn't a nice solution (without altering the match.pd > > > >>> IL). There's the explicit solution, spelling out all combinations. > > > >>> > > > >>> So if we want to go with yout pragmatic solution changing this > > > >>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step > > > >>> for iteration to a cross product for iteration is. > > > >>> > > > >>> Changing just this pattern to > > > >>> > > > >>> (for coss (COS COSH) > > > >>> (for copysigns (COPYSIGN) > > > >>> (simplify > > > >>> (coss (copysigns @0 @1)) > > > >>> (coss @0)))) > > > >>> > > > >>> increases the total number of gimple-match-x.cc lines from > > > >>> 234988 to 235324. > > > >> > > > >> I guess the difference between this and the later suggestions is that > > > >> this one allows builtin copysign to be paired with ifn cos, which would > > > >> be potentially useful in other situations. (It isn't here because > > > >> ifn_cos is rarely provided.) How much of the growth is due to that, > > > >> and much of it is from nonsensical combinations like > > > >> (builtin_cosf (builtin_copysignl ...))? > > > >> > > > >> If it's mostly from nonsensical combinations then would it be possible > > > >> to make genmatch drop them? > > > >> > > > >>> The alternative is to do > > > >>> > > > >>> (for coss (COS COSH) > > > >>> copysigns (COPYSIGN) > > > >>> (simplify > > > >>> (coss (copysigns @0 @1)) > > > >>> (coss @0)) > > > >>> (simplify > > > >>> (coss (IFN_COPYSIGN @0 @1)) > > > >>> (coss @0))) > > > >>> > > > >>> which properly will diagnose a duplicate pattern. Ther are > > > >>> currently no operator lists with just builtins defined (that > > > >>> could be fixed, see gencfn-macros.cc), supposed we'd have > > > >>> COS_C we could do > > > >>> > > > >>> (for coss (COS_C COSH_C IFN_COS IFN_COSH) > > > >>> copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN > > > >>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN > > > >>> IFN_COPYSIGN) > > > >>> (simplify > > > >>> (coss (copysigns @0 @1)) > > > >>> (coss @0))) > > > >>> > > > >>> which of course still looks ugly ;) (some syntax extension like > > > >>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy > > > >>> enough to do) > > > >>> > > > >>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL, > > > >>> re-do it to only split the fors, keeping COPYSIGN and provide > > > >>> some statistics on the gimple-match-* size? I think this might > > > >>> be the pragmatic solution for now. > > > >>> > > > >>> Richard - can you think of a clever way to express the desired > > > >>> iteration? How do RTL macro iterations address cases like this? > > > >> > > > >> I don't think .md files have an equivalent construct, unfortunately. > > > >> (I also regret some of the choices I made for .md iterators, but that's > > > >> another story.) > > > >> > > > >> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...", > > > >> with the "..." meaning "fill to match the longest operator list > > > >> in the loop". > > > > > > > > Hm, I?ll think about this. It would be useful to have a function like > > > > > > > > Internal_fn ifn_for (combined_fn); > > > > > > > > So we can indirectly match all builtins with a switch on the ifn code. > > > > > > There's: > > > > > > extern internal_fn associated_internal_fn (combined_fn, tree); > > > extern internal_fn associated_internal_fn (tree); > > > extern internal_fn replacement_internal_fn (gcall *); > > > > > > where the first one requires the return type, and the second one > > > operates on CALL_EXPRs. > > > > Hmm, for full generality the way we code-generate would need to change > > quite a bit. Instead I've come up with the following quite limited > > approach. You can write > > > > (for coss (COS COSH) > > (simplify > > (coss (ANY_COPYSIGN @0 @1)) > > (coss @0)))) > > This optimization is also handled by backprop (gimple-ssa-backprop.cc) > in a better way than the match code handle. > So maybe we don't really need to extend match-and-simplify here. > Right now backprop is only ran once early after inlining. Maybe run it > once more late would help?
I think it generally makes sense to clean up simple things during building/folding and not wait for specialized passes. The question here is mostly whether we are fine with some bloat in {generic,gimple}-match-?.cc or not. The change proposed likely doens't make a big impact as it's going to be of limited use. Any change exposing semantics of the builtins to genmatch so it can rule out say combining BUILT_IN_SINF and BUILT_IN_COS is going to be quite difficult if you consider (for coss (BUILT_IN_COS BUILT_IN_COSL) sins (BUILT_IN_SINF BUILT_IN_SIN) (simplify (coss (convert (sins @0))) ... or so. The current operator-list handling is handling them ordered, we might want to introduce a semantically different operator-set iteration, treating them unordered. That's basically how (for list1 (...) (for list2 (...) works. There would be the opportunity to change code generation for such case to catch-all case, but the way we generate the decision tree makes this difficult I think. I've filed PR111732 to track this genmatch optimization opportunity. Richard. > Thanks, > Andrew > > > > > > with it. For each internal function the following patch adds a > > ANY_<name> identifier. The use is somewhat limited - you cannot > > use it as the outermost operation in the match part and you cannot > > use it in the replacement part at all. The nice thing is there's > > no "iteration" at all, the ANY_COPYSIGN doesn't cause any pattern > > duplication, instead we match it via CASE_CFN_<name> so it will > > happily match mis-matched (typewise) calls (but those shouldn't > > be there...). > > > > The patch doesn't contain any defensiveness in the parser for the > > use restriction, but you should get compile failures for misuses > > at least. > > > > It should match quite some of the copysign cases, I suspect its > > of no use for most of the operators so maybe less general handling > > and only specifically introducing ANY_COPYSIGN would be better. > > At least I cannot think of any other functions that are matched > > but disappear in the resulting replacement? > > > > Richard. > > > > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc > > index 03d325efdf6..f7d3f51c013 100644 > > --- a/gcc/genmatch.cc > > +++ b/gcc/genmatch.cc > > @@ -524,10 +524,14 @@ class fn_id : public id_base > > { > > public: > > fn_id (enum built_in_function fn_, const char *id_) > > - : id_base (id_base::FN, id_), fn (fn_) {} > > + : id_base (id_base::FN, id_), fn (fn_), case_macro (nullptr) {} > > fn_id (enum internal_fn fn_, const char *id_) > > - : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {} > > + : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)), > > + case_macro (nullptr) {} > > + fn_id (const char *case_macro_, const char *id_) > > + : id_base (id_base::FN, id_), fn (-1U), case_macro (case_macro_) {} > > unsigned int fn; > > + const char *case_macro; > > }; > > > > class simplify; > > @@ -3262,6 +3266,10 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool > > gimple, int depth, > > if (user_id *u = dyn_cast <user_id *> (e->operation)) > > for (auto id : u->substitutes) > > fprintf_indent (f, indent, "case %s:\n", id->id); > > + else if (is_a <fn_id *> (e->operation) > > + && as_a <fn_id *> (e->operation)->case_macro) > > + fprintf_indent (f, indent, "%s:\n", > > + as_a <fn_id *> (e->operation)->case_macro); > > else > > fprintf_indent (f, indent, "case %s:\n", e->operation->id); > > /* We need to be defensive against bogus prototypes allowing > > @@ -3337,9 +3345,12 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool > > gimple, int depth, > > for (unsigned j = 0; j < generic_fns.length (); ++j) > > { > > expr *e = as_a <expr *>(generic_fns[j]->op); > > - gcc_assert (e->operation->kind == id_base::FN); > > + fn_id *oper = as_a <fn_id *> (e->operation); > > > > - fprintf_indent (f, indent, "case %s:\n", e->operation->id); > > + if (oper->case_macro) > > + fprintf_indent (f, indent, "%s:\n", oper->case_macro); > > + else > > + fprintf_indent (f, indent, "case %s:\n", e->operation->id); > > fprintf_indent (f, indent, " if (call_expr_nargs (%s) == %d)\n" > > " {\n", kid_opname, e->ops.length > > ()); > > generic_fns[j]->gen (f, indent + 6, false, depth); > > @@ -5496,7 +5507,8 @@ main (int argc, char **argv) > > #include "builtins.def" > > > > #define DEF_INTERNAL_FN(CODE, NAME, FNSPEC) \ > > - add_function (IFN_##CODE, "CFN_" #CODE); > > + add_function (IFN_##CODE, "CFN_" #CODE); \ > > + add_function ("CASE_CFN_" # CODE, "ANY_" # CODE); > > #include "internal-fn.def" > > > > /* Parse ahead! */ > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)