On Mon, Oct 9, 2023 at 12:20 AM Richard Biener <rguent...@suse.de> wrote: > > On Sat, 7 Oct 2023, Richard Sandiford wrote: > > > Richard Biener <rguent...@suse.de> writes: > > >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford > > >> <richard.sandif...@arm.com>>> Richard Biener <rguent...@suse.de> writes: > > >>> On Thu, 5 Oct 2023, Tamar Christina wrote: > > >>> > > >>>>> I suppose the idea is that -abs(x) might be easier to optimize with > > >>>>> other > > >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)). > > >>>>> > > >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is > > >>>>> less > > >>>>> canonical than copysign. > > >>>>> > > >>>>>> Should I try removing this? > > >>>>> > > >>>>> I'd say yes (and put the reverse canonicalization next to this > > >>>>> pattern). > > >>>>> > > >>>> > > >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is > > >>>> more > > >>>> canonical and allows a target to expand this sequence efficiently. > > >>>> Such > > >>>> sequences are common in scientific code working with gradients. > > >>>> > > >>>> various optimizations in match.pd only happened on COPYSIGN but not > > >>>> COPYSIGN_ALL > > >>>> which means they exclude IFN_COPYSIGN. COPYSIGN however is restricted > > >>>> to only > > >>> > > >>> That's not true: > > >>> > > >>> (define_operator_list COPYSIGN > > >>> BUILT_IN_COPYSIGNF > > >>> BUILT_IN_COPYSIGN > > >>> BUILT_IN_COPYSIGNL > > >>> IFN_COPYSIGN) > > >>> > > >>> but they miss the extended float builtin variants like > > >>> __builtin_copysignf16. Also see below > > >>> > > >>>> the C99 builtins and so doesn't work for vectors. > > >>>> > > >>>> The patch expands these optimizations to work on COPYSIGN_ALL. > > >>>> > > >>>> There is an existing canonicalization of copysign (x, -1) to fneg > > >>>> (fabs (x)) > > >>>> which I remove since this is a less efficient form. The testsuite is > > >>>> also > > >>>> updated in light of this. > > >>>> > > >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > >>>> > > >>>> Ok for master? > > >>>> > > >>>> Thanks, > > >>>> Tamar > > >>>> > > >>>> gcc/ChangeLog: > > >>>> > > >>>> PR tree-optimization/109154 > > >>>> * match.pd: Add new neg+abs rule, remove inverse copysign rule and > > >>>> expand existing copysign optimizations. > > >>>> > > >>>> gcc/testsuite/ChangeLog: > > >>>> > > >>>> PR tree-optimization/109154 > > >>>> * gcc.dg/fold-copysign-1.c: Updated. > > >>>> * gcc.dg/pr55152-2.c: Updated. > > >>>> * gcc.dg/tree-ssa/abs-4.c: Updated. > > >>>> * gcc.dg/tree-ssa/backprop-6.c: Updated. > > >>>> * gcc.dg/tree-ssa/copy-sign-2.c: Updated. > > >>>> * gcc.dg/tree-ssa/mult-abs-2.c: Updated. > > >>>> * gcc.target/aarch64/fneg-abs_1.c: New test. > > >>>> * gcc.target/aarch64/fneg-abs_2.c: New test. > > >>>> * gcc.target/aarch64/fneg-abs_3.c: New test. > > >>>> * gcc.target/aarch64/fneg-abs_4.c: New test. > > >>>> * gcc.target/aarch64/sve/fneg-abs_1.c: New test. > > >>>> * gcc.target/aarch64/sve/fneg-abs_2.c: New test. > > >>>> * gcc.target/aarch64/sve/fneg-abs_3.c: New test. > > >>>> * gcc.target/aarch64/sve/fneg-abs_4.c: New test. > > >>>> > > >>>> --- inline copy of patch --- > > >>>> > > >>>> diff --git a/gcc/match.pd b/gcc/match.pd > > >>>> index > > >>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2 > > >>>> 100644 > > >>>> --- a/gcc/match.pd > > >>>> +++ b/gcc/match.pd > > >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > >>>> > > >>>> /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */ > > >>>> (for coss (COS COSH) > > >>>> - copysigns (COPYSIGN) > > >>>> - (simplify > > >>>> - (coss (copysigns @0 @1)) > > >>>> - (coss @0))) > > >>>> + (for copysigns (COPYSIGN_ALL) > > >>> > > >>> So this ends up generating for example the match > > >>> (cosf (copysignl ...)) which doesn't make much sense. > > >>> > > >>> The lock-step iteration did > > >>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...)) > > >>> which is leaner but misses the case of > > >>> (cosf (ifn_copysign ..)) - that's probably what you are > > >>> after with this change. > > >>> > > >>> That said, there isn't a nice solution (without altering the match.pd > > >>> IL). There's the explicit solution, spelling out all combinations. > > >>> > > >>> So if we want to go with yout pragmatic solution changing this > > >>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step > > >>> for iteration to a cross product for iteration is. > > >>> > > >>> Changing just this pattern to > > >>> > > >>> (for coss (COS COSH) > > >>> (for copysigns (COPYSIGN) > > >>> (simplify > > >>> (coss (copysigns @0 @1)) > > >>> (coss @0)))) > > >>> > > >>> increases the total number of gimple-match-x.cc lines from > > >>> 234988 to 235324. > > >> > > >> I guess the difference between this and the later suggestions is that > > >> this one allows builtin copysign to be paired with ifn cos, which would > > >> be potentially useful in other situations. (It isn't here because > > >> ifn_cos is rarely provided.) How much of the growth is due to that, > > >> and much of it is from nonsensical combinations like > > >> (builtin_cosf (builtin_copysignl ...))? > > >> > > >> If it's mostly from nonsensical combinations then would it be possible > > >> to make genmatch drop them? > > >> > > >>> The alternative is to do > > >>> > > >>> (for coss (COS COSH) > > >>> copysigns (COPYSIGN) > > >>> (simplify > > >>> (coss (copysigns @0 @1)) > > >>> (coss @0)) > > >>> (simplify > > >>> (coss (IFN_COPYSIGN @0 @1)) > > >>> (coss @0))) > > >>> > > >>> which properly will diagnose a duplicate pattern. Ther are > > >>> currently no operator lists with just builtins defined (that > > >>> could be fixed, see gencfn-macros.cc), supposed we'd have > > >>> COS_C we could do > > >>> > > >>> (for coss (COS_C COSH_C IFN_COS IFN_COSH) > > >>> copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN > > >>> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN > > >>> IFN_COPYSIGN) > > >>> (simplify > > >>> (coss (copysigns @0 @1)) > > >>> (coss @0))) > > >>> > > >>> which of course still looks ugly ;) (some syntax extension like > > >>> allowing to specify IFN_COPYSIGN*8 would be nice here and easy > > >>> enough to do) > > >>> > > >>> Can you split out the part changing COPYSIGN to COPYSIGN_ALL, > > >>> re-do it to only split the fors, keeping COPYSIGN and provide > > >>> some statistics on the gimple-match-* size? I think this might > > >>> be the pragmatic solution for now. > > >>> > > >>> Richard - can you think of a clever way to express the desired > > >>> iteration? How do RTL macro iterations address cases like this? > > >> > > >> I don't think .md files have an equivalent construct, unfortunately. > > >> (I also regret some of the choices I made for .md iterators, but that's > > >> another story.) > > >> > > >> Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...", > > >> with the "..." meaning "fill to match the longest operator list > > >> in the loop". > > > > > > Hm, I?ll think about this. It would be useful to have a function like > > > > > > Internal_fn ifn_for (combined_fn); > > > > > > So we can indirectly match all builtins with a switch on the ifn code. > > > > There's: > > > > extern internal_fn associated_internal_fn (combined_fn, tree); > > extern internal_fn associated_internal_fn (tree); > > extern internal_fn replacement_internal_fn (gcall *); > > > > where the first one requires the return type, and the second one > > operates on CALL_EXPRs. > > Hmm, for full generality the way we code-generate would need to change > quite a bit. Instead I've come up with the following quite limited > approach. You can write > > (for coss (COS COSH) > (simplify > (coss (ANY_COPYSIGN @0 @1)) > (coss @0))))
This optimization is also handled by backprop (gimple-ssa-backprop.cc) in a better way than the match code handle. So maybe we don't really need to extend match-and-simplify here. Right now backprop is only ran once early after inlining. Maybe run it once more late would help? Thanks, Andrew > > with it. For each internal function the following patch adds a > ANY_<name> identifier. The use is somewhat limited - you cannot > use it as the outermost operation in the match part and you cannot > use it in the replacement part at all. The nice thing is there's > no "iteration" at all, the ANY_COPYSIGN doesn't cause any pattern > duplication, instead we match it via CASE_CFN_<name> so it will > happily match mis-matched (typewise) calls (but those shouldn't > be there...). > > The patch doesn't contain any defensiveness in the parser for the > use restriction, but you should get compile failures for misuses > at least. > > It should match quite some of the copysign cases, I suspect its > of no use for most of the operators so maybe less general handling > and only specifically introducing ANY_COPYSIGN would be better. > At least I cannot think of any other functions that are matched > but disappear in the resulting replacement? > > Richard. > > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc > index 03d325efdf6..f7d3f51c013 100644 > --- a/gcc/genmatch.cc > +++ b/gcc/genmatch.cc > @@ -524,10 +524,14 @@ class fn_id : public id_base > { > public: > fn_id (enum built_in_function fn_, const char *id_) > - : id_base (id_base::FN, id_), fn (fn_) {} > + : id_base (id_base::FN, id_), fn (fn_), case_macro (nullptr) {} > fn_id (enum internal_fn fn_, const char *id_) > - : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {} > + : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)), > + case_macro (nullptr) {} > + fn_id (const char *case_macro_, const char *id_) > + : id_base (id_base::FN, id_), fn (-1U), case_macro (case_macro_) {} > unsigned int fn; > + const char *case_macro; > }; > > class simplify; > @@ -3262,6 +3266,10 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, > int depth, > if (user_id *u = dyn_cast <user_id *> (e->operation)) > for (auto id : u->substitutes) > fprintf_indent (f, indent, "case %s:\n", id->id); > + else if (is_a <fn_id *> (e->operation) > + && as_a <fn_id *> (e->operation)->case_macro) > + fprintf_indent (f, indent, "%s:\n", > + as_a <fn_id *> (e->operation)->case_macro); > else > fprintf_indent (f, indent, "case %s:\n", e->operation->id); > /* We need to be defensive against bogus prototypes allowing > @@ -3337,9 +3345,12 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, > int depth, > for (unsigned j = 0; j < generic_fns.length (); ++j) > { > expr *e = as_a <expr *>(generic_fns[j]->op); > - gcc_assert (e->operation->kind == id_base::FN); > + fn_id *oper = as_a <fn_id *> (e->operation); > > - fprintf_indent (f, indent, "case %s:\n", e->operation->id); > + if (oper->case_macro) > + fprintf_indent (f, indent, "%s:\n", oper->case_macro); > + else > + fprintf_indent (f, indent, "case %s:\n", e->operation->id); > fprintf_indent (f, indent, " if (call_expr_nargs (%s) == %d)\n" > " {\n", kid_opname, e->ops.length ()); > generic_fns[j]->gen (f, indent + 6, false, depth); > @@ -5496,7 +5507,8 @@ main (int argc, char **argv) > #include "builtins.def" > > #define DEF_INTERNAL_FN(CODE, NAME, FNSPEC) \ > - add_function (IFN_##CODE, "CFN_" #CODE); > + add_function (IFN_##CODE, "CFN_" #CODE); \ > + add_function ("CASE_CFN_" # CODE, "ANY_" # CODE); > #include "internal-fn.def" > > /* Parse ahead! */