On Wed, 15 Jan 2025, Jakub Jelinek wrote:

> On Wed, Jan 15, 2025 at 10:42:12AM +0100, Richard Biener wrote:
> > Yes.  I'll note there's a PR (or a bunch of) which are about
> > 
> > x = FOO (y, ..);
> > <use of x>
> > 
> > vs.
> > 
> > FOO (y, ..);
> > <use of y>
> > 
> > for FOO returning an argument (like for example memcpy).  Where neither
> > form is best in all cases.  For your example above consider '42' being
> > a FP constant, we'd have to re-load that from the constant pool after
> > the call rather using the conveniently available copy in the return
> > register.  Or on RISC archs it might be costly to materialize the
> > integer immediate.
> > 
> > So the question is whether such replacement is a good thing - yes,
> > we want the knowledge to simplify followup code, but copy-replacing
> > might not always be good.
> 
> Or we can improve it incrementally, in RA or wherever else, perhaps with
> some helper function.  VRP or ssa propagation unfortunately doesn't know
> if something is a tail call and shouldn't redo tailc pass analysis for it
> each time it encounters it, and not replacing it in places which aren't tail
> call can hamper further optimizations when the lhs is used later on in other
> statements.
> 
> The point of the patch is that it restores previous behavior at least for
> the tail call optimization.
> 
> BTW, I think we don't optimize returns-arg stuff like that at least right
> now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
> functions actually return a pointer, not integer and for prange we just
> track constant values, not say address of something.
> 
> > > For non-musttail we shouldn't do that though, we don't know whether it is
> > > tail-callable at all and whether having the constant in there is more
> > > beneficial for optimizations or possibly keeping it as maybe tail 
> > > callable.
> > > And therefore I wrote the patch as is, it then fixes the 14/15 regression
> > > for it and as a benefit can even handle the cases where user wrote somefn
> > > (whatever); return 42; rather than return somefn (whatever);
> > > 
> > > I'm willing to at least try the punting on singleton replacements for
> > > gnu::musttail though.
> > 
> > I think that would make it more robust indeed.  But general optimality
> > concerns remain.  I don't think LRA knows to remat a constant from
> > a return register for example.
> 
> It could be taught that.
> 
> In any case, IPA VRP return value optimizion isn't a new thing on the trunk,
> we've been doing that already in GCC 14.
> 
> So, how about following adjusted patch (same tree-tailcall.cc, I've just
> extended the musttail14.c testcase to really verify the non-tailcallable
> cases weren't tail called) plus added testcase to verify the non-musttail
> cases (both when written as return fncall (args); and as fncall (args);
> return singletoncst;), plus follow up patch I'll post next?
> 
> 2025-01-15  Jakub Jelinek  <ja...@redhat.com>
>           Andrew Pinski  <quic_apin...@quicinc.com>
> 
>       PR tree-optimization/118430
>       * tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
>       symbol-summary.h, ipa-cp.h and ipa-prop.h.
>       (find_tail_calls): If ass_var is NULL and ret_var is not, check if
>       IPA-VRP has not found singleton return range for it.  In that case,
>       don't punt if ret_var is the only value in that range.  Adjust the
>       maybe_error_musttail message otherwise to diagnose different value
>       being returned from the caller and callee rather than using return
>       slot.  Formatting fixes.
> 
>       * c-c++-common/musttail14.c: New test.
>       * c-c++-common/pr118430.c: New test.
> 
> --- gcc/tree-tailcall.cc.jj   2025-01-02 11:23:18.407490465 +0100
> +++ gcc/tree-tailcall.cc      2025-01-14 16:03:13.169294430 +0100
> @@ -45,6 +45,12 @@ along with GCC; see the file COPYING3.
>  #include "ipa-utils.h"
>  #include "tree-ssa-live.h"
>  #include "diagnostic-core.h"
> +#include "gimple-range.h"
> +#include "alloc-pool.h"
> +#include "sreal.h"
> +#include "symbol-summary.h"
> +#include "ipa-cp.h"
> +#include "ipa-prop.h"
>  
>  /* The file implements the tail recursion elimination.  It is also used to
>     analyze the tail calls in general, passing the results to the rtl level
> @@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct
>       {
>         if (dump_file)
>           fprintf (dump_file, "Basic block %d has extra exit edges\n",
> -                         bb->index);
> +                  bb->index);
>         return;
>       }
>        if (!cfun->has_musttail)
> @@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct
>         if (bad_stmt)
>           {
>             maybe_error_musttail (call,
> -                           _("memory reference or volatile after call"));
> +                                 _("memory reference or volatile after "
> +                                   "call"));
>             return;
>           }
>         ass_var = gimple_call_lhs (call);
> @@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct
>    {
>      if (stmt == last_stmt)
>        maybe_error_musttail (call,
> -                       _("call may throw exception that does not 
> propagate"));
> +                         _("call may throw exception that does not "
> +                           "propagate"));
>      else
> -      maybe_error_musttail (call,
> -                       _("code between call and return"));
> +      maybe_error_musttail (call, _("code between call and return"));
>      return;
>    }
>  
> @@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct
>           {
>             if (local_live_vars)
>               BITMAP_FREE (local_live_vars);
> -           maybe_error_musttail (call, _("call invocation refers to 
> locals"));
> +           maybe_error_musttail (call,
> +                                 _("call invocation refers to locals"));
>             return;
>           }
>         else
> @@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct
>             if (bitmap_bit_p (local_live_vars, *v))
>               {
>                 BITMAP_FREE (local_live_vars);
> -               maybe_error_musttail (call, _("call invocation refers to 
> locals"));
> +               maybe_error_musttail (call,
> +                                     _("call invocation refers to locals"));
>                 return;
>               }
>           }
> @@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct
>        && (ret_var != ass_var
>         && !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var)))
>      {
> -      maybe_error_musttail (call, _("call uses return slot"));
> -      return;
> +      bool ok = false;
> +      value_range val;
> +      tree valr;
> +      /* If IPA-VRP proves called function always returns a singleton range,
> +      the return value is replaced by the only value in that range.
> +      For tail call purposes, pretend such replacement didn't happen.  */
> +      if (ass_var == NULL_TREE
> +       && !tail_recursion
> +       && TREE_CONSTANT (ret_var))
> +     if (tree type = gimple_range_type (call))
> +       if (tree callee = gimple_call_fndecl (call))
> +         if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
> +             && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
> +                                           type)
> +             && useless_type_conversion_p (TREE_TYPE (ret_var), type)
> +             && ipa_return_value_range (val, callee)
> +             && val.singleton_p (&valr)

I suppose it's good enough to check

 useless_type_conversion_p (TREE_TYPE (ret_var), TREE_TYPE (valr))?

but the bigger question is whether RTL expansion does the right thing
here without changing the IL at this point back to return the LHS
and put that back?  IIRC there are some sanity checks upon tail call
expansion, but does this all work when the call itself doesn't have
its return value used?  Aka some call expansion checks might be
elided in this case.

So I feel a bit nervous marking sth as tail-call that doesn't
actually look like one (unless we make it so again).

Richard.

> +             && operand_equal_p (ret_var, valr, 0))
> +           ok = true;
> +      if (!ok)
> +     {
> +       maybe_error_musttail (call,
> +                             _("call and return value are different"));
> +       return;
> +     }
>      }
>  
>    /* If this is not a tail recursive call, we cannot handle addends or
>       multiplicands.  */
>    if (!tail_recursion && (m || a))
>      {
> -      maybe_error_musttail (call, _("operations after non tail recursive 
> call"));
> +      maybe_error_musttail (call,
> +                         _("operations after non tail recursive call"));
>        return;
>      }
>  
> @@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct
>    if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl))))
>      {
>        maybe_error_musttail (call,
> -                   _("tail recursion with pointers can only use additions"));
> +                         _("tail recursion with pointers can only use "
> +                           "additions"));
>        return;
>      }
>  
> --- gcc/testsuite/c-c++-common/musttail14.c.jj        2025-01-14 
> 16:22:06.623276122 +0100
> +++ gcc/testsuite/c-c++-common/musttail14.c   2025-01-15 11:32:58.048016223 
> +0100
> @@ -0,0 +1,90 @@
> +/* PR tree-optimization/118430 */
> +/* { dg-do compile { target musttail } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "  bar \\\(\[^\n\r]\*\\\); \\\[tail 
> call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "  freddy \\\(\[^\n\r]\*\\\); \\\[tail 
> call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "  (?:bar|freddy) \\\(\[^\n\r]\*\\\); 
> \\\[tail call\\\]" 2 "optimized" } } */
> +
> +__attribute__ ((noipa)) void
> +foo (int x)
> +{
> +  (void) x;
> +}
> +
> +__attribute__ ((noinline)) int
> +bar (int x)
> +{
> +  foo (x);
> +  return 1;
> +}
> +
> +__attribute__ ((noinline)) int
> +baz (int *x)
> +{
> +  foo (*x);
> +  return 2;
> +}
> +
> +__attribute__((noipa)) int
> +qux (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  [[gnu::musttail]]
> +  return bar (x);
> +}
> +
> +__attribute__((noipa)) int
> +corge (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  return bar (x) + 1;
> +}
> +
> +__attribute__ ((noinline)) float
> +freddy (int x)
> +{
> +  foo (x);
> +  return 1.75f;
> +}
> +
> +__attribute__((noipa)) float
> +garply (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  [[gnu::musttail]]
> +  return freddy (x);
> +}
> +
> +__attribute__((noipa)) float
> +quux (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  return freddy (x) + 0.25f;
> +}
> +
> +int v;
> +
> +int
> +main ()
> +{
> +  qux (v);
> +  corge (v);
> +  garply (v);
> +  quux (v);
> +}
> --- gcc/testsuite/c-c++-common/pr118430.c.jj  2025-01-15 10:58:09.258897583 
> +0100
> +++ gcc/testsuite/c-c++-common/pr118430.c     2025-01-15 10:59:27.349797780 
> +0100
> @@ -0,0 +1,89 @@
> +/* PR tree-optimization/118430 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "  bar \\\(\[^\n\r]\*\\\); \\\[tail 
> call\\\]" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "  freddy \\\(\[^\n\r]\*\\\); \\\[tail 
> call\\\]" 2 "optimized" } } */
> +
> +__attribute__ ((noipa)) void
> +foo (int x)
> +{
> +  (void) x;
> +}
> +
> +__attribute__ ((noinline)) int
> +bar (int x)
> +{
> +  foo (x);
> +  return 1;
> +}
> +
> +__attribute__ ((noinline)) int
> +baz (int *x)
> +{
> +  foo (*x);
> +  return 2;
> +}
> +
> +__attribute__((noipa)) int
> +qux (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  return bar (x);
> +}
> +
> +__attribute__((noipa)) int
> +corge (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  bar (x);
> +  return 1;
> +}
> +
> +__attribute__ ((noinline)) float
> +freddy (int x)
> +{
> +  foo (x);
> +  return 1.75f;
> +}
> +
> +__attribute__((noipa)) float
> +garply (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  return freddy (x);
> +}
> +
> +__attribute__((noipa)) float
> +quux (int x)
> +{
> +  {
> +    int v;
> +    foo (x);
> +    baz (&v);
> +  }
> +  freddy (x);
> +  return 1.75f;
> +}
> +
> +int v;
> +
> +int
> +main ()
> +{
> +  qux (v);
> +  corge (v);
> +  garply (v);
> +  quux (v);
> +}
> 
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to