On Wed, Jan 15, 2025 at 10:42:12AM +0100, Richard Biener wrote:
> Yes.  I'll note there's a PR (or a bunch of) which are about
> 
> x = FOO (y, ..);
> <use of x>
> 
> vs.
> 
> FOO (y, ..);
> <use of y>
> 
> for FOO returning an argument (like for example memcpy).  Where neither
> form is best in all cases.  For your example above consider '42' being
> a FP constant, we'd have to re-load that from the constant pool after
> the call rather using the conveniently available copy in the return
> register.  Or on RISC archs it might be costly to materialize the
> integer immediate.
> 
> So the question is whether such replacement is a good thing - yes,
> we want the knowledge to simplify followup code, but copy-replacing
> might not always be good.

Or we can improve it incrementally, in RA or wherever else, perhaps with
some helper function.  VRP or ssa propagation unfortunately doesn't know
if something is a tail call and shouldn't redo tailc pass analysis for it
each time it encounters it, and not replacing it in places which aren't tail
call can hamper further optimizations when the lhs is used later on in other
statements.

The point of the patch is that it restores previous behavior at least for
the tail call optimization.

BTW, I think we don't optimize returns-arg stuff like that at least right
now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg
functions actually return a pointer, not integer and for prange we just
track constant values, not say address of something.

> > For non-musttail we shouldn't do that though, we don't know whether it is
> > tail-callable at all and whether having the constant in there is more
> > beneficial for optimizations or possibly keeping it as maybe tail callable.
> > And therefore I wrote the patch as is, it then fixes the 14/15 regression
> > for it and as a benefit can even handle the cases where user wrote somefn
> > (whatever); return 42; rather than return somefn (whatever);
> > 
> > I'm willing to at least try the punting on singleton replacements for
> > gnu::musttail though.
> 
> I think that would make it more robust indeed.  But general optimality
> concerns remain.  I don't think LRA knows to remat a constant from
> a return register for example.

It could be taught that.

In any case, IPA VRP return value optimizion isn't a new thing on the trunk,
we've been doing that already in GCC 14.

So, how about following adjusted patch (same tree-tailcall.cc, I've just
extended the musttail14.c testcase to really verify the non-tailcallable
cases weren't tail called) plus added testcase to verify the non-musttail
cases (both when written as return fncall (args); and as fncall (args);
return singletoncst;), plus follow up patch I'll post next?

2025-01-15  Jakub Jelinek  <ja...@redhat.com>
            Andrew Pinski  <quic_apin...@quicinc.com>

        PR tree-optimization/118430
        * tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h,
        symbol-summary.h, ipa-cp.h and ipa-prop.h.
        (find_tail_calls): If ass_var is NULL and ret_var is not, check if
        IPA-VRP has not found singleton return range for it.  In that case,
        don't punt if ret_var is the only value in that range.  Adjust the
        maybe_error_musttail message otherwise to diagnose different value
        being returned from the caller and callee rather than using return
        slot.  Formatting fixes.

        * c-c++-common/musttail14.c: New test.
        * c-c++-common/pr118430.c: New test.

--- gcc/tree-tailcall.cc.jj     2025-01-02 11:23:18.407490465 +0100
+++ gcc/tree-tailcall.cc        2025-01-14 16:03:13.169294430 +0100
@@ -45,6 +45,12 @@ along with GCC; see the file COPYING3.
 #include "ipa-utils.h"
 #include "tree-ssa-live.h"
 #include "diagnostic-core.h"
+#include "gimple-range.h"
+#include "alloc-pool.h"
+#include "sreal.h"
+#include "symbol-summary.h"
+#include "ipa-cp.h"
+#include "ipa-prop.h"
 
 /* The file implements the tail recursion elimination.  It is also used to
    analyze the tail calls in general, passing the results to the rtl level
@@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct
        {
          if (dump_file)
            fprintf (dump_file, "Basic block %d has extra exit edges\n",
-                           bb->index);
+                    bb->index);
          return;
        }
       if (!cfun->has_musttail)
@@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct
          if (bad_stmt)
            {
              maybe_error_musttail (call,
-                             _("memory reference or volatile after call"));
+                                   _("memory reference or volatile after "
+                                     "call"));
              return;
            }
          ass_var = gimple_call_lhs (call);
@@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct
   {
     if (stmt == last_stmt)
       maybe_error_musttail (call,
-                         _("call may throw exception that does not 
propagate"));
+                           _("call may throw exception that does not "
+                             "propagate"));
     else
-      maybe_error_musttail (call,
-                         _("code between call and return"));
+      maybe_error_musttail (call, _("code between call and return"));
     return;
   }
 
@@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct
            {
              if (local_live_vars)
                BITMAP_FREE (local_live_vars);
-             maybe_error_musttail (call, _("call invocation refers to 
locals"));
+             maybe_error_musttail (call,
+                                   _("call invocation refers to locals"));
              return;
            }
          else
@@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct
              if (bitmap_bit_p (local_live_vars, *v))
                {
                  BITMAP_FREE (local_live_vars);
-                 maybe_error_musttail (call, _("call invocation refers to 
locals"));
+                 maybe_error_musttail (call,
+                                       _("call invocation refers to locals"));
                  return;
                }
            }
@@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct
       && (ret_var != ass_var
          && !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var)))
     {
-      maybe_error_musttail (call, _("call uses return slot"));
-      return;
+      bool ok = false;
+      value_range val;
+      tree valr;
+      /* If IPA-VRP proves called function always returns a singleton range,
+        the return value is replaced by the only value in that range.
+        For tail call purposes, pretend such replacement didn't happen.  */
+      if (ass_var == NULL_TREE
+         && !tail_recursion
+         && TREE_CONSTANT (ret_var))
+       if (tree type = gimple_range_type (call))
+         if (tree callee = gimple_call_fndecl (call))
+           if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+               && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
+                                             type)
+               && useless_type_conversion_p (TREE_TYPE (ret_var), type)
+               && ipa_return_value_range (val, callee)
+               && val.singleton_p (&valr)
+               && operand_equal_p (ret_var, valr, 0))
+             ok = true;
+      if (!ok)
+       {
+         maybe_error_musttail (call,
+                               _("call and return value are different"));
+         return;
+       }
     }
 
   /* If this is not a tail recursive call, we cannot handle addends or
      multiplicands.  */
   if (!tail_recursion && (m || a))
     {
-      maybe_error_musttail (call, _("operations after non tail recursive 
call"));
+      maybe_error_musttail (call,
+                           _("operations after non tail recursive call"));
       return;
     }
 
@@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct
   if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl))))
     {
       maybe_error_musttail (call,
-                     _("tail recursion with pointers can only use additions"));
+                           _("tail recursion with pointers can only use "
+                             "additions"));
       return;
     }
 
--- gcc/testsuite/c-c++-common/musttail14.c.jj  2025-01-14 16:22:06.623276122 
+0100
+++ gcc/testsuite/c-c++-common/musttail14.c     2025-01-15 11:32:58.048016223 
+0100
@@ -0,0 +1,90 @@
+/* PR tree-optimization/118430 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "  bar \\\(\[^\n\r]\*\\\); \\\[tail 
call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "  freddy \\\(\[^\n\r]\*\\\); \\\[tail 
call\\\] \\\[must tail call\\\]" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "  (?:bar|freddy) \\\(\[^\n\r]\*\\\); 
\\\[tail call\\\]" 2 "optimized" } } */
+
+__attribute__ ((noipa)) void
+foo (int x)
+{
+  (void) x;
+}
+
+__attribute__ ((noinline)) int
+bar (int x)
+{
+  foo (x);
+  return 1;
+}
+
+__attribute__ ((noinline)) int
+baz (int *x)
+{
+  foo (*x);
+  return 2;
+}
+
+__attribute__((noipa)) int
+qux (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  [[gnu::musttail]]
+  return bar (x);
+}
+
+__attribute__((noipa)) int
+corge (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  return bar (x) + 1;
+}
+
+__attribute__ ((noinline)) float
+freddy (int x)
+{
+  foo (x);
+  return 1.75f;
+}
+
+__attribute__((noipa)) float
+garply (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  [[gnu::musttail]]
+  return freddy (x);
+}
+
+__attribute__((noipa)) float
+quux (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  return freddy (x) + 0.25f;
+}
+
+int v;
+
+int
+main ()
+{
+  qux (v);
+  corge (v);
+  garply (v);
+  quux (v);
+}
--- gcc/testsuite/c-c++-common/pr118430.c.jj    2025-01-15 10:58:09.258897583 
+0100
+++ gcc/testsuite/c-c++-common/pr118430.c       2025-01-15 10:59:27.349797780 
+0100
@@ -0,0 +1,89 @@
+/* PR tree-optimization/118430 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "  bar \\\(\[^\n\r]\*\\\); \\\[tail 
call\\\]" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "  freddy \\\(\[^\n\r]\*\\\); \\\[tail 
call\\\]" 2 "optimized" } } */
+
+__attribute__ ((noipa)) void
+foo (int x)
+{
+  (void) x;
+}
+
+__attribute__ ((noinline)) int
+bar (int x)
+{
+  foo (x);
+  return 1;
+}
+
+__attribute__ ((noinline)) int
+baz (int *x)
+{
+  foo (*x);
+  return 2;
+}
+
+__attribute__((noipa)) int
+qux (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  return bar (x);
+}
+
+__attribute__((noipa)) int
+corge (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  bar (x);
+  return 1;
+}
+
+__attribute__ ((noinline)) float
+freddy (int x)
+{
+  foo (x);
+  return 1.75f;
+}
+
+__attribute__((noipa)) float
+garply (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  return freddy (x);
+}
+
+__attribute__((noipa)) float
+quux (int x)
+{
+  {
+    int v;
+    foo (x);
+    baz (&v);
+  }
+  freddy (x);
+  return 1.75f;
+}
+
+int v;
+
+int
+main ()
+{
+  qux (v);
+  corge (v);
+  garply (v);
+  quux (v);
+}


        Jakub

Reply via email to