On Wed, Jan 15, 2025 at 10:42:12AM +0100, Richard Biener wrote: > Yes. I'll note there's a PR (or a bunch of) which are about > > x = FOO (y, ..); > <use of x> > > vs. > > FOO (y, ..); > <use of y> > > for FOO returning an argument (like for example memcpy). Where neither > form is best in all cases. For your example above consider '42' being > a FP constant, we'd have to re-load that from the constant pool after > the call rather using the conveniently available copy in the return > register. Or on RISC archs it might be costly to materialize the > integer immediate. > > So the question is whether such replacement is a good thing - yes, > we want the knowledge to simplify followup code, but copy-replacing > might not always be good.
Or we can improve it incrementally, in RA or wherever else, perhaps with some helper function. VRP or ssa propagation unfortunately doesn't know if something is a tail call and shouldn't redo tailc pass analysis for it each time it encounters it, and not replacing it in places which aren't tail call can hamper further optimizations when the lhs is used later on in other statements. The point of the patch is that it restores previous behavior at least for the tail call optimization. BTW, I think we don't optimize returns-arg stuff like that at least right now, and if we did, it wouldn't be through IPA-VRP, most of the returns-arg functions actually return a pointer, not integer and for prange we just track constant values, not say address of something. > > For non-musttail we shouldn't do that though, we don't know whether it is > > tail-callable at all and whether having the constant in there is more > > beneficial for optimizations or possibly keeping it as maybe tail callable. > > And therefore I wrote the patch as is, it then fixes the 14/15 regression > > for it and as a benefit can even handle the cases where user wrote somefn > > (whatever); return 42; rather than return somefn (whatever); > > > > I'm willing to at least try the punting on singleton replacements for > > gnu::musttail though. > > I think that would make it more robust indeed. But general optimality > concerns remain. I don't think LRA knows to remat a constant from > a return register for example. It could be taught that. In any case, IPA VRP return value optimizion isn't a new thing on the trunk, we've been doing that already in GCC 14. So, how about following adjusted patch (same tree-tailcall.cc, I've just extended the musttail14.c testcase to really verify the non-tailcallable cases weren't tail called) plus added testcase to verify the non-musttail cases (both when written as return fncall (args); and as fncall (args); return singletoncst;), plus follow up patch I'll post next? 2025-01-15 Jakub Jelinek <ja...@redhat.com> Andrew Pinski <quic_apin...@quicinc.com> PR tree-optimization/118430 * tree-tailcall.cc: Include gimple-range.h, alloc-pool.h, sreal.h, symbol-summary.h, ipa-cp.h and ipa-prop.h. (find_tail_calls): If ass_var is NULL and ret_var is not, check if IPA-VRP has not found singleton return range for it. In that case, don't punt if ret_var is the only value in that range. Adjust the maybe_error_musttail message otherwise to diagnose different value being returned from the caller and callee rather than using return slot. Formatting fixes. * c-c++-common/musttail14.c: New test. * c-c++-common/pr118430.c: New test. --- gcc/tree-tailcall.cc.jj 2025-01-02 11:23:18.407490465 +0100 +++ gcc/tree-tailcall.cc 2025-01-14 16:03:13.169294430 +0100 @@ -45,6 +45,12 @@ along with GCC; see the file COPYING3. #include "ipa-utils.h" #include "tree-ssa-live.h" #include "diagnostic-core.h" +#include "gimple-range.h" +#include "alloc-pool.h" +#include "sreal.h" +#include "symbol-summary.h" +#include "ipa-cp.h" +#include "ipa-prop.h" /* The file implements the tail recursion elimination. It is also used to analyze the tail calls in general, passing the results to the rtl level @@ -483,7 +489,7 @@ find_tail_calls (basic_block bb, struct { if (dump_file) fprintf (dump_file, "Basic block %d has extra exit edges\n", - bb->index); + bb->index); return; } if (!cfun->has_musttail) @@ -517,7 +523,8 @@ find_tail_calls (basic_block bb, struct if (bad_stmt) { maybe_error_musttail (call, - _("memory reference or volatile after call")); + _("memory reference or volatile after " + "call")); return; } ass_var = gimple_call_lhs (call); @@ -597,10 +604,10 @@ find_tail_calls (basic_block bb, struct { if (stmt == last_stmt) maybe_error_musttail (call, - _("call may throw exception that does not propagate")); + _("call may throw exception that does not " + "propagate")); else - maybe_error_musttail (call, - _("code between call and return")); + maybe_error_musttail (call, _("code between call and return")); return; } @@ -715,7 +722,8 @@ find_tail_calls (basic_block bb, struct { if (local_live_vars) BITMAP_FREE (local_live_vars); - maybe_error_musttail (call, _("call invocation refers to locals")); + maybe_error_musttail (call, + _("call invocation refers to locals")); return; } else @@ -724,7 +732,8 @@ find_tail_calls (basic_block bb, struct if (bitmap_bit_p (local_live_vars, *v)) { BITMAP_FREE (local_live_vars); - maybe_error_musttail (call, _("call invocation refers to locals")); + maybe_error_musttail (call, + _("call invocation refers to locals")); return; } } @@ -833,15 +842,39 @@ find_tail_calls (basic_block bb, struct && (ret_var != ass_var && !(is_empty_type (TREE_TYPE (ret_var)) && !ass_var))) { - maybe_error_musttail (call, _("call uses return slot")); - return; + bool ok = false; + value_range val; + tree valr; + /* If IPA-VRP proves called function always returns a singleton range, + the return value is replaced by the only value in that range. + For tail call purposes, pretend such replacement didn't happen. */ + if (ass_var == NULL_TREE + && !tail_recursion + && TREE_CONSTANT (ret_var)) + if (tree type = gimple_range_type (call)) + if (tree callee = gimple_call_fndecl (call)) + if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)) + && useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)), + type) + && useless_type_conversion_p (TREE_TYPE (ret_var), type) + && ipa_return_value_range (val, callee) + && val.singleton_p (&valr) + && operand_equal_p (ret_var, valr, 0)) + ok = true; + if (!ok) + { + maybe_error_musttail (call, + _("call and return value are different")); + return; + } } /* If this is not a tail recursive call, we cannot handle addends or multiplicands. */ if (!tail_recursion && (m || a)) { - maybe_error_musttail (call, _("operations after non tail recursive call")); + maybe_error_musttail (call, + _("operations after non tail recursive call")); return; } @@ -849,7 +882,8 @@ find_tail_calls (basic_block bb, struct if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl)))) { maybe_error_musttail (call, - _("tail recursion with pointers can only use additions")); + _("tail recursion with pointers can only use " + "additions")); return; } --- gcc/testsuite/c-c++-common/musttail14.c.jj 2025-01-14 16:22:06.623276122 +0100 +++ gcc/testsuite/c-c++-common/musttail14.c 2025-01-15 11:32:58.048016223 +0100 @@ -0,0 +1,90 @@ +/* PR tree-optimization/118430 */ +/* { dg-do compile { target musttail } } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\] \\\[must tail call\\\]" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " (?:bar|freddy) \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */ + +__attribute__ ((noipa)) void +foo (int x) +{ + (void) x; +} + +__attribute__ ((noinline)) int +bar (int x) +{ + foo (x); + return 1; +} + +__attribute__ ((noinline)) int +baz (int *x) +{ + foo (*x); + return 2; +} + +__attribute__((noipa)) int +qux (int x) +{ + { + int v; + foo (x); + baz (&v); + } + [[gnu::musttail]] + return bar (x); +} + +__attribute__((noipa)) int +corge (int x) +{ + { + int v; + foo (x); + baz (&v); + } + return bar (x) + 1; +} + +__attribute__ ((noinline)) float +freddy (int x) +{ + foo (x); + return 1.75f; +} + +__attribute__((noipa)) float +garply (int x) +{ + { + int v; + foo (x); + baz (&v); + } + [[gnu::musttail]] + return freddy (x); +} + +__attribute__((noipa)) float +quux (int x) +{ + { + int v; + foo (x); + baz (&v); + } + return freddy (x) + 0.25f; +} + +int v; + +int +main () +{ + qux (v); + corge (v); + garply (v); + quux (v); +} --- gcc/testsuite/c-c++-common/pr118430.c.jj 2025-01-15 10:58:09.258897583 +0100 +++ gcc/testsuite/c-c++-common/pr118430.c 2025-01-15 10:59:27.349797780 +0100 @@ -0,0 +1,89 @@ +/* PR tree-optimization/118430 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times " bar \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " freddy \\\(\[^\n\r]\*\\\); \\\[tail call\\\]" 2 "optimized" } } */ + +__attribute__ ((noipa)) void +foo (int x) +{ + (void) x; +} + +__attribute__ ((noinline)) int +bar (int x) +{ + foo (x); + return 1; +} + +__attribute__ ((noinline)) int +baz (int *x) +{ + foo (*x); + return 2; +} + +__attribute__((noipa)) int +qux (int x) +{ + { + int v; + foo (x); + baz (&v); + } + return bar (x); +} + +__attribute__((noipa)) int +corge (int x) +{ + { + int v; + foo (x); + baz (&v); + } + bar (x); + return 1; +} + +__attribute__ ((noinline)) float +freddy (int x) +{ + foo (x); + return 1.75f; +} + +__attribute__((noipa)) float +garply (int x) +{ + { + int v; + foo (x); + baz (&v); + } + return freddy (x); +} + +__attribute__((noipa)) float +quux (int x) +{ + { + int v; + foo (x); + baz (&v); + } + freddy (x); + return 1.75f; +} + +int v; + +int +main () +{ + qux (v); + corge (v); + garply (v); + quux (v); +} Jakub