[PATCH] Add -fopt-builtin optimization option
This option (enabled by default) controls optimizations which convert a sequence of operations into an equivalent sequence that includes calls to builtin functions. Typical cases here are code which matches memcpy, calloc, sincos. The -ftree-loop-distribute-patterns flag only covers converting loops into builtin calls, not numerous other places where knowledge of builtin function semantics changes the generated code. The goal is to allow built-in functions to be declared by the compiler and used directly by the application, but to disable optimizations which create new calls to them, and to allow this optimization behavior to be changed for individual functions by decorating the function definition like this: void attribute((optimize("no-opt-builtin"))) sincos(double x, double *s, double *c) { *s = sin(x); *c = cos(x); } This also avoids converting loops into library calls like this: void * attribute((optimize("no-opt-builtin"))) memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n) { char *d = dst; const char *s = src; while (n--) *d++ = *s++; return dst; } As well as disabling analysis of memory lifetimes around free as in this example: void * attribute((optimize("no-opt-builtin"))) erase_and_free(void *ptr) { memset(ptr, '\0', malloc_usable_size(ptr)); free(ptr); } Clang has a more sophisticated version of this mechanism which can disable all builtins, or disable a specific builtin: double attribute((no_builtin("exp2"))) exp2(double x) { return pow (2.0, x); } Signed-off-by: Keith Packard --- gcc/builtins.c | 6 ++ gcc/common.opt | 4 gcc/gimple.c | 3 +++ gcc/tree-loop-distribution.c | 2 ++ 4 files changed, 15 insertions(+) diff --git a/gcc/builtins.c b/gcc/builtins.c index 7d0f61fc98b..7aae57deab5 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -1922,6 +1922,9 @@ mathfn_built_in_2 (tree type, combined_fn fn) built_in_function fcodef64x = END_BUILTINS; built_in_function fcodef128x = END_BUILTINS; + if (flag_no_opt_builtin) +return END_BUILTINS; + switch (fn) { #define SEQ_OF_CASE_MATHFN \ @@ -2125,6 +2128,9 @@ mathfn_built_in_type (combined_fn fn) case CFN_BUILT_IN_##MATHFN##L_R: \ return long_double_type_node; + if (flag_no_opt_builtin) +return NULL_TREE; + switch (fn) { SEQ_OF_CASE_MATHFN diff --git a/gcc/common.opt b/gcc/common.opt index eeba1a727f2..d6111cc776a 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2142,6 +2142,10 @@ fomit-frame-pointer Common Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. +fopt-builtin +Common Var(flag_no_opt_builtin, 0) Optimization +Match code sequences equivalent to builtin functions + fopt-info Common Var(flag_opt_info) Optimization Enable all optimization info dumps on stderr. diff --git a/gcc/gimple.c b/gcc/gimple.c index 22dd6417d19..5b82b9409c0 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2790,6 +2790,9 @@ gimple_builtin_call_types_compatible_p (const gimple *stmt, tree fndecl) { gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN); + if (flag_no_opt_builtin) +return false; + tree ret = gimple_call_lhs (stmt); if (ret && !useless_type_conversion_p (TREE_TYPE (ret), diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 583c01a42d8..43f22a3c7ce 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1859,6 +1859,7 @@ loop_distribution::classify_partition (loop_p loop, /* Perform general partition disqualification for builtins. */ if (volatiles_p + || flag_no_opt_builtin || !flag_tree_loop_distribute_patterns) return has_reduction; @@ -3764,6 +3765,7 @@ loop_distribution::execute (function *fun) /* Don't distribute multiple exit edges loop, or cold loop when not doing pattern detection. */ if (!single_exit (loop) + || flag_no_opt_builtin || (!flag_tree_loop_distribute_patterns && !optimize_loop_for_speed_p (loop))) continue; -- 2.33.0
-Wuninitialized false positives and threading knobs
After Jeff's explanation of the symbiosis between jump threading and the uninit pass, I'm beginning to see that (almost) every Wuninitialized warning is cause for reflection. It usually hides a missing jump thread. I investigated one such false positive (uninit-pred-7_a.c) and indeed, there's a missing thread. The question is what to do about it. This seemingly simple test is now regressing as can be seen by the xfail I added. What happens is that we now thread far more than before, causing the distance from definition to use to expand. The threading candidate that would make the Wuninitialized go away is there, and the backward threader can see it, but it refuses to thread it because the number of statements would be too large. This is interesting because it means threading is causing larger IL that in turn keeps us from threading some unreachable paths later on because the paths are too large. If you look at the *.threadfull2 dump for the attached simplified test, you can see that the 3->5->6-8->10->13 path would elide the unreachable read, but alas we can't look past BB5, because it would thread too many statements: Checking profitability of path (backwards): bb:10 (2 insns) bb:8 (2 insns) bb:6 (6 insns) bb:5 Control statement insns: 2 Overall: 8 insns FAIL: Did not thread around loop and would copy too many statements. The "problem" we have is that if there's a path in the IL, the new threader *will* exploit it (ranger dependent). This in turns opens up opportunities for other threaders (even DOM) creating a cascading effect. For the attached test, we can squelched the warning with a mere: --param=max-jump-thread-duplication-stmts=19 I don't know how we decided on the default 15 param, and if it makes sense to tweak this, but our current threading passes, and how they relate to VRP and the uninit pass look like this: $ ls a.c.* | grep -e thread -e dom[23] -e vrp[12] -e uninit a.c.034t.ethread a.c.111t.threadfull1 a.c.112t.vrp1 a.c.126t.thread1 a.c.127t.dom2 a.c.191t.thread2 a.c.192t.dom3 a.c.194t.threadfull2 a.c.195t.vrp2 a.c.209t.uninit1 Perhaps we could turn down the knobs for thread[12] and increase them for threadfull[12]? I really don't know. For this particular test, we could even turn off thread1, increase the duplication statements, and eliminate the warning. This would leave DOM2 without the threader that runs before it though. I'm out of my depth here, plus I'm a bit hesitant to make performance decisions to improve warnings. On the other hand, it's sad that improved threading is causing regressions on a test as simple as this one. That being said, I generally don't mention it, but the threading improvements so far solve more problems than they introduce, so perhaps we should do nothing?? I'd be curious to hear what others think. Perhaps others could play with the different knobs and see if there's a better combination that could keep warnings and optimizers in better equilibrium. [FWIW Martin, you could revisit some of the uninit regressions and see if tweaking the above --param would silence the bogus warning. In which case, it's a hint that the regression may not be due to the uninit code itself. FTR, I'm not saying we _should_ thread more, just that this could be a tool to help diagnose]. Aldy /* { dg-do compile } */ /* { dg-options "-Wuninitialized -O2" } */ int g; void blah1(int); void blah2(int); void crapola(int); int foo (int n, int l, int m, int r) { int v; if (n || l) v = r; if (m) g++; if ( n && l) blah1(v); /* { dg-bogus "uninitialized" "bogus warning" } */ if ( n ) blah2(v); /* { dg-bogus "uninitialized" "bogus warning" } */ if ( l ) crapola(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail *-*-* } } */ return 0; }
gcc-12-20211031 is now available
Snapshot gcc-12-20211031 is now available on https://gcc.gnu.org/pub/gcc/snapshots/12-20211031/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 12 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch master revision ca84f39399fda80c770306465276ffd66d3766ed You'll find: gcc-12-20211031.tar.xz Complete GCC SHA256=ff00aee21d003bcc542896551a99adf0b0ccd935d85e96886d6d886c0dc3a3cc SHA1=c5499a3bee10572c290ead3e140647335ee004cd Diffs from 12-20211024 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-12 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.