[PATCH] Add -fopt-builtin optimization option

2021-10-31 Thread Keith Packard via Gcc
This option (enabled by default) controls optimizations which convert
a sequence of operations into an equivalent sequence that includes
calls to builtin functions. Typical cases here are code which matches
memcpy, calloc, sincos.

The -ftree-loop-distribute-patterns flag only covers converting loops
into builtin calls, not numerous other places where knowledge of
builtin function semantics changes the generated code.

The goal is to allow built-in functions to be declared by the compiler
and used directly by the application, but to disable optimizations
which create new calls to them, and to allow this optimization
behavior to be changed for individual functions by decorating the
function definition like this:

void
attribute((optimize("no-opt-builtin")))
sincos(double x, double *s, double *c)
{
*s = sin(x);
*c = cos(x);
}

This also avoids converting loops into library calls like this:

void *
attribute((optimize("no-opt-builtin")))
memcpy(void *__restrict__ dst, const void *__restrict__ src, size_t n)
{
char *d = dst;
const char *s = src;

while (n--)
*d++ = *s++;
return dst;
}

As well as disabling analysis of memory lifetimes around free as in
this example:

void *
attribute((optimize("no-opt-builtin")))
erase_and_free(void *ptr)
{
memset(ptr, '\0', malloc_usable_size(ptr));
free(ptr);
}

Clang has a more sophisticated version of this mechanism which
can disable all builtins, or disable a specific builtin:

double
attribute((no_builtin("exp2")))
exp2(double x)
{
return pow (2.0, x);
}

Signed-off-by: Keith Packard 
---
 gcc/builtins.c   | 6 ++
 gcc/common.opt   | 4 
 gcc/gimple.c | 3 +++
 gcc/tree-loop-distribution.c | 2 ++
 4 files changed, 15 insertions(+)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 7d0f61fc98b..7aae57deab5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1922,6 +1922,9 @@ mathfn_built_in_2 (tree type, combined_fn fn)
   built_in_function fcodef64x = END_BUILTINS;
   built_in_function fcodef128x = END_BUILTINS;
 
+  if (flag_no_opt_builtin)
+return END_BUILTINS;
+
   switch (fn)
 {
 #define SEQ_OF_CASE_MATHFN \
@@ -2125,6 +2128,9 @@ mathfn_built_in_type (combined_fn fn)
   case CFN_BUILT_IN_##MATHFN##L_R: \
 return long_double_type_node;
 
+  if (flag_no_opt_builtin)
+return NULL_TREE;
+
   switch (fn)
 {
 SEQ_OF_CASE_MATHFN
diff --git a/gcc/common.opt b/gcc/common.opt
index eeba1a727f2..d6111cc776a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2142,6 +2142,10 @@ fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
 
+fopt-builtin
+Common Var(flag_no_opt_builtin, 0) Optimization
+Match code sequences equivalent to builtin functions
+
 fopt-info
 Common Var(flag_opt_info) Optimization
 Enable all optimization info dumps on stderr.
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 22dd6417d19..5b82b9409c0 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2790,6 +2790,9 @@ gimple_builtin_call_types_compatible_p (const gimple 
*stmt, tree fndecl)
 {
   gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN);
 
+  if (flag_no_opt_builtin)
+return false;
+
   tree ret = gimple_call_lhs (stmt);
   if (ret
   && !useless_type_conversion_p (TREE_TYPE (ret),
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 583c01a42d8..43f22a3c7ce 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1859,6 +1859,7 @@ loop_distribution::classify_partition (loop_p loop,
 
   /* Perform general partition disqualification for builtins.  */
   if (volatiles_p
+  || flag_no_opt_builtin
   || !flag_tree_loop_distribute_patterns)
 return has_reduction;
 
@@ -3764,6 +3765,7 @@ loop_distribution::execute (function *fun)
   /* Don't distribute multiple exit edges loop, or cold loop when
  not doing pattern detection.  */
   if (!single_exit (loop)
+ || flag_no_opt_builtin
  || (!flag_tree_loop_distribute_patterns
  && !optimize_loop_for_speed_p (loop)))
continue;
-- 
2.33.0



-Wuninitialized false positives and threading knobs

2021-10-31 Thread Aldy Hernandez via Gcc
After Jeff's explanation of the symbiosis between jump threading and
the uninit pass, I'm beginning to see that (almost) every
Wuninitialized warning is cause for reflection.  It usually hides a
missing jump thread.  I investigated one such false positive
(uninit-pred-7_a.c) and indeed, there's a missing thread.  The
question is what to do about it.

This seemingly simple test is now regressing as can be seen by the
xfail I added.

What happens is that we now thread far more than before, causing the
distance from definition to use to expand.  The threading candidate
that would make the Wuninitialized go away is there, and the backward
threader can see it, but it refuses to thread it because the number of
statements would be too large.

This is interesting because it means threading is causing larger IL
that in turn keeps us from threading some unreachable paths later on
because the paths are too large.

If you look at the *.threadfull2 dump for the attached simplified
test, you can see that the 3->5->6-8->10->13 path would elide the
unreachable read, but alas we can't look past BB5, because it would
thread too many statements:

Checking profitability of path (backwards):  bb:10 (2 insns) bb:8 (2
insns) bb:6 (6 insns) bb:5
  Control statement insns: 2
  Overall: 8 insns
  FAIL: Did not thread around loop and would copy too many statements.

The "problem" we have is that if there's a path in the IL, the new
threader *will* exploit it (ranger dependent).  This in turns opens up
opportunities for other threaders (even DOM) creating a cascading
effect.

For the attached test, we can squelched the warning with a mere:

--param=max-jump-thread-duplication-stmts=19

I don't know how we decided on the default 15 param, and if it makes
sense to tweak this, but our current threading passes, and how they
relate to VRP  and the uninit pass look like this:

$ ls a.c.* | grep -e thread -e dom[23] -e vrp[12] -e uninit
a.c.034t.ethread
a.c.111t.threadfull1
a.c.112t.vrp1
a.c.126t.thread1
a.c.127t.dom2
a.c.191t.thread2
a.c.192t.dom3
a.c.194t.threadfull2
a.c.195t.vrp2
a.c.209t.uninit1

Perhaps we could turn down the knobs for thread[12] and increase them
for threadfull[12]?  I really don't know.  For this particular test,
we could even turn off thread1, increase the duplication statements,
and eliminate the warning.  This would leave DOM2 without the threader
that runs before it though.

I'm out of my depth here, plus I'm a bit hesitant to make performance
decisions to improve warnings.  On the other hand, it's sad that
improved threading is causing regressions on a test as simple as this
one.  That being said, I generally don't mention it, but the threading
improvements so far solve more problems than they introduce, so
perhaps we should do nothing??

I'd be curious to hear what others think.  Perhaps others could play
with the different knobs and see if there's a better combination that
could keep warnings and optimizers in better equilibrium.

[FWIW Martin, you could revisit some of the uninit regressions and see
if tweaking the above --param would silence the bogus warning.  In
which case, it's a hint that the regression may not be due to the
uninit code itself.  FTR, I'm not saying we _should_ thread more, just
that this could be a tool to help diagnose].

Aldy
/* { dg-do compile } */
/* { dg-options "-Wuninitialized -O2" } */

int g;
void blah1(int);
void blah2(int);
void crapola(int);

int foo (int n, int l, int m, int r)
{
  int v;

  if (n || l)
v = r;

  if (m)
g++;

  if ( n && l)
  blah1(v); /* { dg-bogus "uninitialized" "bogus warning" } */

  if ( n )
  blah2(v); /* { dg-bogus "uninitialized" "bogus warning" } */

  if ( l )
  crapola(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail *-*-* } } */

  return 0;
}


gcc-12-20211031 is now available

2021-10-31 Thread GCC Administrator via Gcc
Snapshot gcc-12-20211031 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20211031/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision ca84f39399fda80c770306465276ffd66d3766ed

You'll find:

 gcc-12-20211031.tar.xz   Complete GCC

  SHA256=ff00aee21d003bcc542896551a99adf0b0ccd935d85e96886d6d886c0dc3a3cc
  SHA1=c5499a3bee10572c290ead3e140647335ee004cd

Diffs from 12-20211024 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.