Re: [PATCH] Add _GLIBCXX_DEBUG backtrace generation

2022-08-09 Thread François Dumont via Gcc-patches

On 08/08/22 15:29, Jonathan Wakely wrote:

On Wed, 13 Jul 2022 at 18:28, François Dumont via Libstdc++
 wrote:

libstdc++: [_GLIBCXX_DEBUG] Add backtrace generation on demand

Add _GLIBCXX_DEBUG_BACKTRACE macro to activate backtrace generation
on _GLIBCXX_DEBUG assertions. Prerequisite is to have configure the lib
with:

--enable-libstdcxx-backtrace=yes

libstdc++-v3/ChangeLog:

* include/debug/formatter.h
[_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_state): Declare.
[_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_create_state): Declare.
[_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_full_callback): Define.
[_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_error_callback): Define.
[_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_full_func): Define.
[_GLIBCXX_HAVE_STACKTRACE](__glibcxx_backtrace_full): Declare.
[_GLIBCXX_HAVE_STACKTRACE](_Error_formatter::_M_backtrace_state): New.
[_GLIBCXX_HAVE_STACKTRACE](_Error_formatter::_M_backtrace_full): New.
* src/c++11/debug.cc (pretty_print): Rename into...
(print_function): ...that.

This does more than just rename it, what are the other changes for?


Nothing, I'm starting to remember what you did on this, reverted.






[_GLIBCXX_HAVE_STACKTRACE](print_backtrace): New.
(_Error_formatter::_M_error()): Adapt.
* src/libbacktrace/Makefile.am: Add backtrace.c.
* src/libbacktrace/Makefile.in: Regenerate.
* src/libbacktrace/backtrace-rename.h (backtrace_full): New.
* testsuite/23_containers/vector/debug/assign4_neg.cc: Add backtrace
  generation.
* doc/xml/manual/debug_mode.xml: Document _GLIBCXX_DEBUG_BACKTRACE.
* doc/xml/manual/using.xml: Likewise.

Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes.

Ok to commit ?



--- a/libstdc++-v3/testsuite/23_containers/vector/debug/assign4_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/debug/assign4_neg.cc
@@ -16,6 +16,7 @@
// .
//
// { dg-do run { xfail *-*-* } }
+// { dg-options "-D_GLIBCXX_DEBUG_BACKTRACE -lstdc++_libbacktrace" }

#include 
#include 

This will fail to link if the static lib isn't available.

Good point ! So I am introducing a new test case with the necessary dg 
directive.


It is a 'run' test case even if what is really tested is only the 
compilation/link part. For the run part someone has to look at the log file.


François
diff --git a/libstdc++-v3/doc/xml/manual/debug_mode.xml b/libstdc++-v3/doc/xml/manual/debug_mode.xml
index 988c4a93601..dadc0cd1bb4 100644
--- a/libstdc++-v3/doc/xml/manual/debug_mode.xml
+++ b/libstdc++-v3/doc/xml/manual/debug_mode.xml
@@ -161,6 +161,12 @@ which always works correctly.
   GLIBCXX_DEBUG_MESSAGE_LENGTH can be used to request a
   different length.
 
+Note that libstdc++ is able to produce backtraces on error.
+  It requires that you configure libstdc++ build with
+  --enable-libstdcxx-backtrace=yes.
+  Use -D_GLIBCXX_DEBUG_BACKTRACE to activate it.
+  You'll then have to link with libstdc++_libbacktrace static library
+  (-lstdc++_libbacktrace) to build your application.
 
 
 Using a Specific Debug Container
diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml
index 36b86702d22..26f14fae194 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -1129,6 +1129,15 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe
 	extensions and libstdc++-specific behavior into errors.
   
 
+_GLIBCXX_DEBUG_BACKTRACE
+
+  
+	Undefined by default. Considered only if libstdc++ has been configured with
+	--enable-libstdcxx-backtrace=yes and if _GLIBCXX_DEBUG
+	is defined. When defined display backtraces on
+	debug mode assertions.
+  
+
 _GLIBCXX_PARALLEL
 
   Undefined by default. When defined, compiles user code
@@ -1635,6 +1644,7 @@ A quick read of the relevant part of the GCC
   header will remain compatible between different GCC releases.
 
 
+
   
 
   Concurrency
diff --git a/libstdc++-v3/include/debug/formatter.h b/libstdc++-v3/include/debug/formatter.h
index 748d4fbfea4..b4b72383e22 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -31,6 +31,37 @@
 
 #include 
 
+#if _GLIBCXX_HAVE_STACKTRACE
+struct __glibcxx_backtrace_state;
+
+extern "C"
+{
+  __glibcxx_backtrace_state*
+  __glibcxx_backtrace_create_state(const char*, int,
+   void(*)(void*, const char*, int),
+   void*);
+
+  typedef int (*__glibcxx_backtrace_full_callback) (
+void*, __UINTPTR_TYPE__, const char *, int, const char*);
+
+  typedef void (*__glibcxx_backtrace_error_callback) (
+void*, const char*, int);
+
+  typedef int (*__glibcxx_backtrace_full_func) (
+__glibcxx_backtrace_state*, int,
+__glibcxx_backtrace_full_callback,
+__glibcxx_backtrace_error_callback,
+void*);
+
+  int
+  __glibcxx_backtrace_full(
+__glibcxx_backtrace_state*, int,

[PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-09 Thread Immad Mir via Gcc-patches
This patch fixes the ICE caused by valid_to_unchecked_state,
at analyzer/sm-fd.cc by handling the m_start state in
check_for_dup.

Tested lightly on x86_64.

gcc/analyzer/ChangeLog:
PR analyzer/106551
* sm-fd.cc (check_for_dup): handle the m_start
state when transitioning the state of LHS
of dup, dup2 and dup3 call.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index 8bb76d72b05..c8b9930a7b6 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, const 
supernode *node,
 case DUP_1:
   if (lhs)
{
- if (is_constant_fd_p (state_arg_1))
+ if (is_constant_fd_p (state_arg_1) || state_arg_1 == m_start)
sm_ctxt->set_next_state (stmt, lhs, m_unchecked_read_write);
  else
sm_ctxt->set_next_state (stmt, lhs,
@@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, 
const supernode *node,
   file descriptor i.e the first argument.  */
   if (lhs)
{
- if (is_constant_fd_p (state_arg_1))
+ if (is_constant_fd_p (state_arg_1) || state_arg_1 == m_start)
sm_ctxt->set_next_state (stmt, lhs, m_unchecked_read_write);
  else
sm_ctxt->set_next_state (stmt, lhs,
-- 
2.25.1



[x86_64 PATCH] Use PTEST to perform AND in TImode STV of (A & B) != 0.

2022-08-09 Thread Roger Sayle

This x86_64 backend patch allows TImode STV to take advantage of the
fact that the PTEST instruction performs an AND operation.  Previously
PTEST was (mostly) used for comparison against zero, by using the same
operands.  The benefits are demonstrated by the new test case:

__int128 a,b;
int foo()
{
  return (a & b) != 0;
}

Currently with -O2 -msse4 we generate:

movdqa  a(%rip), %xmm0
pandb(%rip), %xmm0
xorl%eax, %eax
ptest   %xmm0, %xmm0
setne   %al
ret

with this patch we now generate:

movdqa  a(%rip), %xmm0
xorl%eax, %eax
ptest   b(%rip), %xmm0
setne   %al
ret

Technically, the magic happens using new define_insn_and_split patterns.
Using two patterns allows this transformation to performed independently
of whether TImode STV is run before or after combine.  The one tricky
case is that immediate constant operands of the AND behave slightly
differently between TImode and V1TImode: All V1TImode immediate operands
becomes loads, but for TImode only values that are not hilo_operands
need to be loaded.  Hence the new *testti_doubleword accepts any
general_operand, but internally during split calls force_reg whenever
the second operand is not x86_64_hilo_general_operand.  This required
(benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in
more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-09  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Create new pseudos only when/if needed.  Add support for TEST
(i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST.
When broadcasting V2DImode and V4SImode use new pseudo register.
(timode_scalar_chain::convert_op): Do nothing if operand is
already V1TImode.  Avoid generating useless SUBREG conversions,
i.e. (SUBREG:V1TImode (REG:V1TImode) 0).  Handle CONST_WIDE_INT
in addition to CONST_INT by using CONST_SCALAR_INT_P.
(convertible_comparison_p): Use CONST_SCALAR_INT_P to match both
CONST_WIDE_INT and CONST_INT.  Recognize new *testti_doubleword
pattern as an STV candidate.
(timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P
operands in binary logic operations.

* config/i386/i386.cc (ix86_rtx_costs) : Add costs
for UNSPEC_PTEST; a PTEST that performs an AND has the same cost
as regular PTEST, i.e. cost->sse_op.

* config/i386/i386.md (*testti_doubleword): New pre-reload
define_insn_and_split that recognizes comparison of TI mode AND
against zero.
* config/i386/sse.md (*ptest_and): New pre-reload
define_insn_and_split that recognizes UNSPEC_PTEST of identical
AND operands.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-8.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 5e3a7ff..effc2f2 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -919,8 +919,7 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 rtx
 scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn)
 {
-  rtx tmp = gen_reg_rtx (vmode);
-  rtx src;
+  rtx src, tmp;
   /* Comparison against anything other than zero, requires an XOR.  */
   if (op2 != const0_rtx)
 {
@@ -929,6 +928,7 @@ scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn 
*insn)
   /* If both operands are MEMs, explicitly load the OP1 into TMP.  */
   if (MEM_P (op1) && MEM_P (op2))
{
+ tmp = gen_reg_rtx (vmode);
  emit_insn_before (gen_rtx_SET (tmp, op1), insn);
  src = tmp;
}
@@ -943,34 +943,56 @@ scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn 
*insn)
   rtx op12 = XEXP (op1, 1);
   convert_op (&op11, insn);
   convert_op (&op12, insn);
-  if (MEM_P (op11))
+  if (!REG_P (op11))
{
+ tmp = gen_reg_rtx (vmode);
  emit_insn_before (gen_rtx_SET (tmp, op11), insn);
  op11 = tmp;
}
   src = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, op11), op12);
 }
+  else if (GET_CODE (op1) == AND)
+{
+  rtx op11 = XEXP (op1, 0);
+  rtx op12 = XEXP (op1, 1);
+  convert_op (&op11, insn);
+  convert_op (&op12, insn);
+  if (!REG_P (op11))
+   {
+ tmp = gen_reg_rtx (vmode);
+ emit_insn_before (gen_rtx_SET (tmp, op11), insn);
+ op11 = tmp;
+   }
+  return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, op11, op12),
+UNSPEC_PTEST);
+}
   else
 {
   convert_op (&op1, insn);
   src = op1;
 }
-  emit_insn_before (gen_rtx_SET (tmp, src), i

[PATCH 1/2] tree-optimization/106514 - add --param max-jump-thread-paths

2022-08-09 Thread Richard Biener via Gcc-patches
The following adds a limit for the exponential greedy search of
the backwards jump threader.  The idea is to limit the search
space in a way that the paths considered are the same if the search
were in BFS order rather than DFS.  In particular it stops considering
incoming edges into a block if the product of the in-degrees of
blocks on the path exceeds the specified limit.

When considering the low stmt copying limit of 7 (or 1 in the size
optimize case) this means the degenerate case with maximum search
space is a sequence of conditions with no actual code

  B1
   |\
   | empty
   |/
  B2
   |\
   ...
  Bn
   |\

GIMPLE_CONDs are costed 2, an equivalent GIMPLE_SWITCH already 4, so
we reach 7 already with 3 middle conditions (B1 and Bn do not count).
The search space would be 2^4 == 16 to reach this.  The FSM threads
historically allowed for a thread length of 10 but is really looking
for a single multiway branch threaded across the backedge.  I've
chosen the default of the new parameter to 64 which effectively
limits the outdegree of the switch statement (the cases reaching the
backedge) to that number (divided by 2 until I add some special
pruning for FSM threads due to the loop header indegree).  The
testcase ssa-dom-thread-7.c requires 56 at the moment (as said,
some special FSM thread pruning of considered edges would bring
it down to half of that), but we now get one more threading
and quite some more in later threadfull.  This testcase seems to
be difficult to check for expected transforms.

The new testcases add the degenerate case we currently thread
(without deciding whether that's a good idea ...) plus one with
an approripate limit that should prevent the threading.

This obsoletes the mentioned --param max-fsm-thread-length but
I am not removing it as part of this patch.  When the search
space is limited the thread stmt size limit effectively provides
max-fsm-thread-length.

The param with its default does not help PR106514 enough to unleash
path searching with the higher FSM stmt count limit.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106514
* params.opt (max-jump-thread-paths): New.
* doc/invoke.texi (max-jump-thread-paths): Document.
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Honor max-jump-thread-paths, take overall_path argument.
(back_threader::find_paths): Pass 1 as initial overall_path.

* gcc.dg/tree-ssa/ssa-thread-16.c: New testcase.
* gcc.dg/tree-ssa/ssa-thread-17.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
---
 gcc/doc/invoke.texi   |  7 ++
 gcc/params.opt|  4 
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-16.c | 24 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-17.c |  7 ++
 gcc/tree-ssa-threadbackward.cc| 20 +++-
 6 files changed, 57 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-17.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 92f7aaead74..f01696696bf 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14754,6 +14754,13 @@ optimizing.
 Maximum number of statements allowed in a block that needs to be
 duplicated when threading jumps.
 
+@item max-jump-thread-paths
+The maximum number of paths to consider when searching for jump threading
+opportunities.  When arriving at a block incoming edges are only considered
+if the number of paths to be searched sofar multiplied by the incoming
+edge degree does not exhaust the specified maximum number of paths to
+consider.
+
 @item max-fields-for-field-sensitive
 Maximum number of fields in a structure treated in
 a field sensitive manner during pointer analysis.
diff --git a/gcc/params.opt b/gcc/params.opt
index 2f9c9cf27dd..132987343c6 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -582,6 +582,10 @@ Bound on the number of iterations the brute force # of 
iterations analysis algor
 Common Joined UInteger Var(param_max_jump_thread_duplication_stmts) Init(15) 
Param Optimization
 Maximum number of statements allowed in a block that needs to be duplicated 
when threading jumps.
 
+-param=max-jump-thread-paths=
+Common Joined UInteger Var(param_max_jump_thread_paths) Init(64) 
IntegerRange(1, 65536) Param Optimization
+Search space limit for the backwards jump threader.
+
 -param=max-last-value-rtl=
 Common Joined UInteger Var(param_max_last_value_rtl) Init(1) Param 
Optimization
 The maximum number of RTL nodes that can be recorded as combiner's last value.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index aa06db5e223..47b8fdfa29a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg

[PATCH 2/2] Remove --param max-fsm-thread-length

2022-08-09 Thread Richard Biener via Gcc-patches
This removes max-fsm-thread-length which is obsoleted by
max-jump-thread-paths.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* doc/invoke.texi (max-fsm-thread-length): Remove.
* params.opt (max-fsm-thread-length): Likewise.
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): Do not
check max-fsm-thread-length.
---
 gcc/doc/invoke.texi| 3 ---
 gcc/params.opt | 4 
 gcc/tree-ssa-threadbackward.cc | 9 -
 3 files changed, 16 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f01696696bf..58e422041e4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15262,9 +15262,6 @@ Emit instrumentation calls to __tsan_func_entry() and 
__tsan_func_exit().
 Maximum number of instructions to copy when duplicating blocks on a
 finite state automaton jump thread path.
 
-@item max-fsm-thread-length
-Maximum number of basic blocks on a jump thread path.
-
 @item threader-debug
 threader-debug=[none|all] Enables verbose dumping of the threader solver.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 132987343c6..201b5c9f56f 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -498,10 +498,6 @@ The maximum number of nested indirect inlining performed 
by early inliner.
 Common Joined UInteger Var(param_max_fields_for_field_sensitive) Param
 Maximum number of fields in a structure before pointer analysis treats the 
structure as a single variable.
 
--param=max-fsm-thread-length=
-Common Joined UInteger Var(param_max_fsm_thread_length) Init(10) 
IntegerRange(1, 99) Param Optimization
-Maximum number of basic blocks on a jump thread path.
-
 -param=max-fsm-thread-path-insns=
 Common Joined UInteger Var(param_max_fsm_thread_path_insns) Init(100) 
IntegerRange(1, 99) Param Optimization
 Maximum number of instructions to copy when duplicating blocks on a finite 
state automaton jump thread path.
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index bb1ef514abf..741c923e1a6 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -568,15 +568,6 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
   if (m_path.length () <= 1)
   return false;
 
-  if (m_path.length () > (unsigned) param_max_fsm_thread_length)
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
-"the number of basic blocks on the path "
-"exceeds PARAM_MAX_FSM_THREAD_LENGTH.\n");
-  return false;
-}
-
   int n_insns = 0;
   gimple_stmt_iterator gsi;
   loop_p loop = m_path[0]->loop_father;
-- 
2.35.3


Re: [x86_64 PATCH] Use PTEST to perform AND in TImode STV of (A & B) != 0.

2022-08-09 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 9, 2022 at 10:16 AM Roger Sayle  wrote:
>
>
> This x86_64 backend patch allows TImode STV to take advantage of the
> fact that the PTEST instruction performs an AND operation.  Previously
> PTEST was (mostly) used for comparison against zero, by using the same
> operands.  The benefits are demonstrated by the new test case:
>
> __int128 a,b;
> int foo()
> {
>   return (a & b) != 0;
> }
>
> Currently with -O2 -msse4 we generate:
>
> movdqa  a(%rip), %xmm0
> pandb(%rip), %xmm0
> xorl%eax, %eax
> ptest   %xmm0, %xmm0
> setne   %al
> ret
>
> with this patch we now generate:
>
> movdqa  a(%rip), %xmm0
> xorl%eax, %eax
> ptest   b(%rip), %xmm0
> setne   %al
> ret
>
> Technically, the magic happens using new define_insn_and_split patterns.
> Using two patterns allows this transformation to performed independently
> of whether TImode STV is run before or after combine.  The one tricky
> case is that immediate constant operands of the AND behave slightly
> differently between TImode and V1TImode: All V1TImode immediate operands
> becomes loads, but for TImode only values that are not hilo_operands
> need to be loaded.  Hence the new *testti_doubleword accepts any
> general_operand, but internally during split calls force_reg whenever
> the second operand is not x86_64_hilo_general_operand.  This required
> (benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in
> more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-09  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.cc (scalar_chain::convert_compare):
> Create new pseudos only when/if needed.  Add support for TEST
> (i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST.
> When broadcasting V2DImode and V4SImode use new pseudo register.
> (timode_scalar_chain::convert_op): Do nothing if operand is
> already V1TImode.  Avoid generating useless SUBREG conversions,
> i.e. (SUBREG:V1TImode (REG:V1TImode) 0).  Handle CONST_WIDE_INT
> in addition to CONST_INT by using CONST_SCALAR_INT_P.
> (convertible_comparison_p): Use CONST_SCALAR_INT_P to match both
> CONST_WIDE_INT and CONST_INT.  Recognize new *testti_doubleword
> pattern as an STV candidate.
> (timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P
> operands in binary logic operations.
>
> * config/i386/i386.cc (ix86_rtx_costs) : Add costs
> for UNSPEC_PTEST; a PTEST that performs an AND has the same cost
> as regular PTEST, i.e. cost->sse_op.
>
> * config/i386/i386.md (*testti_doubleword): New pre-reload
> define_insn_and_split that recognizes comparison of TI mode AND
> against zero.
> * config/i386/sse.md (*ptest_and): New pre-reload
> define_insn_and_split that recognizes UNSPEC_PTEST of identical
> AND operands.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse4_1-stv-8.c: New test case.

OK.

BTW, does your patch also handle DImode and SImode comparisons? They
can be implemented with PTEST, and perhaps they could benefit from
embedded AND, too.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-09 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 8 Aug 2022 at 14:27, Richard Biener  wrote:
>
> On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 21 Jul 2022 at 12:21, Richard Biener  
> > wrote:
> > >
> > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > >  wrote:
> > > > > > >
> > > > > > > Richard Biener  writes:
> > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > > > >>  wrote:
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > > > >> > Gcc-patches
> > > > > > > >> >  wrote:
> > > > > > > >> > >
> > > > > > > >> > > Hi Richard,
> > > > > > > >> > > For the following test:
> > > > > > > >> > >
> > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > > {
> > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > >> > > }
> > > > > > > >> > >
> > > > > > > >> > > The compiler emits following ICE with -O3 
> > > > > > > >> > > -mcpu=generic+sve:
> > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > > > >> > > ‘view_convert_expr’
> > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > >> > >   |   ^~
> > > > > > > >> > > svint32_t
> > > > > > > >> > > __Int32x4_t
> > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > > > >> > >
> > > > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > > > >> > > vec_perm_expr, we have:
> > > > > > > >> > >   int32x4_t v;
> > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > >> > >   svint32_t _9;
> > > > > > > >> > >   vector(4) int _11;
> > > > > > > >> > >
> > > > > > > >> > >:
> > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > >> > >   v_12 = _1;
> > > > > > > >> > >   _11 = v_12;
> > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > >> > >   return _9;
> > > > > > > >> > >
> > > > > > > >> > > During forwprop, simplify_permutation simplifies 
> > > > > > > >> > > vec_perm_expr to
> > > > > > > >> > > view_convert_expr,
> > > > > > > >> > > and the end result becomes:
> > > > > > > >> > >   svint32_t _7;
> > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > >> > >
> > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > >> > > ;;pred:   ENTRY
> > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > >> > >   return _7;
> > > > > > > >> > > ;;succ:   EXIT
> > > > > > > >> > >
> > > > > > > >> > > which causes the error duing verify_gimple since 
> > > > > > > >> > > VIEW_CONVERT_EXPR
> > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > >> > >
> > > > > > > >> > > The attached patch disables simplification of VEC_PERM_EXPR
> > > > > > > >> > > in simplify_permutation, if lhs and rhs have non 
> > > > > > > >> > > compatible types,
> > > > > > > >> > > which resolves ICE, but am not sure if it's the correct 
> > > > > > > >> > > approach ?
> > > > > > > >> >
> > > > > > > >> > It for sure papers over the issue.  I think the error 
> > > > > > > >> > happens earlier,
> > > > > > > >> > the V_C_E should have been built with the type of the 
> > > > > > > >> > VEC_PERM_EXPR
> > > > > > > >> > which is the type of the LHS.  But then you probably run 
> > > > > > > >> > into the
> > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for 
> > > > > > > >> > this case you
> > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > >> Hi Richard,
> > > > > > > >> Sorry I don't quite follow. In this case, we use VEC_PERM_EXPR 
> > > > > > > >> to
> > > > > > > >> represent dup operation
> > > > > > > >> from fixed width to VLA vector. I am not sure how folding it to
> > > > > > > >> BIT_FIELD_REF will work.
> > > > > > > >> Could you please elaborate ?
> > > > > > > >>
> > > > > > > >> A

Unify container pretty printers [PR65230]

2022-08-09 Thread Ulrich Drepper via Gcc-patches
In PR65320 Martin raised the point that the pretty printer for the C++
containers is inconsistent across the different types.  It's also
inconsistent when it comes to showing different states (empty vs not) of
the same type.

In addition, IMO some more information should be printed like the template
parameters which otherwise certainly can be retrieved through ptype but
along with a lot more information one doesn't look for.

In the attached patch I've changed the pretty printers of most of the
containers to be mostly consistent, at least as far as it is possible given
the different nature.  I've also fixed some bugs (e.g., printing indeces
for set elements).

You can see the side-by-side comparison of the output in the text file I
attached to the bug (https://gcc.gnu.org/bugzilla/attachment.cgi?id=53419).
Bring very long lines or use 'less -S'.

Jonathan and I discussed whether such big changes are possible.  Someone
could consider the output guaranteed by some ABI.  We came to the
conclusion that this is a bad idea and shouldn't even start to plant that
idea in people's minds.  The pretty printer output is meant for humans.

Comments?


d-containers-printers
Description: Binary data


Re: [PATCH] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-08-09 Thread Segher Boessenkool
Hi!

> +   /* As ELFv2 ABI shows, the allowable bytes past the global entry
> +  point are 0, 4, 8, 16, 32 and 64.  Considering there are two
> +  non-prefixed instructions for global entry (8 bytes), the count
> +  for patchable NOPs before local entry would be 2, 6 and 14.  */

The other option is to allow other numbers of nops, but in that case not
have a local entry point (so, always use the global entry point).

I don't know if that is useful for any users of this support (if there
even are such users :-P )

> +   if (patch_area_entry > 0)
> + {
> +   if (patch_area_entry != 2
> +   && patch_area_entry != 6
> +   && patch_area_entry != 14)
> + error ("for %<-fpatchable-function-entry=%u,%u%>, patching "
> +"%u NOP(s) before function entry is invalid, it can "
> +"cause assembler error",

I would not say "it can [etc.]" at all.  Oh, and "NOP" (capitals) isn't
a thing, it is not an acronym or such ;-)

> +/* { dg-require-effective-target powerpc_elfv2 } */
> +/* Specify -mcpu=power9 to ensure global entry is needed.  */
> +/* { dg-options "-mdejagnu-cpu=power9" } */

Why would it be needed for p9, and not older, or newer?

Every function always has a GEP, so I'm not sure what you are trying to
say here anyway :-)


Rest looks good to me.


Segher


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-09 Thread Xi Ruoyao via Gcc-patches
Sorry for late reply, I'm rebuilding my entire Linux system (from
scratch) for Glibc-2.36 and Binutils-2.39 update and I just reached the
mail client.

On Mon, 2022-08-08 at 12:53 +0800, Lulu Cheng wrote:
> I still think it makes a little bit more sense to put attribute(model)
> and -mcmodel together.
> 
> -mcmodel sets the access range of all symbols in a single fileand 
> attribute (model) sets the
> 
> accsess range of a single symbol in a file. For example 
> __attribute__((model(normal/large/extreme))).

It might make sense, but then it would not be what we want for per-CPU
symbols.  What we want here is "treat a local symbol as-if it's global",
while each code model may already treat local symbol and global symbol
differently.

Disambiguation: here "local" means "defined in this TU", "global"
otherwise (not "local variable" in C).

I'll send v6 with the name "addr_global" if no objection.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] autopar TLC

2022-08-09 Thread Richard Biener via Gcc-patches
On Tue, 2 Aug 2022, Richard Biener wrote:

> The following removes all excessive update_ssa calls from OMP
> expansion, thereby rewriting the atomic load and store cases to
> GIMPLE code generation.  I don't think autopar ever exercises the
> atomics code though.
> 
> There's not much test coverage overall so I've built SPEC 2k17
> with -floop-parallelize-all -ftree-parallelize-loops=2 with and
> without LTO (and otherwise -Ofast plus -march=haswell) without
> fallout.
> 
> If there's any fallout it's not OK to update SSA form for
> each and every OMP stmt lowered.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Any objections?

I have pushed this now (and will deal with eventual fallout).

Richard.

> Thanks,
> Richard.
> 
>   * omp-expand.cc (expand_omp_atomic_load): Emit GIMPLE
>   directly.  Avoid update_ssa when in SSA form.
>   (expand_omp_atomic_store): Likewise.
>   (expand_omp_atomic_fetch_op): Avoid update_ssa when in SSA
>   form.
>   (expand_omp_atomic_pipeline): Likewise.
>   (expand_omp_atomic_mutex): Likewise.
>   * tree-parloops.cc (gen_parallel_loop): Use
>   TODO_update_ssa_no_phi after loop_version.
> ---
>  gcc/omp-expand.cc| 81 +++-
>  gcc/tree-parloops.cc |  2 +-
>  2 files changed, 50 insertions(+), 33 deletions(-)
> 
> diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
> index 64e6308fc7b..48fbd157c6e 100644
> --- a/gcc/omp-expand.cc
> +++ b/gcc/omp-expand.cc
> @@ -8617,7 +8617,7 @@ expand_omp_atomic_load (basic_block load_bb, tree addr,
>basic_block store_bb;
>location_t loc;
>gimple *stmt;
> -  tree decl, call, type, itype;
> +  tree decl, type, itype;
>  
>gsi = gsi_last_nondebug_bb (load_bb);
>stmt = gsi_stmt (gsi);
> @@ -8637,23 +8637,33 @@ expand_omp_atomic_load (basic_block load_bb, tree 
> addr,
>itype = TREE_TYPE (TREE_TYPE (decl));
>  
>enum omp_memory_order omo = gimple_omp_atomic_memory_order (stmt);
> -  tree mo = build_int_cst (NULL, omp_memory_order_to_memmodel (omo));
> -  call = build_call_expr_loc (loc, decl, 2, addr, mo);
> +  tree mo = build_int_cst (integer_type_node,
> +omp_memory_order_to_memmodel (omo));
> +  gcall *call = gimple_build_call (decl, 2, addr, mo);
> +  gimple_set_location (call, loc);
> +  gimple_set_vuse (call, gimple_vuse (stmt));
> +  gimple *repl;
>if (!useless_type_conversion_p (type, itype))
> -call = fold_build1_loc (loc, VIEW_CONVERT_EXPR, type, call);
> -  call = build2_loc (loc, MODIFY_EXPR, void_type_node, loaded_val, call);
> -
> -  force_gimple_operand_gsi (&gsi, call, true, NULL_TREE, true, 
> GSI_SAME_STMT);
> -  gsi_remove (&gsi, true);
> +{
> +  tree lhs = make_ssa_name (itype);
> +  gimple_call_set_lhs (call, lhs);
> +  gsi_insert_before (&gsi, call, GSI_SAME_STMT);
> +  repl = gimple_build_assign (loaded_val,
> +   build1 (VIEW_CONVERT_EXPR, type, lhs));
> +  gimple_set_location (repl, loc);
> +}
> +  else
> +{
> +  gimple_call_set_lhs (call, loaded_val);
> +  repl = call;
> +}
> +  gsi_replace (&gsi, repl, true);
>  
>store_bb = single_succ (load_bb);
>gsi = gsi_last_nondebug_bb (store_bb);
>gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ATOMIC_STORE);
>gsi_remove (&gsi, true);
>  
> -  if (gimple_in_ssa_p (cfun))
> -update_ssa (TODO_update_ssa_no_phi);
> -
>return true;
>  }
>  
> @@ -8669,7 +8679,7 @@ expand_omp_atomic_store (basic_block load_bb, tree addr,
>basic_block store_bb = single_succ (load_bb);
>location_t loc;
>gimple *stmt;
> -  tree decl, call, type, itype;
> +  tree decl, type, itype;
>machine_mode imode;
>bool exchange;
>  
> @@ -8710,25 +8720,36 @@ expand_omp_atomic_store (basic_block load_bb, tree 
> addr,
>if (!useless_type_conversion_p (itype, type))
>  stored_val = fold_build1_loc (loc, VIEW_CONVERT_EXPR, itype, stored_val);
>enum omp_memory_order omo = gimple_omp_atomic_memory_order (stmt);
> -  tree mo = build_int_cst (NULL, omp_memory_order_to_memmodel (omo));
> -  call = build_call_expr_loc (loc, decl, 3, addr, stored_val, mo);
> +  tree mo = build_int_cst (integer_type_node,
> +omp_memory_order_to_memmodel (omo));
> +  stored_val = force_gimple_operand_gsi (&gsi, stored_val, true, NULL_TREE,
> +  true, GSI_SAME_STMT);
> +  gcall *call = gimple_build_call (decl, 3, addr, stored_val, mo);
> +  gimple_set_location (call, loc);
> +  gimple_set_vuse (call, gimple_vuse (stmt));
> +  gimple_set_vdef (call, gimple_vdef (stmt));
> +
> +  gimple *repl = call;
>if (exchange)
>  {
>if (!useless_type_conversion_p (type, itype))
> - call = build1_loc (loc, VIEW_CONVERT_EXPR, type, call);
> -  call = build2_loc (loc, MODIFY_EXPR, void_type_node, loaded_val, call);
> + {
> +   tree lhs = make_ssa_name (itype);
> +   gimple_call_set_lhs 

Re: [PATCH] lto: support --jobserver-style=fifo for recent GNU make

2022-08-09 Thread Martin Liška
On 8/5/22 12:58, Richard Biener wrote:
> On Thu, Aug 4, 2022 at 10:57 AM Martin Liška  wrote:
>>
>> After a long time, GNU make has finally implemented named pipes when
>> it comes to --jobserver-auth. The traditional approach are
>> provided opened file descriptors that causes troubles:
>> https://savannah.gnu.org/bugs/index.php?57242
>>
>> GNU make commit:
>> https://git.savannah.gnu.org/cgit/make.git/commit/?id=7ad2593b2d2bb5b9332fd8bf93ac6f958bc6
>>
>> I tested that locally with TOT GNU make and it works:
>>
>> $ cat Makefile
>> all:
>> g++ tramp3d-v4.ii -c -flto -O2
>> g++ tramp3d-v4.o -flto=jobserver
>>
>> $ MAKE=/tmp/bin/bin/make /tmp/bin/bin/make -j16 --jobserver-style=fifo
>> g++ tramp3d-v4.ii -c -flto -O2
>> g++ tramp3d-v4.o -flto=jobserver
>> (ltrans run in parallel)
>>
>> Ready to be installed after tests?
> 
> LGTM.

I've got actually a nicer patch set where I also support jobserver for WPA.
I'm going to send it in a separate thread.

Martin

> 
> Thanks,
> Richard.
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> * gcc.cc (driver::detect_jobserver): Support --jobserver-style=fifo.
>> * lto-wrapper.cc (jobserver_active_p): Likewise.
>> ---
>>   gcc/gcc.cc | 15 ---
>>   gcc/lto-wrapper.cc | 20 +++-
>>   2 files changed, 27 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
>> index 5cbb38560b2..c98407fe03d 100644
>> --- a/gcc/gcc.cc
>> +++ b/gcc/gcc.cc
>> @@ -9182,15 +9182,24 @@ driver::detect_jobserver () const
>> const char *makeflags = env.get ("MAKEFLAGS");
>> if (makeflags != NULL)
>>   {
>> -  const char *needle = "--jobserver-auth=";
>> -  const char *n = strstr (makeflags, needle);
>> +  /* Traditionally, GNU make uses opened pieps for jobserver-auth,
>> +e.g. --jobserver-auth=3,4.  */
>> +  const char *pipe_needle = "--jobserver-auth=";
>> +
>> +  /* Starting with GNU make 4.4, one can use --jobserver-style=fifo
>> +and then named pipe is used: --jobserver-auth=fifo:/tmp/hcsparta.  
>> */
>> +  const char *fifo_needle = "--jobserver-auth=fifo:";
>> +  if (strstr (makeflags, fifo_needle) != NULL)
>> +   return;
>> +
>> +  const char *n = strstr (makeflags, pipe_needle);
>> if (n != NULL)
>> {
>>   int rfd = -1;
>>   int wfd = -1;
>>
>>   bool jobserver
>> -   = (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
>> +   = (sscanf (n + strlen (pipe_needle), "%d,%d", &rfd, &wfd) == 2
>>&& rfd > 0
>>&& wfd > 0
>>&& is_valid_fd (rfd)
>> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
>> index 795ab74555c..756350d5ace 100644
>> --- a/gcc/lto-wrapper.cc
>> +++ b/gcc/lto-wrapper.cc
>> @@ -1342,27 +1342,37 @@ static const char *
>>   jobserver_active_p (void)
>>   {
>> #define JS_PREFIX "jobserver is not available: "
>> -  #define JS_NEEDLE "--jobserver-auth="
>> +
>> +  /* Traditionally, GNU make uses opened pieps for jobserver-auth,
>> + e.g. --jobserver-auth=3,4.  */
>> +  #define JS_PIPE_NEEDLE "--jobserver-auth="
>> +
>> +  /* Starting with GNU make 4.4, one can use --jobserver-style=fifo
>> + and then named pipe is used: --jobserver-auth=fifo:/tmp/hcsparta.  */
>> +  #define JS_FIFO_NEEDLE "--jobserver-auth=fifo:"
>>
>> const char *makeflags = getenv ("MAKEFLAGS");
>> if (makeflags == NULL)
>>   return JS_PREFIX "% environment variable is unset";
>>
>> -  const char *n = strstr (makeflags, JS_NEEDLE);
>> +  if (strstr (makeflags, JS_FIFO_NEEDLE) != NULL)
>> +return NULL;
>> +
>> +  const char *n = strstr (makeflags, JS_PIPE_NEEDLE);
>> if (n == NULL)
>> -return JS_PREFIX "%<" JS_NEEDLE "%> is not present in %";
>> +return JS_PREFIX "%<" JS_PIPE_NEEDLE "%> is not present in 
>> %";
>>
>> int rfd = -1;
>> int wfd = -1;
>>
>> -  if (sscanf (n + strlen (JS_NEEDLE), "%d,%d", &rfd, &wfd) == 2
>> +  if (sscanf (n + strlen (JS_PIPE_NEEDLE), "%d,%d", &rfd, &wfd) == 2
>> && rfd > 0
>> && wfd > 0
>> && is_valid_fd (rfd)
>> && is_valid_fd (wfd))
>>   return NULL;
>> else
>> -return JS_PREFIX "cannot access %<" JS_NEEDLE "%> file descriptors";
>> +return JS_PREFIX "cannot access %<" JS_PIPE_NEEDLE "%> file 
>> descriptors";
>>   }
>>
>>   /* Print link to -flto documentation with a hint message.  */
>> --
>> 2.37.1
>>



[PATCH 2/3] lto: support --jobserver-style=fifo for recent GNU make

2022-08-09 Thread Martin Liška
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* jobserver.h (jobserver_info::jobserver_info): Parse FIFO
format of --jobserver-auth.
---
 gcc/jobserver.h | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/jobserver.h b/gcc/jobserver.h
index 85453dd3c79..856e326ddfc 100644
--- a/gcc/jobserver.h
+++ b/gcc/jobserver.h
@@ -39,14 +39,22 @@ struct jobserver_info
   int rfd = -1;
   /* File descriptor for writing used for jobserver communication.  */
   int wfd = -1;
+  /* Named pipe path.  */
+  string pipe_path = "";
   /* Return true if jobserver is active.  */
   bool is_active = false;
 };
 
 jobserver_info::jobserver_info ()
 {
+  /* Traditionally, GNU make uses opened pipes for jobserver-auth,
+e.g. --jobserver-auth=3,4.
+Starting with GNU make 4.4, one can use --jobserver-style=fifo
+and then named pipe is used: --jobserver-auth=fifo:/tmp/hcsparta.  */
+
   /* Detect jobserver and drop it if it's not working.  */
   string js_needle = "--jobserver-auth=";
+  string fifo_prefix = "fifo:";
 
   const char *envval = getenv ("MAKEFLAGS");
   if (envval != NULL)
@@ -55,8 +63,15 @@ jobserver_info::jobserver_info ()
   size_t n = makeflags.rfind (js_needle);
   if (n != string::npos)
{
- if (sscanf (makeflags.c_str () + n + js_needle.size (),
- "%d,%d", &rfd, &wfd) == 2
+ string ending = makeflags.substr (n + js_needle.size ());
+ if (ending.find (fifo_prefix) == 0)
+   {
+ ending = ending.substr (fifo_prefix.size ());
+ pipe_path = ending.substr (0, ending.find (' '));
+ is_active = true;
+   }
+ else if (sscanf (makeflags.c_str () + n + js_needle.size (),
+  "%d,%d", &rfd, &wfd) == 2
  && rfd > 0
  && wfd > 0
  && is_valid_fd (rfd)
-- 
2.37.1




[PATCH 3/3] lto: respect jobserver in parallel WPA streaming

2022-08-09 Thread Martin Liška
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR lto/106328

gcc/ChangeLog:

* jobserver.h (struct jobserver_info): Add pipefd.
(jobserver_info::connect): New.
(jobserver_info::disconnect): Likewise.
(jobserver_info::get_token): Likewise.
(jobserver_info::return_token): Likewise.

gcc/lto/ChangeLog:

* lto.cc (wait_for_child): Decrement nruns once a process
finishes.
(stream_out_partitions): Use job server if active.
(do_whole_program_analysis): Likewise.
---
 gcc/jobserver.h | 54 
 gcc/lto/lto.cc  | 55 +
 2 files changed, 96 insertions(+), 13 deletions(-)

diff --git a/gcc/jobserver.h b/gcc/jobserver.h
index 856e326ddfc..2a7dc9f4113 100644
--- a/gcc/jobserver.h
+++ b/gcc/jobserver.h
@@ -31,6 +31,18 @@ struct jobserver_info
   /* Default constructor.  */
   jobserver_info ();
 
+  /* Connect to the server.  */
+  void connect ();
+
+  /* Disconnect from the server.  */
+  void disconnect ();
+
+  /* Get token from the server.  */
+  bool get_token ();
+
+  /* Return token to the server.  */
+  void return_token ();
+
   /* Error message if there is a problem.  */
   string error_msg = "";
   /* Skipped MAKEFLAGS where --jobserver-auth is skipped.  */
@@ -41,6 +53,8 @@ struct jobserver_info
   int wfd = -1;
   /* Named pipe path.  */
   string pipe_path = "";
+  /* Pipe file descriptor.  */
+  int pipefd = -1;
   /* Return true if jobserver is active.  */
   bool is_active = false;
 };
@@ -97,4 +111,44 @@ jobserver_info::jobserver_info ()
 error_msg = "jobserver is not available: " + error_msg;
 }
 
+void
+jobserver_info::connect ()
+{
+  if (!pipe_path.empty ())
+pipefd = open (pipe_path.c_str (), O_RDWR);
+}
+
+void
+jobserver_info::disconnect ()
+{
+  if (!pipe_path.empty ())
+{
+  gcc_assert (close (pipefd) == 0);
+  pipefd = -1;
+}
+}
+
+bool
+jobserver_info::get_token ()
+{
+  int fd = pipe_path.empty () ? rfd : pipefd;
+  char c;
+  unsigned n = read (fd, &c, 1);
+  if (n != 1)
+{
+  gcc_assert (errno == EAGAIN);
+  return false;
+}
+  else
+return true;
+}
+
+void
+jobserver_info::return_token ()
+{
+  int fd = pipe_path.empty () ? wfd : pipefd;
+  char c = 'G';
+  gcc_assert (write (fd, &c, 1) == 1);
+}
+
 #endif /* GCC_JOBSERVER_H */
diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 31b0c1862f7..56266195ead 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -54,11 +54,17 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "builtins.h"
 #include "lto-common.h"
-
+#include "jobserver.h"
 
 /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  
*/
 static int lto_parallelism;
 
+/* Number of active WPA streaming processes.  */
+static int nruns = 0;
+
+/* GNU make's jobserver info.  */
+static jobserver_info *jinfo = NULL;
+
 /* Return true when NODE has a clone that is analyzed (i.e. we need
to load its body even if the node itself is not needed).  */
 
@@ -205,6 +211,12 @@ wait_for_child ()
 "streaming subprocess was killed by signal");
 }
   while (!WIFEXITED (status) && !WIFSIGNALED (status));
+
+--nruns;
+
+/* Return token to the jobserver if active.  */
+if (jinfo != NULL && jinfo->is_active)
+  jinfo->return_token ();
 }
 #endif
 
@@ -228,25 +240,35 @@ stream_out_partitions (char *temp_filename, int blen, int 
min, int max,
   bool ARG_UNUSED (last))
 {
 #ifdef HAVE_WORKING_FORK
-  static int nruns;
-
   if (lto_parallelism <= 1)
 {
   stream_out_partitions_1 (temp_filename, blen, min, max);
   return;
 }
 
-  /* Do not run more than LTO_PARALLELISM streamings
- FIXME: we ignore limits on jobserver.  */
   if (lto_parallelism > 0 && nruns >= lto_parallelism)
-{
-  wait_for_child ();
-  nruns --;
-}
+wait_for_child ();
+
   /* If this is not the last parallel partition, execute new
  streaming process.  */
   if (!last)
 {
+  if (jinfo != NULL && jinfo->is_active)
+   while (true)
+ {
+   if (jinfo->get_token ())
+ break;
+   if (nruns > 0)
+ wait_for_child ();
+   else
+ {
+   /* There are no free tokens, lets do the job outselves.  */
+   stream_out_partitions_1 (temp_filename, blen, min, max);
+   asm_nodes_output = true;
+   return;
+ }
+ }
+
   pid_t cpid = fork ();
 
   if (!cpid)
@@ -264,10 +286,12 @@ stream_out_partitions (char *temp_filename, int blen, int 
min, int max,
   /* Last partition; stream it and wait for all children to die.  */
   else
 {
-  int i;
   stream_out_partitions_1 (temp_filename, blen, min, max);
-  for (i = 0; i < nruns; i++)
+  while (nruns > 0

[PATCH 1/3] Factor out jobserver_active_p.

2022-08-09 Thread Martin Liška
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* gcc.cc (driver::detect_jobserver): Remove and move to
jobserver.h.
* lto-wrapper.cc (jobserver_active_p): Likewise.
(run_gcc): Likewise.
* jobserver.h: New file.
---
 gcc/gcc.cc | 36 +++-
 gcc/jobserver.h| 85 ++
 gcc/lto-wrapper.cc | 43 +--
 3 files changed, 97 insertions(+), 67 deletions(-)
 create mode 100644 gcc/jobserver.h

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 5cbb38560b2..69fbd293eaa 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -43,6 +43,7 @@ compilation is specified by a string called a "spec".  */
 #include "opts.h"
 #include "filenames.h"
 #include "spellcheck.h"
+#include "jobserver.h"
 
 
 
@@ -9178,38 +9179,9 @@ driver::final_actions () const
 void
 driver::detect_jobserver () const
 {
-  /* Detect jobserver and drop it if it's not working.  */
-  const char *makeflags = env.get ("MAKEFLAGS");
-  if (makeflags != NULL)
-{
-  const char *needle = "--jobserver-auth=";
-  const char *n = strstr (makeflags, needle);
-  if (n != NULL)
-   {
- int rfd = -1;
- int wfd = -1;
-
- bool jobserver
-   = (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
-  && rfd > 0
-  && wfd > 0
-  && is_valid_fd (rfd)
-  && is_valid_fd (wfd));
-
- /* Drop the jobserver if it's not working now.  */
- if (!jobserver)
-   {
- unsigned offset = n - makeflags;
- char *dup = xstrdup (makeflags);
- dup[offset] = '\0';
-
- const char *space = strchr (makeflags + offset, ' ');
- if (space != NULL)
-   strcpy (dup + offset, space);
- xputenv (concat ("MAKEFLAGS=", dup, NULL));
-   }
-   }
-}
+  jobserver_info jinfo;
+  if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
+xputenv (jinfo.skipped_makeflags.c_str ());
 }
 
 /* Determine what the exit code of the driver should be.  */
diff --git a/gcc/jobserver.h b/gcc/jobserver.h
new file mode 100644
index 000..85453dd3c79
--- /dev/null
+++ b/gcc/jobserver.h
@@ -0,0 +1,85 @@
+/* GNU make's jobserver related functionality.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+See dbgcnt.def for usage information.  */
+
+#ifndef GCC_JOBSERVER_H
+#define GCC_JOBSERVER_H
+
+#include 
+
+using namespace std;
+
+struct jobserver_info
+{
+  /* Default constructor.  */
+  jobserver_info ();
+
+  /* Error message if there is a problem.  */
+  string error_msg = "";
+  /* Skipped MAKEFLAGS where --jobserver-auth is skipped.  */
+  string skipped_makeflags = "";
+  /* File descriptor for reading used for jobserver communication.  */
+  int rfd = -1;
+  /* File descriptor for writing used for jobserver communication.  */
+  int wfd = -1;
+  /* Return true if jobserver is active.  */
+  bool is_active = false;
+};
+
+jobserver_info::jobserver_info ()
+{
+  /* Detect jobserver and drop it if it's not working.  */
+  string js_needle = "--jobserver-auth=";
+
+  const char *envval = getenv ("MAKEFLAGS");
+  if (envval != NULL)
+{
+  string makeflags = envval;
+  size_t n = makeflags.rfind (js_needle);
+  if (n != string::npos)
+   {
+ if (sscanf (makeflags.c_str () + n + js_needle.size (),
+ "%d,%d", &rfd, &wfd) == 2
+ && rfd > 0
+ && wfd > 0
+ && is_valid_fd (rfd)
+ && is_valid_fd (wfd))
+   is_active = true;
+ else
+   {
+ string dup = makeflags.substr (0, n);
+ size_t pos = makeflags.find (' ', n);
+ if (pos != string::npos)
+   dup += makeflags.substr (pos);
+ skipped_makeflags = "MAKEFLAGS=" + dup;
+ error_msg
+   = "cannot access %<" + js_needle + "%> file descriptors";
+   }
+   }
+  error_msg = "%<" + js_needle + "%> is not present in %";
+}
+  else
+error_msg = "% environment variable is unset";
+
+  if (!error_msg.empty ())
+error_msg = "jobserver is not available: " + error_msg;
+}
+
+#endif /* GCC_JOBSERVER_H */
diff --git a/gcc/

[committed] amdgcn: Vector procedure call ABI

2022-08-09 Thread Andrew Stubbs

I've committed this patch for amdgcn.

This changes the procedure calling ABI such that vector arguments are 
passed in vector registers, rather than on the stack as before.


The ABI for scalar functions is the same for arguments, but the return 
value has now moved to a vector register; keeping it the same for all 
types simplifies the compiler implementation. If a significant down-side 
is found then we can move to having multiple return locations, and worry 
about how to fix the "untyped" calls then.


There's no "standard ABI" for this target, and there are no third party 
binaries with which to retain compatibility, so we're free to make 
whatever changes we wish.


Andrewamdgcn: Vector procedure call ABI

Adjust the (unofficial) procedure calling ABI such that vector arguments are
passed in vector registers, not on the stack.  Scalar arguments continue to
be passed in scalar registers, making a total of 12 argument registers.

The return value is also moved to a vector register (even for scalars; it
would be possible to retain the scalar location, using untyped_call, but
there's no obvious advantage in doing so).

After this change the ABI is as follows:

s0-s13  : Reserved for kernel launch parameters.
s14-s15 : Frame pointer.
s16-s17 : Stack pointer.
s18-s19 : Link register.
s20-s21 : Exec Save.
s22-s23 : CC Save.
s24-s25 : Scalar arguments.  NO LONGER RETURN VALUE.
s26-s29 : Additional scalar arguments (makes 6 total).
s30-s31 : Static Chain.
v0  : Prologue/epilogue scratch.
v1  : Constant 0, 1, 2, 3, 4, ... 63.
v2-v7   : Prologue/epilogue scratch.
v8-v9   : Return value & vector arguments.  NEW.
v10-v13 : Additional vector arguments (makes 6 total).  NEW.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_function_value): Allow vector return values.
(num_arg_regs): Allow vector arguments.
(gcn_function_arg): Likewise.
(gcn_function_arg_advance): Likewise.
(gcn_arg_partial_bytes): Likewise.
(gcn_return_in_memory): Likewise.
(gcn_expand_epilogue): Get return value from v8.
* config/gcn/gcn.h (RETURN_VALUE_REG): Set to v8.
(FIRST_PARM_REG): USE FIRST_SGPR_REG for clarity.
(FIRST_VPARM_REG): New.
(FUNCTION_ARG_REGNO_P): Allow vector parameters.
(struct gcn_args): Add vnum field.
(LIBCALL_VALUE): All vector return values.
* config/gcn/gcn.md (gcn_call_value): Add vector constraints.
(gcn_call_value_indirect): Likewise.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 6fc20d3f659..96295e23aad 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -2284,7 +2284,7 @@ gcn_function_value (const_tree valtype, const_tree, bool)
   && GET_MODE_SIZE (mode) < 4)
 mode = SImode;
 
-  return gen_rtx_REG (mode, SGPR_REGNO (RETURN_VALUE_REG));
+  return gen_rtx_REG (mode, RETURN_VALUE_REG);
 }
 
 /* Implement TARGET_FUNCTION_VALUE_REGNO_P.
@@ -2308,7 +2308,9 @@ num_arg_regs (const function_arg_info &arg)
 return 0;
 
   int size = arg.promoted_size_in_bytes ();
-  return (size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+  int regsize = UNITS_PER_WORD * (VECTOR_MODE_P (arg.mode)
+ ? GET_MODE_NUNITS (arg.mode) : 1);
+  return (size + regsize - 1) / regsize;
 }
 
 /* Implement TARGET_STRICT_ARGUMENT_NAMING.
@@ -2358,16 +2360,16 @@ gcn_function_arg (cumulative_args_t cum_v, const 
function_arg_info &arg)
   if (targetm.calls.must_pass_in_stack (arg))
return 0;
 
-  /* Vector parameters are not supported yet.  */
-  if (VECTOR_MODE_P (arg.mode))
-   return 0;
-
-  int reg_num = FIRST_PARM_REG + cum->num;
+  int first_reg = (VECTOR_MODE_P (arg.mode)
+  ? FIRST_VPARM_REG : FIRST_PARM_REG);
+  int cum_num = (VECTOR_MODE_P (arg.mode)
+? cum->vnum : cum->num);
+  int reg_num = first_reg + cum_num;
   int num_regs = num_arg_regs (arg);
   if (num_regs > 0)
while (reg_num % num_regs != 0)
  reg_num++;
-  if (reg_num + num_regs <= FIRST_PARM_REG + NUM_PARM_REGS)
+  if (reg_num + num_regs <= first_reg + NUM_PARM_REGS)
return gen_rtx_REG (arg.mode, reg_num);
 }
   else
@@ -2419,11 +2421,15 @@ gcn_function_arg_advance (cumulative_args_t cum_v,
   if (!arg.named)
return;
 
+  int first_reg = (VECTOR_MODE_P (arg.mode)
+  ? FIRST_VPARM_REG : FIRST_PARM_REG);
+  int *cum_num = (VECTOR_MODE_P (arg.mode)
+ ? &cum->vnum : &cum->num);
   int num_regs = num_arg_regs (arg);
   if (num_regs > 0)
-   while ((FIRST_PARM_REG + cum->num) % num_regs != 0)
- cum->num++;
-  cum->num += num_regs;
+   while ((first_reg + *cum_num) % num_regs != 0)
+ (*cum_num)++;
+  *cum_num += num_regs;
 }
   else
 {
@@ -2454,14 +2460,18 @@ gcn_arg_partial_bytes (cumulative_args_t cum_v, const 
function_arg_info &arg)
   if (targetm

[committed] d: Fix undefined reference to pragma(inline) symbol (PR106563)

2022-08-09 Thread Iain Buclaw via Gcc-patches
Hi,

This patch changes the emission strategy for inline functions so that
they are given codegen in every referencing module, not just the module
that they are defined in.

Functions that are declared `pragma(inline)' should be treated as if
they are defined in every translation unit they are referenced from,
regardless of visibility protection.  Ensure they always get
DECL_ONE_ONLY linkage, and start emitting them into other modules that
import them.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline and backported to releases/gcc-12.

Regards,
Iain.

---
PR d/106563

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Set semanticRun
before generating its symbol.
(function_defined_in_root_p): New function.
(function_needs_inline_definition_p): New function.
(maybe_build_decl_tree): New function.
(get_symbol_decl): Call maybe_build_decl_tree before returning symbol.
(start_function): Use function_defined_in_root_p instead of inline
test for locally defined symbols.
(set_linkage_for_decl): Check for inline functions before private or
protected symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/torture.exp (srcdir): New proc.
* gdc.dg/torture/imports/pr106563math.d: New test.
* gdc.dg/torture/imports/pr106563regex.d: New test.
* gdc.dg/torture/imports/pr106563uni.d: New test.
* gdc.dg/torture/pr106563.d: New test.

---
 gcc/d/decl.cc | 121 +++---
 .../gdc.dg/torture/imports/pr106563math.d |  12 ++
 .../gdc.dg/torture/imports/pr106563regex.d|   7 +
 .../gdc.dg/torture/imports/pr106563uni.d  |  15 +++
 gcc/testsuite/gdc.dg/torture/pr106563.d   |  16 +++
 gcc/testsuite/gdc.dg/torture/torture.exp  |   9 ++
 6 files changed, 161 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/torture/imports/pr106563math.d
 create mode 100644 gcc/testsuite/gdc.dg/torture/imports/pr106563regex.d
 create mode 100644 gcc/testsuite/gdc.dg/torture/imports/pr106563uni.d
 create mode 100644 gcc/testsuite/gdc.dg/torture/pr106563.d

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 43c3d87cdd1..9119323175e 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -823,6 +823,10 @@ public:
 if (global.errors)
   return;
 
+/* Start generating code for this function.  */
+gcc_assert (d->semanticRun == PASS::semantic3done);
+d->semanticRun = PASS::obj;
+
 /* Duplicated FuncDeclarations map to the same symbol.  Check if this
is the one declaration which will be emitted.  */
 tree fndecl = get_symbol_decl (d);
@@ -839,10 +843,6 @@ public:
 if (global.params.verbose)
   message ("function  %s", d->toPrettyChars ());
 
-/* Start generating code for this function.  */
-gcc_assert (d->semanticRun == PASS::semantic3done);
-d->semanticRun = PASS::obj;
-
 tree old_context = start_function (d);
 
 tree parm_decl = NULL_TREE;
@@ -1015,13 +1015,103 @@ build_decl_tree (Dsymbol *d)
   input_location = saved_location;
 }
 
+/* Returns true if function FD is defined or instantiated in a root module.  */
+
+static bool
+function_defined_in_root_p (FuncDeclaration *fd)
+{
+  Module *md = fd->getModule ();
+  if (md && md->isRoot ())
+return true;
+
+  TemplateInstance *ti = fd->isInstantiated ();
+  if (ti && ti->minst && ti->minst->isRoot ())
+return true;
+
+  return false;
+}
+
+/* Returns true if function FD always needs to be implicitly defined, such as
+   it was declared `pragma(inline)'.  */
+
+static bool
+function_needs_inline_definition_p (FuncDeclaration *fd)
+{
+  /* Function has already been defined.  */
+  if (!DECL_EXTERNAL (fd->csym))
+return false;
+
+  /* Non-inlineable functions are always external.  */
+  if (DECL_UNINLINABLE (fd->csym))
+return false;
+
+  /* No function body available for inlining.  */
+  if (!fd->fbody)
+return false;
+
+  /* Ignore functions that aren't decorated with `pragma(inline)'.  */
+  if (fd->inlining != PINLINE::always)
+return false;
+
+  /* These functions are tied to the module they are defined in.  */
+  if (fd->isFuncLiteralDeclaration ()
+  || fd->isUnitTestDeclaration ()
+  || fd->isFuncAliasDeclaration ()
+  || fd->isInvariantDeclaration ())
+return false;
+
+  /* Check whether function will be regularly defined later in the current
+ translation unit.  */
+  if (function_defined_in_root_p (fd))
+return false;
+
+  /* Weak functions cannot be inlined.  */
+  if (lookup_attribute ("weak", DECL_ATTRIBUTES (fd->csym)))
+return false;
+
+  /* Naked functions cannot be inlined.  */
+  if (lookup_attribute ("naked", DECL_ATTRIBUTES (fd->csym)))
+return false;
+
+  return true;
+}
+
+/* If the variable or function declaration in DECL needs to be defined, call
+   build_decl_tree on it now before returning its back-end symbol.  

Re: [PATCH] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-08-09 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the review comments!

on 2022/8/9 18:35, Segher Boessenkool wrote:
> Hi!
> 
>> +  /* As ELFv2 ABI shows, the allowable bytes past the global entry
>> + point are 0, 4, 8, 16, 32 and 64.  Considering there are two
>> + non-prefixed instructions for global entry (8 bytes), the count
>> + for patchable NOPs before local entry would be 2, 6 and 14.  */
> 
> The other option is to allow other numbers of nops, but in that case not
> have a local entry point (so, always use the global entry point).
> 

Good point, it's doable, but it means for the other counts of NOPs, the
patched function has to pay the cost of TOC initialization all the time,
IMHO it may not be what we want.

> I don't know if that is useful for any users of this support (if there
> even are such users :-P )

Yeah, as the discussions in PR98125, powerpc linux kernel doesn't adopt
this feature.  :-P

> 
>> +  if (patch_area_entry > 0)
>> +{
>> +  if (patch_area_entry != 2
>> +  && patch_area_entry != 6
>> +  && patch_area_entry != 14)
>> +error ("for %<-fpatchable-function-entry=%u,%u%>, patching "
>> +   "%u NOP(s) before function entry is invalid, it can "
>> +   "cause assembler error",
> 
> I would not say "it can [etc.]" at all.  Oh, and "NOP" (capitals) isn't
> a thing, it is not an acronym or such ;-)
> 

Poor at wording.  :(  Could you help to suggest some words here? 

>> +/* { dg-require-effective-target powerpc_elfv2 } */
>> +/* Specify -mcpu=power9 to ensure global entry is needed.  */
>> +/* { dg-options "-mdejagnu-cpu=power9" } */
> 
> Why would it be needed for p9, and not older, or newer?
> 

It can be p8 or p9, but not p10 and later.  

It's meant to exclude pc-relative feature which can make the case not
generate a global entry point prologue and the test point will become
unavailable.  I thought about adding -mno-pcrel, but guessed it's safer
to use one cpu type which doesn't support pcrel at all, since it can
exclude all possibilities that pcrel gets re-enabled.

Do you think -mno-pcrel is more elegant and relatively safe?
Or just update the comments to make it more meaningful?

> Every function always has a GEP, so I'm not sure what you are trying to
> say here anyway :-)

Good catch!  :) It's meant to say global entry (point) prologue. 

> 
> Rest looks good to me.
> 

Thanks again!

BR,
Kewen


[PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-09 Thread Richard Biener via Gcc-patches
This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.

The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

  if (i)
x = 1;
  else
x = 5;
  if (x == 1)
...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
path_range_query::ssa_range_in_phi fix, now re-running with.

OK?

Thanks,
Richard.

PR tree-optimization/106514
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Compute and unwind both m_imports and interesting on the fly during
path discovery.
(back_threader::find_paths): Compute the original m_imports
from just the SSA uses of the exit conditional.  Drop
handling single_succ_to_potentially_threadable_block.
* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
---
 gcc/gimple-range-path.cc   |  52 -
 gcc/tree-ssa-threadbackward.cc | 104 ++---
 2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 43e7526b6fc..b4376011ea8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -276,8 +276,6 @@ void
 path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
 {
   tree name = gimple_phi_result (phi);
-  basic_block bb = gimple_bb (phi);
-  unsigned nargs = gimple_phi_num_args (phi);
 
   if (at_entry ())
 {
@@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
   // Try to fold the phi exclusively with global or cached values.
   // This will get things like PHI <5(99), 6(88)>.  We do this by
   // calling range_of_expr with no context.
+  unsigned nargs = gimple_phi_num_args (phi);
   Value_Range arg_range (TREE_TYPE (name));
   r.set_undefined ();
   for (size_t i = 0; i < nargs; ++i)
@@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
   return;
 }
 
+  basic_block bb = gimple_bb (phi);
   basic_block prev = prev_bb ();
   edge e_in = find_edge (prev, bb);
-
-  for (size_t i = 0; i < nargs; ++i)
-if (e_in == gimple_phi_arg_edge (phi, i))
-  {
-   tree arg = gimple_phi_arg_def (phi, i);
-   // Avoid using the cache for ARGs defined in this block, as
-   // that could create an ordering problem.
-   if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
- {
-   if (m_resolve)
- {
-   Value_Range tmp (TREE_TYPE (name));
-   // Using both the range on entry to the path, and the
-   // range on this edge yields significantly better
-   // results.
-   if (defined_outside_path (arg))
- range_on_path_entry (r, arg);
-   else
- r.set_varying (TREE_TYPE (name));
-   m_ranger->range_on_edge (tmp, e_in, arg);
-   r.intersect (tmp);
-   return;
- }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
+  // Avoid using the cache for ARGs 

Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-09 Thread Lulu Cheng



在 2022/8/9 下午7:30, Xi Ruoyao 写道:

Sorry for late reply, I'm rebuilding my entire Linux system (from
scratch) for Glibc-2.36 and Binutils-2.39 update and I just reached the
mail client.

On Mon, 2022-08-08 at 12:53 +0800, Lulu Cheng wrote:

I still think it makes a little bit more sense to put attribute(model)
and -mcmodel together.

-mcmodel sets the access range of all symbols in a single fileand
attribute (model) sets the

accsess range of a single symbol in a file. For example
__attribute__((model(normal/large/extreme))).

It might make sense, but then it would not be what we want for per-CPU
symbols.  What we want here is "treat a local symbol as-if it's global",
while each code model may already treat local symbol and global symbol
differently.

Disambiguation: here "local" means "defined in this TU", "global"
otherwise (not "local variable" in C).

I'll send v6 with the name "addr_global" if no objection.

I am implementing the mode of cmodel=extreme. In this mode, the value of 
the relative offset is a signed 64-bit value, so this can solve the 
access problem of the variables of the kernel precpu.


So I wonder if it is necessary to add another attribute like addr_global?



Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-08-09 Thread Richard Biener via Gcc-patches
On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni
 wrote:
>
> On Mon, 8 Aug 2022 at 14:27, Richard Biener  
> wrote:
> >
> > On Mon, Aug 1, 2022 at 5:17 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 21 Jul 2022 at 12:21, Richard Biener  
> > > wrote:
> > > >
> > > > On Wed, Jul 20, 2022 at 5:36 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 18 Jul 2022 at 11:57, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Jul 15, 2022 at 3:49 PM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, 14 Jul 2022 at 17:22, Richard Sandiford
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Richard Biener  writes:
> > > > > > > > > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >> On Wed, 13 Jul 2022 at 12:22, Richard Biener 
> > > > > > > > >>  wrote:
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via 
> > > > > > > > >> > Gcc-patches
> > > > > > > > >> >  wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Hi Richard,
> > > > > > > > >> > > For the following test:
> > > > > > > > >> > >
> > > > > > > > >> > > svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > > {
> > > > > > > > >> > >   int32x4_t v = (int32x4_t) {a, b, c, d};
> > > > > > > > >> > >   return svld1rq_s32 (svptrue_b8 (), &v[0]);
> > > > > > > > >> > > }
> > > > > > > > >> > >
> > > > > > > > >> > > The compiler emits following ICE with -O3 
> > > > > > > > >> > > -mcpu=generic+sve:
> > > > > > > > >> > > foo.c: In function ‘f2’:
> > > > > > > > >> > > foo.c:4:11: error: non-trivial conversion in 
> > > > > > > > >> > > ‘view_convert_expr’
> > > > > > > > >> > > 4 | svint32_t f2(int a, int b, int c, int d)
> > > > > > > > >> > >   |   ^~
> > > > > > > > >> > > svint32_t
> > > > > > > > >> > > __Int32x4_t
> > > > > > > > >> > > _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > >> > > during GIMPLE pass: forwprop
> > > > > > > > >> > > dump file: foo.c.109t.forwprop2
> > > > > > > > >> > > foo.c:4:11: internal compiler error: verify_gimple failed
> > > > > > > > >> > > 0xfda04a verify_gimple_in_cfg(function*, bool)
> > > > > > > > >> > > ../../gcc/gcc/tree-cfg.cc:5568
> > > > > > > > >> > > 0xe9371f execute_function_todo
> > > > > > > > >> > > ../../gcc/gcc/passes.cc:2091
> > > > > > > > >> > > 0xe93ccb execute_todo
> > > > > > > > >> > > ../../gcc/gcc/passes.cc:2145
> > > > > > > > >> > >
> > > > > > > > >> > > This happens because, after folding svld1rq_s32 to 
> > > > > > > > >> > > vec_perm_expr, we have:
> > > > > > > > >> > >   int32x4_t v;
> > > > > > > > >> > >   __Int32x4_t _1;
> > > > > > > > >> > >   svint32_t _9;
> > > > > > > > >> > >   vector(4) int _11;
> > > > > > > > >> > >
> > > > > > > > >> > >:
> > > > > > > > >> > >   _1 = {a_3(D), b_4(D), c_5(D), d_6(D)};
> > > > > > > > >> > >   v_12 = _1;
> > > > > > > > >> > >   _11 = v_12;
> > > > > > > > >> > >   _9 = VEC_PERM_EXPR <_11, _11, { 0, 1, 2, 3, ... }>;
> > > > > > > > >> > >   return _9;
> > > > > > > > >> > >
> > > > > > > > >> > > During forwprop, simplify_permutation simplifies 
> > > > > > > > >> > > vec_perm_expr to
> > > > > > > > >> > > view_convert_expr,
> > > > > > > > >> > > and the end result becomes:
> > > > > > > > >> > >   svint32_t _7;
> > > > > > > > >> > >   __Int32x4_t _8;
> > > > > > > > >> > >
> > > > > > > > >> > > ;;   basic block 2, loop depth 0
> > > > > > > > >> > > ;;pred:   ENTRY
> > > > > > > > >> > >   _8 = {a_2(D), b_3(D), c_4(D), d_5(D)};
> > > > > > > > >> > >   _7 = VIEW_CONVERT_EXPR<__Int32x4_t>(_8);
> > > > > > > > >> > >   return _7;
> > > > > > > > >> > > ;;succ:   EXIT
> > > > > > > > >> > >
> > > > > > > > >> > > which causes the error duing verify_gimple since 
> > > > > > > > >> > > VIEW_CONVERT_EXPR
> > > > > > > > >> > > has incompatible types (svint32_t, int32x4_t).
> > > > > > > > >> > >
> > > > > > > > >> > > The attached patch disables simplification of 
> > > > > > > > >> > > VEC_PERM_EXPR
> > > > > > > > >> > > in simplify_permutation, if lhs and rhs have non 
> > > > > > > > >> > > compatible types,
> > > > > > > > >> > > which resolves ICE, but am not sure if it's the correct 
> > > > > > > > >> > > approach ?
> > > > > > > > >> >
> > > > > > > > >> > It for sure papers over the issue.  I think the error 
> > > > > > > > >> > happens earlier,
> > > > > > > > >> > the V_C_E should have been built with the type of the 
> > > > > > > > >> > VEC_PERM_EXPR
> > > > > > > > >> > which is the type of the LHS.  But then you probably run 
> > > > > > > > >> > into the
> > > > > > > > >> > different sizes ICE (VLA vs constant size).  I think for 
> > > > > > > > >> > this case you
> > > > > > > > >> > want a BIT_FIELD_REF instead of a VIEW_CONVERT_EXPR,
> > > > > > > > >> > selecting the "low" part of the VLA vector.
> > > > > > > > >> Hi Richard,
> > > > > > > > >> Sorry I don't quite 

[PATCH 0/3] OpenMP SIMD routines

2022-08-09 Thread Andrew Stubbs
This patch series implements OpenMP "simd" routines for amdgcn, and also
adds support for "simd inbranch" routines for amdgcn, x86_64, and
aarch64 (probably, I can't easily test it).

I can approve patch 2 myself, but it depends on patch 1 so I include it
here for context and completeness.

I first tried to use "mask_mode = DImode", for amdgcn, but that does not
produce great results because it ends up generating code to turn the
mask into a vector and then back into the exact same mask, so I have
settled on "mask_mode = VOIDmode", for now (in fact that uses fewer
argument registers in many cases, so maybe it's better anyway).
Additionally, I find that the x86_64 truth vectors cannot always be
converted to the mask types specified by the backend, so I have pulled
that code out completely.

Therefore, this patch includes only "mask_mode == VOIDmode" support,
but remains a step forward towards full SIMD clone support.

I have not included dump-scans in the testcases for aarch64, but the
testcases will still test correctness.  The aarch64 maintainers can very
easily add those scans if they choose.  No other architecture has
backend support for the clones at this time.

OK for mainline (patches 1 & 3)?

Thanks

Andrew

Andrew Stubbs (3):
  omp-simd-clone: Allow fixed-lane vectors
  amdgcn: OpenMP SIMD routine support
  vect: inbranch SIMD clones

 gcc/config/gcn/gcn.cc |  63 
 gcc/doc/tm.texi   |   3 +
 gcc/omp-simd-clone.cc |  21 ++-
 gcc/target.def|   3 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c |   2 +
 .../gcc.dg/vect/vect-simd-clone-16.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-16b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16f.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-17b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17f.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-18b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18f.c |  16 +++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c |   2 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c |   1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c |   2 +
 gcc/tree-if-conv.cc   |  39 -
 gcc/tree-vect-stmts.cc| 134 ++
 30 files changed, 734 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c

-- 
2.37.0



[PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-09 Thread Andrew Stubbs

The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).

With this change the vecsize can be left zero and the vectorization factor will
be the same for all types.

gcc/ChangeLog:

* doc/tm.texi: Regenerate.
* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
vecsize.
(simd_clone_adjust_argument_types): Likewise.
* target.def (compute_vecsize_and_simdlen): Document the new
vecsize_int and vecsize_float semantics.
---
 gcc/doc/tm.texi   |  3 +++
 gcc/omp-simd-clone.cc | 20 +++-
 gcc/target.def|  3 +++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 92bda1a7e14..c3001c6ded9 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6253,6 +6253,9 @@ stores.
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
 @var{simdlen} field if it was previously 0.
+@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and
+@var{vecsize_float} should be left zero on targets where the number of lanes is
+not determined by the bitsize (in which case @var{simdlen} is always used).
 The hook should return 0 if SIMD clones shouldn't be emitted,
 or number of @var{vecsize_mangle} variants that should be emitted.
 @end deftypefn
diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 58bd68b129b..258d3c6377f 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
 veclen = node->simdclone->vecsize_int;
   else
 veclen = node->simdclone->vecsize_float;
-  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
+  if (known_eq (veclen, 0))
+veclen = node->simdclone->simdlen;
+  else
+veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
   if (multiple_p (veclen, node->simdclone->simdlen))
 veclen = node->simdclone->simdlen;
   if (POINTER_TYPE_P (t))
@@ -618,8 +621,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
 	  else
 	veclen = sc->vecsize_float;
-	  veclen = exact_div (veclen,
-			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
+	  if (known_eq (veclen, 0))
+	veclen = sc->simdlen;
+	  else
+	veclen = exact_div (veclen,
+GET_MODE_BITSIZE
+(SCALAR_TYPE_MODE (parm_type)));
 	  if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
 	  adj.op = IPA_PARAM_OP_NEW;
@@ -669,8 +676,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	veclen = sc->vecsize_int;
   else
 	veclen = sc->vecsize_float;
-  veclen = exact_div (veclen,
-			  GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
+  if (known_eq (veclen, 0))
+	veclen = sc->simdlen;
+  else
+	veclen = exact_div (veclen,
+			GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
   if (multiple_p (veclen, sc->simdlen))
 	veclen = sc->simdlen;
   if (sc->mask_mode != VOIDmode)
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83d..4d49ffc2c88 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1629,6 +1629,9 @@ DEFHOOK
 "This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\
 @var{simdlen} field if it was previously 0.\n\
+@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and\n\
+@var{vecsize_float} should be left zero on targets where the number of lanes is\n\
+not determined by the bitsize (in which case @var{simdlen} is always used).\n\
 The hook should return 0 if SIMD clones shouldn't be emitted,\n\
 or number of @var{vecsize_mangle} variants that should be emitted.",
 int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)


[PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-09 Thread Andrew Stubbs

Enable and configure SIMD clones for amdgcn.  This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.

Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): New.
(gcn_simd_clone_adjust): New.
(gcn_simd_clone_usable): New.
(TARGET_SIMD_CLONE_ADJUST): New.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New.
(TARGET_SIMD_CLONE_USABLE): New.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-1.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-2.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-3.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-4.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-5.c: Add dg-warning.
* gcc.dg/vect/vect-simd-clone-8.c: Add dg-warning.
---
 gcc/config/gcn/gcn.cc | 63 +++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c |  2 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c |  2 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c |  2 +
 7 files changed, 72 insertions(+)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 96295e23aad..ceb69000807 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -52,6 +52,7 @@
 #include "rtl-iter.h"
 #include "dwarf2.h"
 #include "gimple.h"
+#include "cgraph.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -4555,6 +4556,61 @@ gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost),
   return 1;
 }
 
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
+
+static int
+gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node),
+	struct cgraph_simd_clone *clonei,
+	tree base_type,
+	int ARG_UNUSED (num))
+{
+  unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+
+  if (known_eq (clonei->simdlen, 0U))
+clonei->simdlen = 64;
+  else if (maybe_ne (clonei->simdlen, 64U))
+{
+  /* Note that x86 has a similar message that is likely to trigger on
+	 sizes that are OK for gcn; the user can't win.  */
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %wd (amdgcn)",
+		  clonei->simdlen.to_constant ());
+  return 0;
+}
+
+  clonei->vecsize_mangle = 'n';
+  clonei->vecsize_int = 0;
+  clonei->vecsize_float = 0;
+
+  /* DImode ought to be more natural here, but VOIDmode produces better code,
+ at present, due to the shift-and-test steps not being optimized away
+ inside the in-branch clones.  */
+  clonei->mask_mode = VOIDmode;
+
+  return 1;
+}
+
+/* Implement TARGET_SIMD_CLONE_ADJUST.  */
+
+static void
+gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node))
+{
+  /* This hook has to be defined when
+ TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN is defined, but we don't
+ need it to do anything yet.  */
+}
+
+/* Implement TARGET_SIMD_CLONE_USABLE.  */
+
+static int
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+{
+  /* We don't need to do anything here because
+ gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
+ possibility.  */
+  return 0;
+}
+
 /* }}}  */
 /* {{{ md_reorg pass.  */
 
@@ -6643,6 +6699,13 @@ gcn_dwarf_register_span (rtx rtl)
 #define TARGET_SECTION_TYPE_FLAGS gcn_section_type_flags
 #undef  TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P gcn_scalar_mode_supported_p
+#undef  TARGET_SIMD_CLONE_ADJUST
+#define TARGET_SIMD_CLONE_ADJUST gcn_simd_clone_adjust
+#undef  TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
+  gcn_simd_clone_compute_vecsize_and_simdlen
+#undef  TARGET_SIMD_CLONE_USABLE
+#define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
 #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
   gcn_small_register_classes_for_mode_p
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
index 50429049500..cd65fc343f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
@@ -56,3 +56,5 @@ main ()
   return 0;
 }
 
+/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */
+/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
index f89c73a961b..ffcbf9380d6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
@@ -50,3 +50,5 @@ main ()
   return 0;
 }
 
+

[PATCH 3/3] vect: inbranch SIMD clones

2022-08-09 Thread Andrew Stubbs

There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).

This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
so there should be no regressions.

The sub-set of support should cover all cases needed by amdgcn, at present.

gcc/ChangeLog:

* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Pass the predicate to SIMD functions.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
to clones with mask arguments, in some cases.
Generate the mask vector arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.
---
 gcc/omp-simd-clone.cc |   1 +
 .../gcc.dg/vect/vect-simd-clone-16.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-16b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-16e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-16f.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-17b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-17e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-17f.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18.c  |  89 
 .../gcc.dg/vect/vect-simd-clone-18b.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18c.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18d.c |  16 +++
 .../gcc.dg/vect/vect-simd-clone-18e.c |  14 ++
 .../gcc.dg/vect/vect-simd-clone-18f.c |  16 +++
 gcc/tree-if-conv.cc   |  39 -
 gcc/tree-vect-stmts.cc| 134 ++
 21 files changed, 641 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c

diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 258d3c6377f..58e3dc8b2e9 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -716,6 +716,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
 	}
   sc->args[i].orig_type = base_type;
   sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+  sc->args[i].vector_type = adj.type;
 }
 
   if (node->definition)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
new file mode 100644
index 000..ffaabb30d1e
--- /dev/null
+

Re: [PATCH v2 02/10] Introduce strub: torture tests for C and C++

2022-08-09 Thread Alexandre Oliva via Gcc-patches
Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/599011.html

Here's an incremental patch for some of these tests, that avoids some
relatively rare spurious failures.


diff --git a/gcc/testsuite/c-c++-common/torture/strub-run1.c 
b/gcc/testsuite/c-c++-common/torture/strub-run1.c
index b24a1c7a345fa..7458b3fb54da5 100644
--- a/gcc/testsuite/c-c++-common/torture/strub-run1.c
+++ b/gcc/testsuite/c-c++-common/torture/strub-run1.c
@@ -80,11 +80,16 @@ internal ()
 
 int main ()
 {
-  if (!look_for_string (callable ()))
-__builtin_abort ();
-  if (look_for_string (at_calls ()))
-__builtin_abort ();
-  if (look_for_string (internal ()))
-__builtin_abort ();
+  /* Since these test check stack contents above the top of the stack, an
+ unexpected asynchronous signal or interrupt might overwrite the bits we
+ expect to find and cause spurious fails.  Tolerate one such overall
+ spurious fail by retrying.  */
+  int i = 1;
+  while (!look_for_string (callable ()))
+if (!i--) __builtin_abort ();
+  while (look_for_string (at_calls ()))
+if (!i--) __builtin_abort ();
+  while (look_for_string (internal ()))
+if (!i--) __builtin_abort ();
   __builtin_exit (0);
 }
diff --git a/gcc/testsuite/c-c++-common/torture/strub-run2.c 
b/gcc/testsuite/c-c++-common/torture/strub-run2.c
index 1df2ffe2fe58c..5d60a7775f4bb 100644
--- a/gcc/testsuite/c-c++-common/torture/strub-run2.c
+++ b/gcc/testsuite/c-c++-common/torture/strub-run2.c
@@ -69,11 +69,16 @@ internal ()
 
 int main ()
 {
-  if (!look_for_string (callable ()))
-__builtin_abort ();
-  if (look_for_string (at_calls ()))
-__builtin_abort ();
-  if (look_for_string (internal ()))
-__builtin_abort ();
+  /* Since these test check stack contents above the top of the stack, an
+ unexpected asynchronous signal or interrupt might overwrite the bits we
+ expect to find and cause spurious fails.  Tolerate one such overall
+ spurious fail by retrying.  */
+  int i = 1;
+  while (!look_for_string (callable ()))
+if (!i--) __builtin_abort ();
+  while (look_for_string (at_calls ()))
+if (!i--) __builtin_abort ();
+  while (look_for_string (internal ()))
+if (!i--) __builtin_abort ();
   __builtin_exit (0);
 }
diff --git a/gcc/testsuite/c-c++-common/torture/strub-run3.c 
b/gcc/testsuite/c-c++-common/torture/strub-run3.c
index afbc2cc9ab484..c2ad710858e87 100644
--- a/gcc/testsuite/c-c++-common/torture/strub-run3.c
+++ b/gcc/testsuite/c-c++-common/torture/strub-run3.c
@@ -65,11 +65,16 @@ internal ()
 
 int main ()
 {
-  if (!look_for_string (callable ()))
-__builtin_abort ();
-  if (look_for_string (at_calls ()))
-__builtin_abort ();
-  if (look_for_string (internal ()))
-__builtin_abort ();
+  /* Since these test check stack contents above the top of the stack, an
+ unexpected asynchronous signal or interrupt might overwrite the bits we
+ expect to find and cause spurious fails.  Tolerate one such overall
+ spurious fail by retrying.  */
+  int i = 1;
+  while (!look_for_string (callable ()))
+if (!i--) __builtin_abort ();
+  while (look_for_string (at_calls ()))
+if (!i--) __builtin_abort ();
+  while (look_for_string (internal ()))
+if (!i--) __builtin_abort ();
   __builtin_exit (0);
 }
diff --git a/gcc/testsuite/c-c++-common/torture/strub-run4.c 
b/gcc/testsuite/c-c++-common/torture/strub-run4.c
index 5300f1d330b87..3b36b8e5d68ef 100644
--- a/gcc/testsuite/c-c++-common/torture/strub-run4.c
+++ b/gcc/testsuite/c-c++-common/torture/strub-run4.c
@@ -95,7 +95,12 @@ internal ()
 int __attribute__ ((__strub__ ("disabled")))
 main ()
 {
-  if (look_for_string (internal ()))
-__builtin_abort ();
+  /* Since these test check stack contents above the top of the stack, an
+ unexpected asynchronous signal or interrupt might overwrite the bits we
+ expect to find and cause spurious fails.  Tolerate one such overall
+ spurious fail by retrying.  */
+  int i = 1;
+  while (look_for_string (internal ()))
+if (!i--) __builtin_abort ();
   __builtin_exit (0);
 }


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Introduce hardbool attribute for C

2022-08-09 Thread Alexandre Oliva via Gcc-patches
Ping? (sorry, Joseph, I failed to Cc: you last time)

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598034.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598084.html

On Jul  7, 2022, Alexandre Oliva  wrote:

> for  gcc/c-family/ChangeLog

>   * c-attribs.cc (c_common_attribute_table): Add hardbool.
>   (handle_hardbool_attribute): New.
>   (type_valid_for_vector_size): Reject hardbool.
>   * c-common.cc (convert_and_check): Skip warnings for convert
>   and check for hardbool.
>   (c_hardbool_type_attr_1): New.
>   * c-common.h (c_hardbool_type_attr): New.

> for  gcc/c/ChangeLog

>   * c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools.
>   * c-convert.cc (convert): Convert to hardbool through
>   truthvalue.
>   * c-decl.cc (check_bitfield_type_and_width): Skip enumeral
>   truncation warnings for hardbool.
>   (finish_struct): Propagate hardbool attribute to bitfield
>   types.
>   (digest_init): Convert to hardbool.

> for  gcc/ChangeLog

>   * doc/extend.texi (hardbool): New type attribute.

> for  gcc/testsuite/ChangeLog

>   * gcc.dg/hardbool-err.c: New.
>   * gcc.dg/hardbool-trap.c: New.
>   * gcc.dg/hardbool.c: New.
>   * gcc.dg/hardbool-s.c: New.
>   * gcc.dg/hardbool-us.c: New.
>   * gcc.dg/hardbool-i.c: New.
>   * gcc.dg/hardbool-ul.c: New.
>   * gcc.dg/hardbool-ll.c: New.
>   * gcc.dg/hardbool-5a.c: New.
>   * gcc.dg/hardbool-s-5a.c: New.
>   * gcc.dg/hardbool-us-5a.c: New.
>   * gcc.dg/hardbool-i-5a.c: New.
>   * gcc.dg/hardbool-ul-5a.c: New.
>   * gcc.dg/hardbool-ll-5a.c: New.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] i386 PIE: accept @GOTOFF in load/store multi base address

2022-08-09 Thread Alexandre Oliva via Gcc-patches
Ping?

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598872.html

On Jul 27, 2022, Alexandre Oliva  wrote:

> for  gcc/ChangeLog

>   * config/i386/i386.cc (symbolic_base_address_p,
>   base_address_p): New, factored out from...
>   (extract_base_offset_in_addr): ... here and extended to
>   recognize REG+GOTOFF, as in gcc.target/i386/sse2-load-multi.c
>   and sse2-store-multi.c with PIE enabled by default.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] i386 testsuite: cope with --enable-default-pie

2022-08-09 Thread Alexandre Oliva via Gcc-patches
Ping?

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598276.html

On Jul 27, 2022, Alexandre Oliva  wrote:

> for  gcc/testsuite/ChangeLog

>   * g++.dg/abi/anon1.C: Disable pie on ia32.
>   * g++.dg/abi/anon4.C: Likewise.
>   * g++.dg/cpp0x/initlist-const1.C: Likewise.
>   * g++.dg/no-stack-protector-attr-3.C: Likewise.
>   * g++.dg/stackprotectexplicit2.C: Likewise.
>   * g++.dg/pr71694.C: Likewise.
>   * gcc.dg/pr102892-1.c: Likewise.
>   * gcc.dg/sibcall-11.c: Likewise.
>   * gcc.dg/torture/builtin-self.c: Likewise.
>   * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Likewise.
>   * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Likewise.
>   * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
>   * gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
>   * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
>   * gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
>   * gcc.target/i386/avx512fp16-broadcast-1.c: Likewise.
>   * gcc.target/i386/avx512fp16-pr101846.c: Likewise.
>   * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
>   * gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
>   * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
>   * gcc.target/i386/pr100865-2.c: Likewise.
>   * gcc.target/i386/pr100865-3.c: Likewise.
>   * gcc.target/i386/pr100865-4a.c: Likewise.
>   * gcc.target/i386/pr100865-4b.c: Likewise.
>   * gcc.target/i386/pr100865-5a.c: Likewise.
>   * gcc.target/i386/pr100865-5b.c: Likewise.
>   * gcc.target/i386/pr100865-6a.c: Likewise.
>   * gcc.target/i386/pr100865-6b.c: Likewise.
>   * gcc.target/i386/pr100865-6c.c: Likewise.
>   * gcc.target/i386/pr100865-7b.c: Likewise.
>   * gcc.target/i386/pr101796-1.c: Likewise.
>   * gcc.target/i386/pr101846-2.c: Likewise.
>   * gcc.target/i386/pr101989-broadcast-1.c: Likewise.
>   * gcc.target/i386/pr102021.c: Likewise.
>   * gcc.target/i386/pr90773-17.c: Likewise.
>   * gcc.target/i386/pr54855-3.c: Likewise.
>   * gcc.target/i386/pr54855-7.c: Likewise.
>   * gcc.target/i386/pr15184-1.c: Likewise.
>   * gcc.target/i386/pr15184-2.c: Likewise.
>   * gcc.target/i386/pr27971.c: Likewise.
>   * gcc.target/i386/pr70263-2.c: Likewise.
>   * gcc.target/i386/pr78035.c: Likewise.
>   * gcc.target/i386/pr81736-5.c: Likewise.
>   * gcc.target/i386/pr81736-7.c: Likewise.
>   * gcc.target/i386/pr85620-6.c: Likewise.
>   * gcc.target/i386/pr85667-6.c: Likewise.
>   * gcc.target/i386/pr93492-5.c: Likewise.
>   * gcc.target/i386/pr96539.c: Likewise.
>   PR target/81708 (%gs:my_guard)
>   * gcc.target/i386/stack-prot-sym.c: Likewise.
>   * g++.dg/init/static-cdtor1.C: Add alternate patterns for PIC.
>   * gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: Extend patterns
>   for PIC/PIE register allocation.
>   * gcc.target/i386/pr100704-3.c: Likewise.
>   * gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Likewise.
>   * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: Likewise.
>   * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Likewise.
>   * gcc.target/i386/avx512fp16-vmovsh-1a.c: Likewise.
>   * gcc.target/i386/interrupt-11.c: Likewise, allowing for
>   preservation of the PIC register.
>   * gcc.target/i386/interrupt-12.c: Likewise.
>   * gcc.target/i386/interrupt-13.c: Likewise.
>   * gcc.target/i386/interrupt-15.c: Likewise.
>   * gcc.target/i386/interrupt-16.c: Likewise.
>   * gcc.target/i386/interrupt-17.c: Likewise.
>   * gcc.target/i386/interrupt-8.c: Likewise.
>   * gcc.target/i386/cet-sjlj-6a.c: Combine patterns from
>   previous change.
>   * gcc.target/i386/cet-sjlj-6b.c: Likewise.
>   * gcc.target/i386/pad-10.c: Accept insns in get_pc_thunk.
>   * gcc.target/i386/pr70321.c: Likewise.
>   * gcc.target/i386/pr81563.c: Likewise.
>   * gcc.target/i386/pr84278.c: Likewise.
>   * gcc.target/i386/pr90773-2.c: Likewise, plus extra loads from
>   the GOT.
>   * gcc.target/i386/pr90773-3.c: Likewise.
>   * gcc.target/i386/pr94913-2.c: Accept additional PIC insns.
>   * gcc.target/i386/stack-check-17.c: Likewise.
>   * gcc.target/i386/stack-check-12.c: Do not require dummy stack
>   probing obviated with PIC.
>   * gcc.target/i386/pr95126-m32-1.c: Expect missed optimization
>   with PIC.
>   * gcc.target/i386/pr95126-m32-2.c: Likewise.
>   * gcc.target/i386/pr95852-2.c: Accept different optimization
>   with PIC.
>   * gcc.target/i386/pr95852-4.c: Likewise.


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: 回复:[PATCH v5] LoongArch: add movable attribute

2022-08-09 Thread Xi Ruoyao via Gcc-patches
On Tue, 2022-08-09 at 21:03 +0800, Lulu Cheng wrote:
> 
> 在 2022/8/9 下午7:30, Xi Ruoyao 写道:
>  
> 
> 
> 
> > Sorry for late reply, I'm rebuilding my entire Linux system (from
> > scratch) for Glibc-2.36 and Binutils-2.39 update and I just reached the
> > mail client.
> > 
> > On Mon, 2022-08-08 at 12:53 +0800, Lulu Cheng wrote:
> >  
> > 
> > 
> > 
> > > I still think it makes a little bit more sense to put attribute(model)
> > > and -mcmodel together.
> > > 
> > > -mcmodel sets the access range of all symbols in a single fileand 
> > > attribute (model) sets the
> > > 
> > > accsess range of a single symbol in a file. For example 
> > > __attribute__((model(normal/large/extreme))).
> > It might make sense, but then it would not be what we want for per-CPU
> > symbols.  What we want here is "treat a local symbol as-if it's global",
> > while each code model may already treat local symbol and global symbol
> > differently.
> > 
> > Disambiguation: here "local" means "defined in this TU", "global"
> > otherwise (not "local variable" in C).
> > 
> > I'll send v6 with the name "addr_global" if no objection.
> > 
> I am implementing the mode of cmodel=extreme.
> In this mode, the value of the relative offset is a signed 64-bit value,
> so this can solve the access problem of the variables of the kernel precpu.
> So I wonder if it is necessary to add another attribute like addr_global?

If we use GOT I can implement only PC_HI20 and PC_LO12 relocs in kernel
module loader. If we use extreme I'll need to implement 4 ABS
relocations along with them.

But "the less the better" is not a very strong reason anyway.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)

2022-08-09 Thread Richard Biener via Gcc-patches
On Mon, 8 Aug 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> So I've changed the approach from the RFC as suggested, moving the bitfield
> lowering to the if-convert pass.
> 
> So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's
> to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are
> writes, using loads and writes of 'representatives' that are big enough to
> contain the bitfield value.
> 
> In vect_recog I added two patterns to replace these BIT_FIELD_REF and
> BIT_INSERT_EXPR with shift's and masks as appropriate.
> 
> I'd like to see if it was possible to remove the 'load' part of a
> BIT_INSERT_EXPR if the representative write didn't change any relevant bits. 
> For example:
> 
> struct s{
> int dont_care;
> char a : 3;
> };
> 
> s.a = ;
> 
> Should not require a load & write cycle, in fact it wouldn't even require any
> masking either. Though to achieve this we'd need to make sure the
> representative didn't overlap with any other field. Any suggestions on how to
> do this would be great, though I don't think we need to wait for that, as
> that's merely a nice-to-have optimization I guess?

Hmm.  I'm not sure the middle-end can simply ignore padding.  If
some language standard says that would be OK then I think we should
exploit this during lowering when the frontend is still around to
ask - which means somewhen during early optimization.

> I am not sure where I should 'document' this change of behavior to ifcvt,
> and/or we should change the name of the pass, since it's doing more than
> if-conversion now?

It's preparation for vectorization anyway since it will emit
.MASK_LOAD/STORE and friends already.  So I don't think anything
needs to change there.


@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool 
aggressive_if_conv)
   auto_vec critical_edges;

   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
 return false;

   body = get_loop_body (loop);

this doesn't appear in the ChangeLog nor is it clear to me why it's
needed?  Likewise

-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+{
+  /* Save BB->aux around loop_version as that uses the same field.  
*/
+  save_length = loop->inner ? loop->inner->num_nodes : 
loop->num_nodes;
+  saved_preds = XALLOCAVEC (void *, save_length);
+  for (unsigned i = 0; i < save_length; i++)
+   saved_preds[i] = ifc_bbs[i]->aux;
+}

is that just premature optimization?

+  /* BITSTART and BITEND describe the region we can safely load from 
inside the
+ structure.  BITPOS is the bit position of the value inside the
+ representative that we will end up loading OFFSET bytes from the 
start
+ of the struct.  BEST_MODE is the mode describing the optimal size of 
the
+ representative chunk we load.  If this is a write we will store the 
same
+ sized representative back, after we have changed the appropriate 
bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);

I think you need to give up when get_bit_range sets bitstart = bitend to 
zero

+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+TYPE_ALIGN (TREE_TYPE (struct_expr)),
+INT_MAX, false, &best_mode))

+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+ NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();

you shouldn't build a new FIELD_DECL.  Either you use
DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
BIT_FIELD_REF accessing the "representative".
DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
a variable field offset, you can also subset that with an
intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
too large for your taste.

I'm not sure all the offset calculation you do is correct, but
since you shouldn't invent a new FIELD_DECL it probably needs
to change anyway ...

Note that for optimization it will be important that all
accesses to the bitfield members of the same bitfield use the
same underlying area (CSE and store-forwarding will thank you).

+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, 
&bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+  && !need_to_lower_bitfields)
 goto cleanup;

so we lower bitfields even when we cannot split critical edges?
why?

+  need_to_ifcvt
+= if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
 goto cleanup;

likewise - i

Re: [PATCH v2] c++: Extend -Wredundant-move for const-qual objects [PR90428]

2022-08-09 Thread Marek Polacek via Gcc-patches
On Mon, Aug 08, 2022 at 04:27:10PM -0400, Marek Polacek wrote:
> +  /* Also try to warn about redundant std::move in code such as
> +  T f (const T& t)
> +  {
> + return std::move(t);
> +  }
> +for which EXPR will be something like
> +  *std::move ((const struct T &) (const struct T *) t)
> + and where the std::move does nothing if T does not have a T(const T&&)
> + constructor, because the argument is const.  It will not use T(T&&)
> + because that would mean losing the const.  */
> +  else if (TREE_CODE (TREE_TYPE (arg)) == REFERENCE_TYPE

This is TYPE_REF_P so I'll fix that up.

> +&& CP_TYPE_CONST_P (TREE_TYPE (TREE_TYPE (arg

Marek



Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-09 at 13:16 +0530, Immad Mir wrote:
> This patch fixes the ICE caused by valid_to_unchecked_state,
> at analyzer/sm-fd.cc by handling the m_start state in
> check_for_dup.
> 
> Tested lightly on x86_64.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/106551
> * sm-fd.cc (check_for_dup): handle the m_start
> state when transitioning the state of LHS
> of dup, dup2 and dup3 call.
> 
> Signed-off-by: Immad Mir 
> ---
>  gcc/analyzer/sm-fd.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> index 8bb76d72b05..c8b9930a7b6 100644
> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context
> *sm_ctxt, const supernode *node,
>  case DUP_1:
>    if (lhs)
> {
> - if (is_constant_fd_p (state_arg_1))
> + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> m_start)
>     sm_ctxt->set_next_state (stmt, lhs,
> m_unchecked_read_write);
>   else
>     sm_ctxt->set_next_state (stmt, lhs,
> @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context
> *sm_ctxt, const supernode *node,
>    file descriptor i.e the first argument.  */
>    if (lhs)
> {
> - if (is_constant_fd_p (state_arg_1))
> + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> m_start)
>     sm_ctxt->set_next_state (stmt, lhs,
> m_unchecked_read_write);
>   else
>     sm_ctxt->set_next_state (stmt, lhs,

Thanks.  The fix looks reasonable, but please can the patch also add a
reproducer to the test suite, covering each of the three dup/dup2/dup3
entrypoints - presumably the one from the bug can be used/adapted.

Dave



[committed] docs: add notes on which functions -fanalyzer has hardcoded knowledge of

2022-08-09 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-2003-g16877cc2006ede.

gcc/ChangeLog:
* doc/invoke.texi (Static Analyzer Options): Add notes on which
functions the analyzer has hardcoded knowledge of.

Signed-off-by: David Malcolm 
---
 gcc/doc/invoke.texi | 81 +
 1 file changed, 81 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 92f7aaead74..a17c059d515 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10281,6 +10281,87 @@ See 
@uref{https://cwe.mitre.org/data/definitions/457.html, CWE-457: Use of Unini
 
 @end table
 
+The analyzer has hardcoded knowledge about the behavior of the following
+memory-management functions:
+
+@itemize @bullet
+@item @code{alloca}
+@item The built-in functions @code{__builtin_alloc},
+@code{__builtin_alloc_with_align}, @item @code{__builtin_calloc},
+@code{__builtin_free}, @code{__builtin_malloc}, @code{__builtin_memcpy},
+@code{__builtin_memcpy_chk}, @code{__builtin_memset},
+@code{__builtin_memset_chk}, @code{__builtin_realloc},
+@code{__builtin_stack_restore}, and @code{__builtin_stack_save}
+@item @code{calloc}
+@item @code{free}
+@item @code{malloc}
+@item @code{memset}
+@item @code{operator delete}
+@item @code{operator delete []}
+@item @code{operator new}
+@item @code{operator new []}
+@item @code{realloc}
+@item @code{strdup}
+@item @code{strndup}
+@end itemize
+
+of the following functions for working with file descriptors:
+
+@itemize @bullet
+@item @code{open}
+@item @code{close}
+@item @code{creat}
+@item @code{dup}, @code{dup2} and @code{dup3}
+@item @code{read}
+@item @code{write}
+@end itemize
+
+of the following functions for working with @code{} streams:
+@itemize @bullet
+@item The built-in functions @code{__builtin_fprintf},
+@code{__builtin_fprintf_unlocked}, @code{__builtin_fputc},
+@code{__builtin_fputc_unlocked}, @code{__builtin_fputs},
+@code{__builtin_fputs_unlocked}, @code{__builtin_fwrite},
+@code{__builtin_fwrite_unlocked}, @code{__builtin_printf},
+@code{__builtin_printf_unlocked}, @code{__builtin_putc},
+@code{__builtin_putchar}, @code{__builtin_putchar_unlocked},
+@code{__builtin_putc_unlocked}, @code{__builtin_puts},
+@code{__builtin_puts_unlocked}, @code{__builtin_vfprintf}, and
+@code{__builtin_vprintf}
+@item @code{fopen}
+@item @code{fclose}
+@item @code{fgets}
+@item @code{fgets_unlocked}
+@item @code{fread}
+@item @code{getchar}
+@item @code{fprintf}
+@item @code{printf}
+@item @code{fwrite}
+@end itemize
+
+and of the following functions:
+
+@itemize @bullet
+@item The built-in functions @code{__builtin_expect},
+@code{__builtin_expect_with_probability}, @code{__builtin_strchr},
+@code{__builtin_strcpy}, @code{__builtin_strcpy_chk},
+@code{__builtin_strlen}, @code{__builtin_va_copy}, and
+@code{__builtin_va_start}
+@item The GNU extensions @code{error} and @code{error_at_line}
+@item @code{getpass}
+@item @code{longjmp}
+@item @code{putenv}
+@item @code{setjmp}
+@item @code{siglongjmp}
+@item @code{signal}
+@item @code{sigsetjmp}
+@item @code{strchr}
+@item @code{strlen}
+@end itemize
+
+In addition, various functions with an @code{__analyzer_} prefix have
+special meaning to the analyzer, described in the GCC Internals manual.
+
 Pertinent parameters for controlling the exploration are:
 @option{--param analyzer-bb-explosion-factor=@var{value}},
 @option{--param analyzer-max-enodes-per-program-point=@var{value}},
-- 
2.26.3



Re: [PATCH] libgccjit.h: Make the macro definition for testing gcc_jit_context_new_bitcast correctly available.

2022-08-09 Thread David Malcolm via Gcc-patches
On Sat, 2022-07-30 at 19:18 +0530, Vibhav Pant wrote:
> I don't have push rights to the repo, so this would need to be
> applied manually.

I've gone ahead and pushed your fix to trunk (for GCC 13) as r13-2004-
g9385cd9c74cf66.

I plan to also push it to the gcc 12 branch shortly (for gcc 12.2)


Thanks again for the patch.
Dave


> 
> 
> Thanks,
> Vibhav
> 
> On Tue, Jul 26, 2022 at 4:48 AM David Malcolm 
> wrote:
> > 
> > On Sat, 2022-07-23 at 13:31 +0530, Vibhav Pant via Jit wrote:
> > > The macro definition for LIBGCCJIT_HAVE_gcc-
> > > jit_context_new_bitcast
> > > was earlier located in the documentation comment for
> > > gcc_jit_context_new_bitcast, making it unavailable to code that
> > > consumed libgccjit.h. This patch moves the definition out of the
> > > comment, making it effective.
> > 
> > Good catch!
> > 
> > Do you have push rights to the git repo, or should I push this?
> > 
> > Thanks
> > Dave
> > 
> 
> 




[PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-09 Thread Immad Mir via Gcc-patches
This patch fixes the ICE caused by valid_to_unchecked_state,
at analyzer/sm-fd.cc by handling the m_start state in
check_for_dup.

Tested lightly on x86_64.

gcc/analyzer/ChangeLog:
PR analyzer/106551
* sm-fd.cc (check_for_dup): handle the m_start
state when transitioning the state of LHS
of dup, dup2 and dup3 call.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-dup-1.c: New testcases.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc|  4 ++--
 gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 28 +++-
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index 8bb76d72b05..c8b9930a7b6 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, const 
supernode *node,
 case DUP_1:
   if (lhs)
{
- if (is_constant_fd_p (state_arg_1))
+ if (is_constant_fd_p (state_arg_1) || state_arg_1 == m_start)
sm_ctxt->set_next_state (stmt, lhs, m_unchecked_read_write);
  else
sm_ctxt->set_next_state (stmt, lhs,
@@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, 
const supernode *node,
   file descriptor i.e the first argument.  */
   if (lhs)
{
- if (is_constant_fd_p (state_arg_1))
+ if (is_constant_fd_p (state_arg_1) || state_arg_1 == m_start)
sm_ctxt->set_next_state (stmt, lhs, m_unchecked_read_write);
  else
sm_ctxt->set_next_state (stmt, lhs,
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
index eba2570568f..ed4d6de57db 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
@@ -220,4 +220,30 @@ test_19 (const char *path, void *buf)
 close (fd);
 }
 
-}
\ No newline at end of file
+}
+
+void
+test_20 ()
+{
+int m;
+int fd = dup (m); /* { dg-warning "'dup' on possibly invalid file 
descriptor 'm'" } */
+close (fd);
+}
+
+void
+test_21 ()
+{
+int m;
+int fd = dup2 (m, 1); /* { dg-warning "'dup2' on possibly invalid file 
descriptor 'm'" } */
+close (fd);
+}
+
+void
+test_22 (int flags)
+{
+int m;
+int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on possibly invalid 
file descriptor 'm'" } */
+close (fd);
+}
+
+
-- 
2.25.1



Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-09 Thread Mir Immad via Gcc-patches
Thanks. I've added few testcases that use uninitialized ints in dup, dup2,
and dup3.

Immad.

On Tue, Aug 9, 2022 at 8:43 PM David Malcolm  wrote:

> On Tue, 2022-08-09 at 13:16 +0530, Immad Mir wrote:
> > This patch fixes the ICE caused by valid_to_unchecked_state,
> > at analyzer/sm-fd.cc by handling the m_start state in
> > check_for_dup.
> >
> > Tested lightly on x86_64.
> >
> > gcc/analyzer/ChangeLog:
> > PR analyzer/106551
> > * sm-fd.cc (check_for_dup): handle the m_start
> > state when transitioning the state of LHS
> > of dup, dup2 and dup3 call.
> >
> > Signed-off-by: Immad Mir 
> > ---
> >  gcc/analyzer/sm-fd.cc | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > index 8bb76d72b05..c8b9930a7b6 100644
> > --- a/gcc/analyzer/sm-fd.cc
> > +++ b/gcc/analyzer/sm-fd.cc
> > @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context
> > *sm_ctxt, const supernode *node,
> >  case DUP_1:
> >if (lhs)
> > {
> > - if (is_constant_fd_p (state_arg_1))
> > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > m_start)
> > sm_ctxt->set_next_state (stmt, lhs,
> > m_unchecked_read_write);
> >   else
> > sm_ctxt->set_next_state (stmt, lhs,
> > @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context
> > *sm_ctxt, const supernode *node,
> >file descriptor i.e the first argument.  */
> >if (lhs)
> > {
> > - if (is_constant_fd_p (state_arg_1))
> > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > m_start)
> > sm_ctxt->set_next_state (stmt, lhs,
> > m_unchecked_read_write);
> >   else
> > sm_ctxt->set_next_state (stmt, lhs,
>
> Thanks.  The fix looks reasonable, but please can the patch also add a
> reproducer to the test suite, covering each of the three dup/dup2/dup3
> entrypoints - presumably the one from the bug can be used/adapted.
>
> Dave
>
>


[PATCH] c++: Implement -Wself-move warning [PR81159]

2022-08-09 Thread Marek Polacek via Gcc-patches
About 5 years ago we got a request to implement -Wself-move, which
warns about useless moves like this:

  int x;
  x = std::move (x);

This patch implements that warning.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/81159

gcc/c-family/ChangeLog:

* c.opt (Wself-move): New option.

gcc/cp/ChangeLog:

* typeck.cc (maybe_warn_self_move): New.
(cp_build_modify_expr): Call maybe_warn_self_move.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wself-move.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wself-move1.C: New test.
---
 gcc/c-family/c.opt  |  4 ++
 gcc/cp/typeck.cc| 48 +-
 gcc/doc/invoke.texi | 23 ++-
 gcc/testsuite/g++.dg/warn/Wself-move1.C | 87 +
 4 files changed, 160 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44e1a60ce24..a098ae1830d 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1229,6 +1229,10 @@ Wselector
 ObjC ObjC++ Var(warn_selector) Warning
 Warn if a selector has multiple methods.
 
+Wself-move
+C++ ObjC++ Var(warn_self_move) Warning LangEnabledBy(C++ ObjC++, Wall)
+Warn when a value is moved to itself with std::move.
+
 Wsequence-point
 C ObjC C++ ObjC++ Var(warn_sequence_point) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about possible violations of sequence point rules.
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 6e4f23af982..f05913c0fac 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8897,7 +8897,51 @@ cp_build_c_cast (location_t loc, tree type, tree expr,
 
   return error_mark_node;
 }
-
+
+/* Warn when a value is moved to itself with std::move.  LHS is the target,
+   RHS may be the std::move call, and LOC is the location of the whole
+   assignment.  */
+
+static void
+maybe_warn_self_move (location_t loc, tree lhs, tree rhs)
+{
+  if (!warn_self_move)
+return;
+
+  /* C++98 doesn't know move.  */
+  if (cxx_dialect < cxx11)
+return;
+
+  if (processing_template_decl)
+return;
+
+  /* We're looking for *std::move ((T &) &arg), or
+ *std::move ((T &) (T *) r) if the argument it a reference.  */
+  if (!REFERENCE_REF_P (rhs)
+  || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
+return;
+  tree fn = TREE_OPERAND (rhs, 0);
+  if (!is_std_move_p (fn))
+return;
+  tree arg = CALL_EXPR_ARG (fn, 0);
+  if (TREE_CODE (arg) != NOP_EXPR)
+return;
+  /* Strip the (T &).  */
+  arg = TREE_OPERAND (arg, 0);
+  /* Strip the (T *) or &.  */
+  arg = TREE_OPERAND (arg, 0);
+  arg = convert_from_reference (arg);
+  /* So that we catch (i) = std::move (i);.  */
+  lhs = maybe_undo_parenthesized_ref (lhs);
+  STRIP_ANY_LOCATION_WRAPPER (lhs);
+  if (cp_tree_equal (lhs, arg))
+{
+  auto_diagnostic_group d;
+  if (warning_at (loc, OPT_Wself_move, "moving a variable to itself"))
+   inform (loc, "remove % call");
+}
+}
+
 /* For use from the C common bits.  */
 tree
 build_modify_expr (location_t location,
@@ -9101,6 +9145,8 @@ cp_build_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
 
   if (modifycode == NOP_EXPR)
{
+ maybe_warn_self_move (loc, lhs, rhs);
+
  if (c_dialect_objc ())
{
  result = objc_maybe_build_modify_expr (lhs, rhs);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f3e9429b2ca..28cf36b94c6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -264,7 +264,7 @@ in the following sections.
 -Wreorder  -Wregister @gol
 -Wstrict-null-sentinel  -Wno-subobject-linkage  -Wtemplates @gol
 -Wno-non-template-friend  -Wold-style-cast @gol
--Woverloaded-virtual  -Wno-pmf-conversions -Wsign-promo @gol
+-Woverloaded-virtual  -Wno-pmf-conversions -Wself-move -Wsign-promo @gol
 -Wsized-deallocation  -Wsuggest-final-methods @gol
 -Wsuggest-final-types  -Wsuggest-override  @gol
 -Wno-terminate  -Wuseless-cast  -Wno-vexing-parse  @gol
@@ -5841,6 +5841,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect 
Options}.
 -Wreorder   @gol
 -Wrestrict   @gol
 -Wreturn-type  @gol
+-Wself-move @r{(only for C++)}  @gol
 -Wsequence-point  @gol
 -Wsign-compare @r{(only in C++)}  @gol
 -Wsizeof-array-div @gol
@@ -6826,6 +6827,26 @@ of a declaration:
 
 This warning is enabled by @option{-Wall}.
 
+@item -Wno-self-move @r{(C++ and Objective-C++ only)}
+@opindex Wself-move
+@opindex Wno-self-move
+This warning warns when a value is moved to itself with @code{std::move}.
+Such a @code{std::move} has no effect.
+
+@smallexample
+struct T @{
+@dots{}
+@};
+void fn()
+@{
+  T t;
+  @dots{}
+  t = std::move (t);
+@}
+@end smallexample
+
+This warning is enabled by @option{-Wall}.
+
 @item -Wsequence-point
 @opindex Wsequence-point
 @opindex Wno-sequence-point
diff --git a/gcc/testsuite/g++.dg/warn/Wself-move1.C 
b/gcc/testsuite/g++.dg/warn/Wself-move1.C
new file 

Re: [PATCH] i386 testsuite: cope with --enable-default-pie

2022-08-09 Thread Fangrui Song via Gcc-patches
On Tue, Aug 9, 2022 at 7:00 AM Alexandre Oliva via Gcc-patches
 wrote:
>
> Ping?
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598276.html

This is great! And hope
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398 can be
re-considered, at least for some ports :)

> On Jul 27, 2022, Alexandre Oliva  wrote:
>
> > for  gcc/testsuite/ChangeLog
>
> >   * g++.dg/abi/anon1.C: Disable pie on ia32.
> >   * g++.dg/abi/anon4.C: Likewise.
> >   * g++.dg/cpp0x/initlist-const1.C: Likewise.
> >   * g++.dg/no-stack-protector-attr-3.C: Likewise.
> >   * g++.dg/stackprotectexplicit2.C: Likewise.
> >   * g++.dg/pr71694.C: Likewise.
> >   * gcc.dg/pr102892-1.c: Likewise.
> >   * gcc.dg/sibcall-11.c: Likewise.
> >   * gcc.dg/torture/builtin-self.c: Likewise.
> >   * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Likewise.
> >   * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Likewise.
> >   * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
> >   * gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
> >   * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> >   * gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
> >   * gcc.target/i386/avx512fp16-broadcast-1.c: Likewise.
> >   * gcc.target/i386/avx512fp16-pr101846.c: Likewise.
> >   * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> >   * gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
> >   * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> >   * gcc.target/i386/pr100865-2.c: Likewise.
> >   * gcc.target/i386/pr100865-3.c: Likewise.
> >   * gcc.target/i386/pr100865-4a.c: Likewise.
> >   * gcc.target/i386/pr100865-4b.c: Likewise.
> >   * gcc.target/i386/pr100865-5a.c: Likewise.
> >   * gcc.target/i386/pr100865-5b.c: Likewise.
> >   * gcc.target/i386/pr100865-6a.c: Likewise.
> >   * gcc.target/i386/pr100865-6b.c: Likewise.
> >   * gcc.target/i386/pr100865-6c.c: Likewise.
> >   * gcc.target/i386/pr100865-7b.c: Likewise.
> >   * gcc.target/i386/pr101796-1.c: Likewise.
> >   * gcc.target/i386/pr101846-2.c: Likewise.
> >   * gcc.target/i386/pr101989-broadcast-1.c: Likewise.
> >   * gcc.target/i386/pr102021.c: Likewise.
> >   * gcc.target/i386/pr90773-17.c: Likewise.
> >   * gcc.target/i386/pr54855-3.c: Likewise.
> >   * gcc.target/i386/pr54855-7.c: Likewise.
> >   * gcc.target/i386/pr15184-1.c: Likewise.
> >   * gcc.target/i386/pr15184-2.c: Likewise.
> >   * gcc.target/i386/pr27971.c: Likewise.
> >   * gcc.target/i386/pr70263-2.c: Likewise.
> >   * gcc.target/i386/pr78035.c: Likewise.
> >   * gcc.target/i386/pr81736-5.c: Likewise.
> >   * gcc.target/i386/pr81736-7.c: Likewise.
> >   * gcc.target/i386/pr85620-6.c: Likewise.
> >   * gcc.target/i386/pr85667-6.c: Likewise.
> >   * gcc.target/i386/pr93492-5.c: Likewise.
> >   * gcc.target/i386/pr96539.c: Likewise.
> >   PR target/81708 (%gs:my_guard)
> >   * gcc.target/i386/stack-prot-sym.c: Likewise.
> >   * g++.dg/init/static-cdtor1.C: Add alternate patterns for PIC.
> >   * gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: Extend patterns
> >   for PIC/PIE register allocation.
> >   * gcc.target/i386/pr100704-3.c: Likewise.
> >   * gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Likewise.
> >   * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: Likewise.
> >   * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Likewise.
> >   * gcc.target/i386/avx512fp16-vmovsh-1a.c: Likewise.
> >   * gcc.target/i386/interrupt-11.c: Likewise, allowing for
> >   preservation of the PIC register.
> >   * gcc.target/i386/interrupt-12.c: Likewise.
> >   * gcc.target/i386/interrupt-13.c: Likewise.
> >   * gcc.target/i386/interrupt-15.c: Likewise.
> >   * gcc.target/i386/interrupt-16.c: Likewise.
> >   * gcc.target/i386/interrupt-17.c: Likewise.
> >   * gcc.target/i386/interrupt-8.c: Likewise.
> >   * gcc.target/i386/cet-sjlj-6a.c: Combine patterns from
> >   previous change.
> >   * gcc.target/i386/cet-sjlj-6b.c: Likewise.
> >   * gcc.target/i386/pad-10.c: Accept insns in get_pc_thunk.
> >   * gcc.target/i386/pr70321.c: Likewise.
> >   * gcc.target/i386/pr81563.c: Likewise.
> >   * gcc.target/i386/pr84278.c: Likewise.
> >   * gcc.target/i386/pr90773-2.c: Likewise, plus extra loads from
> >   the GOT.
> >   * gcc.target/i386/pr90773-3.c: Likewise.
> >   * gcc.target/i386/pr94913-2.c: Accept additional PIC insns.
> >   * gcc.target/i386/stack-check-17.c: Likewise.
> >   * gcc.target/i386/stack-check-12.c: Do not require dummy stack
> >   probing obviated with PIC.
> >   * gcc.target/i386/pr95126-m32-1.c: Expect missed optimization
> >   with PIC.
> >   * gcc.target/i386/pr95126-m32-2.c: Likewise.
> >   * gcc.target/i386/pr95852-2.c: Accept different optimization
> >   with PIC

Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-09 Thread Andrew MacLeod via Gcc-patches



On 8/9/22 09:01, Richard Biener wrote:

This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.


I don't really know how this works in practice.  Aldys off this week, so 
he can comment when he returns.


The original premise was along the line of recognizing that only changes 
to a GORI import name to a block can affect the branch at the end of the 
block.  ie, if the path doesn't change any import to block A, then the 
branch at the end of block A will not change either.    Likewise, if it 
does change an import, then we look at whether the branch can be 
threaded.    Beyond that basic premise, I dont know what all it does.


I presume the unbounded def chain is for local defs within a block that 
in turn feeds the import to another block.   Im not sure why we need to 
do much with those..  again, its only the import to the defchain that 
can affect the outcome t the end of the chain.. and if it changes, then 
you need to recalculate the entire chain.. but that would be part of the 
normal path walk.  I suspect ther eis also some pruning that can be done 
there, as GORi reflects "can affect the range" not "will affect the range".


Perhaps whats going on is that all those local elements are being added 
up front to the list of interesting names?  That would certainly blow up 
the bitmaps and loops and such.


Im sure Aldy will pitch in when he returns from vacation.





The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

   if (i)
 x = 1;
   else
 x = 5;
   if (x == 1)
 ...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
path_range_query::ssa_range_in_phi fix, now re-running with.

OK?

Thanks,
Richard.

PR tree-optimization/106514
* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
Compute and unwind both m_imports and interesting on the fly during
path discovery.
(back_threader::find_paths): Compute the original m_imports
from just the SSA uses of the exit conditional.  Drop
handling single_succ_to_potentially_threadable_block.
* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
---
  gcc/gimple-range-path.cc   |  52 -
  gcc/tree-ssa-threadbackward.cc | 104 ++---
  2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 43e7526b6fc..b4376011ea8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -276,8 +276,6 @@ void
  path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
  {
tree name = gimple_phi_result (phi);
-  basic_block bb = gimple_bb (phi);
-  unsigned nargs = gimple_phi_num_args (phi);
  
if (at_entry ())

  {
@@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
// Try to fold the phi exclusively with global or cached values.
// This will get things like PHI <5(99), 6(88)>.  We do this by
// calling range_of_expr with no context.
+   

Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-09 at 21:42 +0530, Immad Mir wrote:
> This patch fixes the ICE caused by valid_to_unchecked_state,
> at analyzer/sm-fd.cc by handling the m_start state in
> check_for_dup.
> 
> Tested lightly on x86_64.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/106551
> * sm-fd.cc (check_for_dup): handle the m_start
> state when transitioning the state of LHS
> of dup, dup2 and dup3 call.
> 
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/fd-dup-1.c: New testcases.
> 
> Signed-off-by: Immad Mir 
> ---
>  gcc/analyzer/sm-fd.cc    |  4 ++--
>  gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 28
> +++-
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> index 8bb76d72b05..c8b9930a7b6 100644
> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context
> *sm_ctxt, const supernode *node,
>  case DUP_1:
>    if (lhs)
> {
> - if (is_constant_fd_p (state_arg_1))
> + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> m_start)
>     sm_ctxt->set_next_state (stmt, lhs,
> m_unchecked_read_write);
>   else
>     sm_ctxt->set_next_state (stmt, lhs,
> @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context
> *sm_ctxt, const supernode *node,
>    file descriptor i.e the first argument.  */
>    if (lhs)
> {
> - if (is_constant_fd_p (state_arg_1))
> + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> m_start)
>     sm_ctxt->set_next_state (stmt, lhs,
> m_unchecked_read_write);
>   else
>     sm_ctxt->set_next_state (stmt, lhs,
> diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> index eba2570568f..ed4d6de57db 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> @@ -220,4 +220,30 @@ test_19 (const char *path, void *buf)
>  close (fd);
>  }
>  
> -}
> \ No newline at end of file
> +}
> +
> +void
> +test_20 ()
> +{
> +    int m;
> +    int fd = dup (m); /* { dg-warning "'dup' on possibly invalid
> file descriptor 'm'" } */
> +    close (fd);
> +}
> +
> +void
> +test_21 ()
> +{
> +    int m;
> +    int fd = dup2 (m, 1); /* { dg-warning "'dup2' on possibly
> invalid file descriptor 'm'" } */
> +    close (fd);
> +}
> +
> +void
> +test_22 (int flags)
> +{
> +    int m;
> +    int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on possibly
> invalid file descriptor 'm'" } */
> +    close (fd);
> +}

Thanks for the updated patch.

The test cases looked suspicious to me - I was wondering why the
analyzer doesn't complain about the uninitialized values being passed
to the various dup functions as parameters.  So your test cases seem to
have uncovered a hidden pre-existing bug in the analyzer's
uninitialized value detection, which I've filed for myself to deal with
as PR analyzer/106573.

If you convert the "int m;" locals into an extern global, like in
comment #0 of bug 106551, does that still trigger the crash on the
unpatched sm-fd.cc?  If so, then that's greatly preferable as a
regression test, since otherwise I'll have to modify that test case
when I fix bug 106573.

Dave





Re: [PATCH] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-08-09 Thread Segher Boessenkool
Hi!

On Tue, Aug 09, 2022 at 08:51:59PM +0800, Kewen.Lin wrote:
> on 2022/8/9 18:35, Segher Boessenkool wrote:
> >> +/* As ELFv2 ABI shows, the allowable bytes past the global entry
> >> +   point are 0, 4, 8, 16, 32 and 64.  Considering there are two
> >> +   non-prefixed instructions for global entry (8 bytes), the count
> >> +   for patchable NOPs before local entry would be 2, 6 and 14.  */
> > 
> > The other option is to allow other numbers of nops, but in that case not
> > have a local entry point (so, always use the global entry point).
> 
> Good point, it's doable, but it means for the other counts of NOPs, the
> patched function has to pay the cost of TOC initialization all the time,
> IMHO it may not be what we want.

It isn't very expensive: the main benefit of the LEP is not not having
to do those two insns, but having the r2 setter earlier, allowing loads
via the TOC reg to execute earlier.

> > I don't know if that is useful for any users of this support (if there
> > even are such users :-P )
> 
> Yeah, as the discussions in PR98125, powerpc linux kernel doesn't adopt
> this feature.  :-P

Right, -mprofile-kernel is more efficient.

So maybe just say in the comment that it is possible to support those
other nop pad sizes, by not doing a LEP at all?  Instead of sasying it
cannot be done :-)

> 
> > 
> >> +if (patch_area_entry > 0)
> >> +  {
> >> +if (patch_area_entry != 2
> >> +&& patch_area_entry != 6
> >> +&& patch_area_entry != 14)
> >> +  error ("for %<-fpatchable-function-entry=%u,%u%>, patching "
> >> + "%u NOP(s) before function entry is invalid, it can "
> >> + "cause assembler error",
> > 
> > I would not say "it can [etc.]" at all.  Oh, and "NOP" (capitals) isn't
> > a thing, it is not an acronym or such ;-)
> > 
> 
> Poor at wording.  :(  Could you help to suggest some words here? 

I'll try...

"unsupported number of nops before function entry (%u)"

> >> +/* { dg-require-effective-target powerpc_elfv2 } */
> >> +/* Specify -mcpu=power9 to ensure global entry is needed.  */
> >> +/* { dg-options "-mdejagnu-cpu=power9" } */
> > 
> > Why would it be needed for p9, and not older, or newer?
> > 
> 
> It can be p8 or p9, but not p10 and later.  
> 
> It's meant to exclude pc-relative feature which can make the case not
> generate a global entry point prologue and the test point will become
> unavailable.  I thought about adding -mno-pcrel, but guessed it's safer
> to use one cpu type which doesn't support pcrel at all, since it can
> exclude all possibilities that pcrel gets re-enabled.
> 
> Do you think -mno-pcrel is more elegant and relatively safe?
> Or just update the comments to make it more meaningful?

Just use { ! powerpc_pcrel } ?  I don't think you can put that in a
dg-require-effective-target, but you can do for example
  dg-do compile { target { ! powerpc_pcrel } }
or similar.

Direct things are aleays much preferred.  There should be a comment
saying what some non-obvious restriction is for always, and it will be
simple and boring then (the code already says that pcrel is not okay,
just add a word or two "no TOC etc. with pcrel" or whatever :-)


Segher


[PATCH 1/2] analyzer: consider that realloc could shrink the buffer [PR106539]

2022-08-09 Thread Tim Lange
This patch adds the "shrinks buffer" case to the success_with_move
modelling of realloc.

2022-08-09  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106539
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Add get_copied_size function and pass the result as the size of the
new sized_region.

---
 gcc/analyzer/region-model-impl-calls.cc | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8c38e9206fa..50a19a52a21 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -737,9 +737,11 @@ region_model::impl_call_realloc (const call_details &cd)
  old_size_sval);
  const svalue *buffer_content_sval
= model->get_store_value (sized_old_reg, cd.get_ctxt ());
+ const svalue *copied_size_sval
+   = get_copied_size (old_size_sval, new_size_sval);
  const region *sized_new_reg
= model->m_mgr->get_sized_region (new_reg, NULL,
- old_size_sval);
+ copied_size_sval);
  model->set_value (sized_new_reg, buffer_content_sval,
cd.get_ctxt ());
}
@@ -774,6 +776,39 @@ region_model::impl_call_realloc (const call_details &cd)
   else
return true;
 }
+
+  private:
+/* Return the size svalue for the new region allocated by realloc.  */
+const svalue *get_copied_size (const svalue *old_size_sval,
+  const svalue *new_size_sval) const
+{
+  tree old_size_cst = old_size_sval->maybe_get_constant ();
+  tree new_size_cst = new_size_sval->maybe_get_constant ();
+
+  if (old_size_cst && new_size_cst)
+   {
+ /* Both are constants and comparable.  */
+ tree cmp = fold_binary (LT_EXPR, boolean_type_node,
+ old_size_cst, new_size_cst);
+
+ if (cmp == boolean_true_node)
+   return old_size_sval;
+ else
+   return new_size_sval;
+   }
+  else if (new_size_cst)
+   {
+ /* OLD_SIZE_SVAL is symbolic, so return that.  */
+ return old_size_sval;
+   }
+  else
+   {
+ /* NEW_SIZE_SVAL is symbolic or both are symbolic.
+Return NEW_SIZE_SVAL, because implementations of realloc
+probably only moves the buffer if the new size is larger.  */
+ return new_size_sval;
+   }
+}
   };
 
   /* Body of region_model::impl_call_realloc.  */
-- 
2.37.1



[PATCH 2/2] analyzer: out-of-bounds checker [PR106000]

2022-08-09 Thread Tim Lange
This patch adds an experimental out-of-bounds checker to the analyzer.

The checker was tested on coreutils, curl, httpd and openssh. It is mostly
accurate but does produce false-positives on yacc-generated files and
sometimes when the analyzer misses an invariant. These cases will be
documented in bugzilla.
(Regrtests still running with the latest changes, will report back later.)

2022-08-09  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106000
* analyzer.opt: Add Wanalyzer-out-of-bounds.
* region-model.cc (class out_of_bounds): Diagnostics base class
for all out-of-bounds diagnostics.
(class past_the_end): Base class derived from out_of_bounds for
the buffer_overflow and buffer_overread diagnostics.
(class buffer_overflow): Buffer overflow diagnostics.
(class buffer_overread): Buffer overread diagnostics.
(class buffer_underflow): Buffer underflow diagnostics.
(class buffer_underread): Buffer overread diagnostics.
(region_model::check_region_bounds): New function to check region
bounds for out-of-bounds accesses.
(region_model::check_region_access):
Add call to check_region_bounds.
(region_model::get_representative_tree): New function that accepts
a region instead of an svalue.
* region-model.h (class region_model):
Add region_model::check_region_bounds.
* region.cc (region::symbolic_p): New predicate. 
(offset_region::get_byte_size_sval): Only return the remaining
byte size on offset_regions.
* region.h: Add region::symbolic_p.
* store.cc (byte_range::intersects_p):
Add new function equivalent to bit_range::intersects_p.
(byte_range::exceeds_p): New function.
(byte_range::falls_short_of_p): New function.
* store.h (struct byte_range): Add byte_range::intersects_p,
byte_range::exceeds_p and byte_range::falls_short_of_p.

gcc/ChangeLog:

PR analyzer/106000
* doc/invoke.texi: Add Wanalyzer-out-of-bounds.

gcc/testsuite/ChangeLog:

PR analyzer/106000
* gcc.dg/analyzer/allocation-size-3.c:
Disable out-of-bounds warning.
* gcc.dg/analyzer/memcpy-2.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr101962.c: Add dg-warning.
* gcc.dg/analyzer/pr97029.c:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/test-setjmp.h:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/zlib-3.c: Add dg-bogus.
* gcc.dg/analyzer/out-of-bounds-1.c: New test.
* gcc.dg/analyzer/out-of-bounds-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-container_of.c: New test.
* gcc.dg/analyzer/out-of-bounds-coreutils.c: New test.
* gcc.dg/analyzer/out-of-bounds-curl.c: New test.

---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/region-model.cc  | 410 ++
 gcc/analyzer/region-model.h   |   3 +
 gcc/analyzer/region.cc|  32 ++
 gcc/analyzer/region.h |   4 +
 gcc/analyzer/store.cc |  67 +++
 gcc/analyzer/store.h  |   9 +
 gcc/doc/invoke.texi   |  12 +
 .../gcc.dg/analyzer/allocation-size-3.c   |   2 +
 gcc/testsuite/gcc.dg/analyzer/memcpy-2.c  |   2 +-
 .../gcc.dg/analyzer/out-of-bounds-1.c | 119 +
 .../gcc.dg/analyzer/out-of-bounds-2.c |  83 
 .../gcc.dg/analyzer/out-of-bounds-3.c |  91 
 .../analyzer/out-of-bounds-container_of.c |  51 +++
 .../gcc.dg/analyzer/out-of-bounds-coreutils.c |  29 ++
 .../gcc.dg/analyzer/out-of-bounds-curl.c  |  41 ++
 gcc/testsuite/gcc.dg/analyzer/pr101962.c  |   5 +-
 gcc/testsuite/gcc.dg/analyzer/pr97029.c   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/test-setjmp.h   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/zlib-3.c|   4 +-
 20 files changed, 970 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-container_of.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-coreutils.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-curl.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 5021376b6fb..8e73af60ceb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -158,6 +158,10 @@ Wanalyzer-tainted-size
 Common Var(warn_analyzer_tainted_size) Init(1) Warning
 Warn about code paths in which an unsanitized value is used as a size.
 
+Wanalyzer-out-of-bounds
+Common Var(warn_analyzer_out_of_bounds) Init(1) Wa

Re: [PATCH v2, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-09 Thread Segher Boessenkool
On Tue, Aug 09, 2022 at 11:14:16AM +0800, Kewen.Lin wrote:
> on 2022/8/8 14:04, HAO CHEN GUI wrote:
> > +/* { dg-do run { target { has_arch_ppc64 } } } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
> > +/* { dg-require-effective-target int128 } */
> > +/* { dg-require-effective-target p9modulo_hw } */
> > +/* { dg-final { scan-assembler-times {\mmaddld\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {\mmaddhd\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mmaddhdu\M} 1 } } */
> > +
> 
> Maybe it's good to split this case into two, one for compiling and the other 
> for running.
> Since the generated asm is a test point here, with one separated case for 
> compiling, we
> can still have that part of test coverage on hosts which are unable to run 
> this case.
> You can move functions multiply_add and multiply_addu into one common header 
> file, then
> include it in both source files.

Yeah, good point.  You cannot make dg-do do different things on
different targets.  Fortunatelt just duplicating this test and then
removing the things not relevant to run resp. compile testing makes
things even more clear :-)

> Nit: better to add one explicit "return 0;" to avoid possible warning.

This is in main(), the C standard requires this to work without return
(and it is common).  But, before C99 the implicit return value from
main() was undefined, so yes, it could warn then.  Does it?


Segher


Re: [PATCH v2, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-09 Thread Segher Boessenkool
Hi!

On Mon, Aug 08, 2022 at 02:04:07PM +0800, HAO CHEN GUI wrote:
>   This patch adds an expand and several insns for multiply-add with three
> 64bit operands.

Also for maddld for 32-bit operands.

>"maddld %0,%1,%2,%3"
>[(set_attr "type" "mul")])

I suppose attr "size" isn't relevant for any of the cpus that implement
these instructions?

Okay for trunk.  Thanks!

(The testcase improvements can be done later).


Segher


Re: [PATCH 1/2] analyzer: consider that realloc could shrink the buffer [PR106539]

2022-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote:
> This patch adds the "shrinks buffer" case to the success_with_move
> modelling of realloc.

Hi Tim, thanks for the patch.

> 
> 2022-08-09  Tim Lange  
> 
> gcc/analyzer/ChangeLog:
> 
> PR analyzer/106539
> * region-model-impl-calls.cc
> (region_model::impl_call_realloc):
> Add get_copied_size function and pass the result as the size
> of the
> new sized_region.
> 
> ---
>  gcc/analyzer/region-model-impl-calls.cc | 37
> -
>  1 file changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/analyzer/region-model-impl-calls.cc
> b/gcc/analyzer/region-model-impl-calls.cc
> index 8c38e9206fa..50a19a52a21 100644
> --- a/gcc/analyzer/region-model-impl-calls.cc
> +++ b/gcc/analyzer/region-model-impl-calls.cc
> @@ -737,9 +737,11 @@ region_model::impl_call_realloc (const
> call_details &cd)
>   old_size_sval);
>   const svalue *buffer_content_sval
> = model->get_store_value (sized_old_reg, cd.get_ctxt
> ());
> + const svalue *copied_size_sval
> +   = get_copied_size (old_size_sval, new_size_sval);
>   const region *sized_new_reg
> = model->m_mgr->get_sized_region (new_reg, NULL,
> - old_size_sval);
> + copied_size_sval);

I think that we need to use the same copied size svalue for both
sized_old_reg and sized_new_reg, so that we're using a consistent size
when getting buffer_content_sval.  (I admit my handling of symbolic
sizes of svalues is a bit sloppy, but I'd prefer not to add extra
inconsistencies here)

So the copied_size_sval determination needs to happen before getting
sized_old_reg.

Also, I think renaming sized_{old,new}_reg to copied_{old,new}_reg
might make things clearer (we're copying from a leading subset of the
old region to a leading subset of the new region).


>   model->set_value (sized_new_reg, buffer_content_sval,
> cd.get_ctxt ());
>     }
> @@ -774,6 +776,39 @@ region_model::impl_call_realloc (const
> call_details &cd)
>    else
> return true;
>  }
> +
> +  private:
> +    /* Return the size svalue for the new region allocated by
> realloc.  */

This comment is misleading - isn't it the size of the existing data to
be copied, i.e. the lesser of the old and new sizes? (allowing for the
fact that one or both could be symbolic, of course)


Please also add a simple explicit test case for the shrinking case
(IIRC from our offlist discussion there's already one that happened to
cover shrinking, but I think that testcase was doing other things too).

Does the patch fix the missing leak warning for the test case in
PR106539?  If so, please have the patch add that to the test suite.


Thanks again
Dave



> +    const svalue *get_copied_size (const svalue *old_size_sval,
> +  const svalue *new_size_sval) const
> +    {
> +  tree old_size_cst = old_size_sval->maybe_get_constant ();
> +  tree new_size_cst = new_size_sval->maybe_get_constant ();
> +
> +  if (old_size_cst && new_size_cst)
> +   {
> + /* Both are constants and comparable.  */
> + tree cmp = fold_binary (LT_EXPR, boolean_type_node,
> + old_size_cst, new_size_cst);
> +
> + if (cmp == boolean_true_node)
> +   return old_size_sval;
> + else
> +   return new_size_sval;
> +   }
> +  else if (new_size_cst)
> +   {
> + /* OLD_SIZE_SVAL is symbolic, so return that.  */
> + return old_size_sval;
> +   }
> +  else
> +   {
> + /* NEW_SIZE_SVAL is symbolic or both are symbolic.
> +    Return NEW_SIZE_SVAL, because implementations of realloc
> +    probably only moves the buffer if the new size is
> larger.  */
> + return new_size_sval;
> +   }
> +    }
>    };
>  
>    /* Body of region_model::impl_call_realloc.  */




Re: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-09 Thread Segher Boessenkool
Hi!

On Tue, Aug 09, 2022 at 11:01:05AM +0800, Kewen.Lin wrote:
> on 2022/8/8 11:42, Xionghu Luo wrote:
> > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> 
> Sorry, no -m32 for LE testing.

You can use -m32 on powerpc64le-*, but the default configuration
disallows it.  There also is powerpcle-*, which in the distant past
actually was used (string insns (like lswi) and multiple insns (like
lmw) do not work, and unaligned accesses are more problematic as well,
but :-) )

It isn't something we support with ELFv2 at all, indeed.

> I have some concern on those changed "altivec_*_direct", IMHO the suffix
> "_direct" is normally to indicate the define_insn is mapped to the
> corresponding hw insn directly.

Exactly.  Let's please keep this intact.

> With this change, for example,
> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
> misleading.  Maybe we can add the corresponding _direct_le and _direct_be
> versions, both are mapped into the same insn but have different RTL
> patterns.

If that is the best we can do, that is the best we can do.  It would be
lovely if there was something nicer we can do though :-)


Segher


Re: [PATCH 2/2] analyzer: out-of-bounds checker [PR106000]

2022-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote:
> This patch adds an experimental out-of-bounds checker to the
> analyzer.
> 
> The checker was tested on coreutils, curl, httpd and openssh. It is
> mostly
> accurate but does produce false-positives on yacc-generated files and
> sometimes when the analyzer misses an invariant. These cases will be
> documented in bugzilla.
> (Regrtests still running with the latest changes, will report back
> later.)

Hi Tim, thanks for the patch, and for all the testing you've done on
it.

We've already had several rounds of review of this off-list, and this
patch looks very close to ready.

Some nits below...

> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 5021376b6fb..8e73af60ceb 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -158,6 +158,10 @@ Wanalyzer-tainted-size
>  Common Var(warn_analyzer_tainted_size) Init(1) Warning
>  Warn about code paths in which an unsanitized value is used as a
> size.
>  
> +Wanalyzer-out-of-bounds
> +Common Var(warn_analyzer_out_of_bounds) Init(1) Warning
> +Warn about code paths in which a write or read to a buffer is out-
> of-bounds.
> +

Please keep the list alphabetized; I think this needs to be between
  Wanalyzer-mismatching-deallocation 
and 
  Wanalyzer-possible-null-argument

>  Wanalyzer-use-after-free
>  Common Var(warn_analyzer_use_after_free) Init(1) Warning
>  Warn about code paths in which a freed value is used.
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index f7df2fca245..2f9382ed96c 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -1268,6 +1268,402 @@ region_model::on_stmt_pre (const gimple
> *stmt,
>  }
>  }
>  
> +/* Abstract base class for all out-of-bounds warnings.  */
> +
> +class out_of_bounds : public
> pending_diagnostic_subclass
> +{
> +public:
> +  out_of_bounds (const region *reg, tree diag_arg, byte_range range)
> +  : m_reg (reg), m_diag_arg (diag_arg), m_range (range)
> +  {}
> +
> +  const char *get_kind () const final override
> +  {
> +    return "out_of_bounds_diagnostic";
> +  }
> +
> +  bool operator== (const out_of_bounds &other) const
> +  {
> +    return m_reg == other.m_reg
> +  && m_range == other.m_range
> +  && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg);
> +  }
> +
> +  int get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_out_of_bounds;
> +  }
> +
> +  void mark_interesting_stuff (interesting_t *interest) final
> override
> +  {
> +    interest->add_region_creation (m_reg);
> +  }
> +
> +protected:
> +  const region *m_reg;
> +  tree m_diag_arg;
> +  byte_range m_range;

Please add a comment clarifying what the meaning of m_range is here. 
Is it
(a) the range of all bytes that are accessed,
(b) the range of bytes that are accessed out-of-bounds,
(c) etc?

>From my reading of the patch I think it's (b).


> +};
> +
> +/* Abstract subclass to complaing about out-of-bounds
> +   past the end of the buffer.  */
> +
> +class past_the_end : public out_of_bounds
> +{
> +public:
> +  past_the_end (const region *reg, tree diag_arg, byte_range range,
> +   tree byte_bound)
> +  : out_of_bounds (reg, diag_arg, range), m_byte_bound (byte_bound)
> +  {}
> +
> +  bool operator== (const past_the_end &other) const
> +  {
> +    return m_reg == other.m_reg
> +  && m_range == other.m_range
> +  && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg)

Is it possible to call
  out_of_bounds::operator== 
for the first three fields, rather than a copy-and-paste of the logic?

> +  && pending_diagnostic::same_tree_p (m_byte_bound,
> +  other.m_byte_bound);
> +  }
> +
> +  label_text
> +  describe_region_creation_event (const evdesc::region_creation &ev)
> final
> +  override
> +  {
> +    if (m_byte_bound && TREE_CODE (m_byte_bound) == INTEGER_CST)
> +  return ev.formatted_print ("capacity is %E bytes",
> m_byte_bound);
> +
> +    return label_text ();
> +  }
> +
> +protected:
> +  tree m_byte_bound;
> +};

[...snip the concrete subclasses...]

We went through several rounds of review off-list, and I have lots of
ideas for wording tweaks to the patch, but rather than me be a
"backseat driver" (or bikeshedding), I think that that aspect of the
patch is good enough as-is, and I'll make the wording changes myself
once the patch is in trunk.


[...snip...]

> +
> +    if (warned)
> +  {
> +   char num_bytes_past_buf[WIDE_INT_PRINT_BUFFER_SIZE];
> +   print_dec (m_range.m_size_in_bytes, num_bytes_past_buf,
> UNSIGNED);

I think we can use %wu for this, but I can fix this up in a followup.


[...snip...]

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index fa23fbe..5ab834af780 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -459,6 +459,7 @@ Objective-C and Objective-C++ 

Re: [PATCH] libgccjit.h: Make the macro definition for testing gcc_jit_context_new_bitcast correctly available.

2022-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-09 at 11:39 -0400, David Malcolm wrote:
> On Sat, 2022-07-30 at 19:18 +0530, Vibhav Pant wrote:
> > I don't have push rights to the repo, so this would need to be
> > applied manually.
> 
> I've gone ahead and pushed your fix to trunk (for GCC 13) as r13-
> 2004-
> g9385cd9c74cf66.
> 
> I plan to also push it to the gcc 12 branch shortly (for gcc 12.2)

I've now done this (as r12-8674-g92f2582f3ec7b8).

Thanks again
Dave



[committed] analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on special-cased functions [PR106573]

2022-08-09 Thread David Malcolm via Gcc-patches
We were missing checks for uninitialized params on calls to functions
that the analyzer has hardcoded knowledge of - both for those that are
handled just by state machines, and for those that are handled in
region-model-impl-calls.cc (for those arguments for which the svalue
wasn't accessed in handling the call).

Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-2007-gbddd8d86e3036e.

gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Ensure that we call
get_arg_svalue on all arguments.

gcc/testsuite/ChangeLog:
PR analyzer/106573
* gcc.dg/analyzer/error-uninit.c: New test.
* gcc.dg/analyzer/fd-uninit-1.c: New test.
* gcc.dg/analyzer/file-uninit-1.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  |  8 +++
 gcc/testsuite/gcc.dg/analyzer/error-uninit.c  | 29 +++
 gcc/testsuite/gcc.dg/analyzer/fd-uninit-1.c   | 21 
 gcc/testsuite/gcc.dg/analyzer/file-uninit-1.c | 52 +++
 4 files changed, 110 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-uninit.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-uninit-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/file-uninit-1.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index a140f4d5088..8393c7ddbf7 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1355,6 +1355,14 @@ region_model::on_call_pre (const gcall *call, 
region_model_context *ctxt,
   && gimple_call_internal_fn (call) == IFN_DEFERRED_INIT)
 return false;
 
+  /* Get svalues for all of the arguments at the callsite, to ensure that we
+ complain about any uninitialized arguments.  This might lead to
+ duplicates if any of the handling below also looks up the svalues,
+ but the deduplication code should deal with that.  */
+  if (ctxt)
+for (unsigned arg_idx = 0; arg_idx < cd.num_args (); arg_idx++)
+  cd.get_arg_svalue (arg_idx);
+
   /* Some of the cases below update the lhs of the call based on the
  return value, but not all.  Provide a default value, which may
  get overwritten below.  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/error-uninit.c 
b/gcc/testsuite/gcc.dg/analyzer/error-uninit.c
new file mode 100644
index 000..8d52a177b11
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/error-uninit.c
@@ -0,0 +1,29 @@
+/* Verify that we check for uninitialized values passed to functions
+   that we have special-cased region-model handling for.  */
+
+extern void error (int __status, int __errnum, const char *__format, ...)
+ __attribute__ ((__format__ (__printf__, 3, 4)));
+
+void test_uninit_status (int arg)
+{
+  int st;
+  error (st, 42, "test: %s", arg); /* { dg-warning "use of uninitialized value 
'st'" } */
+}
+
+void test_uninit_errnum (int st)
+{
+  int num;
+  error (st, num, "test"); /* { dg-warning "use of uninitialized value 'num'" 
} */
+}
+
+void test_uninit_fmt (int st)
+{
+  const char *fmt;
+  error (st, 42, fmt); /* { dg-warning "use of uninitialized value 'fmt'" } */
+}
+
+void test_uninit_vargs (int st)
+{
+  int arg;
+  error (st, 42, "test: %s", arg); /* { dg-warning "use of uninitialized value 
'arg'" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-uninit-1.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-uninit-1.c
new file mode 100644
index 000..b5b189ece98
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-uninit-1.c
@@ -0,0 +1,21 @@
+/* Verify that we check for uninitialized values passed to functions
+   that we have special-cased state-machine handling for.  */
+
+int dup (int old_fd);
+int not_dup (int old_fd);
+
+int
+test_1 ()
+{
+  int m;
+  return dup (m); /* { dg-warning "use of uninitialized value 'm'" "uninit" } 
*/
+  /* { dg-bogus "'dup' on possibly invalid file descriptor 'm'" "invalid fd 
false +ve" { xfail *-*-* } .-1 } */
+  /* XFAIL: probably covered by fix for PR analyzer/106551.  */
+}
+
+int
+test_2 ()
+{
+  int m;
+  return not_dup (m); /* { dg-warning "use of uninitialized value 'm'" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/file-uninit-1.c 
b/gcc/testsuite/gcc.dg/analyzer/file-uninit-1.c
new file mode 100644
index 000..0f8ac5442b1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/file-uninit-1.c
@@ -0,0 +1,52 @@
+/* Verify that we check for uninitialized values passed to functions
+   that we have special-cased state-machine handling for.  */
+
+typedef struct FILE   FILE;
+
+FILE* fopen (const char*, const char*);
+int   fclose (FILE*);
+int fseek (FILE *, long, int);
+
+FILE *
+test_fopen_uninit_path (void)
+{
+  const char *path;
+  FILE *f = fopen (path, "r"); /* { dg-warning "use of uninitialized value 
'path'" } */
+  return f;
+}
+
+FILE *
+test_fopen_uninit_mode (const char *path)
+{
+  const char *mode;
+  FILE *f = fopen (path, mode); /* { dg-warning "use of uninitialized 

Re: [PATCH v2, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-09 Thread HAO CHEN GUI via Gcc-patches
Hi Segher,
  Thanks for your comments. I checked the cost table. For P9 and P10, the
cost of all mul* insn is the same, not relevant to the size of operand.

  I will split the test case to one compiling and one runnable case.

Thanks.
Gui Haochen

On 10/8/2022 上午 5:43, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Aug 08, 2022 at 02:04:07PM +0800, HAO CHEN GUI wrote:
>>   This patch adds an expand and several insns for multiply-add with three
>> 64bit operands.
> 
> Also for maddld for 32-bit operands.
> 
>>"maddld %0,%1,%2,%3"
>>[(set_attr "type" "mul")])
> 
> I suppose attr "size" isn't relevant for any of the cpus that implement
> these instructions?
> 
> Okay for trunk.  Thanks!
> 
> (The testcase improvements can be done later).
> 
> 
> Segher


Re: [PATCH] i386 testsuite: cope with --enable-default-pie

2022-08-09 Thread Alexandre Oliva via Gcc-patches
On Aug  9, 2022, Alexandre Oliva  wrote:

> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598276.html

Oops, sorry, I linked to the wrong patch.  This is the one I meant to ping:

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598874.html

> On Jul 27, 2022, Alexandre Oliva  wrote:

>> for  gcc/testsuite/ChangeLog

>> * g++.dg/abi/anon1.C: Disable pie on ia32.
>> * g++.dg/abi/anon4.C: Likewise.
>> * g++.dg/cpp0x/initlist-const1.C: Likewise.
>> * g++.dg/no-stack-protector-attr-3.C: Likewise.
>> * g++.dg/stackprotectexplicit2.C: Likewise.
>> * g++.dg/pr71694.C: Likewise.
>> * gcc.dg/pr102892-1.c: Likewise.
>> * gcc.dg/sibcall-11.c: Likewise.
>> * gcc.dg/torture/builtin-self.c: Likewise.
>> * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Likewise.
>> * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Likewise.
>> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
>> * gcc.target/i386/avx512f-broadcast-pr87767-3.c: Likewise.
>> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
>> * gcc.target/i386/avx512f-broadcast-pr87767-7.c: Likewise.
>> * gcc.target/i386/avx512fp16-broadcast-1.c: Likewise.
>> * gcc.target/i386/avx512fp16-pr101846.c: Likewise.
>> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
>> * gcc.target/i386/avx512vl-broadcast-pr87767-3.c: Likewise.
>> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
>> * gcc.target/i386/pr100865-2.c: Likewise.
>> * gcc.target/i386/pr100865-3.c: Likewise.
>> * gcc.target/i386/pr100865-4a.c: Likewise.
>> * gcc.target/i386/pr100865-4b.c: Likewise.
>> * gcc.target/i386/pr100865-5a.c: Likewise.
>> * gcc.target/i386/pr100865-5b.c: Likewise.
>> * gcc.target/i386/pr100865-6a.c: Likewise.
>> * gcc.target/i386/pr100865-6b.c: Likewise.
>> * gcc.target/i386/pr100865-6c.c: Likewise.
>> * gcc.target/i386/pr100865-7b.c: Likewise.
>> * gcc.target/i386/pr101796-1.c: Likewise.
>> * gcc.target/i386/pr101846-2.c: Likewise.
>> * gcc.target/i386/pr101989-broadcast-1.c: Likewise.
>> * gcc.target/i386/pr102021.c: Likewise.
>> * gcc.target/i386/pr90773-17.c: Likewise.
>> * gcc.target/i386/pr54855-3.c: Likewise.
>> * gcc.target/i386/pr54855-7.c: Likewise.
>> * gcc.target/i386/pr15184-1.c: Likewise.
>> * gcc.target/i386/pr15184-2.c: Likewise.
>> * gcc.target/i386/pr27971.c: Likewise.
>> * gcc.target/i386/pr70263-2.c: Likewise.
>> * gcc.target/i386/pr78035.c: Likewise.
>> * gcc.target/i386/pr81736-5.c: Likewise.
>> * gcc.target/i386/pr81736-7.c: Likewise.
>> * gcc.target/i386/pr85620-6.c: Likewise.
>> * gcc.target/i386/pr85667-6.c: Likewise.
>> * gcc.target/i386/pr93492-5.c: Likewise.
>> * gcc.target/i386/pr96539.c: Likewise.
>> PR target/81708 (%gs:my_guard)
>> * gcc.target/i386/stack-prot-sym.c: Likewise.
>> * g++.dg/init/static-cdtor1.C: Add alternate patterns for PIC.
>> * gcc.target/i386/avx512fp16-vcvtsh2si-1a.c: Extend patterns
>> for PIC/PIE register allocation.
>> * gcc.target/i386/pr100704-3.c: Likewise.
>> * gcc.target/i386/avx512fp16-vcvtsh2usi-1a.c: Likewise.
>> * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: Likewise.
>> * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Likewise.
>> * gcc.target/i386/avx512fp16-vmovsh-1a.c: Likewise.
>> * gcc.target/i386/interrupt-11.c: Likewise, allowing for
>> preservation of the PIC register.
>> * gcc.target/i386/interrupt-12.c: Likewise.
>> * gcc.target/i386/interrupt-13.c: Likewise.
>> * gcc.target/i386/interrupt-15.c: Likewise.
>> * gcc.target/i386/interrupt-16.c: Likewise.
>> * gcc.target/i386/interrupt-17.c: Likewise.
>> * gcc.target/i386/interrupt-8.c: Likewise.
>> * gcc.target/i386/cet-sjlj-6a.c: Combine patterns from
>> previous change.
>> * gcc.target/i386/cet-sjlj-6b.c: Likewise.
>> * gcc.target/i386/pad-10.c: Accept insns in get_pc_thunk.
>> * gcc.target/i386/pr70321.c: Likewise.
>> * gcc.target/i386/pr81563.c: Likewise.
>> * gcc.target/i386/pr84278.c: Likewise.
>> * gcc.target/i386/pr90773-2.c: Likewise, plus extra loads from
>> the GOT.
>> * gcc.target/i386/pr90773-3.c: Likewise.
>> * gcc.target/i386/pr94913-2.c: Accept additional PIC insns.
>> * gcc.target/i386/stack-check-17.c: Likewise.
>> * gcc.target/i386/stack-check-12.c: Do not require dummy stack
>> probing obviated with PIC.
>> * gcc.target/i386/pr95126-m32-1.c: Expect missed optimization
>> with PIC.
>> * gcc.target/i386/pr95126-m32-2.c: Likewise.
>> * gcc.target/i386/pr95852-2.c: Accept different optimization
>> with PIC.
>> * gcc.target/i386/pr95852-4.c: Likewise.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[Committed] PR other/106575: Use "signed char" in new fold-eqandshift-4.c

2022-08-09 Thread Roger Sayle

My recently added testcase gcc.dg/fold-eqandshift-4.c, incorrectly assumed
that "char" was "signed char", and hence fails on powerpc64 where this
isn't the case.  Fixed by making "signed char" explicit where needed in
this test.  Committed as obvious.


2022-08-10  Roger Sayle  

gcc/testsuite/ChangeLog
* gcc.dg/fold-eqandshift-4.c: Use "signed char" explicitly.


Apologies for the inconvenience.
Roger
--

diff --git a/gcc/testsuite/gcc.dg/fold-eqandshift-4.c 
b/gcc/testsuite/gcc.dg/fold-eqandshift-4.c
index 42d5190703e..fbba438556e 100644
--- a/gcc/testsuite/gcc.dg/fold-eqandshift-4.c
+++ b/gcc/testsuite/gcc.dg/fold-eqandshift-4.c
@@ -1,14 +1,14 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
-int sr30eq00(char x) { return ((x >> 4) & 0x30) == 0; }
-int sr30ne00(char x) { return ((x >> 4) & 0x30) != 0; }
-int sr30eq20(char z) { return ((z >> 4) & 0x30) == 0x20; }
-int sr30ne20(char z) { return ((z >> 4) & 0x30) != 0x20; }
-int sr30eq30(char x) { return ((x >> 4) & 0x30) == 0x30; }
-int sr30ne30(char x) { return ((x >> 4) & 0x30) != 0x30; }
-int sr33eq33(char x) { return ((x >> 4) & 0x33) == 0x33; }
-int sr33ne33(char x) { return ((x >> 4) & 0x33) != 0x33; }
+int sr30eq00(signed char x) { return ((x >> 4) & 0x30) == 0; }
+int sr30ne00(signed char x) { return ((x >> 4) & 0x30) != 0; }
+int sr30eq20(signed char z) { return ((z >> 4) & 0x30) == 0x20; }
+int sr30ne20(signed char z) { return ((z >> 4) & 0x30) != 0x20; }
+int sr30eq30(signed char x) { return ((x >> 4) & 0x30) == 0x30; }
+int sr30ne30(signed char x) { return ((x >> 4) & 0x30) != 0x30; }
+int sr33eq33(signed char x) { return ((x >> 4) & 0x33) == 0x33; }
+int sr33ne33(signed char x) { return ((x >> 4) & 0x33) != 0x33; }
 
 int ur30eq00(unsigned char z) { return ((z >> 4) & 0x30) == 0; }
 int ur30ne00(unsigned char z) { return ((z >> 4) & 0x30) != 0; }


Re: [PATCH 0/5] IEEE 128-bit built-in overload support.

2022-08-09 Thread Michael Meissner via Gcc-patches
On Fri, Aug 05, 2022 at 01:19:05PM -0500, Segher Boessenkool wrote:
> On Thu, Jul 28, 2022 at 12:43:49AM -0400, Michael Meissner wrote:
> > These patches lay the foundation for a set of follow-on patches that will
> > change the internal handling of 128-bit floating point types in GCC.  In the
> > future patches, I hope to change the compiler to always use KFmode for the
> > explicit _Float128/__float128 types, to always use TFmode for the long 
> > double
> > type, no matter which 128-bit floating point type is used, and IFmode for 
> > the
> > explicit __ibm128 type.
> 
> Making TFmode different from KFmode and IFmode is not an improvement.
> NAK.
> 
> 
> Segher

First of all, it already IS different from KFmode and IFmode, as we've talked
about.  I'm trying to clean this mess up.  Having explicit __float128's being
converted to TFmode if -mabi=ieeelongdouble is just as bad, and it means that
_Float128 and __float128 are not the same type.

What I'm trying to eliminate is the code in rs6000-builtin.cc that overrides
the builtin ops (i.e. it does the equivalent of an overloaded function):

  /* TODO: The following commentary and code is inherited from the original
 builtin processing code.  The commentary is a bit confusing, with the
 intent being that KFmode is always IEEE-128, IFmode is always IBM
 double-double, and TFmode is the current long double.  The code is
 confusing in that it converts from KFmode to TFmode pattern names,
 when the other direction is more intuitive.  Try to address this.  */

  /* We have two different modes (KFmode, TFmode) that are the IEEE
 128-bit floating point type, depending on whether long double is the
 IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
 It is simpler if we only define one variant of the built-in function,
 and switch the code when defining it, rather than defining two built-
 ins and using the overload table in rs6000-c.cc to switch between the
 two.  If we don't have the proper assembler, don't do this switch
 because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
  if (FLOAT128_IEEE_P (TFmode))
switch (icode)
  {
  case CODE_FOR_sqrtkf2_odd:
icode = CODE_FOR_sqrttf2_odd;
break;
  case CODE_FOR_trunckfdf2_odd:
icode = CODE_FOR_trunctfdf2_odd;
break;
  case CODE_FOR_addkf3_odd:
icode = CODE_FOR_addtf3_odd;
break;
  case CODE_FOR_subkf3_odd:
icode = CODE_FOR_subtf3_odd;
break;
  case CODE_FOR_mulkf3_odd:
icode = CODE_FOR_multf3_odd;
break;
  case CODE_FOR_divkf3_odd:
icode = CODE_FOR_divtf3_odd;
break;
  case CODE_FOR_fmakf4_odd:
icode = CODE_FOR_fmatf4_odd;
break;
  case CODE_FOR_xsxexpqp_kf:
icode = CODE_FOR_xsxexpqp_tf;
break;
  case CODE_FOR_xsxsigqp_kf:
icode = CODE_FOR_xsxsigqp_tf;
break;
  case CODE_FOR_xststdcnegqp_kf:
icode = CODE_FOR_xststdcnegqp_tf;
break;
  case CODE_FOR_xsiexpqp_kf:
icode = CODE_FOR_xsiexpqp_tf;
break;
  case CODE_FOR_xsiexpqpf_kf:
icode = CODE_FOR_xsiexpqpf_tf;
break;
  case CODE_FOR_xststdcqp_kf:
icode = CODE_FOR_xststdcqp_tf;
break;
  case CODE_FOR_xscmpexpqp_eq_kf:
icode = CODE_FOR_xscmpexpqp_eq_tf;
break;
  case CODE_FOR_xscmpexpqp_lt_kf:
icode = CODE_FOR_xscmpexpqp_lt_tf;
break;
  case CODE_FOR_xscmpexpqp_gt_kf:
icode = CODE_FOR_xscmpexpqp_gt_tf;
break;
  case CODE_FOR_xscmpexpqp_unordered_kf:
icode = CODE_FOR_xscmpexpqp_unordered_tf;
break;
  default:
break;
  }

// ... other code

  if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
{
  if (fcode == RS6000_BIF_PACK_IF)
{
  icode = CODE_FOR_packtf;
  fcode = RS6000_BIF_PACK_TF;
  uns_fcode = (size_t) fcode;
}
  else if (fcode == RS6000_BIF_UNPACK_IF)
{
  icode = CODE_FOR_unpacktf;
  fcode = RS6000_BIF_UNPACK_TF;
  uns_fcode = (size_t) fcode;
}
}

In particular, without overloaded built-ins, we likely have something similar
to the above to cover all of the built-ins for both modes.  I tend to think
overloading is more natural in this case.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-09 Thread Xionghu Luo via Gcc-patches




On 2022/8/9 11:01, Kewen.Lin wrote:

Hi Xionghu,

Thanks for the fix.

on 2022/8/8 11:42, Xionghu Luo wrote:

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}


Sorry, no -m32 for LE testing.  I noticed the attachement in that PR didn't
include the test case (though the changelog has it), so I re-tested it
again, nothing changed.  :)


Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
restore to the UNSPEC implementation?



I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly.  With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading.  Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns.  Looking forward to Segher's and David's suggestions.



Thanks!  Do you mean same RTL patterns with different hw insn?
Updated as:

v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux(Thanks to Kewen), OK for master?  Or should we revert r12-4496 to
restore to the UNSPEC implementation?

gcc/ChangeLog:
PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb): Emit same native
RTL for BE and LE.
(altivec_vmrghh): Likewise.
(altivec_vmrghw): Likewise.
(*altivec_vmrghsf): Adjust.
(altivec_vmrglb): Likewise.
(altivec_vmrglh): Likewise.
(altivec_vmrglw): Likewise.
(*altivec_vmrglsf): Adjust.
(altivec_vmrghb_direct): Emit different ASM for BE and LE.
(altivec_vmrghh_direct): Likewise.
(altivec_vmrghw_direct_): Likewise.
(altivec_vmrglb_direct): Likewise.
(altivec_vmrglh_direct): Likewise.
(altivec_vmrglw_direct_): Likewise.
(vec_widen_smult_hi_v16qi): Adjust.
(vec_widen_smult_lo_v16qi): Adjust.
(vec_widen_umult_hi_v16qi): Adjust.
(vec_widen_umult_lo_v16qi): Adjust.
(vec_widen_smult_hi_v8hi): Adjust.
(vec_widen_smult_lo_v8hi): Adjust.
(vec_widen_umult_hi_v8hi): Adjust.
(vec_widen_umult_lo_v8hi): Adjust.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
native RTL for BE and LE.
* config/rs6000/vsx.md (vsx_xxmrghw_): Likewise.
(vsx_xxmrglw_): Likewise.

gcc/testsuite/ChangeLog:
PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
 gcc/config/rs6000/altivec.md| 223 ++--
 gcc/config/rs6000/rs6000.cc |  36 ++--
 gcc/config/rs6000/vsx.md|  26 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++
 4 files changed, 303 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

d

Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-09 Thread Richard Biener via Gcc-patches
On Tue, 9 Aug 2022, Andrew MacLeod wrote:

> 
> On 8/9/22 09:01, Richard Biener wrote:
> > This revisits how we compute imports later used for the ranger path
> > query during backwards threading.  The compute_imports function
> > of the path solver ends up pulling the SSA def chain of regular
> > stmts without limit and since it starts with just the gori imports
> > of the path exit it misses some interesting names to translate
> > during path discovery.  In fact with a still empty path this
> > compute_imports function looks like not the correct tool.
> 
> I don't really know how this works in practice.  Aldys off this week, so he
> can comment when he returns.
> 
> The original premise was along the line of recognizing that only changes to a
> GORI import name to a block can affect the branch at the end of the block. 
> ie, if the path doesn't change any import to block A, then the branch at the
> end of block A will not change either.    Likewise, if it does change an
> import, then we look at whether the branch can be threaded.    Beyond that
> basic premise, I dont know what all it does.

Yep, I also think that's the idea.

> I presume the unbounded def chain is for local defs within a block that in
> turn feeds the import to another block.   Im not sure why we need to do much
> with those..  again, its only the import to the defchain that can affect the
> outcome t the end of the chain.. and if it changes, then you need to
> recalculate the entire chain.. but that would be part of the normal path
> walk.  I suspect ther eis also some pruning that can be done there, as GORi
> reflects "can affect the range" not "will affect the range".
>
> Perhaps whats going on is that all those local elements are being added up
> front to the list of interesting names?  That would certainly blow up the
> bitmaps and loops and such.

What it does is, if we have

bb:
  _3 = _5 + 1;
  _1 = _3 + _4;
  if (_1 > _2)

it puts _3 and _4 and _5 into the set of interesting names.  That's
OK and desired I think?  The actual problem is that compute_imports
will follow the def of _5 and _4 into dominating blocks recursively,
adding things to the imports even if the definition blocks are not
on the path (the path is empty at the point we call compute_imports).
For the testcase at hand this pulls in some 1000s of names into the
initial set of imports.  Now, the current path discovery code
only adds to imports by means of PHI translating from a PHI
def to a PHI use on the path edge - it doesn't add any further
local names used to define such PHI edge use from blocks not
dominating the path exit (but on the about to be threaded path).

I'll also note that compute_imports does

  // Exported booleans along the path, may help conditionals.
  if (m_resolve)
for (i = 0; i < m_path.length (); ++i)
  {
basic_block bb = m_path[i];
tree name;
FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
  if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
bitmap_set_bit (imports, SSA_NAME_VERSION (name));
  }

but at least for the backwards threader this isn't effective
since the path is empty at the point we call this.  And no
other code in the backwards threader does sth like this.
I would also say for the exit block it doesn't make
much sense to look at gori exports.  Plus I don't understand
what this tries to capture - it presumably catches stmts
like _1 = _2 == _3 that have uses in downstream blocks but
might be not directly part of the conditional op def chain.

In the end all the threader does is, once the path is complete,
compute ranges for the path and the imports and then fold the
path exit stmt.

I'm not sure which names we need in the import set for this
but as I guess the set of imports constrain the names we
compute ranges for so for the BB above, if we want to have
a range of _1 we need _3 in the imports set even though it
is not in the set of GORI imports?

What I'm seeing with the threader is that we have quite some
cases where we have an "unrelated" CFG diamond on the path but
end up threading a random path through it (because neither the
path query infrastructure nor the copier knows to copy the
whole diamond).  When the def chain of the operands on the
controlling conditions are not in the import set then I
suppose the path range computation doesn't do anything to
simplify things there (we do after all specify a condition
result by means of arbitrarily choosing one of the paths).
Such diamonds are the main source of exponential behavior
of the path discovery and I'm thinking of ways to improve
things here.  The original backwards threader, supposedly
due to a bug, only ever considered one path for continuing
beyond a diamond - but maybe that was on purpose because
the intermediate condition doesn't meaninfully contribute
to resolving the exit condition.

Anyway, the important result of the change is that the imports
set is vastly smaller since it is now constrained to the
actual path

Re: [PATCH] Mips: Enable TSAN for 64-bit ABIs

2022-08-09 Thread Dimitrije Milosevic
Gentle ping. :)

From: Dimitrije Milosevic
Sent: Friday, July 29, 2022 12:38 PM
To: gcc-patches@gcc.gnu.org 
Cc: Djordje Todorovic ; xry...@xry111.site 
; mask...@google.com 
Subject: [PATCH] Mips: Enable TSAN for 64-bit ABIs 
 
The following patch enables TSAN for mips64, on which it is supported.

Signed-off-by: Dimitrije Milosevic .

libsanitizer/ChangeLog:

    * configure.tgt: Enable
    TSAN for 64-bit ABIs.
---
 libsanitizer/configure.tgt | 4 
 1 file changed, 4 insertions(+)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index fb89df4935c..6855a6ca9e7 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -55,6 +55,10 @@ case "${target}" in
   arm*-*-linux*)
 ;;
   mips*-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_mips64.lo
+   fi
 ;;
   aarch64*-*-linux*)
 if test x$ac_cv_sizeof_void_p = x8; then
-- 
2.25.1

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-09 Thread Richard Biener via Gcc-patches
On Tue, Aug 9, 2022 at 2:03 PM Martin Liška  wrote:
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * gcc.cc (driver::detect_jobserver): Remove and move to
> jobserver.h.
> * lto-wrapper.cc (jobserver_active_p): Likewise.
> (run_gcc): Likewise.
> * jobserver.h: New file.
> ---
>  gcc/gcc.cc | 36 +++-
>  gcc/jobserver.h| 85 ++
>  gcc/lto-wrapper.cc | 43 +--
>  3 files changed, 97 insertions(+), 67 deletions(-)
>  create mode 100644 gcc/jobserver.h
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 5cbb38560b2..69fbd293eaa 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -43,6 +43,7 @@ compilation is specified by a string called a "spec".  */
>  #include "opts.h"
>  #include "filenames.h"
>  #include "spellcheck.h"
> +#include "jobserver.h"
>
>
>
> @@ -9178,38 +9179,9 @@ driver::final_actions () const
>  void
>  driver::detect_jobserver () const
>  {
> -  /* Detect jobserver and drop it if it's not working.  */
> -  const char *makeflags = env.get ("MAKEFLAGS");
> -  if (makeflags != NULL)
> -{
> -  const char *needle = "--jobserver-auth=";
> -  const char *n = strstr (makeflags, needle);
> -  if (n != NULL)
> -   {
> - int rfd = -1;
> - int wfd = -1;
> -
> - bool jobserver
> -   = (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
> -  && rfd > 0
> -  && wfd > 0
> -  && is_valid_fd (rfd)
> -  && is_valid_fd (wfd));
> -
> - /* Drop the jobserver if it's not working now.  */
> - if (!jobserver)
> -   {
> - unsigned offset = n - makeflags;
> - char *dup = xstrdup (makeflags);
> - dup[offset] = '\0';
> -
> - const char *space = strchr (makeflags + offset, ' ');
> - if (space != NULL)
> -   strcpy (dup + offset, space);
> - xputenv (concat ("MAKEFLAGS=", dup, NULL));
> -   }
> -   }
> -}
> +  jobserver_info jinfo;
> +  if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
> +xputenv (jinfo.skipped_makeflags.c_str ());
>  }
>
>  /* Determine what the exit code of the driver should be.  */
> diff --git a/gcc/jobserver.h b/gcc/jobserver.h
> new file mode 100644
> index 000..85453dd3c79
> --- /dev/null
> +++ b/gcc/jobserver.h
> @@ -0,0 +1,85 @@
> +/* GNU make's jobserver related functionality.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.
> +
> +See dbgcnt.def for usage information.  */
> +
> +#ifndef GCC_JOBSERVER_H
> +#define GCC_JOBSERVER_H
> +
> +#include 

C++ standard library includes have to go through system.h (#define
INCLUDE_STRING).

Does the API really have to use std::string?

> +
> +using namespace std;
> +
> +struct jobserver_info
> +{
> +  /* Default constructor.  */
> +  jobserver_info ();
> +
> +  /* Error message if there is a problem.  */
> +  string error_msg = "";
> +  /* Skipped MAKEFLAGS where --jobserver-auth is skipped.  */
> +  string skipped_makeflags = "";
> +  /* File descriptor for reading used for jobserver communication.  */
> +  int rfd = -1;
> +  /* File descriptor for writing used for jobserver communication.  */
> +  int wfd = -1;
> +  /* Return true if jobserver is active.  */
> +  bool is_active = false;
> +};
> +
> +jobserver_info::jobserver_info ()
> +{
> +  /* Detect jobserver and drop it if it's not working.  */
> +  string js_needle = "--jobserver-auth=";
> +
> +  const char *envval = getenv ("MAKEFLAGS");
> +  if (envval != NULL)
> +{
> +  string makeflags = envval;
> +  size_t n = makeflags.rfind (js_needle);
> +  if (n != string::npos)
> +   {
> + if (sscanf (makeflags.c_str () + n + js_needle.size (),
> + "%d,%d", &rfd, &wfd) == 2
> + && rfd > 0
> + && wfd > 0
> + && is_valid_fd (rfd)
> + && is_valid_fd (wfd))
> +   is_active = true;
> + else
> +   {
> + string dup = makeflags.substr (0, n);
> + size_t pos = makeflags.find (' ', n);
> + if (pos != string::npos)
> +   dup += m