Re: [PATCH] i386: simplify cpu_feature handling

2022-03-31 Thread Martin Liška

@Jakub: May I install it once stage1 opens?

Cheers,
Martin

On 1/3/22 12:43, Martin Liška wrote:

PING: Jakub?

On 12/15/21 10:57, Martin Liška wrote:

On 12/14/21 17:12, Jakub Jelinek wrote:

I'd use INT_TYPE_SIZE - 1 instead of 31.  Otherwise LGTM.


Installed with that change, thanks.

Moreover, I'm suggesting a simplification:

The patch removes unneeded loops for cpu_features2 and CONVERT_EXPR
that can be simplified with NOP_EXPR.

Survives i386.exp tests, may I install the patch after testing or
is it a stage1 material?

Thanks,
Martin






[PATCH] rtl-optimization/105091 - wrong DSE with missed TREE_ADDRESSABLE

2022-03-31 Thread Richard Biener via Gcc-patches
When expanding an aggregate copy into a memcpy call RTL expansion
uses mark_addressable to ensure the base object is addressable but
that function doesn't handle TARGET_MEM_REF bases.  Fixed as follows.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-31  Richard Biener  

PR rtl-optimization/105091
* gimple-expr.cc (mark_addressable): Handle TARGET_MEM_REF
bases.
---
 gcc/gimple-expr.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-expr.cc b/gcc/gimple-expr.cc
index f9a650b5daf..5faaf43eaf5 100644
--- a/gcc/gimple-expr.cc
+++ b/gcc/gimple-expr.cc
@@ -910,7 +910,8 @@ mark_addressable (tree x)
 x = TREE_OPERAND (x, 0);
   while (handled_component_p (x))
 x = TREE_OPERAND (x, 0);
-  if (TREE_CODE (x) == MEM_REF
+  if ((TREE_CODE (x) == MEM_REF
+   || TREE_CODE (x) == TARGET_MEM_REF)
   && TREE_CODE (TREE_OPERAND (x, 0)) == ADDR_EXPR)
 x = TREE_OPERAND (TREE_OPERAND (x, 0), 0);
   if (!VAR_P (x)
-- 
2.34.1


C++ modules and AAPCS/ARM EABI clash on inline key methods

2022-03-31 Thread Alexandre Oliva via Gcc-patches


g++.dg/modules/virt-2_a.C fails on arm-eabi and many other arm targets
that use the AAPCS variant.  ARM is the only target that overrides
TARGET_CXX_KEY_METHOD_MAY_BE_INLINE.  It's not clear to me which way the
clash between AAPCS and C++ Modules design should be resolved, but
currently it favors AAPCS and thus the test fails.

Should we skip the test on ARM, XFAIL it, or put in some kludge like
the patchlet below?


diff --git a/gcc/testsuite/g++.dg/modules/virt-2_a.C 
b/gcc/testsuite/g++.dg/modules/virt-2_a.C
index 9115cc19cc286..40f30137f0086 100644
--- a/gcc/testsuite/g++.dg/modules/virt-2_a.C
+++ b/gcc/testsuite/g++.dg/modules/virt-2_a.C
@@ -10,7 +10,9 @@ export struct Visitor
 
 // Key function explicitly inline (regardless of p1779's state)
 // We emit vtables & rtti only in this TU
+#if !__ARM_EABI__
 inline // Yoink!
+#endif
   int Visitor::Visit ()
 {
   return 0;


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[committed][nvptx] Fix ASM_SPEC workaround for sm_30

2022-03-31 Thread Tom de Vries via Gcc-patches
Hi,

Newer versions of CUDA no longer support sm_30, and nvptx-tools as
currently doesn't handle that gracefully when verifying
( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).

There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't
work when using -Wa,--verify on the command line.

Use a more robust workaround: verify using sm_35 when misa=sm_30 is specified
(either implicitly or explicitly).

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix ASM_SPEC workaround for sm_30

gcc/ChangeLog:

2022-03-30  Tom de Vries  

* config/nvptx/nvptx.h (ASM_SPEC): Use "-m sm_35" for -misa=sm_30.

---
 gcc/config/nvptx/nvptx.h | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 75ac7a666b1..3b06f33032f 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,10 +29,24 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-/* Default needs to be in sync with default for misa in nvptx.opt.
-   We add a default here to work around a hard-coded sm_30 default in
-   nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}"
+/* Newer versions of CUDA no longer support sm_30, and nvptx-tools as
+   currently doesn't handle that gracefully when verifying
+   ( https://github.com/MentorEmbedded/nvptx-tools/issues/30 ).  Work around
+   this by verifying with sm_35 when having misa=sm_30 (either implicitly
+   or explicitly).  */
+#define ASM_SPEC   \
+  "%{" \
+  /* Explict misa=sm_30.  */   \
+  "misa=sm_30:-m sm_35"\
+  /* Separator. */ \
+  "; " \
+  /* Catch-all. */ \
+  "misa=*:-m %*"   \
+  /* Separator. */ \
+  "; " \
+  /* Implicit misa=sm_30.  */  \
+  ":-m sm_35"  \
+  "}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 


[PATCH] tree-optimization/105109 - bogus uninit diagnostic with _Complex

2022-03-31 Thread Richard Biener via Gcc-patches
When update_address_taken rewrites a _Complex into SSA it changes
stores to real/imaginary parts to loads of the other component and
a COMPLEX_EXPR.  That matches what gimplification does but it misses
suppression of diagnostics for the load of the other component.
The following patch adds that, syncing up gimplification and
update_address_taken behavior.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-31  Richard Biener  

PR tree-optimization/105109
* tree-ssa.cc (execute_update_addresses_taken): Suppress
diagnostics on the load of the other complex component.

* gcc.dg/uninit-pr105109.c: New testcase.
---
 gcc/testsuite/gcc.dg/uninit-pr105109.c | 15 +++
 gcc/tree-ssa.cc|  1 +
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/uninit-pr105109.c

diff --git a/gcc/testsuite/gcc.dg/uninit-pr105109.c 
b/gcc/testsuite/gcc.dg/uninit-pr105109.c
new file mode 100644
index 000..001003ca261
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr105109.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -Wuninitialized" } */
+
+static void foo(int dim,float _Complex f0[])
+{
+  int d;
+  f0[0] -= 3.14;  /* { dg-bogus "uninitialized" } */
+  for (d = 0; d < dim; ++d) f0[0] += 3.14;
+}
+void bar(int dim, const float _Complex u_t[], float _Complex f0[])
+{
+  float _Complex exp[1] = {0.};
+  foo(dim, exp);
+  f0[0] = u_t[0] - exp[0];
+}
diff --git a/gcc/tree-ssa.cc b/gcc/tree-ssa.cc
index 6dcb3142869..a362a0a9ea6 100644
--- a/gcc/tree-ssa.cc
+++ b/gcc/tree-ssa.cc
@@ -1905,6 +1905,7 @@ execute_update_addresses_taken (void)
? REALPART_EXPR : IMAGPART_EXPR,
TREE_TYPE (other),
TREE_OPERAND (lhs, 0));
+   suppress_warning (lrhs);
gimple *load = gimple_build_assign (other, lrhs);
location_t loc = gimple_location (stmt);
gimple_set_location (load, loc);
-- 
2.34.1


[committed][nvptx, testsuite] Fix typo in gcc.target/nvptx/march.c

2022-03-31 Thread Tom de Vries via Gcc-patches
Hi,

The dg-options line in gcc.target/nvptx/march.c:
...
/* { dg-options "-march=sm_30"} */
...
currently doesn't have any effect because it's missing a space between '"' and
'}'.

Fix this by adding the missing space.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Fix typo in gcc.target/nvptx/march.c

gcc/testsuite/ChangeLog:

2022-03-31  Tom de Vries  

* gcc.target/nvptx/march.c: Add missing space in dg-options line.

---
 gcc/testsuite/gcc.target/nvptx/march.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/march.c 
b/gcc/testsuite/gcc.target/nvptx/march.c
index ec91f21c903..d1dd715798c 100644
--- a/gcc/testsuite/gcc.target/nvptx/march.c
+++ b/gcc/testsuite/gcc.target/nvptx/march.c
@@ -1,4 +1,4 @@
-/* { dg-options "-march=sm_30"} */
+/* { dg-options "-march=sm_30" } */
 
 #include "main.c"
 


Re: [PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-03-31 Thread Richard Biener via Gcc-patches
On Thu, Mar 31, 2022 at 7:51 AM liuhongt  wrote:
>
> Since cfg is freed before machine_reorg, just do a rough calculation
> of the window according to the layout.
> Also according to an experiment on CLX, set window size to 64.
>
> Currently only handle V2DFmode load since it doesn't need any scratch
> registers, and it's sufficient to recover cray performance for -O2
> compared to GCC11.
>
> Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}.
> No impact for SPEC2017(same binary for both O2 and Ofast).
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/101908
> * config/i386/i386.cc (ix86_split_stlf_stall_load): New
> function
> (ix86_reorg): Call ix86_split_stlf_stall_load.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr101908-1.c: New test.
> * gcc.target/i386/pr101908-2.c: New test.
> ---
>  gcc/config/i386/i386.cc| 47 ++
>  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 ++
>  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 ++
>  3 files changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 5a561966eb4..f9169b04d43 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -21933,7 +21933,53 @@ ix86_seh_fixup_eh_fallthru (void)
>emit_insn_after (gen_nops (const1_rtx), insn);
>  }
>  }
> +/* Split vector load from parm_decl to elemental loads to avoid STLF
> +   stalls.  */
> +static void
> +ix86_split_stlf_stall_load ()
> +{
> +  basic_block bb;
> +  unsigned window = 0;
> +  FOR_EACH_BB_FN (bb, cfun)
> +{
> +  rtx_insn *insn;
> +  FOR_BB_INSNS (bb, insn)
> +   {
> + if (!NONDEBUG_INSN_P (insn))
> +   continue;
> + window++;
> + /* Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between 
> each
> +other, just emulate for pipeline) before stalled load, stlf stall
> +case is as fast as no stall cases on CLX.
> +Since CFG is freed before machine_reorg, just do a rough
> +calculation of the window according to the layout.  */
> + if (window > 64)
> +   return;

I wonder if we should also return for any_uncondjump_p (insn)
(not sure if that captures returnjump_p), or maybe just explicitely
allow any_condjump_p and reject other PC setters.

Likewise we might want to stop at a LABEL that can be backwards reached.

I suppose people more familiar with cfgrtl can suggest something better.

> + rtx set = single_set (insn);
> + if (!set)
> +   continue;
> + rtx src = SET_SRC (set);
> + if (!MEM_P (src)
> + /* Only handle V2DFmode load since it doesn't need any scratch
> +register.  */
> + || GET_MODE (src) != E_V2DFmode
> + || !MEM_EXPR (src)
> + || TREE_CODE (get_base_address (MEM_EXPR (src))) != PARM_DECL)

I wonder if we have (easy) ways to detect whether XEXP (src, 0) is
frame/stack based
rather than requiring a MEM_EXPR.  There is may_be_sp_based_p ()
exported from alias.c
for example, but I'm not sure whether that works after RA & frame elimination.

> +   continue;
> +
> + rtx zero = CONST0_RTX (V2DFmode);
> + rtx dest = SET_DEST (set);
> + rtx m = adjust_address (src, DFmode, 0);
> + emit_insn_before (gen_sse2_loadlpd (dest, zero, m), insn);

Can SSE1 also do this?

> + m = adjust_address (src, DFmode, 8);
> + PATTERN (insn) = gen_sse2_loadhpd (dest, dest, m);
> + INSN_CODE (insn) = -1;
> + gcc_assert (recog_memoized (insn) != -1);

I think we want to dump something into dump_file when we split an insn here.

> +   }
> +}
> +
> +}
>  /* Implement machine specific optimizations.  We implement padding of returns
> for K8 CPUs and pass to avoid 4 jumps in the single 16 byte window.  */
>  static void
> @@ -21948,6 +21994,7 @@ ix86_reorg (void)
>
>if (optimize && optimize_function_for_speed_p (cfun))
>  {
> +  ix86_split_stlf_stall_load ();
>if (TARGET_PAD_SHORT_FUNCTION)
> ix86_pad_short_function ();

btw. this function suggests we do have edges, so doing something "better"
than FOR_EACH_BB_FN, aka walking blocks in layout order, might be
possible after all.  For example ix86_avoid_jump_mispredicts just walks
the function by looking at get_insns(), that might be more closely what
"as laid out" is.

>else if (TARGET_PAD_RETURNS)
> diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c 
> b/gcc/testsuite/gcc.target/i386/pr101908-1.c
> new file mode 100644
> index 000..33d9684f0ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx" } */
> +/* { dg-final { scan-assem

Re: Test for linking for arm/size-optimization-ieee-[123].c

2022-03-31 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches  writes:
> These tests require a target that supports arm soft-float.  The
> problem is that the test checks for compile-time soft-float support,
> but they may hit a problem when the linker complains that it can't
> combine the testcase's object file with hard-float init files and
> target system libraries.
>
> I don't see that the tests actually require linking, and they could be
> simplified to dg-do assemble, but I figured a link test for soft-float
> support could be useful, so I added that, and adjusted the tests to
> require it instead.
>
> Tested on an affected platform.  Ok to install?
>
>
> for  gcc/testsuite/ChangeLog
>
>   * lib/target-supports.exp
>   (check_effective_target_arm_soft_ok_link): New.
>   * gcc.target/arm/size-optimization-ieee-1.c: Use it.
>   * gcc.target/arm/size-optimization-ieee-2.c: Likewise.
>   * gcc.target/arm/size-optimization-ieee-3.c: Likewise.

OK, thanks.

Richard

> ---
>  .../gcc.target/arm/size-optimization-ieee-1.c  |2 +-
>  .../gcc.target/arm/size-optimization-ieee-2.c  |2 +-
>  .../gcc.target/arm/size-optimization-ieee-3.c  |2 +-
>  gcc/testsuite/lib/target-supports.exp  |   12 
>  4 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c 
> b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
> index 61475eb4c679f..9af2c6e102071 100644
> --- a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
> +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do link { target arm_soft_ok } } */
> +/* { dg-do link { target arm_soft_ok_link } } */
>  /* { dg-skip-if "Feature is -mfloat-abi=soft only" { *-*-* } { 
> "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
>  /* { dg-options "-mfloat-abi=soft" } */
>  
> diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c 
> b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
> index b4699271ceacc..e78a7ada62eb7 100644
> --- a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
> +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do link { target arm_soft_ok } } */
> +/* { dg-do link { target arm_soft_ok_link } } */
>  /* { dg-skip-if "Feature is -mfloat-abi=soft only" { *-*-* } { 
> "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
>  /* { dg-options "-mfloat-abi=soft" } */
>  
> diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c 
> b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c
> index 34b1ebe7afd60..bb9ccefda5e95 100644
> --- a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c
> +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c
> @@ -1,4 +1,4 @@
> -/* { dg-do link { target arm_soft_ok } } */
> +/* { dg-do link { target arm_soft_ok_link } } */
>  /* { dg-skip-if "Feature is -mfloat-abi=soft only" { *-*-* } { 
> "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
>  /* { dg-options "-mfloat-abi=soft" } */
>  
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index a1aef0e0a1622..ff8edbd3e1776 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3935,6 +3935,18 @@ proc check_effective_target_arm_soft_ok { } {
>   } "-mfloat-abi=soft"]
>  }
>  
> +# Return 1 if this is an ARM target supporting -mfloat-abi=soft even
> +# for linking.  Some multilibs may be incompatible with this option,
> +# and some linkers may reject incompatible options.
> +
> +proc check_effective_target_arm_soft_ok_link { } {
> +return [check_no_compiler_messages arm_soft_ok_link executable {
> + #include 
> + int dummy;
> + int main (void) { return 0; }
> + } "-mfloat-abi=soft"]
> +}
> +
>  # Return 1 if this is an ARM target supporting -mfpu=vfp with an
>  # appropriate abi.


[RFC/gcov 00/12] Add merge-stream subcommand to gcov-tool

2022-03-31 Thread Sebastian Huber
This patch set is a proof of concept.

The aim is to better support gcov in free-standing environments. For example,
you can run a test executable which dumps all gcov info objects in a serial
data stream using __gcov_info_to_gcda() and the new __gcov_filename_to_gcfn().
It could be encoded as base64. It could be also compressed. On the host you
unpack the encoded data stream and feed it into gcov-tool using the new
"merge-stream" subcommand:

gcov-tool --help
Usage: gcov-tool [OPTION]... SUB_COMMAND [OPTION]...

Offline tool to handle gcda counts

  -h, --helpPrint this help, then exit
  -v, --version Print version number, then exit
  merge-stream [options] [stream-file]  Merge coverage stream file (or stdin)
and coverage file contents
-v, --verbose   Verbose mode
-w, --weight Set weights (float point values)

Example:

base64 -d log.txt | gcov-tool merge-stream

The patch set does not change the format of gcda files.

TODO:

* Documentation
* Tests

Sebastian Huber (12):
  gcov-tool: Allow merging of empty profile lists
  gcov: Add mode to all gcov_open()
  gcov: Add open mode parameter to gcov_do_dump()
  gcov: Make gcov_seek() static
  gcov: Add __gcov_filename_to_gcfn()
  gcov-tool: Support file input from stdin
  gcov: Use xstrdup()
  gcov: Move prepend to list to read_gcda_file()
  gcov: Move gcov_open() to caller of read_gcda_file()
  gcov: Fix integer types in ftw_read_file()
  gcov: Record EOF error during read
  gcov-tool: Add merge-stream subcommand

 gcc/gcov-io.cc |  76 ++-
 gcc/gcov-io.h  |  35 +
 gcc/gcov-tool.cc   | 107 +-
 libgcc/gcov.h  |  17 -
 libgcc/libgcov-driver-system.c |   7 +-
 libgcc/libgcov-driver.c|  42 --
 libgcc/libgcov-util.c  | 135 +
 libgcc/libgcov.h   |   3 -
 8 files changed, 326 insertions(+), 96 deletions(-)

-- 
2.34.1



[RFC/gcov 01/12] gcov-tool: Allow merging of empty profile lists

2022-03-31 Thread Sebastian Huber
The gcov_profile_merge() already had code to deal with profile information
which had no counterpart to merge with.  For profile information from files
with no associated counterpart, the profile information is simply used as is
with the weighting transformation applied.  Make sure that gcov_profile_merge()
works with an empty target profile list.  Return the merged profile list.

gcc/
* gcov-tool.cc (gcov_profile_merge): Adjust return type.
(profile_merge): Allow merging of directories which contain no profile
files.

libgcc/
* libgcov-util.c (gcov_profile_merge): Return the list of merged
profiles.  Accept empty target and source profile lists.
---
 gcc/gcov-tool.cc  | 27 ++-
 libgcc/libgcov-util.c | 19 +++
 2 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/gcc/gcov-tool.cc b/gcc/gcov-tool.cc
index f4e42ae763c..2e4083a664d 100644
--- a/gcc/gcov-tool.cc
+++ b/gcc/gcov-tool.cc
@@ -40,7 +40,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #endif
 #include 
 
-extern int gcov_profile_merge (struct gcov_info*, struct gcov_info*, int, int);
+extern struct gcov_info *gcov_profile_merge (struct gcov_info*,
+struct gcov_info*, int, int);
 extern int gcov_profile_overlap (struct gcov_info*, struct gcov_info*);
 extern int gcov_profile_normalize (struct gcov_info*, gcov_type);
 extern int gcov_profile_scale (struct gcov_info*, float, int, int);
@@ -141,26 +142,18 @@ profile_merge (const char *d1, const char *d2, const char 
*out, int w1, int w2)
 {
   struct gcov_info *d1_profile;
   struct gcov_info *d2_profile;
-  int ret;
+  struct gcov_info *merged_profile;
 
   d1_profile = gcov_read_profile_dir (d1, 0);
-  if (!d1_profile)
-return 1;
-
-  if (d2)
-{
-  d2_profile = gcov_read_profile_dir (d2, 0);
-  if (!d2_profile)
-return 1;
+  d2_profile = gcov_read_profile_dir (d2, 0);
 
-  /* The actual merge: we overwrite to d1_profile.  */
-  ret = gcov_profile_merge (d1_profile, d2_profile, w1, w2);
+  /* The actual merge: we overwrite to d1_profile.  */
+  merged_profile = gcov_profile_merge (d1_profile, d2_profile, w1, w2);
 
-  if (ret)
-return ret;
-}
-
-  gcov_output_files (out, d1_profile);
+  if (merged_profile)
+gcov_output_files (out, merged_profile);
+  else if (verbose)
+fnotice (stdout, "no profile files were merged\n");
 
   return 0;
 }
diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index ba7fb924b53..100f1b19f1a 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -674,16 +674,16 @@ find_match_gcov_info (struct gcov_info **array, int size,
 }
 
 /* Merge the list of gcov_info objects from SRC_PROFILE to TGT_PROFILE.
-   Return 0 on success: without mismatch.
-   Reutrn 1 on error.  */
+   Return the list of merged gcov_info objects.  Return NULL if the list is
+   empty.  */
 
-int
+struct gcov_info *
 gcov_profile_merge (struct gcov_info *tgt_profile, struct gcov_info 
*src_profile,
 int w1, int w2)
 {
   struct gcov_info *gi_ptr;
   struct gcov_info **tgt_infos;
-  struct gcov_info *tgt_tail;
+  struct gcov_info **tgt_tail;
   struct gcov_info **in_src_not_tgt;
   unsigned tgt_cnt = 0, src_cnt = 0;
   unsigned unmatch_info_cnt = 0;
@@ -703,7 +703,10 @@ gcov_profile_merge (struct gcov_info *tgt_profile, struct 
gcov_info *src_profile
   for (gi_ptr = tgt_profile, i = 0; gi_ptr; gi_ptr = gi_ptr->next, i++)
 tgt_infos[i] = gi_ptr;
 
-  tgt_tail = tgt_infos[tgt_cnt - 1];
+  if (tgt_cnt)
+ tgt_tail = &tgt_infos[tgt_cnt - 1]->next;
+  else
+ tgt_tail = &tgt_profile;
 
   /* First pass on tgt_profile, we multiply w1 to all counters.  */
   if (w1 > 1)
@@ -732,14 +735,14 @@ gcov_profile_merge (struct gcov_info *tgt_profile, struct 
gcov_info *src_profile
   gi_ptr = in_src_not_tgt[i];
   gcov_merge (gi_ptr, gi_ptr, w2 - 1);
   gi_ptr->next = NULL;
-  tgt_tail->next = gi_ptr;
-  tgt_tail = gi_ptr;
+  *tgt_tail = gi_ptr;
+  tgt_tail = &gi_ptr->next;
 }
 
   free (in_src_not_tgt);
   free (tgt_infos);
 
-  return 0;
+  return tgt_profile;
 }
 
 typedef gcov_type (*counter_op_fn) (gcov_type, void*, void*);
-- 
2.34.1



[RFC/gcov 02/12] gcov: Add mode to all gcov_open()

2022-03-31 Thread Sebastian Huber
gcc/

* gcov-io.cc (gcov_open): Always use the mode parameter.
* gcov-io.h (gcov_open): Declare it unconditionally.

libgcc/

* libgcov-driver-system.c (gcov_exit_open_gcda_file): Open file for
reading and writing.
* libgcov-util.c (read_gcda_file): Open file for reading.
* libgcov.h (gcov_open): Delete declaration.
---
 gcc/gcov-io.cc | 7 ---
 gcc/gcov-io.h  | 5 +
 libgcc/libgcov-driver-system.c | 4 ++--
 libgcc/libgcov-util.c  | 2 +-
 libgcc/libgcov.h   | 1 -
 5 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/gcc/gcov-io.cc b/gcc/gcov-io.cc
index 72c40f8eaa0..017a6e32a5d 100644
--- a/gcc/gcov-io.cc
+++ b/gcc/gcov-io.cc
@@ -89,15 +89,8 @@ from_file (gcov_unsigned_t value)
Return zero on failure, non-zero on success.  */
 
 GCOV_LINKAGE int
-#if IN_LIBGCOV
-gcov_open (const char *name)
-#else
 gcov_open (const char *name, int mode)
-#endif
 {
-#if IN_LIBGCOV
-  int mode = 0;
-#endif
 #if GCOV_LOCKED
   struct flock s_flock;
   int fd;
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 99ce7dbccc8..afe74b002f1 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -347,15 +347,12 @@ struct gcov_summary
functions for writing.  Your file may become corrupted if you break
these invariants.  */
 
-#if !IN_LIBGCOV
-GCOV_LINKAGE int gcov_open (const char */*name*/, int /*direction*/);
-#endif
-
 #if !IN_LIBGCOV || defined (IN_GCOV_TOOL)
 GCOV_LINKAGE int gcov_magic (gcov_unsigned_t, gcov_unsigned_t);
 #endif
 
 /* Available everywhere.  */
+GCOV_LINKAGE int gcov_open (const char *, int) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE int gcov_close (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_unsigned_t gcov_read_unsigned (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_type gcov_read_counter (void) ATTRIBUTE_HIDDEN;
diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
index eef6e3cbda1..9abb2fe7f74 100644
--- a/libgcc/libgcov-driver-system.c
+++ b/libgcc/libgcov-driver-system.c
@@ -309,7 +309,7 @@ gcov_exit_open_gcda_file (struct gcov_info *gi_ptr,
 
   gf->filename = replace_filename_variables (gf->filename);
 
-  if (!gcov_open (gf->filename))
+  if (!gcov_open (gf->filename, 0))
 {
   /* Open failed likely due to missed directory.
  Create directory and retry to open file. */
@@ -318,7 +318,7 @@ gcov_exit_open_gcda_file (struct gcov_info *gi_ptr,
   fprintf (stderr, "profiling:%s:Skip\n", gf->filename);
   return -1;
 }
-  if (!gcov_open (gf->filename))
+  if (!gcov_open (gf->filename, 0))
 {
   fprintf (stderr, "profiling:%s:Cannot open\n", gf->filename);
   return -1;
diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index 100f1b19f1a..db157220c9d 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -268,7 +268,7 @@ read_gcda_file (const char *filename)
 k_ctrs_mask[i] = 0;
   k_ctrs_types = 0;
 
-  if (!gcov_open (filename))
+  if (!gcov_open (filename, 1))
 {
   fnotice (stderr, "%s:cannot open\n", filename);
   return NULL;
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 40e845ce3ea..f190547e819 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -343,7 +343,6 @@ extern int __gcov_execve (const char *, char  *const [], 
char *const [])
   ATTRIBUTE_HIDDEN;
 
 /* Functions that only available in libgcov.  */
-GCOV_LINKAGE int gcov_open (const char */*name*/) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_write_summary (gcov_unsigned_t /*tag*/,
   const struct gcov_summary *)
 ATTRIBUTE_HIDDEN;
-- 
2.34.1



[RFC/gcov 03/12] gcov: Add open mode parameter to gcov_do_dump()

2022-03-31 Thread Sebastian Huber
gcc/

* gcov-tool.cc (gcov_do_dump): Add mode parameter.
(gcov_output_files): Open files for reading and writing.

libgcc/

* libgcov-driver-system.c (gcov_exit_open_gcda_file): Add mode
parameter.  Pass mode to gcov_open() calls.
* libgcov-driver.c (dump_one_gcov):  Add mode parameter.  Pass mode to
gcov_exit_open_gcda_file() call.
(gcov_do_dump): Add mode parameter.  Pass mode to dump_one_gcov()
calls.
(__gcov_dump_one):  Open file for reading and writing.
---
 gcc/gcov-tool.cc   |  4 ++--
 libgcc/libgcov-driver-system.c |  7 ---
 libgcc/libgcov-driver.c| 12 ++--
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/gcov-tool.cc b/gcc/gcov-tool.cc
index 2e4083a664d..d712715cf7e 100644
--- a/gcc/gcov-tool.cc
+++ b/gcc/gcov-tool.cc
@@ -46,7 +46,7 @@ extern int gcov_profile_overlap (struct gcov_info*, struct 
gcov_info*);
 extern int gcov_profile_normalize (struct gcov_info*, gcov_type);
 extern int gcov_profile_scale (struct gcov_info*, float, int, int);
 extern struct gcov_info* gcov_read_profile_dir (const char*, int);
-extern void gcov_do_dump (struct gcov_info *, int);
+extern void gcov_do_dump (struct gcov_info *, int, int);
 extern const char *gcov_get_filename (struct gcov_info *list);
 extern void gcov_set_verbose (void);
 
@@ -124,7 +124,7 @@ gcov_output_files (const char *out, struct gcov_info 
*profile)
 fatal_error (input_location, "output file %s already exists in folder %s",
 filename, out);
 
-  gcov_do_dump (profile, 0);
+  gcov_do_dump (profile, 0, 0);
 
   ret = chdir (pwd);
   if (ret)
diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
index 9abb2fe7f74..ac405c38e3a 100644
--- a/libgcc/libgcov-driver-system.c
+++ b/libgcc/libgcov-driver-system.c
@@ -261,7 +261,8 @@ allocate_filename_struct (struct gcov_filename *gf)
 
 static int
 gcov_exit_open_gcda_file (struct gcov_info *gi_ptr,
- struct gcov_filename *gf)
+ struct gcov_filename *gf,
+ int mode)
 {
   int append_slash = 0;
   const char *fname = gi_ptr->filename;
@@ -309,7 +310,7 @@ gcov_exit_open_gcda_file (struct gcov_info *gi_ptr,
 
   gf->filename = replace_filename_variables (gf->filename);
 
-  if (!gcov_open (gf->filename, 0))
+  if (!gcov_open (gf->filename, mode))
 {
   /* Open failed likely due to missed directory.
  Create directory and retry to open file. */
@@ -318,7 +319,7 @@ gcov_exit_open_gcda_file (struct gcov_info *gi_ptr,
   fprintf (stderr, "profiling:%s:Skip\n", gf->filename);
   return -1;
 }
-  if (!gcov_open (gf->filename, 0))
+  if (!gcov_open (gf->filename, mode))
 {
   fprintf (stderr, "profiling:%s:Cannot open\n", gf->filename);
   return -1;
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 7e52c5676e5..10831e84b61 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -595,14 +595,14 @@ write_one_data (const struct gcov_info *gi_ptr,
 static void
 dump_one_gcov (struct gcov_info *gi_ptr, struct gcov_filename *gf,
   unsigned run_counted ATTRIBUTE_UNUSED,
-  gcov_type run_max ATTRIBUTE_UNUSED)
+  gcov_type run_max ATTRIBUTE_UNUSED, int mode)
 {
   struct gcov_summary summary = {};
   int error;
   gcov_unsigned_t tag;
   fn_buffer = 0;
 
-  error = gcov_exit_open_gcda_file (gi_ptr, gf);
+  error = gcov_exit_open_gcda_file (gi_ptr, gf, mode);
   if (error == -1)
 return;
 
@@ -649,13 +649,13 @@ read_fatal:;
 
 /* Dump all the coverage counts for the program. It first computes program
summary and then traverses gcov_list list and dumps the gcov_info
-   objects one by one.  */
+   objects one by one.  Use MODE to open files.  */
 
 #if !IN_GCOV_TOOL
 static
 #endif
 void
-gcov_do_dump (struct gcov_info *list, int run_counted)
+gcov_do_dump (struct gcov_info *list, int run_counted, int mode)
 {
   struct gcov_info *gi_ptr;
   struct gcov_filename gf;
@@ -678,7 +678,7 @@ gcov_do_dump (struct gcov_info *list, int run_counted)
   /* Now merge each file.  */
   for (gi_ptr = list; gi_ptr; gi_ptr = gi_ptr->next)
 {
-  dump_one_gcov (gi_ptr, &gf, run_counted, run_max);
+  dump_one_gcov (gi_ptr, &gf, run_counted, run_max, mode);
   free (gf.filename);
 }
 
@@ -701,7 +701,7 @@ __gcov_dump_one (struct gcov_root *root)
   if (root->dumped)
 return;
 
-  gcov_do_dump (root->list, root->run_counted);
+  gcov_do_dump (root->list, root->run_counted, 0);
   
   root->dumped = 1;
   root->run_counted = 1;
-- 
2.34.1



[RFC/gcov 06/12] gcov-tool: Support file input from stdin

2022-03-31 Thread Sebastian Huber
gcc/

* gcov-io.cc (GCOV_MODE_STDIN): Define.
(gcov_position): For gcov-tool, return calculated position if file is
stdin.
(gcov_open):  For gcov-tool, use stdin if filename is NULL.
(gcov_close): For gcov-tool, do not close stdin.
(gcov_read_bytes): For gcov-tool, update position if file is stdin.
(gcov_sync): For gcov-tool, discard input if file is stdin.
---
 gcc/gcov-io.cc | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/gcc/gcov-io.cc b/gcc/gcov-io.cc
index fee3130f94a..177f81166a6 100644
--- a/gcc/gcov-io.cc
+++ b/gcc/gcov-io.cc
@@ -35,8 +35,13 @@ struct gcov_var
   int error;   /* < 0 overflow, > 0 disk error.  */
   int mode;/* < 0 writing, > 0 reading.  */
   int endian;  /* Swap endianness.  */
+#ifdef IN_GCOV_TOOL
+  gcov_position_t pos; /* File position for stdin support. */
+#endif
 } gcov_var;
 
+#define GCOV_MODE_STDIN 2
+
 /* Save the current position in the gcov file.  */
 /* We need to expose this function when compiling for gcov-tool.  */
 #ifndef IN_GCOV_TOOL
@@ -45,6 +50,10 @@ static inline
 gcov_position_t
 gcov_position (void)
 {
+#ifdef IN_GCOV_TOOL
+  if (gcov_var.mode == GCOV_MODE_STDIN)
+return gcov_var.pos;
+#endif
   return ftell (gcov_var.file);
 }
 
@@ -108,6 +117,16 @@ gcov_open (const char *name, int mode)
 #if !IN_LIBGCOV || defined (IN_GCOV_TOOL)
   gcov_var.endian = 0;
 #endif
+#ifdef IN_GCOV_TOOL
+  gcov_var.pos = 0;
+  if (!name)
+{
+  gcov_nonruntime_assert (gcov_var.mode > 0);
+  gcov_var.file = stdin;
+  gcov_var.mode = GCOV_MODE_STDIN;
+  return 1;
+}
+#endif
 #if GCOV_LOCKED
   if (mode > 0)
 {
@@ -190,6 +209,11 @@ gcov_open (const char *name, int mode)
 GCOV_LINKAGE int
 gcov_close (void)
 {
+#ifdef IN_GCOV_TOOL
+  if (gcov_var.file == stdin)
+gcov_var.file = 0;
+  else
+#endif
   if (gcov_var.file)
 {
   if (fclose (gcov_var.file))
@@ -363,6 +387,9 @@ gcov_read_bytes (void *buffer, unsigned count)
   if (read != 1)
 return NULL;
 
+#ifdef IN_GCOV_TOOL
+  gcov_var.pos += count;
+#endif
   return buffer;
 }
 
@@ -499,6 +526,17 @@ gcov_sync (gcov_position_t base, gcov_unsigned_t length)
 {
   gcov_nonruntime_assert (gcov_var.mode > 0);
   base += length;
+#ifdef IN_GCOV_TOOL
+  if (gcov_var.mode == GCOV_MODE_STDIN)
+{
+  while (gcov_var.pos < base)
+   {
+ ++gcov_var.pos;
+ (void)fgetc(gcov_var.file);
+   }
+  return;
+}
+#endif
   fseek (gcov_var.file, base, SEEK_SET);
 }
 #endif
-- 
2.34.1



[RFC/gcov 04/12] gcov: Make gcov_seek() static

2022-03-31 Thread Sebastian Huber
This function is only used by gcov_write_length() in the gcov-io.cc file.

gcc/

* gcov-io.cc (gcov_seek): Make it static.
* gcov-io.h (struct gcov_summary): Do not mention gcov_seek().

libgcc/

* libgcov.h (gcov_seek): Remove define and declaration.
---
 gcc/gcov-io.cc   | 4 +---
 gcc/gcov-io.h| 6 +++---
 libgcc/libgcov.h | 2 --
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/gcc/gcov-io.cc b/gcc/gcov-io.cc
index 017a6e32a5d..fee3130f94a 100644
--- a/gcc/gcov-io.cc
+++ b/gcc/gcov-io.cc
@@ -294,17 +294,15 @@ gcov_write_filename (const char *filename)
 
   gcov_write_string (filename);
 }
-#endif
 
 /* Move to a given position in a gcov file.  */
 
-GCOV_LINKAGE void
+static void
 gcov_seek (gcov_position_t base)
 {
   fseek (gcov_var.file, base, SEEK_SET);
 }
 
-#if !IN_LIBGCOV
 /* Write a tag TAG and reserve space for the record length. Return a
value to be used for gcov_write_length.  */
 
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index afe74b002f1..204ae0ccf7f 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -340,9 +340,9 @@ struct gcov_summary
 /* Functions for reading and writing gcov files. In libgcov you can
open the file for reading then writing. Elsewhere you can open the
file either for reading or for writing. When reading a file you may
-   use the gcov_read_* functions, gcov_sync, gcov_position, &
-   gcov_error. When writing a file you may use the gcov_write
-   functions, gcov_seek & gcov_error. When a file is to be rewritten
+   use the gcov_read_* functions, gcov_sync, gcov_position, and
+   gcov_error. When writing a file you may use the gcov_write*
+   functions and gcov_error. When a file is to be rewritten
you use the functions for reading, then gcov_rewrite then the
functions for writing.  Your file may become corrupted if you break
these invariants.  */
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index f190547e819..487bd1464cd 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -115,7 +115,6 @@ typedef unsigned gcov_type_unsigned __attribute__ ((mode 
(QI)));
 #define gcov_open __gcov_open
 #define gcov_close __gcov_close
 #define gcov_position __gcov_position
-#define gcov_seek __gcov_seek
 #define gcov_rewrite __gcov_rewrite
 #define gcov_is_error __gcov_is_error
 #define gcov_write_unsigned __gcov_write_unsigned
@@ -346,7 +345,6 @@ extern int __gcov_execve (const char *, char  *const [], 
char *const [])
 GCOV_LINKAGE void gcov_write_summary (gcov_unsigned_t /*tag*/,
   const struct gcov_summary *)
 ATTRIBUTE_HIDDEN;
-GCOV_LINKAGE void gcov_seek (gcov_position_t /*position*/) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_rewrite (void) ATTRIBUTE_HIDDEN;
 
 /* "Counts" stored in gcda files can be a real counter value, or
-- 
2.34.1



[RFC/gcov 07/12] gcov: Use xstrdup()

2022-03-31 Thread Sebastian Huber
Move duplication of filename to caller and use xstrdup() instead of custom
code.  This helps to reuse read_gcda_file() for other purposes.

libgcc/

* libgcov-util.c (read_gcda_file): Do not duplicate filename.
(ftw_read_file): Duplicate filename for read_gcda_file().
---
 libgcc/libgcov-util.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index db157220c9d..ae5712c0138 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -296,16 +296,11 @@ read_gcda_file (const char *filename)
  sizeof (struct gcov_ctr_info) * GCOV_COUNTERS, 1);
 
   obj_info->version = version;
+  obj_info->filename = filename;
   obstack_init (&fn_info);
   num_fn_info = 0;
   curr_fn_info = 0;
-  {
-size_t len = strlen (filename) + 1;
-char *str_dup = (char*) xmalloc (len);
 
-memcpy (str_dup, filename, len);
-obj_info->filename = str_dup;
-  }
 
   /* Read stamp.  */
   obj_info->stamp = gcov_read_unsigned ();
@@ -415,7 +410,7 @@ ftw_read_file (const char *filename,
   if (verbose)
 fnotice (stderr, "reading file: %s\n", filename);
 
-  obj_info = read_gcda_file (filename);
+  obj_info = read_gcda_file (xstrdup (filename));
   if (!obj_info)
 return 0;
 
-- 
2.34.1



[RFC/gcov 08/12] gcov: Move prepend to list to read_gcda_file()

2022-03-31 Thread Sebastian Huber
This helps to reuse read_gcda_file().

libgcc/

* libgcov-util.c (read_gcda_file): Prepend new info object to global
list.
(ftw_read_file): Remove list append here.
---
 libgcc/libgcov-util.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index ae5712c0138..d628c71479e 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -301,6 +301,9 @@ read_gcda_file (const char *filename)
   num_fn_info = 0;
   curr_fn_info = 0;
 
+  /* Prepend to gcov info list */
+  obj_info->next = gcov_info_head;
+  gcov_info_head = obj_info;
 
   /* Read stamp.  */
   obj_info->stamp = gcov_read_unsigned ();
@@ -392,7 +395,6 @@ ftw_read_file (const char *filename,
 {
   int filename_len;
   int suffix_len;
-  struct gcov_info *obj_info;
 
   /* Only read regular files.  */
   if (type != FTW_F)
@@ -410,12 +412,7 @@ ftw_read_file (const char *filename,
   if (verbose)
 fnotice (stderr, "reading file: %s\n", filename);
 
-  obj_info = read_gcda_file (xstrdup (filename));
-  if (!obj_info)
-return 0;
-
-  obj_info->next = gcov_info_head;
-  gcov_info_head = obj_info;
+  (void)read_gcda_file (xstrdup (filename));
 
   return 0;
 }
-- 
2.34.1



[RFC/gcov 09/12] gcov: Move gcov_open() to caller of read_gcda_file()

2022-03-31 Thread Sebastian Huber
This allows to reuse read_gcda_file() to read multiple objects from a single
file.

libgcc/

* libgcov-util.c (read_gcda_file): Do not open file.
(ftw_read_file): Open file here.
---
 libgcc/libgcov-util.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index d628c71479e..03902ed10b1 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -268,17 +268,10 @@ read_gcda_file (const char *filename)
 k_ctrs_mask[i] = 0;
   k_ctrs_types = 0;
 
-  if (!gcov_open (filename, 1))
-{
-  fnotice (stderr, "%s:cannot open\n", filename);
-  return NULL;
-}
-
   /* Read magic.  */
   if (!gcov_magic (gcov_read_unsigned (), GCOV_DATA_MAGIC))
 {
   fnotice (stderr, "%s:not a gcov data file\n", filename);
-  gcov_close ();
   return NULL;
 }
 
@@ -287,7 +280,6 @@ read_gcda_file (const char *filename)
   if (version != GCOV_VERSION)
 {
   fnotice (stderr, "%s:incorrect gcov version %d vs %d \n", filename, 
version, GCOV_VERSION);
-  gcov_close ();
   return NULL;
 }
 
@@ -379,7 +371,6 @@ read_gcda_file (const char *filename)
 }
 
   read_gcda_finalize (obj_info);
-  gcov_close ();
 
   return obj_info;
 }
@@ -412,7 +403,14 @@ ftw_read_file (const char *filename,
   if (verbose)
 fnotice (stderr, "reading file: %s\n", filename);
 
+  if (!gcov_open (filename, 1))
+{
+  fnotice (stderr, "%s:cannot open\n", filename);
+  return 0;
+}
+
   (void)read_gcda_file (xstrdup (filename));
+  gcov_close ();
 
   return 0;
 }
-- 
2.34.1



[RFC/gcov 05/12] gcov: Add __gcov_filename_to_gcfn()

2022-03-31 Thread Sebastian Huber
gcc/

* gcov-io.h (GCOV_FILENAME_MAGIC): Define and document.

libgcc/

* gcov.h (__gcov_info_to_gcda): Mention __gcov_filename_to_gcfn().
(__gcov_filename_to_gcfn): Declare and document.
* libgcov-driver.c (dump_string): New.
(__gcov_filename_to_gcfn): Likewise.
---
 gcc/gcov-io.h   | 24 
 libgcc/gcov.h   | 17 -
 libgcc/libgcov-driver.c | 30 ++
 3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 204ae0ccf7f..30947634d73 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -60,14 +60,21 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
file : int32:magic int32:version int32:stamp record*
 
-   The magic ident is different for the notes and the data files.  The
-   magic ident is used to determine the endianness of the file, when
-   reading.  The version is the same for both files and is derived
-   from gcc's version number. The stamp value is used to synchronize
-   note and data files and to synchronize merging within a data
-   file. It need not be an absolute time stamp, merely a ticker that
-   increments fast enough and cycles slow enough to distinguish
-   different compile/run/compile cycles.
+   A filename header may be used to provide a filename for the data in
+   a stream of data to support gcov in freestanding environments.  This
+   header is used by the merge-stream subcommand of the gcov-tool.  The
+   format of the filename header is
+
+   filename-header : int32:magic int32:version string
+
+   The magic ident is different for the notes and the data files as
+   well as the filename header.  The magic ident is used to determine
+   the endianness of the file, when reading.  The version is the same
+   for both files and is derived from gcc's version number. The stamp
+   value is used to synchronize note and data files and to synchronize
+   merging within a data file. It need not be an absolute time stamp,
+   merely a ticker that increments fast enough and cycles slow enough
+   to distinguish different compile/run/compile cycles.
 
Although the ident and version are formally 32 bit numbers, they
are derived from 4 character ASCII strings.  The version number
@@ -228,6 +235,7 @@ typedef uint64_t gcov_type_unsigned;
 /* File magic. Must not be palindromes.  */
 #define GCOV_DATA_MAGIC ((gcov_unsigned_t)0x67636461) /* "gcda" */
 #define GCOV_NOTE_MAGIC ((gcov_unsigned_t)0x67636e6f) /* "gcno" */
+#define GCOV_FILENAME_MAGIC ((gcov_unsigned_t)0x6763666e) /* "gcfn" */
 
 #include "version.h"
 
diff --git a/libgcc/gcov.h b/libgcc/gcov.h
index cea93023920..cdd4206f625 100644
--- a/libgcc/gcov.h
+++ b/libgcc/gcov.h
@@ -43,7 +43,8 @@ extern void __gcov_dump (void);
stream.  The ALLOCATE_FN callback shall allocate memory with a size in
characters specified by the first callback parameter.  The ARG parameter is
a user-provided argument passed as the last argument to the callback
-   functions.  */
+   functions.  It is recommended to use the __gcov_filename_to_gcfn()
+   in the filename callback function.  */
 
 extern void
 __gcov_info_to_gcda (const struct gcov_info *__info,
@@ -52,4 +53,18 @@ __gcov_info_to_gcda (const struct gcov_info *__info,
 void *(*__allocate_fn) (unsigned, void *),
 void *__arg);
 
+/* Convert the FILENAME to a gcfn data stream.  The DUMP_FN callback is
+   subsequently called with chunks (the begin and length of the chunk are
+   passed as the first two callback parameters) of the gcfn data stream.
+   The ARG parameter is a user-provided argument passed as the last
+   argument to the DUMP_FN callback function.  This function is intended
+   to be used by the filename callback of __gcov_info_to_gcda().  The gcfn
+   data stream is used by the merge-stream subcommand of the gcov-tool to
+   get the filename associated with a gcda data stream.  */
+
+extern void
+__gcov_filename_to_gcfn (const char *__filename,
+void (*__dump_fn) (const void *, unsigned, void *),
+void *__arg);
+
 #endif /* GCC_GCOV_H */
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 10831e84b61..a44054a3cb3 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -410,6 +410,23 @@ dump_counter (gcov_type counter,
 dump_unsigned (0, dump_fn, arg);
 }
 
+/* Dump the STRING using the DUMP handler called with ARG.  */
+
+static inline void
+dump_string (const char *string,
+void (*dump_fn) (const void *, unsigned, void *),
+void *arg)
+{
+  unsigned length = 0;
+
+  if (string)
+length = strlen (string) + 1;
+
+  dump_unsigned (length, dump_fn, arg);
+  if (string)
+(*dump_fn) (string, length, arg);
+}
+
 #define MAX(X,Y) ((X) > (Y) ? (X) : (Y))
 
 /* Store all TOP N counters where each has a dynamic length.  */
@@ -780,4 +797,

[RFC/gcov 11/12] gcov: Record EOF error during read

2022-03-31 Thread Sebastian Huber
Use an enum for file error codes.

gcc/

* gcov-io.cc (gcov_file_error): New enum.
(gcov_var): Use gcov_file_error enum for the error member.
(gcov_open): Use GCOV_FILE_NO_ERROR.
(gcov_close): Use GCOV_FILE_WRITE_ERROR.
(gcov_write): Likewise.
(gcov_write_unsigned): Likewise.
(gcov_write_string): Likewise.
(gcov_read_bytes): Set error code if EOF is reached.
(gcov_read_counter): Use GCOV_FILE_COUNTER_OVERFLOW.
---
 gcc/gcov-io.cc | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/gcov-io.cc b/gcc/gcov-io.cc
index 177f81166a6..60c762bf3a3 100644
--- a/gcc/gcov-io.cc
+++ b/gcc/gcov-io.cc
@@ -29,10 +29,17 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 static gcov_unsigned_t *gcov_read_words (void *buffer, unsigned);
 
+enum gcov_file_error {
+  GCOV_FILE_COUNTER_OVERFLOW = -1,
+  GCOV_FILE_NO_ERROR = 0,
+  GCOV_FILE_WRITE_ERROR = 1,
+  GCOV_FILE_EOF = 2
+};
+
 struct gcov_var
 {
   FILE *file;
-  int error;   /* < 0 overflow, > 0 disk error.  */
+  enum gcov_file_error error;
   int mode;/* < 0 writing, > 0 reading.  */
   int endian;  /* Swap endianness.  */
 #ifdef IN_GCOV_TOOL
@@ -113,7 +120,7 @@ gcov_open (const char *name, int mode)
 #endif
 
   gcov_nonruntime_assert (!gcov_var.file);
-  gcov_var.error = 0;
+  gcov_var.error = GCOV_FILE_NO_ERROR;
 #if !IN_LIBGCOV || defined (IN_GCOV_TOOL)
   gcov_var.endian = 0;
 #endif
@@ -217,7 +224,7 @@ gcov_close (void)
   if (gcov_var.file)
 {
   if (fclose (gcov_var.file))
-   gcov_var.error = 1;
+   gcov_var.error = GCOV_FILE_WRITE_ERROR;
 
   gcov_var.file = 0;
 }
@@ -253,7 +260,7 @@ gcov_write (const void *data, unsigned length)
 {
   gcov_unsigned_t r = fwrite (data, length, 1, gcov_var.file);
   if (r != 1)
-gcov_var.error = 1;
+gcov_var.error = GCOV_FILE_WRITE_ERROR;
 }
 
 /* Write unsigned VALUE to coverage file.  */
@@ -263,7 +270,7 @@ gcov_write_unsigned (gcov_unsigned_t value)
 {
   gcov_unsigned_t r = fwrite (&value, sizeof (value), 1, gcov_var.file);
   if (r != 1)
-gcov_var.error = 1;
+gcov_var.error = GCOV_FILE_WRITE_ERROR;
 }
 
 #if !IN_LIBGCOV
@@ -283,7 +290,7 @@ gcov_write_string (const char *string)
 {
   gcov_unsigned_t r = fwrite (string, length, 1, gcov_var.file);
   if (r != 1)
-   gcov_var.error = 1;
+   gcov_var.error = GCOV_FILE_WRITE_ERROR;
 }
 }
 #endif
@@ -385,7 +392,11 @@ gcov_read_bytes (void *buffer, unsigned count)
 
   unsigned read = fread (buffer, count, 1, gcov_var.file);
   if (read != 1)
-return NULL;
+{
+  if (feof (gcov_var.file))
+   gcov_var.error = GCOV_FILE_EOF;
+  return NULL;
+}
 
 #ifdef IN_GCOV_TOOL
   gcov_var.pos += count;
@@ -434,7 +445,7 @@ gcov_read_counter (void)
   if (sizeof (value) > sizeof (gcov_unsigned_t))
 value |= ((gcov_type) from_file (buffer[1])) << 32;
   else if (buffer[1])
-gcov_var.error = -1;
+gcov_var.error = GCOV_FILE_COUNTER_OVERFLOW;
 
   return value;
 }
-- 
2.34.1



[RFC/gcov 12/12] gcov-tool: Add merge-stream subcommand

2022-03-31 Thread Sebastian Huber
gcc/

* gcov-tool.cc (gcov_profile_merge_stream): Declare.
(print_merge_stream_usage_message): New.
(merge_stream_usage): Likewise.
(do_merge_stream): Likewise.
(print_usage): Call print_merge_stream_usage_message().
(main): Call do_merge_stream() to execute merge-stream subcommand.

libgcc/

* libgcov-util.c (consume_stream): New.
(get_target_profiles_for_merge): Likewise.
(gcov_profile_merge_stream): Likewise.
---
 gcc/gcov-tool.cc  | 76 
 libgcc/libgcov-util.c | 80 +++
 2 files changed, 156 insertions(+)

diff --git a/gcc/gcov-tool.cc b/gcc/gcov-tool.cc
index d712715cf7e..d8572b184e9 100644
--- a/gcc/gcov-tool.cc
+++ b/gcc/gcov-tool.cc
@@ -42,6 +42,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 extern struct gcov_info *gcov_profile_merge (struct gcov_info*,
 struct gcov_info*, int, int);
+extern struct gcov_info *gcov_profile_merge_stream (const char *, int, int);
 extern int gcov_profile_overlap (struct gcov_info*, struct gcov_info*);
 extern int gcov_profile_normalize (struct gcov_info*, gcov_type);
 extern int gcov_profile_scale (struct gcov_info*, float, int, int);
@@ -229,6 +230,78 @@ do_merge (int argc, char **argv)
   return profile_merge (argv[optind], argv[optind+1], output_dir, w1, w2);
 }
 
+/* Usage message for profile merge-stream.  */
+
+static void
+print_merge_stream_usage_message (int error_p)
+{
+  FILE *file = error_p ? stderr : stdout;
+
+  fnotice (file, "  merge-stream [options] [stream-file]  Merge coverage 
stream file (or stdin)\n"
+"and coverage file 
contents\n");
+  fnotice (file, "-v, --verbose   Verbose mode\n");
+  fnotice (file, "-w, --weight Set weights (float 
point values)\n");
+}
+
+static const struct option merge_stream_options[] =
+{
+  { "verbose",no_argument,   NULL, 'v' },
+  { "weight", required_argument, NULL, 'w' },
+  { 0, 0, 0, 0 }
+};
+
+/* Print merge-stream usage and exit.  */
+
+static void ATTRIBUTE_NORETURN
+merge_stream_usage (void)
+{
+  fnotice (stderr, "Merge-stream subcomand usage:");
+  print_merge_stream_usage_message (true);
+  exit (FATAL_EXIT_CODE);
+}
+
+/* Driver for profile merge-stream sub-command.  */
+
+static int
+do_merge_stream (int argc, char **argv)
+{
+  int opt;
+  int w1 = 1, w2 = 1;
+  struct gcov_info *merged_profile;
+
+  optind = 0;
+  while ((opt = getopt_long (argc, argv, "vw:",
+merge_stream_options, NULL)) != -1)
+{
+  switch (opt)
+   {
+   case 'v':
+ verbose = true;
+ gcov_set_verbose ();
+ break;
+   case 'w':
+ sscanf (optarg, "%d,%d", &w1, &w2);
+ if (w1 < 0 || w2 < 0)
+   fatal_error (input_location, "weights need to be non-negative");
+ break;
+   default:
+ merge_stream_usage ();
+   }
+}
+
+  if (argc - optind > 1)
+merge_stream_usage ();
+
+  merged_profile = gcov_profile_merge_stream (argv[optind], w1, w2);
+
+  if (merged_profile)
+gcov_do_dump (merged_profile, 0, -1);
+  else if (verbose)
+fnotice (stdout, "no profile files were merged\n");
+
+  return 0;
+}
+
 /* If N_VAL is no-zero, normalize the profile by setting the largest counter
counter value to N_VAL and scale others counters proportionally.
Otherwise, multiply the all counters by SCALE.  */
@@ -505,6 +578,7 @@ print_usage (int error_p)
   fnotice (file, "  -h, --helpPrint this help, 
then exit\n");
   fnotice (file, "  -v, --version Print version 
number, then exit\n");
   print_merge_usage_message (error_p);
+  print_merge_stream_usage_message (error_p);
   print_rewrite_usage_message (error_p);
   print_overlap_usage_message (error_p);
   fnotice (file, "\nFor bug reporting instructions, please see:\n%s.\n",
@@ -594,6 +668,8 @@ main (int argc, char **argv)
 
   if (!strcmp (sub_command, "merge"))
 return do_merge (argc - optind, argv + optind);
+  else if (!strcmp (sub_command, "merge-stream"))
+return do_merge_stream (argc - optind, argv + optind);
   else if (!strcmp (sub_command, "rewrite"))
 return do_rewrite (argc - optind, argv + optind);
   else if (!strcmp (sub_command, "overlap"))
diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index 622d5a9dc71..0fe60528b48 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -735,6 +735,86 @@ gcov_profile_merge (struct gcov_info *tgt_profile, struct 
gcov_info *src_profile
   return tgt_profile;
 }
 
+struct gcov_info *
+consume_stream (const char *filename)
+{
+  read_profile_dir_init ();
+
+  while (true)
+{
+  unsigned version;
+  const char *filename_of_info;
+  struct gcov_info *obj_info;
+

[RFC/gcov 10/12] gcov: Fix integer types in ftw_read_file()

2022-03-31 Thread Sebastian Huber
libgcc/

* libgcov-util.c (ftw_read_file): Use size_t for strlen() variables.
---
 libgcc/libgcov-util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index 03902ed10b1..622d5a9dc71 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -384,8 +384,8 @@ ftw_read_file (const char *filename,
const struct stat *status ATTRIBUTE_UNUSED,
int type)
 {
-  int filename_len;
-  int suffix_len;
+  size_t filename_len;
+  size_t suffix_len;
 
   /* Only read regular files.  */
   if (type != FTW_F)
-- 
2.34.1



Re: try multi dest registers in default_zero_call_used_regs

2022-03-31 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches  writes:
> When the mode of regno_reg_rtx is not hard_regno_mode_ok for the
> target, try grouping the register with subsequent ones.  This enables
> s16 to s31 and their hidden pairs to be zeroed with the default logic
> on some arm variants.
>
> Regstrapped on x86_64-linux-gnu, also tested on an affected arm
> configuration.  Ok to install?
>
>
> for  gcc/ChangeLog
>
>   * targhooks.c (default_zero_call_used_regs): Attempt to group
>   regs that the target refuses to use in their natural modes.

Thanks for doing this.  Some comments below…

> ---
>  gcc/targhooks.cc |   79 
> --
>  1 file changed, 70 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index fc49235eb38ee..bdaab9c63c7ee 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -1035,16 +1035,45 @@ default_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>{
>   rtx_insn *last_insn = get_last_insn ();
> - machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
> + rtx regno_rtx = regno_reg_rtx[regno];
> + machine_mode mode = GET_MODE (regno_rtx);
> +
> + /* If the natural mode doesn't work, try some wider mode.  */
> + if (!targetm.hard_regno_mode_ok (regno, mode))
> +   {
> + for (int nregs = 2;
> +  regno + nregs <= FIRST_PSEUDO_REGISTER
> +&& TEST_HARD_REG_BIT (need_zeroed_hardregs,
> +  regno + nregs - 1);
> +  nregs++)
> +   {
> + mode = choose_hard_reg_mode (regno, nregs, 0);

I like the idea, but it would be good to avoid the large:

  FIRST_PSEUDO_REGISTER * FIRST_PSEUDO_REGISTER * NUM_MACHINE_MODES

constant factor.  How about if init_reg_modes_target recorded the
maximum value of x_hard_regno_nregs?

> + if (mode == E_VOIDmode)
> +   continue;
> + gcc_checking_assert (targetm.hard_regno_mode_ok (regno, mode));
> + regno_rtx = gen_rtx_REG (mode, regno);
> + break;
> +   }
> + if (mode != GET_MODE (regno_rtx)
> + || regno_rtx == regno_reg_rtx[regno])
> +   {
> + SET_HARD_REG_BIT (failed, regno);
> + continue;
> +   }
> +   }
> +
>   rtx zero = CONST0_RTX (mode);
> - rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
> + rtx_insn *insn = emit_move_insn (regno_rtx, zero);
>   if (!valid_insn_p (insn))
> {
>   SET_HARD_REG_BIT (failed, regno);
>   delete_insns_since (last_insn);
> }
>   else
> -   progress = true;
> +   {
> + progress = true;
> + regno += hard_regno_nregs (regno, mode) - 1;
> +   }
>}
>  
>/* Now retry with copies from zeroed registers, as long as we've
> @@ -1060,7 +1089,34 @@ default_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>   if (TEST_HARD_REG_BIT (retrying, regno))
> {
> - machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
> + rtx regno_rtx = regno_reg_rtx[regno];
> + machine_mode mode = GET_MODE (regno_rtx);
> +
> + /* If the natural mode doesn't work, try some wider mode.  */
> + if (!targetm.hard_regno_mode_ok (regno, mode))
> +   {
> + for (int nregs = 2;
> +  regno + nregs <= FIRST_PSEUDO_REGISTER
> +&& TEST_HARD_REG_BIT (need_zeroed_hardregs,
> +  regno + nregs - 1);
> +  nregs++)
> +   {
> + mode = choose_hard_reg_mode (regno, nregs, 0);
> + if (mode == E_VOIDmode)
> +   continue;
> + gcc_checking_assert (targetm.hard_regno_mode_ok (regno,
> +  mode));
> + regno_rtx = gen_rtx_REG (mode, regno);
> + break;
> +   }
> + if (mode != GET_MODE (regno_rtx)
> + || regno_rtx == regno_reg_rtx[regno])
> +   {
> + SET_HARD_REG_BIT (failed, regno);
> + continue;
> +   }
> +   }
> + 

This seems big enough to be worth splitting out into a helper, rather
than repeating.  That should also simplify the failure detection:
the helper can return nonnull on success and null on failure.

>   bool success = false;
>   /* Look for a source.  */
>   for (unsigned int src = 0; src < FIRST_PSEUDO_REGISTER; src++)
> @@ -1086,8 +1142,10 @@ default_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>  
>   /* SRC is usable, try to copy from it.  */
>   rtx_insn *last_insn = get_last_ins

Re: [PATCH] ipa: Careful processing ANCESTOR jump functions and NULL pointers (PR 103083)

2022-03-31 Thread Jan Hubicka via Gcc-patches
> IPA_JF_ANCESTOR jump functions are constructed also when the formal
> parameter of the caller is first checked whether it is NULL and left
> as it is if it is NULL, to accommodate C++ casts to an ancestor class.
> 
> The jump function type was invented for devirtualization and IPA-CP
> propagation of tree constants is also careful to apply it only to
> existing DECLs(*) but as PR 103083 shows, the part propagating "known
> bits" was not careful about this, which can lead to miscompilations.
> 
> This patch introduces a flag to the ancestor jump functions which
> tells whether a NULL-check was elided when creating it and makes the
> bits propagation behave accordingly, masking any bits otherwise would
> be known to be one.  This should safely preserve alignment info, which
> is the primary ifnormation that we keep in bits for pointers.
> 
> (*) There still may remain problems when a DECL resides on address
> zero (with -fno-delete-null-pointer-checks ...I hope it cannot happen
> otherwise).  I am looking into that now but I think it will be easier
> for everyone if I do so in a follow-up patch.
> 
> gcc/ChangeLog:
> 
> 2022-02-11  Martin Jambor  
> 
>   PR ipa/103083
>   * ipa-prop.h (ipa_ancestor_jf_data): New flag keep_null;
>   (ipa_get_jf_ancestor_keep_null): New function.
>   * ipa-prop.c (ipa_set_ancestor_jf): Initialize keep_null field of the
>   ancestor function.
>   (compute_complex_assign_jump_func): Pass false to keep_null
>   parameter of ipa_set_ancestor_jf.
>   (compute_complex_ancestor_jump_func): Pass true to keep_null
>   parameter of ipa_set_ancestor_jf.
>   (update_jump_functions_after_inlining): Carry over keep_null from the
>   original ancestor jump-function or merge them.
>   (ipa_write_jump_function): Stream keep_null flag.
>   (ipa_read_jump_function): Likewise.
>   (ipa_print_node_jump_functions_for_edge): Print the new flag.
>   * ipa-cp.c (class ipcp_bits_lattice): Make various getters const.  New
>   member function known_nonzero_p.
>   (ipcp_bits_lattice::known_nonzero_p): New.
>   (ipcp_bits_lattice::meet_with_1): New parameter drop_all_ones,
>   observe it.
>   (ipcp_bits_lattice::meet_with): Likewise.
>   (propagate_bits_across_jump_function): Simplify.  Pass true in
>   drop_all_ones when it is necessary.
>   (propagate_aggs_across_jump_function): Take care of keep_null
>   flag.
>   (ipa_get_jf_ancestor_result): Propagate NULL accross keep_null
>   jump functions.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-11-25  Martin Jambor  
> 
>   * gcc.dg/ipa/pr103083-1.c: New test.
>   * gcc.dg/ipa/pr103083-2.c: Likewise.

OK,
thanks!
Honza


Re: [PATCH] ipa: Create LOAD references when necessary during inlining (PR 103171)

2022-03-31 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> in r12-2523-g13586172d0b70c ipa-prop tracking of jump functions during
> inlining got the ability to remove ADDR references when inlining
> discovered that they were not necessary or turn them into LOAD
> references when we know that what was a function call argument passed
> by reference will end up as a load (one or more).
> 
> Unfortunately, the code only creates the LOAD references when
> replacing removed ADDR references and PR 103171 showed that with some
> ordering of inlining, we need to add the LOAD reference before we know
> we can remove the ADDR one - or the reference will be lost, leading to
> link errors or even ICEs.
> 
> Specifically in testcase gcc.dg/lto/pr103171_1.c added in this patch,
> if foo() is inlined to entry(), we need to create the LOAD reference
> so that when later bar() is inlined into foo() and we discover that
> the paameter is unused, we can remove the ADDR reference and still
> keep the varaible around for the load.
> 
> Bootstrapped, LTO bootstrapped and tested on x86_64-linux.  OK for
> trunk?

OK,
Thanks!
Honza
> 
> Thanks,
> 
> Martin
> 
> 
> 
> gcc/ChangeLog:
> 
> 2022-01-28  Martin Jambor  
> 
>   PR ipa/103171
>   * ipa-prop.cc (propagate_controlled_uses): Add a LOAD reference
>   always when an ADDR_EXPR constant is known to reach a load because
>   of inlining, not just when removing an ADDR reference.
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-01-28  Martin Jambor  
> 
>   PR ipa/103171
>   * gcc.dg/ipa/remref-6.c: Adjust dump scan string.
>   * gcc.dg/ipa/remref-7.c: New test.
>   * gcc.dg/lto/pr103171_0.c: New test.
>   * gcc.dg/lto/pr103171_1.c: Likewise.
> ---
>  gcc/ipa-prop.cc   | 30 ---
>  gcc/testsuite/gcc.dg/ipa/remref-6.c   |  2 +-
>  gcc/testsuite/gcc.dg/ipa/remref-7.c   | 33 +
>  gcc/testsuite/gcc.dg/lto/pr103171_0.c | 11 +
>  gcc/testsuite/gcc.dg/lto/pr103171_1.c | 35 +++
>  5 files changed, 96 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr103171_0.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr103171_1.c
> 
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index e55fe2776f2..72aa3e2f60d 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -4181,6 +4181,20 @@ propagate_controlled_uses (struct cgraph_edge *cs)
> int d = ipa_get_controlled_uses (old_root_info, i);
> int c = rdesc->refcount;
> rdesc->refcount = combine_controlled_uses_counters (c, d);
> +   if (rdesc->refcount != IPA_UNDESCRIBED_USE
> +   && ipa_get_param_load_dereferenced (old_root_info, i))
> + {
> +   tree cst = ipa_get_jf_constant (jf);
> +   gcc_checking_assert (TREE_CODE (cst) == ADDR_EXPR
> +&& (TREE_CODE (TREE_OPERAND (cst, 0))
> +== VAR_DECL));
> +   symtab_node *n = symtab_node::get (TREE_OPERAND (cst, 0));
> +   new_root->create_reference (n, IPA_REF_LOAD, NULL);
> +   if (dump_file)
> + fprintf (dump_file, "ipa-prop: Address IPA constant will reach "
> +  "a load so adding LOAD reference from %s to %s.\n",
> +  new_root->dump_name (), n->dump_name ());
> + }
> if (rdesc->refcount == 0)
>   {
> tree cst = ipa_get_jf_constant (jf);
> @@ -4193,20 +4207,8 @@ propagate_controlled_uses (struct cgraph_edge *cs)
> symtab_node *n = symtab_node::get (TREE_OPERAND (cst, 0));
> if (n)
>   {
> -   struct cgraph_node *clone;
> -   bool removed = remove_described_reference (n, rdesc);
> -   /* The reference might have been removed by IPA-CP.  */
> -   if (removed
> -   && ipa_get_param_load_dereferenced (old_root_info, i))
> - {
> -   new_root->create_reference (n, IPA_REF_LOAD, NULL);
> -   if (dump_file)
> - fprintf (dump_file, "ipa-prop: ...replaced it with "
> -  "LOAD one from %s to %s.\n",
> -  new_root->dump_name (), n->dump_name ());
> - }
> -
> -   clone = cs->caller;
> +   remove_described_reference (n, rdesc);
> +   cgraph_node *clone = cs->caller;
> while (clone->inlined_to
>&& clone->ipcp_clone
>&& clone != rdesc->cs->caller)
> diff --git a/gcc/testsuite/gcc.dg/ipa/remref-6.c 
> b/gcc/testsuite/gcc.dg/ipa/remref-6.c
> index 7deae3114a4..f31f4c14319 100644
> --- a/gcc/testsuite/gcc.dg/ipa/remref-6.c
> +++ b/gcc/testsuite/gcc.dg/ipa/remref-6.c
> @@ -20,5 +20,5 @@ void entry()
>  }
>  
>  /* { dg-final { scan-ipa-dump "Removed a reference"  "inline" } }  */
> -/* { dg-f

Re: [PATCH] ipa-cp: Do not create clones for values outside known value range (PR 102513)

2022-03-31 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> PR 102513 shows we emit bogus array access warnings when IPA-CP
> creates clones specialized for values which it deduces from arithmetic
> jump functions describing self-recursive calls.  Those can however be
> avoided if we consult the IPA-VR information that the same pass also
> has.
> 
> The patch below does that at the stage when normally values are only
> examined for profitability.  It would be better not to create lattices
> describing such bogus values in the first place, however that presents
> an ordering problem, the pass currently propagates all information,
> and so both constants and VR, in no particular order when processing
> SCCs, and so this approach seemed much simpler.
> 
> I plan to rearrange the pass so that it clones in multiple passes over
> the call graph (or rather the lattice dependence graph) and it feels
> natural to only do propagation for these kinds of recursion in the
> second or later passes, which would fix the issue more elegantly.
> 
> Bootstrapped and tested on x86_64, OK for trunk (and perhaps also for
> GCC 11)?

Looks OK.  I wonder if we can do that more genrally to avoid inlining in
such cases as well?

Honza
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2022-02-14  Martin Jambor  
> 
>   PR ipa/102513
>   * ipa-cp.cc (decide_whether_version_node): Skip scalar values
>   which do not fit the known value_range.
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-02-14  Martin Jambor  
> 
>   PR ipa/102513
>   * gcc.dg/ipa/pr102513.c: New test.
> ---
>  gcc/ipa-cp.cc   | 28 ++--
>  gcc/testsuite/gcc.dg/ipa/pr102513.c | 33 +
>  2 files changed, 59 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr102513.c
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 453e9c93cc3..ec37632d487 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6154,8 +6154,32 @@ decide_whether_version_node (struct cgraph_node *node)
>   {
> ipcp_value *val;
> for (val = lat->values; val; val = val->next)
> - ret |= decide_about_value (node, i, -1, val, &avals,
> -&self_gen_clones);
> + {
> +   /* If some values generated for self-recursive calls with
> +  arithmetic jump functions fall outside of the known
> +  value_range for the parameter, we can skip them.  VR interface
> +  supports this only for integers now.  */
> +   if (TREE_CODE (val->value) == INTEGER_CST
> +   && !plats->m_value_range.bottom_p ()
> +   && !plats->m_value_range.m_vr.contains_p (val->value))
> + {
> +   /* This can happen also if a constant present in the source
> +  code falls outside of the range of parameter's type, so we
> +  cannot assert.  */
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fprintf (dump_file, " - skipping%s value ",
> +val->self_recursion_generated_p ()
> +? " self_recursion_generated" : "");
> +   print_ipcp_constant_value (dump_file, val->value);
> +   fprintf (dump_file, " because it is outside known "
> +"value range.\n");
> + }
> +   continue;
> + }
> +   ret |= decide_about_value (node, i, -1, val, &avals,
> +  &self_gen_clones);
> + }
>   }
>  
>if (!plats->aggs_bottom)
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr102513.c 
> b/gcc/testsuite/gcc.dg/ipa/pr102513.c
> new file mode 100644
> index 000..9ee5431b730
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr102513.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -Warray-bounds" } */
> +
> +extern int block2[7][256];
> +
> +static int encode_block(int block2[7][256], unsigned level)
> +{
> +int best_score = 0;
> +
> +for (unsigned x = 0; x < level; x++) {
> +int v = block2[1][x];
> +block2[level][x] = 0;
> +best_score += v * v;
> +}
> +
> +if (level > 0 && best_score > 64) {
> +int score = 0;
> +
> +score += encode_block(block2, level - 1);
> +score += encode_block(block2, level - 1);
> +
> +if (score < best_score) {
> +best_score = score;
> +}
> +}
> +
> +return best_score;
> +}
> +
> +int foo(void)
> +{
> +return encode_block(block2, 5);
> +}
> -- 
> 2.34.1
> 


Re: [pushed] c++: parse trivial DMI immediately [PR96645]

2022-03-31 Thread Patrick Palka via Gcc-patches
On Wed, 30 Mar 2022, Jason Merrill via Gcc-patches wrote:

> The recent change to reject __is_constructible for nested classes with DMI
> is breaking some code loudly that was previously only silently broken.
> Let's allow simple cases by immediately parsing DMI that do no name lookup;
> then being in complete class scope makes no difference.

Not sure if this is a problem in practice but it seems the initializer
processing step of cp_parser_late_parse_one_default_arg may involve name
lookup even if the parse itself didn't:

  struct A {
struct B {
  const A &a = {};
};
  };

We used to accept this (very contrived example), but now reject with

  error: invalid use of incomplete type ‘const struct A’

> 
> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
>   PR c++/96645
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_early_parsing_nsdmi): New.
>   (cp_parser_member_declaration): Call it.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/nsdmi10.C: Now OK.
>   * g++.dg/ext/is_constructible3.C: Likewise.
>   * g++.dg/ext/is_constructible7.C: Likewise.
> ---
>  gcc/cp/parser.cc | 28 +++-
>  gcc/testsuite/g++.dg/cpp0x/nsdmi10.C |  4 +--
>  gcc/testsuite/g++.dg/ext/is_constructible3.C |  2 +-
>  gcc/testsuite/g++.dg/ext/is_constructible7.C |  3 +--
>  4 files changed, 31 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 7e1c777364e..63c8af1c722 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -2701,6 +2701,8 @@ static tree cp_parser_late_parse_one_default_arg
>(cp_parser *, tree, tree, tree);
>  static void cp_parser_late_parsing_nsdmi
>(cp_parser *, tree);
> +static bool cp_parser_early_parsing_nsdmi
> +  (cp_parser *, tree);
>  static void cp_parser_late_parsing_default_args
>(cp_parser *, tree);
>  static tree cp_parser_sizeof_operand
> @@ -27478,7 +27480,8 @@ cp_parser_member_declaration (cp_parser* parser)
> if (DECL_DECLARES_FUNCTION_P (decl))
>   cp_parser_save_default_args (parser, STRIP_TEMPLATE (decl));
> else if (TREE_CODE (decl) == FIELD_DECL
> -&& DECL_INITIAL (decl))
> +&& DECL_INITIAL (decl)
> +&& !cp_parser_early_parsing_nsdmi (parser, decl))
>   /* Add DECL to the queue of NSDMI to be parsed later.  */
>   vec_safe_push (unparsed_nsdmis, decl);
>   }
> @@ -32292,6 +32295,29 @@ cp_parser_late_parsing_nsdmi (cp_parser *parser, 
> tree field)
>DECL_INITIAL (field) = def;
>  }
>  
> +/* If the DEFERRED_PARSE for FIELD is safe to parse immediately, do so.
> +   Returns true if deferred parsing is no longer needed.  */
> +
> +static bool
> +cp_parser_early_parsing_nsdmi (cp_parser *parser, tree field)
> +{
> +  tree init = DECL_INITIAL (field);
> +  if (TREE_CODE (init) != DEFERRED_PARSE)
> +return true;
> +
> +  cp_token_cache *tokens = DEFPARSE_TOKENS (init);
> +  for (cp_token *p = tokens->first; p != tokens->last; ++p)
> +if (p->type == CPP_NAME
> + || p->keyword == RID_THIS
> + || p->keyword == RID_OPERATOR)
> +  /* There's a name to look up or 'this', give up.  */
> +  return false;
> +
> +  /* It's trivial, parse now.  */
> +  cp_parser_late_parsing_nsdmi (parser, field);
> +  return true;
> +}
> +
>  /* FN is a FUNCTION_DECL which may contains a parameter with an
> unparsed DEFERRED_PARSE.  Parse the default args now.  This function
> assumes that the current scope is the scope in which the default
> diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi10.C 
> b/gcc/testsuite/g++.dg/cpp0x/nsdmi10.C
> index d8588b7f29e..a965f7bc333 100644
> --- a/gcc/testsuite/g++.dg/cpp0x/nsdmi10.C
> +++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi10.C
> @@ -6,7 +6,7 @@ struct A1 {
>  int y1 = 1;
>};
>  
> -  A1(const B1& opts = B1()) {}  // { dg-error "default member initializer" }
> +  A1(const B1& opts = B1()) {}
>  };
>  
>  struct A2 {
> @@ -14,5 +14,5 @@ struct A2 {
>  int x2, y2 = 1;
>};
>  
> -  A2(const B2& opts = B2()) {}  // { dg-error "default member initializer" }
> +  A2(const B2& opts = B2()) {}
>  };
> diff --git a/gcc/testsuite/g++.dg/ext/is_constructible3.C 
> b/gcc/testsuite/g++.dg/ext/is_constructible3.C
> index 305751d28e2..c7c58746cd0 100644
> --- a/gcc/testsuite/g++.dg/ext/is_constructible3.C
> +++ b/gcc/testsuite/g++.dg/ext/is_constructible3.C
> @@ -8,7 +8,7 @@ struct A {
>  B() = default;
>};
>  
> -  static constexpr bool v = __is_constructible (B); // { dg-error "member 
> initializer" }
> +  static constexpr bool v = __is_constructible (B);
>  
>  };
>  
> diff --git a/gcc/testsuite/g++.dg/ext/is_constructible7.C 
> b/gcc/testsuite/g++.dg/ext/is_constructible7.C
> index 76a63bba5d0..013a1df03c6 100644
> --- a/gcc/testsuite/g++.dg/ext/is_constructible7.C
> +++ b/gcc/testsuite/g++.dg/ext/is_constructible7.C
> @@ -12,7 +12,7 @@ using true_type = bool_consta

Re: [PATCH] tree-optimization/104912 - ensure cost model is checked first

2022-03-31 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following makes sure that when we build the versioning condition
> for vectorization including the cost model check, we check for the
> cost model and branch over other versioning checks.  That is what
> the cost modeling assumes, since the cost model check is the only
> one accounted for in the scalar outside cost.  Currently we emit
> all checks as straight-line code combined with bitwise ops which
> can result in surprising ordering of checks in the final assembly.

Yeah, this had bugged me too, and meant that we made some bad
decisions in some of the local benchmarks we use.  Was just afraid
to poke at it, since it seemed like a deliberate decision. :-)

> Since loop_version accepts only a single versioning condition
> the splitting is done after the fact.
>
> The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> with -Ofast and tuning for generic or skylake.  That's not enough
> to recover from the slowdown when vectorizing but it now cuts off
> the expensive alias versioning test.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?
>
> For the rest of the regression my plan is to somehow factor in
> the evolution of the number of iterations in the outer loop
> (which is {1, +, 1}) to somehow bump the static profitability
> estimate and together with the "cheap" cost model check never
> execute the vectorized version (well, it is actually never executed,
> but only because the alias check fails).
>
> Thanks,
> Richard.
>
> 2022-03-21  Richard Biener  
>
>   PR tree-optimization/104912
>   * tree-vect-loop-manip.cc (vect_loop_versioning): Split
>   the cost model check to a separate BB to make sure it is
>   checked first and not combined with other version checks.
> ---
>  gcc/tree-vect-loop-manip.cc | 53 ++---
>  1 file changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index a7bbc916bbc..8ef333eb31b 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>   cond_expr = expr;
>  }
>  
> +  tree cost_name = NULL_TREE;
> +  if (cond_expr
> +  && !integer_truep (cond_expr)
> +  && (version_niter
> +   || version_align
> +   || version_alias
> +   || version_simd_if_cond))
> +cost_name = cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> + &cond_expr_stmt_list,
> + is_gimple_val, NULL_TREE);
> +
>if (version_niter)
>  vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
>  
>if (cond_expr)
> -cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> - &cond_expr_stmt_list,
> - is_gimple_condexpr, NULL_TREE);
> +{
> +  gimple_seq tem = NULL;
> +  cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> +   &tem,
> +   is_gimple_condexpr, NULL_TREE);
> +  gimple_seq_add_seq (&cond_expr_stmt_list, tem);
> +}
>  
>if (version_align)
>  vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>update_ssa (TODO_update_ssa);
>  }
>  
> +  /* Split the cost model check off to a separate BB.  Costing assumes
> + this is the only thing we perform when we enter the scalar loop.  */

Maybe “…from a failed cost decision” or something?  Might sounds out
of context like it applied more generally.

> +  if (cost_name)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (cost_name);

I realise it should only happen very rarely if at all, but is it
absolutely guaranteed that the cost condition doesn't fold to a
constant?

> +  /* All uses of the cost check are 'true' after the check we
> +  are going to insert.  */
> +  replace_uses_by (cost_name, boolean_true_node);
> +  /* And we're going to build the new single use of it.  */
> +  gcond *cond = gimple_build_cond (NE_EXPR, cost_name, 
> boolean_false_node,
> +NULL_TREE, NULL_TREE);
> +  edge e = split_block (gimple_bb (def), def);
> +  gimple_stmt_iterator gsi = gsi_for_stmt (def);
> +  gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
> +  edge true_e, false_e;
> +  extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> +  e->flags &= ~EDGE_FALLTHRU;
> +  e->flags |= EDGE_TRUE_VALUE;
> +  edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
> +  e->probability = prob;
> +  e2->probability = prob.invert ();

Is reusing prob the right thing to do?  Wouldn't the path to the vector
loop end up with a probability of PROB^2?

> +  set_immediate_domina

Re: [patch]update the documentation for TARGET_ZERO_CALL_USED_REGS hook and add an assertion

2022-03-31 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
> Hi, 
>
> Per our discussion on: 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592002.html
>
> I come up with the following patch to:
>
> 1. Update the documentation for TARGET_ZERO_CALL_USED_REGS hook;
> 2. Add an assertion in function.cc to make sure the actually zeroed_regs is a 
> subset of all call used regs;
>(The reason I didn’t add a new parameter to TARGET_ZERO_CALL_USED_REGS is, 
> I think adding the 
> assertion in the common place function.cc is simpler to be implemented).

Yeah, that's fair.  I guess in theory, passing the parameter would allow
targets to choose between two versions of zeroing the register, one with
a temporary and one without.  But that's a purely hypothetical situation
and we could always add a parameter later if that turns out to be useful.

Perhaps more realistically, there might be other uses of the hook in
future that want to zero registers for different reasons, with their
own rules about which registers can be zeroed.  In other words, the
hook is providing a general facility that happens to be useful for
-fzero-call-used-regs.  But again, we can deal with that if it ever
happens.

So I agree this is the right call, especially for stage 4.

> 3. This new assertion identified a bug in i386 implementation. Fix this bug 
> in i386.
>
> This patch is bootstrapped on both x86 and aarch64, no regression.
>
> Okay for commit?

OK for the non-x86 bits.

Thanks,
Richard

> thanks.
>
> Qing
>
> ===
> From 2e5bc1b25a707c6a17afbf03da2a8bec5b03454d Mon Sep 17 00:00:00 2001
> From: Qing Zhao 
> Date: Fri, 18 Mar 2022 20:49:56 +
> Subject: [PATCH] Add an assertion: the zeroed_hardregs set is a subset of all
>  call used regs.
>
> We should make sure that the hard register set that is actually cleared by
> the target hook zero_call_used_regs should be a subset of all call used
> registers.
>
> At the same time, update documentation for the target hook
> TARGET_ZERO_CALL_USED_REGS.
>
> This new assertion identified a bug in the i386 implemenation, which
> incorrectly set the zeroed_hardregs for stack registers. Fixed this bug
> in i386 implementation.
>
> gcc/ChangeLog:
>
> 2022-03-21  Qing Zhao  
>
>   * config/i386/i386.cc (zero_all_st_registers): Return the value of
>   num_of_st.
>   (ix86_zero_call_used_regs): Update zeroed_hardregs set according to
>   the return value of zero_all_st_registers.
>   * doc/tm.texi: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
>   * function.cc (gen_call_used_regs_seq): Add an assertion.
>   * target.def: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
> ---
>  gcc/config/i386/i386.cc | 27 ++-
>  gcc/doc/tm.texi |  7 +++
>  gcc/function.cc | 22 ++
>  gcc/target.def  |  7 +++
>  4 files changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 5a561966eb44..d84047a4bc1b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -3753,16 +3753,17 @@ zero_all_vector_registers (HARD_REG_SET 
> need_zeroed_hardregs)
> needs to be cleared, the whole stack should be cleared.  However,
> x87 stack registers that hold the return value should be excluded.
> x87 returns in the top (two for complex values) register, so
> -   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.
> +   return the value of num_of_st.  */
>  
>  
> -static bool
> +static int
>  zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>  {
>  
>/* If the FPU is disabled, no need to zero all st registers.  */
>if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
> -return false;
> +return 0;
>  
>unsigned int num_of_st = 0;
>for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> @@ -3774,7 +3775,7 @@ zero_all_st_registers (HARD_REG_SET 
> need_zeroed_hardregs)
>}
>  
>if (num_of_st == 0)
> -return false;
> +return 0;
>  
>bool return_with_x87 = false;
>return_with_x87 = (crtl->return_rtx
> @@ -3802,7 +3803,7 @@ zero_all_st_registers (HARD_REG_SET 
> need_zeroed_hardregs)
>insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
>add_reg_note (insn, REG_DEAD, st_reg);
>  }
> -  return true;
> +  return num_of_st;
>  }
>  
>  
> @@ -3851,7 +3852,7 @@ ix86_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>  {
>HARD_REG_SET zeroed_hardregs;
>bool all_sse_zeroed = false;
> -  bool all_st_zeroed = false;
> +  int all_st_zeroed_num = 0;
>bool all_mm_zeroed = false;
>  
>CLEAR_HARD_REG_SET (zeroed_hardregs);
> @@ -3881,9 +3882,17 @@ ix86_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>if (!exit_with_mmx_mode)
>  /* x87 exit mode, we should zero all st registers together.  */
>  {
> -  all_st

options: Clarify 'Init' option property usage for streaming optimization (was: [PATCH] options, lto: Optimize streaming of optimization nodes)

2022-03-31 Thread Thomas Schwinge
Hi!

On 2020-11-18T10:36:35+0100, Jakub Jelinek via Gcc-patches 
 wrote:
> Honza mentioned that especially for the new param machinery, most of
> streamed values are probably going to be the default values.  Perhaps
> somehow we could stream them more effectively.
>
> This patch implements it and brings further savings, the size
> goes down from 574 bytes to 273 bytes, i.e. less than half.

Neat idea about the XOR-encoding!

> Not trying to handle enums because the code doesn't know if (enum ...) 10
> is even valid, similarly non-parameters because those really generally
> don't have large initializers, and params without Init (those are 0
> initialized and thus don't need to be handled).

Given this only looks at 'Init', I understand this may actually
mis-optimize in the case that there is an 'Init' present, but that value
is then "by default" overridden via 'LANG_HOOKS_INIT_OPTIONS_STRUCT', or
'TARGET_OPTION_OVERRIDE', for example.  (I'm however not claiming that's
an actual problem to be worried about -- just for my understanding.)

Either way:

>   * optc-save-gen.awk: Initialize var_opt_init.  In
>   cl_optimization_stream_out for params with default values larger than
>   10, xor the default value with the actual parameter value.  In
>   cl_optimization_stream_in repeat the above xor.

> --- gcc/optc-save-gen.awk.jj  2020-09-14 10:51:54.493740942 +0200
> +++ gcc/optc-save-gen.awk 2020-09-14 11:39:39.441602594 +0200
> @@ -1186,6 +1186,7 @@ for (i = 0; i < n_opts; i++) {
>   var_opt_val_type[n_opt_val] = otype;
>   var_opt_val[n_opt_val] = "x_" name;
>   var_opt_hash[n_opt_val] = flag_set_p("Optimization", flags[i]);
> + var_opt_init[n_opt_val] = opt_args("Init", flags[i]);
>   n_opt_val++;
>   }
>  }

Reading through the options handling, I was confused why 'Init' needs to
be handled in 'gcc/optc-save-gen.awk'.  To help the next person -- like,
me in a few weeks ;-) -- wondering about this, OK to push the attached
"options: Clarify 'Init' option property usage for streaming
optimization"?


Grüße
 Thomas


> @@ -1257,10 +1258,21 @@ for (i = 0; i < n_opt_val; i++) {
>   otype = var_opt_val_type[i];
>   if (otype ~ "^const char \\**$")
>   print "  bp_pack_string (ob, bp, ptr->" name", true);";
> - else if (otype ~ "^unsigned")
> - print "  bp_pack_var_len_unsigned (bp, ptr->" name");";
> - else
> - print "  bp_pack_var_len_int (bp, ptr->" name");";
> + else {
> + if (otype ~ "^unsigned") {
> + sgn = "unsigned";
> + } else {
> + sgn = "int";
> + }
> + if (name ~ "^x_param" && !(otype ~ "^enum ") && 
> var_opt_init[i]) {
> + print "  if (" var_opt_init[i] " > (" 
> var_opt_val_type[i] ") 10)";
> + print "bp_pack_var_len_" sgn " (bp, ptr->" name" ^ 
> " var_opt_init[i] ");";
> + print "  else";
> + print "bp_pack_var_len_" sgn " (bp, ptr->" name");";
> + } else {
> + print "  bp_pack_var_len_" sgn " (bp, ptr->" name");";
> + }
> + }
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
> @@ -1281,10 +1293,18 @@ for (i = 0; i < n_opt_val; i++) {
>   print "  if (ptr->" name")";
>   print "ptr->" name" = xstrdup (ptr->" name");";
>   }
> - else if (otype ~ "^unsigned")
> - print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_unsigned (bp);";
> - else
> - print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_int (bp);";
> + else {
> + if (otype ~ "^unsigned") {
> + sgn = "unsigned";
> + } else {
> + sgn = "int";
> + }
> + print "  ptr->" name" = (" var_opt_val_type[i] ") 
> bp_unpack_var_len_" sgn " (bp);";
> + if (name ~ "^x_param" && !(otype ~ "^enum ") && 
> var_opt_init[i]) {
> + print "  if (" var_opt_init[i] " > (" 
> var_opt_val_type[i] ") 10)";
> + print "ptr->" name" ^= " var_opt_init[i] ";";
> + }
> + }
>  }
>  print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
>  print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c7968fa44b1073dfa7da3a11470a9e76b6faafaf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 31 Mar 2022 12:06:29 +0

Re: [PATCH] tree-optimization/104912 - ensure cost model is checked first

2022-03-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Mar 2022, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following makes sure that when we build the versioning condition
> > for vectorization including the cost model check, we check for the
> > cost model and branch over other versioning checks.  That is what
> > the cost modeling assumes, since the cost model check is the only
> > one accounted for in the scalar outside cost.  Currently we emit
> > all checks as straight-line code combined with bitwise ops which
> > can result in surprising ordering of checks in the final assembly.
> 
> Yeah, this had bugged me too, and meant that we made some bad
> decisions in some of the local benchmarks we use.  Was just afraid
> to poke at it, since it seemed like a deliberate decision. :-)

I guess it was rather giving up because of the way our loop-versioning
infrastructure works ...

> > Since loop_version accepts only a single versioning condition
> > the splitting is done after the fact.
> >
> > The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> > with -Ofast and tuning for generic or skylake.  That's not enough
> > to recover from the slowdown when vectorizing but it now cuts off
> > the expensive alias versioning test.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK for trunk?
> >
> > For the rest of the regression my plan is to somehow factor in
> > the evolution of the number of iterations in the outer loop
> > (which is {1, +, 1}) to somehow bump the static profitability
> > estimate and together with the "cheap" cost model check never
> > execute the vectorized version (well, it is actually never executed,
> > but only because the alias check fails).
> >
> > Thanks,
> > Richard.
> >
> > 2022-03-21  Richard Biener  
> >
> > PR tree-optimization/104912
> > * tree-vect-loop-manip.cc (vect_loop_versioning): Split
> > the cost model check to a separate BB to make sure it is
> > checked first and not combined with other version checks.
> > ---
> >  gcc/tree-vect-loop-manip.cc | 53 ++---
> >  1 file changed, 50 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index a7bbc916bbc..8ef333eb31b 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3445,13 +3445,28 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> > cond_expr = expr;
> >  }
> >  
> > +  tree cost_name = NULL_TREE;
> > +  if (cond_expr
> > +  && !integer_truep (cond_expr)
> > +  && (version_niter
> > + || version_align
> > + || version_alias
> > + || version_simd_if_cond))
> > +cost_name = cond_expr = force_gimple_operand_1 (unshare_expr 
> > (cond_expr),
> > +   &cond_expr_stmt_list,
> > +   is_gimple_val, NULL_TREE);
> > +
> >if (version_niter)
> >  vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
> >  
> >if (cond_expr)
> > -cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > -   &cond_expr_stmt_list,
> > -   is_gimple_condexpr, NULL_TREE);
> > +{
> > +  gimple_seq tem = NULL;
> > +  cond_expr = force_gimple_operand_1 (unshare_expr (cond_expr),
> > + &tem,
> > + is_gimple_condexpr, NULL_TREE);
> > +  gimple_seq_add_seq (&cond_expr_stmt_list, tem);
> > +}
> >  
> >if (version_align)
> >  vect_create_cond_for_align_checks (loop_vinfo, &cond_expr,
> > @@ -3654,6 +3669,38 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >update_ssa (TODO_update_ssa);
> >  }
> >  
> > +  /* Split the cost model check off to a separate BB.  Costing assumes
> > + this is the only thing we perform when we enter the scalar loop.  */
> 
> Maybe “…from a failed cost decision” or something?  Might sounds out
> of context like it applied more generally.

Fixed.

> > +  if (cost_name)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (cost_name);
> 
> I realise it should only happen very rarely if at all, but is it
> absolutely guaranteed that the cost condition doesn't fold to a
> constant?

OK, better be safe than sorry - I added a SSA_NAME check.

> > +  /* All uses of the cost check are 'true' after the check we
> > +are going to insert.  */
> > +  replace_uses_by (cost_name, boolean_true_node);
> > +  /* And we're going to build the new single use of it.  */
> > +  gcond *cond = gimple_build_cond (NE_EXPR, cost_name, 
> > boolean_false_node,
> > +  NULL_TREE, NULL_TREE);
> > +  edge e = split_block (gimple_bb (def), def);
> > +  gimple_stmt_iterator gsi = gsi_for_stmt (def);
> > +  gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
> > +  edge true_e, false_e;
> > +  extract_true_false_edges_from_blo

Re: [patch]update the documentation for TARGET_ZERO_CALL_USED_REGS hook and add an assertion

2022-03-31 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your reviewing on the non-x86 part. 

Uros, could you please review the x86 part of the change?

thanks.

Qing

> On Mar 31, 2022, at 8:10 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> Hi, 
>> 
>> Per our discussion on: 
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592002.html
>> 
>> I come up with the following patch to:
>> 
>> 1. Update the documentation for TARGET_ZERO_CALL_USED_REGS hook;
>> 2. Add an assertion in function.cc to make sure the actually zeroed_regs is 
>> a subset of all call used regs;
>>   (The reason I didn’t add a new parameter to TARGET_ZERO_CALL_USED_REGS is, 
>> I think adding the 
>>assertion in the common place function.cc is simpler to be implemented).
> 
> Yeah, that's fair.  I guess in theory, passing the parameter would allow
> targets to choose between two versions of zeroing the register, one with
> a temporary and one without.  But that's a purely hypothetical situation
> and we could always add a parameter later if that turns out to be useful.
> 
> Perhaps more realistically, there might be other uses of the hook in
> future that want to zero registers for different reasons, with their
> own rules about which registers can be zeroed.  In other words, the
> hook is providing a general facility that happens to be useful for
> -fzero-call-used-regs.  But again, we can deal with that if it ever
> happens.
> 
> So I agree this is the right call, especially for stage 4.
> 
>> 3. This new assertion identified a bug in i386 implementation. Fix this bug 
>> in i386.
>> 
>> This patch is bootstrapped on both x86 and aarch64, no regression.
>> 
>> Okay for commit?
> 
> OK for the non-x86 bits.
> 
> Thanks,
> Richard
> 
>> thanks.
>> 
>> Qing
>> 
>> ===
>> From 2e5bc1b25a707c6a17afbf03da2a8bec5b03454d Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Fri, 18 Mar 2022 20:49:56 +
>> Subject: [PATCH] Add an assertion: the zeroed_hardregs set is a subset of all
>> call used regs.
>> 
>> We should make sure that the hard register set that is actually cleared by
>> the target hook zero_call_used_regs should be a subset of all call used
>> registers.
>> 
>> At the same time, update documentation for the target hook
>> TARGET_ZERO_CALL_USED_REGS.
>> 
>> This new assertion identified a bug in the i386 implemenation, which
>> incorrectly set the zeroed_hardregs for stack registers. Fixed this bug
>> in i386 implementation.
>> 
>> gcc/ChangeLog:
>> 
>> 2022-03-21  Qing Zhao  
>> 
>>  * config/i386/i386.cc (zero_all_st_registers): Return the value of
>>  num_of_st.
>>  (ix86_zero_call_used_regs): Update zeroed_hardregs set according to
>>  the return value of zero_all_st_registers.
>>  * doc/tm.texi: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
>>  * function.cc (gen_call_used_regs_seq): Add an assertion.
>>  * target.def: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
>> ---
>> gcc/config/i386/i386.cc | 27 ++-
>> gcc/doc/tm.texi |  7 +++
>> gcc/function.cc | 22 ++
>> gcc/target.def  |  7 +++
>> 4 files changed, 50 insertions(+), 13 deletions(-)
>> 
>> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
>> index 5a561966eb44..d84047a4bc1b 100644
>> --- a/gcc/config/i386/i386.cc
>> +++ b/gcc/config/i386/i386.cc
>> @@ -3753,16 +3753,17 @@ zero_all_vector_registers (HARD_REG_SET 
>> need_zeroed_hardregs)
>>needs to be cleared, the whole stack should be cleared.  However,
>>x87 stack registers that hold the return value should be excluded.
>>x87 returns in the top (two for complex values) register, so
>> -   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
>> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.
>> +   return the value of num_of_st.  */
>> 
>> 
>> -static bool
>> +static int
>> zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>> {
>> 
>>   /* If the FPU is disabled, no need to zero all st registers.  */
>>   if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
>> -return false;
>> +return 0;
>> 
>>   unsigned int num_of_st = 0;
>>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> @@ -3774,7 +3775,7 @@ zero_all_st_registers (HARD_REG_SET 
>> need_zeroed_hardregs)
>>   }
>> 
>>   if (num_of_st == 0)
>> -return false;
>> +return 0;
>> 
>>   bool return_with_x87 = false;
>>   return_with_x87 = (crtl->return_rtx
>> @@ -3802,7 +3803,7 @@ zero_all_st_registers (HARD_REG_SET 
>> need_zeroed_hardregs)
>>   insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
>>   add_reg_note (insn, REG_DEAD, st_reg);
>> }
>> -  return true;
>> +  return num_of_st;
>> }
>> 
>> 
>> @@ -3851,7 +3852,7 @@ ix86_zero_call_used_regs (HARD_REG_SET 
>> need_zeroed_hardregs)
>> {
>>   HARD_REG_SET zeroed_hardregs;
>>   bool all_sse_zeroed = false;
>> -  bool all_st_zeroed = f

[PATCH] testsuite: add missing dg-require-effective-target fpic

2022-03-31 Thread Marc Poulhiès via Gcc-patches
Require effective target fpic for newly added test.

gcc/testsuite/
* g++.dg/ext/visibility/visibility-local-extern1.C: Add missing
dg-require-effective-target fpic.

Tested on x86_64-linux. Ok for master?

---
 gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C 
b/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C
index 40c20199d0c..6fb1cc7f381 100644
--- a/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C
+++ b/gcc/testsuite/g++.dg/ext/visibility/visibility-local-extern1.C
@@ -1,6 +1,7 @@
 // PR c++/103291
 // { dg-additional-options -fpic }
 // { dg-final { scan-assembler-not "@GOTPCREL" } }
+// { dg-require-effective-target fpic }
 
 #pragma GCC visibility push(hidden)
 
-- 
2.25.1



[committed] libstdc++: Add comment about memalign requirements

2022-03-31 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

-- >8 --

The memalign man page on Solaris and QNX lists an additional requirement
for the alignment value that is not present in all implementation of
that non-standard function. For both those targets we should actually be
using posix_memalign anyway, so it doesn't matter. This just adds a
comment making note of that fact.

libstdc++-v3/ChangeLog:

* libsupc++/new_opa.cc (aligned_alloc): Add comment.
---
 libstdc++-v3/libsupc++/new_opa.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/libsupc++/new_opa.cc 
b/libstdc++-v3/libsupc++/new_opa.cc
index 5f24737de1c..411cd8d98b2 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -87,6 +87,8 @@ aligned_alloc (std::size_t al, std::size_t sz)
 static inline void*
 aligned_alloc (std::size_t al, std::size_t sz)
 {
+  // Solaris requires al >= sizeof a word and QNX requires >= sizeof(void*)
+  // but they both provide posix_memalign, so will use the definition above.
   return memalign (al, sz);
 }
 #else // !HAVE__ALIGNED_MALLOC && !HAVE_POSIX_MEMALIGN && !HAVE_MEMALIGN
-- 
2.34.1



[PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Jonathan Wakely via Gcc-patches
This is a tiny C++23 feature that I plan to add for GCC 12. Does anybody
have any comments on the choices below in terms of when to make reaching
std::unreachable do an assert/trap/unreachable?

My thoughts on avoiding the overhead in the _GLIBCXX_ASSERTIONS case is
that this differs from e.g. assertions in operator[] where we want to
catch accidental bad indices. A std::unreachable() call should not be
reached accidentally. I hope it will only be used for conditions that
really are unreachable, and probably quite often where performance
matters. If using std::unreachable() increased code size significantly
then it would make it much less useful for guiding optimizations.


-- >8 --

This defines std::unreachable as an assertion for debug mode, a trap
when _GLIBCXX_ASSERTIONS is defined, and __builtin_unreachable()
otherwise.

The reason for only using __builtin_trap() in the second case is to
avoid the overhead of setting up a call to __glibcxx_assert_fail that
should never happen.

While thinking about what the debug assertion failure should print, I
noticed that the __glibcxx_assert_fail function doesn't check for null
pointers. This adds a check so we don't try to print them if null.

libstdc++-v3/ChangeLog:

* include/std/utility (unreachable): Define for C++23.
* include/std/version (__cpp_lib_unreachable): Define.
* src/c++11/debug.cc (__glibcxx_assert_fail): Check for valid
arguments.
* testsuite/20_util/unreachable/1.cc: New test.
* testsuite/20_util/unreachable/version.cc: New test.
---
 libstdc++-v3/include/std/utility| 15 +++
 libstdc++-v3/include/std/version|  1 +
 libstdc++-v3/src/c++11/debug.cc |  5 +++--
 libstdc++-v3/testsuite/20_util/unreachable/1.cc | 17 +
 .../testsuite/20_util/unreachable/version.cc| 10 ++
 5 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/unreachable/1.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/unreachable/version.cc

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 0d7f8954c5a..e5b5212381d 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -186,6 +186,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr underlying_type_t<_Tp>
 to_underlying(_Tp __value) noexcept
 { return static_cast>(__value); }
+
+#define __cpp_lib_unreachable 202202L
+  [[noreturn,__gnu__::__always_inline__]]
+  void
+  unreachable()
+  {
+#ifdef _GLIBCXX_DEBUG
+std::__glibcxx_assert_fail("", 0, "std::unreachable()",
+  "inconceivable!");
+#elif defined _GLIBCXX_ASSERTIONS
+__builtin_trap();
+#else
+__builtin_unreachable();
+#endif
+  }
 #endif // C++23
 #endif // C++20
 #endif // C++17
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 44b8a9f88b5..51f2110b68e 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -326,6 +326,7 @@
 # define __cpp_lib_string_resize_and_overwrite 202110L
 #endif
 #define __cpp_lib_to_underlying 202102L
+#define __cpp_lib_unreachable 202202L
 #endif
 #endif // C++2b
 #endif // C++20
diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 98fe2dcc153..ff3723eb93b 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -52,8 +52,9 @@ namespace std
   __glibcxx_assert_fail(const char* file, int line,
const char* function, const char* condition) noexcept
   {
-fprintf(stderr, "%s:%d: %s: Assertion '%s' failed.\n",
- file, line, function, condition);
+if (file && function && condition)
+  fprintf(stderr, "%s:%d: %s: Assertion '%s' failed.\n",
+ file, line, function, condition);
 abort();
   }
 }
diff --git a/libstdc++-v3/testsuite/20_util/unreachable/1.cc 
b/libstdc++-v3/testsuite/20_util/unreachable/1.cc
new file mode 100644
index 000..0c463d52a48
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unreachable/1.cc
@@ -0,0 +1,17 @@
+// { dg-options "-std=gnu++23" }
+// { dg-do compile { target c++23 } }
+
+#include 
+
+#ifndef __cpp_lib_unreachable
+# error "Feature-test macro for unreachable missing in "
+#elif __cpp_lib_unreachable != 202202L
+# error "Feature-test macro for unreachable has wrong value in "
+#endif
+
+bool test01(int i)
+{
+  if (i == 4)
+return true;
+  std::unreachable();
+} // { dg-bogus "control reaches end of non-void function" }
diff --git a/libstdc++-v3/testsuite/20_util/unreachable/version.cc 
b/libstdc++-v3/testsuite/20_util/unreachable/version.cc
new file mode 100644
index 000..c7795900c30
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unreachable/version.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++23" }
+// { dg-do preprocess { target c++23 } }
+
+#include 
+
+#ifndef __cpp_lib_unreachable
+# error "Fe

Re: [PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Matthias Kretz via Gcc-patches
I like it. But I'd like it even more if we could have

#elif defined _UBSAN
__ubsan_invoke_ub("reached std::unreachable()");

But to my knowledge UBSAN has no hooks for the library like this (yet).

and...

On Thursday, 31 March 2022 17:30:29 CEST Jonathan Wakely via Gcc-patches 
wrote:
> diff --git a/libstdc++-v3/include/std/utility
> b/libstdc++-v3/include/std/utility index 0d7f8954c5a..e5b5212381d 100644
> --- a/libstdc++-v3/include/std/utility
> +++ b/libstdc++-v3/include/std/utility
> @@ -186,6 +186,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  constexpr underlying_type_t<_Tp>
>  to_underlying(_Tp __value) noexcept
>  { return static_cast>(__value); }
> +
> +#define __cpp_lib_unreachable 202202L
> +  [[noreturn,__gnu__::__always_inline__]]
> +  void
> +  unreachable()
> +  {
> +#ifdef _GLIBCXX_DEBUG
> +std::__glibcxx_assert_fail("", 0, "std::unreachable()",
> +  "inconceivable!");

Funny message, but it should be more helpful, IMHO. :)

-Matthias

> +#elif defined _GLIBCXX_ASSERTIONS
> +__builtin_trap();
> +#else
> +__builtin_unreachable();
> +#endif
> +  }


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──


Re: [PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Xi Ruoyao via Gcc-patches
On Thu, 2022-03-31 at 17:50 +0200, Matthias Kretz via Gcc-patches wrote:
> I like it. But I'd like it even more if we could have
> 
> #elif defined _UBSAN
>     __ubsan_invoke_ub("reached std::unreachable()");
> 
> But to my knowledge UBSAN has no hooks for the library like this
> (yet).

UBSAN can catch __builtin_unreachable() and print a message "execution
reached an unreachable program point".
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Marc Glisse via Gcc-patches

On Thu, 31 Mar 2022, Matthias Kretz via Gcc-patches wrote:


I like it. But I'd like it even more if we could have

#elif defined _UBSAN
   __ubsan_invoke_ub("reached std::unreachable()");

But to my knowledge UBSAN has no hooks for the library like this (yet).


-fsanitize=undefined already replaces __builtin_unreachable with its own 
thing, so I was indeed going to ask if the assertion / trap provide a 
better debugging experience compared to plain __builtin_unreachable, with 
the possibility to get a stack trace (UBSAN_OPTIONS=print_stacktrace=1), 
etc? Detecting if (the right subset of) ubsan is enabled sounds like a 
good idea.


--
Marc Glisse


Re: [PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Jonathan Wakely via Gcc-patches
On Thu, 31 Mar 2022 at 16:51, Matthias Kretz via Libstdc++
 wrote:
>
> I like it. But I'd like it even more if we could have
>
> #elif defined _UBSAN
> __ubsan_invoke_ub("reached std::unreachable()");
>
> But to my knowledge UBSAN has no hooks for the library like this (yet).

As far as I know, that's correct.


> > +#ifdef _GLIBCXX_DEBUG
> > +std::__glibcxx_assert_fail("", 0, "std::unreachable()",
> > +  "inconceivable!");
>
> Funny message, but it should be more helpful, IMHO. :)

We're currently limited to some string that can go inside "Assertion
'...' failed."

I also considered changing __glibcxx_assert_fail like so:

--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -55,6 +55,8 @@ namespace std
if (file && function && condition)
  fprintf(stderr, "%s:%d: %s: Assertion '%s' failed.\n",
 file, line, function, condition);
+else if (function)
+  fprintf(stderr, "%s called.\n", function);
abort();
  }
}

And then making std::unreachable() call __glibcxx_assert_fail(0, 0,
"std::unreachable()", 0).



Re: [PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Jonathan Wakely via Gcc-patches
On Thu, 31 Mar 2022 at 17:03, Marc Glisse via Libstdc++
 wrote:
>
> On Thu, 31 Mar 2022, Matthias Kretz via Gcc-patches wrote:
>
> > I like it. But I'd like it even more if we could have
> >
> > #elif defined _UBSAN
> >__ubsan_invoke_ub("reached std::unreachable()");
> >
> > But to my knowledge UBSAN has no hooks for the library like this (yet).
>
> -fsanitize=undefined already replaces __builtin_unreachable with its own
> thing, so I was indeed going to ask if the assertion / trap provide a
> better debugging experience compared to plain __builtin_unreachable, with
> the possibility to get a stack trace (UBSAN_OPTIONS=print_stacktrace=1),
> etc? Detecting if (the right subset of) ubsan is enabled sounds like a
> good idea.

Does UBsan define a macro that we can use to detect it?



[PATCH] mips: Emit psabi diagnostic for return values affected by C++ zero-width bit-field ABI change [PR 102024]

2022-03-31 Thread Xi Ruoyao via Gcc-patches
Part 1/2 of PR 102024 fix for MIPS.

No change for code generation.  It only adds a -Wpsabi inform for return
values affected by the change in C++ FE.

The ABI is clear that any zero-width bit-field in a return value will
disallow using FPRs for it, so the behavior of GCC trunk is correct
(it's also same as clang).

gcc/
PR target/102024
* mips.cc (mips_fpr_return_fields): Detect C++ zero-width
bit-fields and set up an indicator.
(mips_return_in_msb): Adapt for mips_fpr_return_fields change.
(mips_function_value_1): Diagnose when the presense of a C++
zero-width bit-field changes function returning in GCC 12.

gcc/testsuite/
PR target/102024
* g++.target/mips/mips.exp: New test supporting file.
* g++.target/mips/pr102024-1.C: New test.
---
 gcc/config/mips/mips.cc  | 51 ++--
 gcc/testsuite/g++.target/mips/mips.exp   | 34 
 gcc/testsuite/g++.target/mips/pr102024.C | 20 ++
 3 files changed, 101 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/mips/mips.exp
 create mode 100644 gcc/testsuite/g++.target/mips/pr102024.C

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index a1c4b437cd4..3284cf71f6f 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6274,10 +6274,17 @@ mips_callee_copies (cumulative_args_t, const 
function_arg_info &arg)
 
For n32 & n64, a structure with one or two fields is returned in
floating-point registers as long as every field has a floating-point
-   type.  */
+   type.
+
+   The C++ FE used to remove zero-width bit-fields in GCC 11 and earlier.
+   To make a proper diagnostic, this function will set HAS_ZERO_WIDTH_BF
+   to 1 once a C++ zero-width bit-field shows up, and then ignore it.
+   Then the caller can determine if this zero-width bit-field will make a
+   difference and emit a -Wpsabi inform.  */
 
 static int
-mips_fpr_return_fields (const_tree valtype, tree *fields)
+mips_fpr_return_fields (const_tree valtype, tree *fields,
+   int *has_zero_width_bf)
 {
   tree field;
   int i;
@@ -6294,6 +6301,12 @@ mips_fpr_return_fields (const_tree valtype, tree *fields)
   if (TREE_CODE (field) != FIELD_DECL)
continue;
 
+  if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
+   {
+ *has_zero_width_bf = 1;
+ continue;
+   }
+
   if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (field)))
return 0;
 
@@ -6319,11 +6332,14 @@ static bool
 mips_return_in_msb (const_tree valtype)
 {
   tree fields[2];
+  int has_zero_width_bf = 0;
+  int use_fpr = mips_fpr_return_fields (valtype, fields,
+   &has_zero_width_bf);
 
   return (TARGET_NEWABI
  && TARGET_BIG_ENDIAN
  && AGGREGATE_TYPE_P (valtype)
- && mips_fpr_return_fields (valtype, fields) == 0);
+ && (use_fpr == 0 || has_zero_width_bf));
 }
 
 /* Return true if the function return value MODE will get returned in a
@@ -6418,8 +6434,35 @@ mips_function_value_1 (const_tree valtype, const_tree 
fn_decl_or_type,
 return values, promote the mode here too.  */
   mode = promote_function_mode (valtype, mode, &unsigned_p, func, 1);
 
+  int has_zero_width_bf = 0;
+  int use_fpr = mips_fpr_return_fields (valtype, fields,
+   &has_zero_width_bf);
+  if (TARGET_HARD_FLOAT &&
+ warn_psabi &&
+ has_zero_width_bf &&
+ use_fpr != 0)
+   {
+ static unsigned last_reported_type_uid;
+ unsigned uid = TYPE_UID (TYPE_MAIN_VARIANT (valtype));
+ if (uid != last_reported_type_uid)
+   {
+ static const char *url =
+   CHANGES_ROOT_URL
+   "gcc-12/changes.html#zero_width_bitfields";
+ inform (input_location,
+ "the ABI for returning a value containing "
+ "zero-width bit-fields but otherwise an aggregate "
+ "with only one or two floating-point fields was "
+ "retconned in GCC %{12.1%}", url);
+ last_reported_type_uid = uid;
+   }
+   }
+
+  if (has_zero_width_bf)
+   use_fpr = 0;
+
   /* Handle structures whose fields are returned in $f0/$f2.  */
-  switch (mips_fpr_return_fields (valtype, fields))
+  switch (use_fpr)
{
case 1:
  return mips_return_fpr_single (mode,
diff --git a/gcc/testsuite/g++.target/mips/mips.exp 
b/gcc/testsuite/g++.target/mips/mips.exp
new file mode 100644
index 000..9fa7e771b4d
--- /dev/null
+++ b/gcc/testsuite/g++.target/mips/mips.exp
@@ -0,0 +1,34 @@
+# Copyright (C) 2019-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of th

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-31 Thread Andre Vieira (lists) via Gcc-patches


On 28/03/2022 15:59, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi,

Addressed all of your comments bar the pred ops one.

Is this OK?


gcc/ChangeLog:

      * config/aarch64/aarch64.cc (aarch64_vector_costs): Define
determine_suggested_unroll_factor and m_nosve_pattern.
      (determine_suggested_unroll_factor): New function.
      (aarch64_vector_costs::add_stmt_cost): Check for a qualifying
pattern
      to set m_nosve_pattern.
      (aarch64_vector_costs::finish_costs): Use
determine_suggested_unroll_factor.
      * config/aarch64/aarch64.opt (aarch64-vect-unroll-limit): New.

On 16/03/2022 18:01, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi,

This patch implements the costing function
determine_suggested_unroll_factor for aarch64.
It determines the unrolling factor by dividing the number of X
operations we can do per cycle by the number of X operations in the loop
body, taking this information from the vec_ops analysis during vector
costing and the available issue_info information.
We multiply the dividend by a potential reduction_latency, to improve
our pipeline utilization if we are stalled waiting on a particular
reduction operation.

Right now we also have a work around for vectorization choices where the
main loop uses a NEON mode and predication is available, such that if
the main loop makes use of a NEON pattern that is not directly supported
by SVE we do not unroll, as that might cause performance regressions in
cases where we would enter the original main loop's VF. As an example if
you have a loop where you could use AVG_CEIL with a V8HI mode, you would
originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated
epilogue, using other instructions. Whereas with the unrolling you would
end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping
the original 8x NEON. In the future, we could handle this differently,
by either using a different costing model for epilogues, or potentially
vectorizing more than one single epilogue.

gcc/ChangeLog:

       * config/aarch64/aarch64.cc (aarch64_vector_costs): Define
determine_suggested_unroll_factor.
       (determine_suggested_unroll_factor): New function.
       (aarch64_vector_costs::finish_costs): Use
determine_suggested_unroll_factor.

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15680,6 +15680,7 @@ private:
 unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
 unsigned int);
 bool prefer_unrolled_loop () const;
+  unsigned int determine_suggested_unroll_factor ();
   
 /* True if we have performed one-time initialization based on the

vec_info.  */
@@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops,
 return sve_cycles_per_iter;
   }
   
+unsigned int

+aarch64_vector_costs::determine_suggested_unroll_factor ()
+{
+  auto *issue_info = aarch64_tune_params.vec_costs->issue_info;
+  if (!issue_info)
+return 1;
+  bool sve = false;
+  if (aarch64_sve_mode_p (m_vinfo->vector_mode))

Other code uses m_vec_flags & VEC_ANY_SVE for this.


+{
+  if (!issue_info->sve)
+   return 1;
+  sve = true;
+}
+  else
+{
+  if (!issue_info->advsimd)
+   return 1;

The issue info should instead be taken from vec_ops.simd_issue_info ()
in the loop below.  It can vary for each entry.


+  /* If we are trying to unroll a NEON main loop that contains patterns

s/a NEON/an Advanced SIMD/


+that we do not support with SVE and we might use a predicated
+epilogue, we need to be conservative and block unrolling as this might
+lead to a less optimal loop for the first and only epilogue using the
+original loop's vectorization factor.
+TODO: Remove this constraint when we add support for multiple epilogue
+vectorization.  */
+  if (partial_vectors_supported_p ()
+ && param_vect_partial_vector_usage != 0
+ && !TARGET_SVE2)
+   {
+ unsigned int i;
+ stmt_vec_info stmt_vinfo;
+ FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo)
+   {
+ if (is_pattern_stmt_p (stmt_vinfo))
+   {
+ gimple *stmt = stmt_vinfo->stmt;
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt))
+   {
+ enum internal_fn ifn
+   = gimple_call_internal_fn (stmt);
+ switch (ifn)
+   {
+   case IFN_AVG_FLOOR:
+   case IFN_AVG_CEIL:
+ return 1;
+   default:
+ break;
+   

[PATCH][GCC 9] arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]

2022-03-31 Thread Alex Coplan via Gcc-patches
Hi,

This is a backport of the fix for PR99977 to the GCC 9 branch. The only
case where the GCC 10 patch did not apply cleanly was on sync.md, where
some of the context has changed, but the substance of the patch has not
changed, it simply required applying by hand.

Tested as follows:
 - Bootstrap/regtest on arm-linux-gnueabihf.
 - Regression tested a cross compiler configured with
   --with-arch=armv8-m.base.

OK for the GCC 9 branch?

Thanks,
Alex

---

The PR shows two ICEs with __sync_bool_compare_and_swap and
-mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one
later on, after the CAS insn is split.

The LRA ICE occurs because the
@atomic_compare_and_swap_1 pattern attempts to tie
two output operands together (operands 0 and 1 in the third
alternative). LRA can't handle this, since it doesn't make sense for an
insn to assign to the same operand twice.

The later (post-splitting) ICE occurs because the expansion of the
cbranchsi4_scratch insn doesn't quite go according to plan. As it
stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
attempting to pass a register (neg_bval) to use as a scratch register.
However, since the RTL template has a match_scratch here,
gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
Since this is all happening after RA, this is doomed to fail (and we get
an ICE about the insn not matching its constraints).

It seems that the motivation for the choice of constraints in the
atomic_compare_and_swap pattern comes from an attempt to satisfy the
constraints of the cbranchsi4_scratch insn. This insn requires the
scratch register to be the same as the input register in the case that
we use a larger negative immediate (one that satisfies J, but not L).

Of course, as noted above, LRA refuses to assign two output operands to
the same register, so this was never going to work.

The solution I'm proposing here is to collapse the alternatives to the
CAS insn (allowing the two output register operands to be matched to
different registers) and to ensure that the constraints for
cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by
inserting a move to ensure the source and destination registers match if
necessary (i.e. in the case of large negative immediates).

Another notable change here is that we only do:

  emit_move_insn (neg_bval, const1_rtx);

for non-negative immediates. This is because the ADDS instruction used in
the negative case suffices to leave a suitable value in neg_bval: if the
operands compare equal, we don't take the branch (so neg_bval will be
set by the load exclusive). Otherwise, the ADDS will leave a nonzero
value in neg_bval, which will correctly signal that the CAS has failed
when it is later negated.

gcc/ChangeLog:

PR target/99977
* config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
with negative immediates: ensure we expand cbranchsi4_scratch
correctly and ensure we satisfy its constraints.
* config/arm/sync.md
(@atomic_compare_and_swap_1): Don't
attempt to tie two output operands together with constraints;
collapse two alternatives.
(@atomic_compare_and_swap_1): Likewise.
* config/arm/thumb1.md (cbranchsi4_neg_late): New.

gcc/testsuite/ChangeLog:

PR target/99977
* gcc.target/arm/pr99977.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 58afc4a2647..8cbdb9a44e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29053,13 +29053,31 @@ arm_split_compare_and_swap (rtx operands[])
 }
   else
 {
-  emit_move_insn (neg_bval, const1_rtx);
   cond = gen_rtx_NE (VOIDmode, rval, oldval);
   if (thumb1_cmpneg_operand (oldval, SImode))
-   emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval,
-   label2, cond));
+   {
+ rtx src = rval;
+ if (!satisfies_constraint_L (oldval))
+   {
+ gcc_assert (satisfies_constraint_J (oldval));
+
+ /* For such immediates, ADDS needs the source and destination regs
+to be the same.
+
+Normally this would be handled by RA, but this is all happening
+after RA.  */
+ emit_move_insn (neg_bval, rval);
+ src = neg_bval;
+   }
+
+ emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval,
+  label2, cond));
+   }
   else
-   emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2));
+   {
+ emit_move_insn (neg_bval, const1_rtx);
+ emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2));
+   }
 }
 
   arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release);
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 0e777a92bb4..693a802c292 100644
--- a/gcc/config/

[PATCH] mips: Ignore zero width bitfields in arguments and issue -Wpsabi warning about C zero-width bit-field ABI changes [PR102024]

2022-03-31 Thread Xi Ruoyao via Gcc-patches
Part 2/2 of PR 102024 fix for MIPS.

The ABI says:

"Regardless of the struct field structure, it is treated as a sequence
of 64-bit chunks. If a chunk consists solely of a double float field
(but not a double, which is part of a union), it is passed in a floating
point register. Any other chunk is passed in an integer register."

It's not clear that if a zero-width field is a part of any 64-bit chunk,
and which 64-bit chunk it shall belong to when its on the boundary of
two chunks.  In previous GCC releases, the C++ FE removes all zero-width
bit-fields but otherwise a zero-width field is considered a part of the
next 64-bit chunk:

struct A
{
  /* first chunk */
  double a;  /* into FPR */
  /* second chunk */
  int : 0;
  double b;  /* into GPR with current GCC trunk because "the chunk
contains a bit-field" */
};

But clang does not account a zero-width bit-field on the boundary into
any "64-bit chunk", so its behavior is same as the C++ FE of previous
GCC releases.

I think we should not make an arbitary assumption like "a zero-width
bit-field is a part of the next chunk, but not a part of the previous
chunk".  So a consistent behavior is either consider it a part of both
chunks, or consider it not a part of any chunks.  As we are changing
psABI anyway, it seems OK to be compatible with clang.

gcc/
PR target/102024
* mips.cc (mips_function_arg): Ignore zero-width bit-fields, and
inform if it causes a psABI change.

gcc/testsuite/
PR target/102024
* gcc.target/mips/pr102024.c: New test.
---
 gcc/config/mips/mips.cc  | 45 +---
 gcc/testsuite/gcc.target/mips/pr102024.c | 20 +++
 2 files changed, 61 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr102024.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 3284cf71f6f..5d1637e7b2f 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6042,11 +6042,26 @@ mips_function_arg (cumulative_args_t cum_v, const 
function_arg_info &arg)
  for (i = 0; i < info.reg_words; i++)
{
  rtx reg;
+ int has_zero_width_bf_abi_change = 0;
 
  for (; field; field = DECL_CHAIN (field))
-   if (TREE_CODE (field) == FIELD_DECL
-   && int_bit_position (field) >= bitpos)
- break;
+   {
+ if (TREE_CODE (field) != FIELD_DECL)
+   continue;
+
+ /* Ignore zero-width bit-fields.  And, if the ignored
+field is not from C++, it may be an ABI change.  */
+ if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
+   continue;
+ if (integer_zerop (DECL_SIZE (field)))
+   {
+ has_zero_width_bf_abi_change = 1;
+ continue;
+   }
+
+ if (int_bit_position (field) >= bitpos)
+   break;
+   }
 
  if (field
  && int_bit_position (field) == bitpos
@@ -6054,7 +6069,29 @@ mips_function_arg (cumulative_args_t cum_v, const 
function_arg_info &arg)
  && TYPE_PRECISION (TREE_TYPE (field)) == BITS_PER_WORD)
reg = gen_rtx_REG (DFmode, FP_ARG_FIRST + info.reg_offset + i);
  else
-   reg = gen_rtx_REG (DImode, GP_ARG_FIRST + info.reg_offset + i);
+   {
+ reg = gen_rtx_REG (DImode,
+GP_ARG_FIRST + info.reg_offset + i);
+ has_zero_width_bf_abi_change = 0;
+   }
+
+ if (has_zero_width_bf_abi_change && warn_psabi)
+   {
+ static unsigned last_reported_type_uid;
+ unsigned uid = TYPE_UID (TYPE_MAIN_VARIANT (arg.type));
+ if (uid != last_reported_type_uid)
+   {
+ static const char *url =
+   CHANGES_ROOT_URL
+   "gcc-12/changes.html#zero_width_bitfields";
+ inform (input_location,
+ "the ABI for passing a value containing "
+ "zero-width bit-fields before an adjacent "
+ "64-bit floating-point field was retconned "
+ "in GCC %{12.1%}", url);
+ last_reported_type_uid = uid;
+   }
+   }
 
  XVECEXP (ret, 0, i)
= gen_rtx_EXPR_LIST (VOIDmode, reg,
diff --git a/gcc/testsuite/gcc.target/mips/pr102024.c 
b/gcc/testsuite/gcc.target/mips/pr102024.c
new file mode 100644
index 000..c45d01315d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr102024.c
@@ -0,0 +1,20 @@
+// PR target/102024
+// { dg-do compile }
+// { dg-options "-mabi=64 -mhard-float" }
+// { dg-final { scan-assembler "\\\$f12" } }
+

Re: [PATCH v4] libgo: Don't use pt_regs member in mcontext_t

2022-03-31 Thread Sören Tempel via Gcc-patches
Ping.

Would be nice to get this integrated since this one of the changes needed to
make gccgo work with musl libc. Let me know if the patch needs to be revised
further.

Sören Tempel  wrote:
> The .regs member is primarily intended to be used in conjunction with
> ptrace. Since this code is not using ptrace, using .regs is a bad idea.
> Furthermore, the code currently fails to compile on musl since the
> pt_regs type (used by .regs) is in an incomplete type which has to be
> completed by inclusion of the asm/ptrace.h Kernel header. Contrary to
> glibc, this header is not indirectly included by musl through other
> header files.
> 
> This patch fixes compilation of this code with musl libc by accessing
> the register values via .gp_regs/.gregs (depending on 32-bit or 64-bit
> PowerPC) instead of using .regs. For more details, see
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591261.html
> 
> For the offsets in gp_regs refer to the Kernel asm/ptrace.h header.
> 
> This patch has been tested on Alpine Linux ppc64le (uses musl libc).
> 
> Signed-off-by: Sören Tempel 
> 
> ChangeLog:
> 
>   * libgo/runtime/go-signal.c (defined): Use .gp_regs/.gregs
> to access ppc64/ppc32 registers.
>   (dumpregs): Ditto.
> ---
> Changes since v3: Add special handling for 32-bit PowerPC with glibc,
> also avoid use of gregs_t type since glibc does not seem to define
> it on PowerPC.
> 
> This version of the patch introduces a new macro (PPC_GPREGS) to access
> these registers to special case musl/glibc handling in a central place
> once instead of duplicating it twice.
> 
>  libgo/runtime/go-signal.c | 32 
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
> index d30d1603adc..3255046260d 100644
> --- a/libgo/runtime/go-signal.c
> +++ b/libgo/runtime/go-signal.c
> @@ -16,6 +16,21 @@
>#define SA_RESTART 0
>  #endif
>  
> +// The PowerPC API for accessing gregs/gp_regs differs greatly across
> +// different libc implementations (musl and glibc).  To workaround that,
> +// define the canonical way to access these registers once here.
> +//
> +// See https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591360.html
> +#ifdef __PPC__
> +#if defined(__PPC64__)   /* ppc64 glibc & musl */
> +#define PPC_GPREGS(MCTX) (MCTX)->gp_regs
> +#elif defined(__GLIBC__) /* ppc32 glibc */
> +#define PPC_GPREGS(MCTX) (MCTX)->uc_regs->gregs
> +#else/* ppc32 musl */
> +#define PPC_GPREGS(MCTX) (MCTX)->gregs
> +#endif /* __PPC64__ */
> +#endif /* __PPC__ */
> +
>  #ifdef USING_SPLIT_STACK
>  
>  extern void __splitstack_getcontext(void *context[10]);
> @@ -224,7 +239,8 @@ getSiginfo(siginfo_t *info, void *context 
> __attribute__((unused)))
>  #elif defined(__alpha__) && defined(__linux__)
>   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.sc_pc;
>  #elif defined(__PPC__) && defined(__linux__)
> - ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
> + mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
> + ret.sigpc = PPC_GPREGS(m)[32];
>  #elif defined(__PPC__) && defined(_AIX)
>   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
>  #elif defined(__aarch64__) && defined(__linux__)
> @@ -341,13 +357,13 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
> *context __attribute__((u
>   int i;
>  
>   for (i = 0; i < 32; i++)
> - runtime_printf("r%d %X\n", i, m->regs->gpr[i]);
> - runtime_printf("pc  %X\n", m->regs->nip);
> - runtime_printf("msr %X\n", m->regs->msr);
> - runtime_printf("cr  %X\n", m->regs->ccr);
> - runtime_printf("lr  %X\n", m->regs->link);
> - runtime_printf("ctr %X\n", m->regs->ctr);
> - runtime_printf("xer %X\n", m->regs->xer);
> + runtime_printf("r%d %X\n", i, PPC_GPREGS(m)[i]);
> + runtime_printf("pc  %X\n", PPC_GPREGS(m)[32]);
> + runtime_printf("msr %X\n", PPC_GPREGS(m)[33]);
> + runtime_printf("cr  %X\n", PPC_GPREGS(m)[38]);
> + runtime_printf("lr  %X\n", PPC_GPREGS(m)[36]);
> + runtime_printf("ctr %X\n", PPC_GPREGS(m)[35]);
> + runtime_printf("xer %X\n", PPC_GPREGS(m)[37]);
> }
>  #elif defined(__PPC__) && defined(_AIX)
> {
> 


Re: options: Fix "Multiple different help strings" error diagnostic (was: make conflicting help text an error)

2022-03-31 Thread Joseph Myers
On Wed, 30 Mar 2022, Thomas Schwinge wrote:

> > --- gcc/optc-gen.awk  (revision 187427)
> > +++ gcc/optc-gen.awk  (working copy)
> 
> >   else if (help[i] != "" && help[i + 1] != help[i])
> > - print "warning: multiple different help strings for " 
> > \
> > - opts[i] ":\n\t" help[i] "\n\t" help[i + 1] \
> > - | "cat 1>&2"
> > + print "#error Multiple different help strings for " \
> > + opts[i] ":\n\t" help[i] "\n\t" help[i + 1]
> > +
> 
> OK to push the attached 'options: Fix "Multiple different help strings"
> error diagnostic'?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: options: Clarifications around option definition records' help texts (was: make conflicting help text an error)

2022-03-31 Thread Joseph Myers
On Wed, 30 Mar 2022, Thomas Schwinge wrote:

> > I don't think we want to support different help strings for
> > different languages; if an option is supported for multiple languages, we
> > should have a generic description of that option that is correct for all
> > of them.
> 
> To not just bury that in the email archives: OK to push the attached
> "options: Clarifications around option definition records' help texts"?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libstdc++: Implement std::unreachable() for C++23 (P0627R6)

2022-03-31 Thread Marc Glisse via Gcc-patches

On Thu, 31 Mar 2022, Jonathan Wakely wrote:


On Thu, 31 Mar 2022 at 17:03, Marc Glisse via Libstdc++
 wrote:


On Thu, 31 Mar 2022, Matthias Kretz via Gcc-patches wrote:


I like it. But I'd like it even more if we could have

#elif defined _UBSAN
   __ubsan_invoke_ub("reached std::unreachable()");

But to my knowledge UBSAN has no hooks for the library like this (yet).


-fsanitize=undefined already replaces __builtin_unreachable with its own
thing, so I was indeed going to ask if the assertion / trap provide a
better debugging experience compared to plain __builtin_unreachable, with
the possibility to get a stack trace (UBSAN_OPTIONS=print_stacktrace=1),
etc? Detecting if (the right subset of) ubsan is enabled sounds like a
good idea.


Does UBsan define a macro that we can use to detect it?


https://github.com/google/sanitizers/issues/765 seems to say no (it could 
be outdated though), but they were asking for use cases to motivate adding 
one. Apparently there is a macro for clang, although I don't think it is 
fine-grained.


Adding one to cppbuiltin.cc testing SANITIZE_UNREACHABLE looks easy, maybe 
we can do just this one, we don't need to go overboard and define macros 
for all possible suboptions of ubsan right now.


I don't think any of that prevents from pushing your patch as is for 
gcc-12.


--
Marc Glisse


Re: [patch]update the documentation for TARGET_ZERO_CALL_USED_REGS hook and add an assertion

2022-03-31 Thread Uros Bizjak via Gcc-patches
On Mon, Mar 21, 2022 at 10:28 PM Qing Zhao via Gcc-patches
 wrote:
>
> Hi,
>
> Per our discussion on: 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592002.html
>
> I come up with the following patch to:
>
> 1. Update the documentation for TARGET_ZERO_CALL_USED_REGS hook;
> 2. Add an assertion in function.cc to make sure the actually zeroed_regs is a 
> subset of all call used regs;
>(The reason I didn’t add a new parameter to TARGET_ZERO_CALL_USED_REGS is, 
> I think adding the
> assertion in the common place function.cc is simpler to be implemented).
> 3. This new assertion identified a bug in i386 implementation. Fix this bug 
> in i386.
>
> This patch is bootstrapped on both x86 and aarch64, no regression.
>
> Okay for commit?
>
> thanks.
>
> Qing
>
> ===
> From 2e5bc1b25a707c6a17afbf03da2a8bec5b03454d Mon Sep 17 00:00:00 2001
> From: Qing Zhao 
> Date: Fri, 18 Mar 2022 20:49:56 +
> Subject: [PATCH] Add an assertion: the zeroed_hardregs set is a subset of all
>  call used regs.
>
> We should make sure that the hard register set that is actually cleared by
> the target hook zero_call_used_regs should be a subset of all call used
> registers.
>
> At the same time, update documentation for the target hook
> TARGET_ZERO_CALL_USED_REGS.
>
> This new assertion identified a bug in the i386 implemenation, which
> incorrectly set the zeroed_hardregs for stack registers. Fixed this bug
> in i386 implementation.
>
> gcc/ChangeLog:
>
> 2022-03-21  Qing Zhao  
>
> * config/i386/i386.cc (zero_all_st_registers): Return the value of
> num_of_st.
> (ix86_zero_call_used_regs): Update zeroed_hardregs set according to
> the return value of zero_all_st_registers.
> * doc/tm.texi: Update the documentation of TARGET_ZERO_CALL_USED_REGS.
> * function.cc (gen_call_used_regs_seq): Add an assertion.
> * target.def: Update the documentation of TARGET_ZERO_CALL_USED_REGS.

OK for the i386 part.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 27 ++-
>  gcc/doc/tm.texi |  7 +++
>  gcc/function.cc | 22 ++
>  gcc/target.def  |  7 +++
>  4 files changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 5a561966eb44..d84047a4bc1b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -3753,16 +3753,17 @@ zero_all_vector_registers (HARD_REG_SET 
> need_zeroed_hardregs)
> needs to be cleared, the whole stack should be cleared.  However,
> x87 stack registers that hold the return value should be excluded.
> x87 returns in the top (two for complex values) register, so
> -   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.
> +   return the value of num_of_st.  */
>
>
> -static bool
> +static int
>  zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>  {
>
>/* If the FPU is disabled, no need to zero all st registers.  */
>if (! (TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387))
> -return false;
> +return 0;
>
>unsigned int num_of_st = 0;
>for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> @@ -3774,7 +3775,7 @@ zero_all_st_registers (HARD_REG_SET 
> need_zeroed_hardregs)
>}
>
>if (num_of_st == 0)
> -return false;
> +return 0;
>
>bool return_with_x87 = false;
>return_with_x87 = (crtl->return_rtx
> @@ -3802,7 +3803,7 @@ zero_all_st_registers (HARD_REG_SET 
> need_zeroed_hardregs)
>insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
>add_reg_note (insn, REG_DEAD, st_reg);
>  }
> -  return true;
> +  return num_of_st;
>  }
>
>
> @@ -3851,7 +3852,7 @@ ix86_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>  {
>HARD_REG_SET zeroed_hardregs;
>bool all_sse_zeroed = false;
> -  bool all_st_zeroed = false;
> +  int all_st_zeroed_num = 0;
>bool all_mm_zeroed = false;
>
>CLEAR_HARD_REG_SET (zeroed_hardregs);
> @@ -3881,9 +3882,17 @@ ix86_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>if (!exit_with_mmx_mode)
>  /* x87 exit mode, we should zero all st registers together.  */
>  {
> -  all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
> -  if (all_st_zeroed)
> -   SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
> +  all_st_zeroed_num = zero_all_st_registers (need_zeroed_hardregs);
> +
> +  if (all_st_zeroed_num > 0)
> +   for (unsigned int regno = FIRST_STACK_REG; regno <= LAST_STACK_REG; 
> regno++)
> + /* x87 stack registers that hold the return value should be 
> excluded.
> +x87 returns in the top (two for complex values) register.  */
> + if (all_st_zeroed_num == 8
> + || !((all_st_zeroed_num >= 6 && regno == REGNO 
> (crtl->return_rtx))
> +  || (a

Re: [PATCH v4] libgo: Don't use pt_regs member in mcontext_t

2022-03-31 Thread Ian Lance Taylor via Gcc-patches
On Thu, Mar 31, 2022 at 9:41 AM Sören Tempel  wrote:
>
> Ping.
>
> Would be nice to get this integrated since this one of the changes needed to
> make gccgo work with musl libc. Let me know if the patch needs to be revised
> further.

I went with a simpler solution, more verbose but easier to read.  Now
committed to mainline.  Please let me know if you have any problems
with this.  Thanks.

Ian
fad0ecb68c08512ac24852b6d5264cdb9809dc6d
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index afaccb0e9e6..f93eaf48e28 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-7f33baa09a8172bb2c5f1ca0435d9efe3e194c9b
+45108f37070afb696b069768700e39a269f1fecb
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
index 0cb90304730..9c919e1568a 100644
--- a/libgo/runtime/go-signal.c
+++ b/libgo/runtime/go-signal.c
@@ -231,7 +231,14 @@ getSiginfo(siginfo_t *info, void *context 
__attribute__((unused)))
 #elif defined(__alpha__) && defined(__linux__)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.sc_pc;
 #elif defined(__PPC__) && defined(__linux__)
+   // For some reason different libc implementations use
+   // different names.
+#if defined(__PPC64__) || defined(__GLIBC__)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
+#else
+   // Assumed to be ppc32 musl.
+   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
+#endif
 #elif defined(__PPC__) && defined(_AIX)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
 #elif defined(__aarch64__) && defined(__linux__)
@@ -347,6 +354,7 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
*context __attribute__((u
mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
int i;
 
+#if defined(__PPC64__) || defined(__GLIBC__)
for (i = 0; i < 32; i++)
runtime_printf("r%d %X\n", i, m->regs->gpr[i]);
runtime_printf("pc  %X\n", m->regs->nip);
@@ -355,6 +363,16 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
*context __attribute__((u
runtime_printf("lr  %X\n", m->regs->link);
runtime_printf("ctr %X\n", m->regs->ctr);
runtime_printf("xer %X\n", m->regs->xer);
+#else
+   for (i = 0; i < 32; i++)
+   runtime_printf("r%d %X\n", i, m->gregs[i]);
+   runtime_printf("pc  %X\n", m->gregs[32]);
+   runtime_printf("msr %X\n", m->gregs[33]);
+   runtime_printf("cr  %X\n", m->gregs[38]);
+   runtime_printf("lr  %X\n", m->gregs[36]);
+   runtime_printf("ctr %X\n", m->gregs[35]);
+   runtime_printf("xer %X\n", m->gregs[37]);
+#endif
  }
 #elif defined(__PPC__) && defined(_AIX)
  {


Re: rs6000 patch ping: [PATCH 8/8] rs6000: Fix some missing built-in attributes [PR104004]

2022-03-31 Thread Segher Boessenkool
On Wed, Mar 30, 2022 at 06:07:26PM -0500, Segher Boessenkool wrote:
> On Tue, Mar 15, 2022 at 02:18:00PM +0100, Jakub Jelinek wrote:
> > On Fri, Jan 28, 2022 at 11:50:26AM -0600, Bill Schmidt via Gcc-patches 
> > wrote:
> > > PR104004 caught some misses on my part in converting to the new built-in
> > > function infrastructure.  In particular, I forgot to mark all of the 
> > > "nosoft"
> > > built-ins, and one of those should also have been marked "no32bit".
> 
> > This patch fixes a P1 regression and from my (limited) understanding
> > doesn't depend on any other patch in the series.
> 
> It depends on 3/8 which was only partially applied (or not at all even?)
> It is a mess :-(
> 
> I'll look into it tomorrow.

3/8 wasn't applied at all.  I did some surgery to apply this 8/8 though.


Segher


New Croatian PO file for 'gcc' (version 12.1-b20220213)

2022-03-31 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-12.1-b20220213.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




fixed-point/composite-type: add -Wno-array-parameter

2022-03-31 Thread Alexandre Oliva via Gcc-patches


On machines that support fixed-point and the test runs, it's failing
because of warnings issued by -Warray-parameter=[12], enabled by
-Wall.

The warnings state "mismatch in bound 1 of argument 1 declared as...",
referring to the redeclaration of f2_##NAME.  The purpose of the
redeclaration is not clear to me.

It doesn't look like the test intends to catch mismatches between
parameter's array lengths, despite the explicit array bound and the
incompatible calls, so I'm adding -Wno-array-parameter to avoid this
distraction and enable the test to pass.

Tested on arm-eabi, where the patch removes the excess errors fail.  Ok
to install?


for gcc/testsuite/ChangeLog

* gcc.dg/fixed-point/composite-type.c: Add -Wno-array-parameter.
---
 gcc/testsuite/gcc.dg/fixed-point/composite-type.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c 
b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
index 026bdaf564420..59351ff09b390 100644
--- a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
+++ b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-std=gnu99 -O -Wall -Wno-unused -ftrack-macro-expansion=0" } 
*/
+/* { dg-options "-std=gnu99 -O -Wall -Wno-unused -ftrack-macro-expansion=0 
-Wno-array-parameter" } */
 
 /* C99 6.2.7: Compatible type and composite type.  */
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] libgcc, riscv: Add restore libcalls to be used by tail calling functions

2022-03-31 Thread Palmer Dabbelt

On Tue, 29 Mar 2022 07:08:35 PDT (-0700), lewis.rev...@embecosm.com wrote:

Currently the existing libcalls for restoring registers have the
requirement that they must be tail called by the parent function, so
that they can safely return through the restored return address
register. This does impose the restriction that the libcalls cannot be
used if there already exists a tail call at the end of the parent
function in question, and as such this patch forms part of an effort to
rectify this situation.

There already exists patches to LLVM and Compiler-RT to add the libcalls
and the capability for the compiler to generate them
(https://reviews.llvm.org/D91720 and https://reviews.llvm.org/D91719),
and the behaviour that we want to standardize across the compilers is
documented in the following pull request to the RISC-V toolchain
conventions repository:
https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/10


This generally looks good to me, but the timing is awkward: we're in 
stage 4 (so features need an exception), but my bigger worry is that 
taking support for a draft spec so late in the cycle puts us at serious 
risk of shipping the draft and being stuck with it (which is bad for 
everyone).  It looks like the spec is just waiting on GCC, though, so 
maybe we're in that chicken-and-egg stage -- a bit of a headache for 
that to show up in stage 4 as there's no room for error, but this one 
seems manageable.


If this is aimed at GCC-13, then I think it's best to make sure we also 
have the GCC support for emitting calls to those routines -- otherwise 
it'll be very hard to test this.  The good news is that in that case 
there's time, it's just a chunk of extra work to do.  That should also 
make alignment with the spec timeline easy, as we'll have many months of 
slack.


Regardless, it seems like this is mostly Jim's code so I'll defer to him 
here.


Thanks!


The libcalls added in this patch follow that documented behaviour and
are based off a suggested implementation provided by Jim Wilson in the
thread of that pull request. Similar to the existing restore libcalls,
restores are grouped according to the expected stack alignment, and the
'upper' libcalls fall through to the lower libcalls, finally ending in
return through the temporary register t1.

libgcc/

* config/riscv/restore-tail.S: Add restore libcalls compatible
with use from functions ending in tail calls.
* config/riscv/t-elf: Add file restore-tail.S.
---
 libgcc/config/riscv/restore-tail.S | 279 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 280 insertions(+)
 create mode 100644 libgcc/config/riscv/restore-tail.S

diff --git a/libgcc/config/riscv/restore-tail.S
b/libgcc/config/riscv/restore-tail.S
new file mode 100644
index 000..54116beff17
--- /dev/null
+++ b/libgcc/config/riscv/restore-tail.S
@@ -0,0 +1,279 @@
+/* Tail-call compatible callee-saved register restore routines for RISC-V.
+
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#include "riscv-asm.h"
+
+  .text
+
+#if __riscv_xlen == 64
+
+FUNC_BEGIN (__riscv_restore_tailcall_12)
+  .cfi_startproc
+  .cfi_def_cfa_offset 112
+  .cfi_offset 27, -104
+  .cfi_offset 26, -96
+  .cfi_offset 25, -88
+  .cfi_offset 24, -80
+  .cfi_offset 23, -72
+  .cfi_offset 22, -64
+  .cfi_offset 21, -56
+  .cfi_offset 20, -48
+  .cfi_offset 19, -40
+  .cfi_offset 18, -32
+  .cfi_offset 9, -24
+  .cfi_offset 8, -16
+  .cfi_offset 1, -8
+  ld s11, 8(sp)
+  .cfi_restore 27
+  addi sp, sp, 16
+
+FUNC_BEGIN (__riscv_restore_tailcall_11)
+FUNC_BEGIN (__riscv_restore_tailcall_10)
+  .cfi_restore 27
+  .cfi_def_cfa_offset 96
+  ld s10, 0(sp)
+  .cfi_restore 26
+  ld s9, 8(sp)
+  .cfi_restore 25
+  addi sp, sp, 16
+
+FUNC_BEGIN (__riscv_restore_tailcall_9)
+FUNC_BEGIN (__riscv_restore_tailcall_8)
+  .cfi_restore 25
+  .cfi_restore 26
+  .cfi_restore 27
+  .cfi_def_cfa_offset 80
+  ld s8, 0(sp)
+  .cfi_restore 24
+  ld s7, 8(sp)
+  .cfi_restore 23
+  addi sp, sp, 16
+
+FUNC_BEGIN (__riscv_restore_tailcall_7)
+FUNC_BEGIN (__riscv_restore

[COMMITTED] MAINTAINERS: Update my email address

2022-03-31 Thread Qian Jianhua via Gcc-patches
Update my email address in the MAINTAINERS file.

2022-04-01  Qian Jianhua  

ChangeLog:

* MAINTAINERS: Update my email address.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f388bdaf4f1..30f81b3dd52 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -463,7 +463,7 @@ Daniel Jacobowitz   
 Andreas Jaeger 
 Harsha Jagasia 
 Fariborz Jahanian  
-Qian Jianhua   
+Qian Jianhua   
 Janis Johnson  
 Teresa Johnson 
 Kean Johnston  
-- 
2.18.1





Re: try multi dest registers in default_zero_call_used_regs

2022-03-31 Thread Alexandre Oliva via Gcc-patches
Hello, Richard,

Thanks for the review!

On Mar 31, 2022, Richard Sandiford  wrote:

>> +/* If the natural mode doesn't work, try some wider mode.  */
>> +if (!targetm.hard_regno_mode_ok (regno, mode))
>> +  {
>> +for (int nregs = 2;
>> + regno + nregs <= FIRST_PSEUDO_REGISTER
>> +   && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> + regno + nregs - 1);
>> + nregs++)
>> +  {
>> +mode = choose_hard_reg_mode (regno, nregs, 0);

> I like the idea, but it would be good to avoid the large:

>   FIRST_PSEUDO_REGISTER * FIRST_PSEUDO_REGISTER * NUM_MACHINE_MODES

> constant factor.

Enteringf the nregs loop, because the register can't be used in its
natural mode, is supposed to be an unusual case, not worth optimizing
much under Amdahl's law.  I gather the aggregate trip counts are
unlikely to hit the theoretical O(n^2) because registers that would take
the loop are rare and expected to be paired/grouped up.  If that
assumption doesn't hold, then a cap would indeed be desirable.

> How about if init_reg_modes_target recorded the maximum value of
> x_hard_regno_nregs?

I had thought of a cap but couldn't find one I was happy with, and in
the end I thought we didn't need one.  But this is indeed a good one to
use.  Thanks, I'm implementing it.

> This seems big enough to be worth splitting out into a helper, rather
> than repeating.

I had considered that, but it didn't seem to me it would bring an
improvement.  As it turns out, it does.  Thanks.

>> -rtx zsrc = gen_rtx_REG (mode, src);
>> +rtx src_rtx = (mode == GET_MODE (regno_reg_rtx[src])
>> +   ? regno_reg_rtx[src]
>> +   : gen_rtx_REG (mode, src));

> Is this needed?  The original gen_rtx_REG (mode, src) seems OK.

No, it's not needed, it's just an attempt to avoid allocating RTL that
we have handy.  This function could in theory make several attempts at
allocating rtl for each register in the shrinking pending set.  I
thought every saved bit could help.


Here's what I'm regstrapping on x86_64-linux-gnu, after verifying that
it does the job on the affected arm variant.  Ok to install, assuming no
surprises in the testing?


try multi-reg dest in default_zero_call_used_regs

From: Alexandre Oliva 

When the mode of regno_reg_rtx is not hard_regno_mode_ok for the
target, try grouping the register with subsequent ones.  This enables
s16 to s31 and their hidden pairs to be zeroed with the default logic
on some arm variants.


for  gcc/ChangeLog

* targhooks.c (default_zero_call_used_regs): Attempt to group
regs that the target refuses to use in their natural modes.
(zcur_select_mode_rtx): New.
* regs.h (struct target_regs): Add x_hard_regno_max_nregs.
(hard_regno_max_nregs): Define.
* reginfo.c (init_reg_modes_target): Set hard_regno_max_nregs.
---
 gcc/reginfo.cc   |9 --
 gcc/regs.h   |5 +++
 gcc/targhooks.cc |   86 --
 3 files changed, 89 insertions(+), 11 deletions(-)

diff --git a/gcc/reginfo.cc b/gcc/reginfo.cc
index 234f72eceeb25..67e30cab42855 100644
--- a/gcc/reginfo.cc
+++ b/gcc/reginfo.cc
@@ -441,10 +441,15 @@ init_reg_modes_target (void)
 {
   int i, j;
 
+  this_target_regs->x_hard_regno_max_nregs = 1;
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 for (j = 0; j < MAX_MACHINE_MODE; j++)
-  this_target_regs->x_hard_regno_nregs[i][j]
-   = targetm.hard_regno_nregs (i, (machine_mode) j);
+  {
+   unsigned char nregs = targetm.hard_regno_nregs (i, (machine_mode) j);
+   this_target_regs->x_hard_regno_nregs[i][j] = nregs;
+   if (nregs > this_target_regs->x_hard_regno_max_nregs)
+ this_target_regs->x_hard_regno_max_nregs = nregs;
+  }
 
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 {
diff --git a/gcc/regs.h b/gcc/regs.h
index 74f1f63770322..f72b06fb56508 100644
--- a/gcc/regs.h
+++ b/gcc/regs.h
@@ -202,6 +202,9 @@ struct target_regs {
  registers that a given machine mode occupies.  */
   unsigned char x_hard_regno_nregs[FIRST_PSEUDO_REGISTER][MAX_MACHINE_MODE];
 
+  /* The max value found in x_hard_regno_nregs.  */
+  unsigned char x_hard_regno_max_nregs;
+
   /* For each hard register, the widest mode object that it can contain.
  This will be a MODE_INT mode if the register can hold integers.  Otherwise
  it will be a MODE_FLOAT or a MODE_CC mode, whichever is valid for the
@@ -235,6 +238,8 @@ extern struct target_regs *this_target_regs;
 #else
 #define this_target_regs (&default_target_regs)
 #endif
+#define hard_regno_max_nregs \
+  (this_target_regs->x_hard_regno_max_nregs)
 #define reg_raw_mode \
   (this_target_regs->x_reg_raw_mode)
 #define have_regs_of_mode \
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index fc49235eb38ee..2681833e2ce79 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@

Re: fixed-point/composite-type: add -Wno-array-parameter

2022-03-31 Thread Richard Biener via Gcc-patches
On Fri, Apr 1, 2022 at 4:40 AM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> On machines that support fixed-point and the test runs, it's failing
> because of warnings issued by -Warray-parameter=[12], enabled by
> -Wall.
>
> The warnings state "mismatch in bound 1 of argument 1 declared as...",
> referring to the redeclaration of f2_##NAME.  The purpose of the
> redeclaration is not clear to me.
>
> It doesn't look like the test intends to catch mismatches between
> parameter's array lengths, despite the explicit array bound and the
> incompatible calls, so I'm adding -Wno-array-parameter to avoid this
> distraction and enable the test to pass.
>
> Tested on arm-eabi, where the patch removes the excess errors fail.  Ok
> to install?

Sounds reasonable.  OK.

Richard.

>
> for gcc/testsuite/ChangeLog
>
> * gcc.dg/fixed-point/composite-type.c: Add -Wno-array-parameter.
> ---
>  gcc/testsuite/gcc.dg/fixed-point/composite-type.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c 
> b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
> index 026bdaf564420..59351ff09b390 100644
> --- a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
> +++ b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-std=gnu99 -O -Wall -Wno-unused -ftrack-macro-expansion=0" 
> } */
> +/* { dg-options "-std=gnu99 -O -Wall -Wno-unused -ftrack-macro-expansion=0 
> -Wno-array-parameter" } */
>
>  /* C99 6.2.7: Compatible type and composite type.  */
>
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


[PATCH v2] rs6000: Support UN[GL][ET] in rs6000_maybe_emit_maxc_minc [PR105002]

2022-03-31 Thread Kewen.Lin via Gcc-patches
Hi,

Commit r12-7687 exposed one miss optimization chance in function
rs6000_maybe_emit_maxc_minc, for now it only considers comparison
codes GE/GT/LE/LT, but it can support more variants with codes
UNLT/UNLE/UNGT/UNGE by reversing them into the equivalent ones
with GE/GT/LE/LT.

Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592204.html

v2: Add more comments to provide better readability as Segher suggested.

BR,
Kewen
-
PR target/105002

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_maybe_emit_maxc_minc): Support more
comparison codes UNLT/UNLE/UNGT/UNGE.

---
 gcc/config/rs6000/rs6000.cc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index f41b8f740ba..40db13e6498 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15838,11 +15838,26 @@ rs6000_maybe_emit_maxc_minc (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   rtx op1 = XEXP (op, 1);
   machine_mode compare_mode = GET_MODE (op0);
   machine_mode result_mode = GET_MODE (dest);
-  bool max_p = false;
+  bool max_p;

   if (result_mode != compare_mode)
 return false;

+  /* See the comments of this function, it simply expects GE/GT/LE/LT in
+ the checks, but for the reversible equivalent UNLT/UNLE/UNGT/UNGE,
+ we need to do the reversions first to make the following checks
+ support fewer cases, like:
+
+   (a UNLT b) ? op1 : op2 =>  (a >= b) ? op2 : op1;
+   (a UNLE b) ? op1 : op2 =>  (a >  b) ? op2 : op1;
+   (a UNGT b) ? op1 : op2 =>  (a <= b) ? op2 : op1;
+   (a UNGE b) ? op1 : op2 =>  (a <  b) ? op2 : op1;  */
+  if (code == UNGE || code == UNGT || code == UNLE || code == UNLT)
+{
+  code = reverse_condition_maybe_unordered (code);
+  std::swap (true_cond, false_cond);
+}
+
   if (code == GE || code == GT)
 max_p = true;
   else if (code == LE || code == LT)
--
2.27.0



Re: [PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-03-31 Thread Hongtao Liu via Gcc-patches
On Thu, Mar 31, 2022 at 6:45 PM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Mar 31, 2022 at 7:51 AM liuhongt  wrote:
> >
> > Since cfg is freed before machine_reorg, just do a rough calculation
> > of the window according to the layout.
> > Also according to an experiment on CLX, set window size to 64.
> >
> > Currently only handle V2DFmode load since it doesn't need any scratch
> > registers, and it's sufficient to recover cray performance for -O2
> > compared to GCC11.
> >
> > Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}.
> > No impact for SPEC2017(same binary for both O2 and Ofast).
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/101908
> > * config/i386/i386.cc (ix86_split_stlf_stall_load): New
> > function
> > (ix86_reorg): Call ix86_split_stlf_stall_load.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr101908-1.c: New test.
> > * gcc.target/i386/pr101908-2.c: New test.
> > ---
> >  gcc/config/i386/i386.cc| 47 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 ++
> >  3 files changed, 71 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 5a561966eb4..f9169b04d43 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -21933,7 +21933,53 @@ ix86_seh_fixup_eh_fallthru (void)
> >emit_insn_after (gen_nops (const1_rtx), insn);
> >  }
> >  }
> > +/* Split vector load from parm_decl to elemental loads to avoid STLF
> > +   stalls.  */
> > +static void
> > +ix86_split_stlf_stall_load ()
> > +{
> > +  basic_block bb;
> > +  unsigned window = 0;
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +{
> > +  rtx_insn *insn;
> > +  FOR_BB_INSNS (bb, insn)
> > +   {
> > + if (!NONDEBUG_INSN_P (insn))
> > +   continue;
> > + window++;
> > + /* Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between 
> > each
> > +other, just emulate for pipeline) before stalled load, stlf 
> > stall
> > +case is as fast as no stall cases on CLX.
> > +Since CFG is freed before machine_reorg, just do a rough
> > +calculation of the window according to the layout.  */
> > + if (window > 64)
> > +   return;
>
> I wonder if we should also return for any_uncondjump_p (insn)
> (not sure if that captures returnjump_p), or maybe just explicitely
> allow any_condjump_p and reject other PC setters.
>
I guess it doesn't include call.
> Likewise we might want to stop at a LABEL that can be backwards reached.
>
I think checking load from parm_decl can somehow avoid split load in a
loop(assume optimizer will hoist that out).
> I suppose people more familiar with cfgrtl can suggest something better.
>
> > + rtx set = single_set (insn);
> > + if (!set)
> > +   continue;
> > + rtx src = SET_SRC (set);
> > + if (!MEM_P (src)
> > + /* Only handle V2DFmode load since it doesn't need any scratch
> > +register.  */
> > + || GET_MODE (src) != E_V2DFmode
> > + || !MEM_EXPR (src)
> > + || TREE_CODE (get_base_address (MEM_EXPR (src))) != PARM_DECL)
>
> I wonder if we have (easy) ways to detect whether XEXP (src, 0) is
> frame/stack based
> rather than requiring a MEM_EXPR.  There is may_be_sp_based_p ()
may_be_sp_based_p just checks stack pointer which is not suitable after RA.
> exported from alias.c
> for example, but I'm not sure whether that works after RA & frame elimination.
>
> > +   continue;
> > +
> > + rtx zero = CONST0_RTX (V2DFmode);
> > + rtx dest = SET_DEST (set);
> > + rtx m = adjust_address (src, DFmode, 0);
> > + emit_insn_before (gen_sse2_loadlpd (dest, zero, m), insn);
>
> Can SSE1 also do this?
X86 does have movlps and movhps, but the problem is movlps load 64bit
memory to xmm w/o change upper bits which may cause partial register
dependence.
>
> > + m = adjust_address (src, DFmode, 8);
> > + PATTERN (insn) = gen_sse2_loadhpd (dest, dest, m);
> > + INSN_CODE (insn) = -1;
> > + gcc_assert (recog_memoized (insn) != -1);
>
> I think we want to dump something into dump_file when we split an insn here.
Good idea.
>
> > +   }
> > +}
> > +
> > +}
> >  /* Implement machine specific optimizations.  We implement padding of 
> > returns
> > for K8 CPUs and pass to avoid 4 jumps in the single 16 byte window.  */
> >  static void
> > @@ -21948,6 +21994,7 @@ ix86_reorg (void)
> >
> >if (optimize && optimize_function_for_speed_p (cfun))
> >  {
> > +  ix86_split_stlf_stall_load ();
> >if (TARGET_PAD_SHORT_FUNCTION)
> > ix86_pad_short_functi

Re: [PATCH] rs6000: Support UN[GL][ET] in rs6000_maybe_emit_maxc_minc [PR105002]

2022-03-31 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/3/27 11:02 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Mar 24, 2022 at 10:00:43AM +0800, Kewen.Lin wrote:
>> Commit r12-7687 exposed one miss optimization chance in function
>> rs6000_maybe_emit_maxc_minc, for now it only considers comparison
>> codes GE/GT/LE/LT, but it can support more variants with codes
>> UNLT/UNLE/UNGT/UNGE by reversing them into the equivalent ones
>> with GE/GT/LE/LT.
> 
>> +  /* Canonicalize UN[GL][ET] to [LG][TE].  */
>> +  if (code == UNGE || code == UNGT || code == UNLE || code == UNLT)
>> +{
>> +  code = reverse_condition_maybe_unordered (code);
>> +  std::swap (true_cond, false_cond);
>> +}
> 
> Typically you would have to generate code to compensate for the reversed
> comparison.  It works out fine here, but could you please restructure
> the code to make that less tricky / more obvious?  Or at least add a
> comment?
> 
> I wouldn't call it "canonicalisation" btw, LT is by no means more
> standard than UNGE is.  You can say you are folding things so you later
> have to support fewer cases, or similar?
> 

Thanks for the review!  Sorry for the late reply (I'm just back from
vacation).  I just posted v2 by adding more comments to describe the
reversions and changing the bad "canonicalisation" word, hope it looks
better to you.  :)

v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/592624.html

BR,
Kewen


[PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-03-31 Thread liuhongt via Gcc-patches
Update in V2:
1. Use get_insns instead of FOR_EACH_BB_CFUN and FOR_BB_INSNS.
2. Return for any_uncondjump_p and ANY_RETURN_P.
3. Add dump info for spliting instruction.
4. Restrict ix86_split_stlf_stall_load under TARGET_SSE2.

Since cfg is freed before machine_reorg, just do a rough calculation
of the window according to the layout.
Also according to an experiment on CLX, set window size to 64.

Currently only handle V2DFmode load since it doesn't need any scratch
registers, and it's sufficient to recover cray performance for -O2
compared to GCC11.

gcc/ChangeLog:

PR target/101908
* config/i386/i386.cc (ix86_split_stlf_stall_load): New
function
(ix86_reorg): Call ix86_split_stlf_stall_load.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
---
 gcc/config/i386/i386.cc| 60 ++
 gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +
 gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +
 3 files changed, 84 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5a561966eb4..c88a689f32b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21933,6 +21933,64 @@ ix86_seh_fixup_eh_fallthru (void)
   emit_insn_after (gen_nops (const1_rtx), insn);
 }
 }
+/* Split vector load from parm_decl to elemental loads to avoid STLF
+   stalls.  */
+static void
+ix86_split_stlf_stall_load ()
+{
+  rtx_insn* insn, *start = get_insns ();
+  unsigned window = 0;
+
+  for (insn = start; insn; insn = NEXT_INSN (insn))
+{
+  if (!NONDEBUG_INSN_P (insn))
+   continue;
+  window++;
+  /* Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each
+other, just emulate for pipeline) before stalled load, stlf stall
+case is as fast as no stall cases on CLX.
+Since CFG is freed before machine_reorg, just do a rough
+calculation of the window according to the layout.  */
+  if (window > 64)
+   return;
+
+  if (any_uncondjump_p (insn)
+ || ANY_RETURN_P (PATTERN (insn)))
+   return;
+
+  rtx set = single_set (insn);
+  if (!set)
+   continue;
+  rtx src = SET_SRC (set);
+  if (!MEM_P (src)
+ /* Only handle V2DFmode load since it doesn't need any scratch
+register.  */
+ || GET_MODE (src) != E_V2DFmode
+ || !MEM_EXPR (src)
+ || TREE_CODE (get_base_address (MEM_EXPR (src))) != PARM_DECL)
+   continue;
+
+  rtx zero = CONST0_RTX (V2DFmode);
+  rtx dest = SET_DEST (set);
+  rtx m = adjust_address (src, DFmode, 0);
+  rtx loadlpd = gen_sse2_loadlpd (dest, zero, m);
+  emit_insn_before (loadlpd, insn);
+  m = adjust_address (src, DFmode, 8);
+  rtx loadhpd = gen_sse2_loadhpd (dest, dest, m);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fputs ("Due to potential STLF stall, split instruction:\n",
+dump_file);
+ print_rtl_single (dump_file, insn);
+ fputs ("To:\n", dump_file);
+ print_rtl_single (dump_file, loadlpd);
+ print_rtl_single (dump_file, loadhpd);
+   }
+  PATTERN (insn) = loadhpd;
+  INSN_CODE (insn) = -1;
+  gcc_assert (recog_memoized (insn) != -1);
+}
+}
 
 /* Implement machine specific optimizations.  We implement padding of returns
for K8 CPUs and pass to avoid 4 jumps in the single 16 byte window.  */
@@ -21948,6 +22006,8 @@ ix86_reorg (void)
 
   if (optimize && optimize_function_for_speed_p (cfun))
 {
+  if (TARGET_SSE2)
+   ix86_split_stlf_stall_load ();
   if (TARGET_PAD_SHORT_FUNCTION)
ix86_pad_short_function ();
   else if (TARGET_PAD_RETURNS)
diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c 
b/gcc/testsuite/gcc.target/i386/pr101908-1.c
new file mode 100644
index 000..33d9684f0ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx" } */
+/* { dg-final { scan-assembler-not {(?n)movhpd[ \t]} } } */
+
+struct X { double x[2]; };
+typedef double v2df __attribute__((vector_size(16)));
+
+v2df __attribute__((noipa))
+foo (struct X* x, struct X* y)
+{
+  return (v2df) {x->x[1], x->x[0] } + (v2df) { y->x[1], y->x[0] };
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr101908-2.c 
b/gcc/testsuite/gcc.target/i386/pr101908-2.c
new file mode 100644
index 000..45060b73c06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101908-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx" } */
+/* { dg-final { scan-assembler-times {(?n)movhpd[ \t]+} "2" } }  */
+
+struct X { double x[4]; };
+typedef double v2df __attribute__((vector_size(16)));
+
+v2df __attribute__((noipa))
+foo (

Re: [PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-03-31 Thread Richard Biener via Gcc-patches
On Fri, Apr 1, 2022 at 8:29 AM Hongtao Liu  wrote:
>
> On Thu, Mar 31, 2022 at 6:45 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Mar 31, 2022 at 7:51 AM liuhongt  wrote:
> > >
> > > Since cfg is freed before machine_reorg, just do a rough calculation
> > > of the window according to the layout.
> > > Also according to an experiment on CLX, set window size to 64.
> > >
> > > Currently only handle V2DFmode load since it doesn't need any scratch
> > > registers, and it's sufficient to recover cray performance for -O2
> > > compared to GCC11.
> > >
> > > Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}.
> > > No impact for SPEC2017(same binary for both O2 and Ofast).
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/101908
> > > * config/i386/i386.cc (ix86_split_stlf_stall_load): New
> > > function
> > > (ix86_reorg): Call ix86_split_stlf_stall_load.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr101908-1.c: New test.
> > > * gcc.target/i386/pr101908-2.c: New test.
> > > ---
> > >  gcc/config/i386/i386.cc| 47 ++
> > >  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 ++
> > >  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 ++
> > >  3 files changed, 71 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 5a561966eb4..f9169b04d43 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -21933,7 +21933,53 @@ ix86_seh_fixup_eh_fallthru (void)
> > >emit_insn_after (gen_nops (const1_rtx), insn);
> > >  }
> > >  }
> > > +/* Split vector load from parm_decl to elemental loads to avoid STLF
> > > +   stalls.  */
> > > +static void
> > > +ix86_split_stlf_stall_load ()
> > > +{
> > > +  basic_block bb;
> > > +  unsigned window = 0;
> > > +  FOR_EACH_BB_FN (bb, cfun)
> > > +{
> > > +  rtx_insn *insn;
> > > +  FOR_BB_INSNS (bb, insn)
> > > +   {
> > > + if (!NONDEBUG_INSN_P (insn))
> > > +   continue;
> > > + window++;
> > > + /* Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence 
> > > between each
> > > +other, just emulate for pipeline) before stalled load, stlf 
> > > stall
> > > +case is as fast as no stall cases on CLX.
> > > +Since CFG is freed before machine_reorg, just do a rough
> > > +calculation of the window according to the layout.  */
> > > + if (window > 64)
> > > +   return;
> >
> > I wonder if we should also return for any_uncondjump_p (insn)
> > (not sure if that captures returnjump_p), or maybe just explicitely
> > allow any_condjump_p and reject other PC setters.
> >
> I guess it doesn't include call.
> > Likewise we might want to stop at a LABEL that can be backwards reached.
> >
> I think checking load from parm_decl can somehow avoid split load in a
> loop(assume optimizer will hoist that out).
> > I suppose people more familiar with cfgrtl can suggest something better.
> >
> > > + rtx set = single_set (insn);
> > > + if (!set)
> > > +   continue;
> > > + rtx src = SET_SRC (set);
> > > + if (!MEM_P (src)
> > > + /* Only handle V2DFmode load since it doesn't need any 
> > > scratch
> > > +register.  */
> > > + || GET_MODE (src) != E_V2DFmode
> > > + || !MEM_EXPR (src)
> > > + || TREE_CODE (get_base_address (MEM_EXPR (src))) != 
> > > PARM_DECL)
> >
> > I wonder if we have (easy) ways to detect whether XEXP (src, 0) is
> > frame/stack based
> > rather than requiring a MEM_EXPR.  There is may_be_sp_based_p ()
> may_be_sp_based_p just checks stack pointer which is not suitable after RA.
> > exported from alias.c
> > for example, but I'm not sure whether that works after RA & frame 
> > elimination.
> >
> > > +   continue;
> > > +
> > > + rtx zero = CONST0_RTX (V2DFmode);
> > > + rtx dest = SET_DEST (set);
> > > + rtx m = adjust_address (src, DFmode, 0);
> > > + emit_insn_before (gen_sse2_loadlpd (dest, zero, m), insn);
> >
> > Can SSE1 also do this?
> X86 does have movlps and movhps, but the problem is movlps load 64bit
> memory to xmm w/o change upper bits which may cause partial register
> dependence.

So do we need to guard this transform on SSE2 availability then?

> >
> > > + m = adjust_address (src, DFmode, 8);
> > > + PATTERN (insn) = gen_sse2_loadhpd (dest, dest, m);
> > > + INSN_CODE (insn) = -1;
> > > + gcc_assert (recog_memoized (insn) != -1);
> >
> > I think we want to dump something into dump_file when we split an insn here.
> Good idea.
> >
> > > +   }
> > > +}
> > > +
> > > +}
> > >  /* Implement machine specific opti

Re: [PATCH] Split vector load from parm_del to elemental loads to avoid STLF stalls.

2022-03-31 Thread Richard Biener via Gcc-patches
On Fri, Apr 1, 2022 at 8:47 AM liuhongt via Gcc-patches
 wrote:
>
> Update in V2:
> 1. Use get_insns instead of FOR_EACH_BB_CFUN and FOR_BB_INSNS.
> 2. Return for any_uncondjump_p and ANY_RETURN_P.
> 3. Add dump info for spliting instruction.
> 4. Restrict ix86_split_stlf_stall_load under TARGET_SSE2.
>
> Since cfg is freed before machine_reorg, just do a rough calculation
> of the window according to the layout.
> Also according to an experiment on CLX, set window size to 64.
>
> Currently only handle V2DFmode load since it doesn't need any scratch
> registers, and it's sufficient to recover cray performance for -O2
> compared to GCC11.
>
> gcc/ChangeLog:
>
> PR target/101908
> * config/i386/i386.cc (ix86_split_stlf_stall_load): New
> function
> (ix86_reorg): Call ix86_split_stlf_stall_load.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr101908-1.c: New test.
> * gcc.target/i386/pr101908-2.c: New test.
> ---
>  gcc/config/i386/i386.cc| 60 ++
>  gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +
>  gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +
>  3 files changed, 84 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 5a561966eb4..c88a689f32b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -21933,6 +21933,64 @@ ix86_seh_fixup_eh_fallthru (void)
>emit_insn_after (gen_nops (const1_rtx), insn);
>  }
>  }
> +/* Split vector load from parm_decl to elemental loads to avoid STLF
> +   stalls.  */
> +static void
> +ix86_split_stlf_stall_load ()
> +{
> +  rtx_insn* insn, *start = get_insns ();
> +  unsigned window = 0;
> +
> +  for (insn = start; insn; insn = NEXT_INSN (insn))
> +{
> +  if (!NONDEBUG_INSN_P (insn))
> +   continue;
> +  window++;
> +  /* Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each
> +other, just emulate for pipeline) before stalled load, stlf stall
> +case is as fast as no stall cases on CLX.
> +Since CFG is freed before machine_reorg, just do a rough
> +calculation of the window according to the layout.  */
> +  if (window > 64)

I think we want to turn the '64' into a --param at least.  You can add

-param=x86-stlf-window-ninsns=

into i386.opt (see -param= examples in aarch64/ for example).

> +   return;
> +
> +  if (any_uncondjump_p (insn)
> + || ANY_RETURN_P (PATTERN (insn)))

You made a point about calls - does any_uncondjump_p cover them?

otherwise I think this is fine, Honza, do you agree?

Thanks,
Richard.

> +   return;
> +
> +  rtx set = single_set (insn);
> +  if (!set)
> +   continue;
> +  rtx src = SET_SRC (set);
> +  if (!MEM_P (src)
> + /* Only handle V2DFmode load since it doesn't need any scratch
> +register.  */
> + || GET_MODE (src) != E_V2DFmode
> + || !MEM_EXPR (src)
> + || TREE_CODE (get_base_address (MEM_EXPR (src))) != PARM_DECL
> +   continue;
> +
> +  rtx zero = CONST0_RTX (V2DFmode);
> +  rtx dest = SET_DEST (set);
> +  rtx m = adjust_address (src, DFmode, 0);
> +  rtx loadlpd = gen_sse2_loadlpd (dest, zero, m);
> +  emit_insn_before (loadlpd, insn);
> +  m = adjust_address (src, DFmode, 8);
> +  rtx loadhpd = gen_sse2_loadhpd (dest, dest, m);
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> + fputs ("Due to potential STLF stall, split instruction:\n",
> +dump_file);
> + print_rtl_single (dump_file, insn);
> + fputs ("To:\n", dump_file);
> + print_rtl_single (dump_file, loadlpd);
> + print_rtl_single (dump_file, loadhpd);
> +   }
> +  PATTERN (insn) = loadhpd;
> +  INSN_CODE (insn) = -1;
> +  gcc_assert (recog_memoized (insn) != -1);
> +}
> +}
>
>  /* Implement machine specific optimizations.  We implement padding of returns
> for K8 CPUs and pass to avoid 4 jumps in the single 16 byte window.  */
> @@ -21948,6 +22006,8 @@ ix86_reorg (void)
>
>if (optimize && optimize_function_for_speed_p (cfun))
>  {
> +  if (TARGET_SSE2)
> +   ix86_split_stlf_stall_load ();
>if (TARGET_PAD_SHORT_FUNCTION)
> ix86_pad_short_function ();
>else if (TARGET_PAD_RETURNS)
> diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c 
> b/gcc/testsuite/gcc.target/i386/pr101908-1.c
> new file mode 100644
> index 000..33d9684f0ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx" } */
> +/* { dg-final { scan-assembler-not {(?n)movhpd[ \t]} } } */
> +
> +struct X { double x[2]; };
> +typedef double v2df __attribute__((vector_size(16)));
> +
> +v2df __attribut