Re: [1/3] Add support for target_version attribute

2023-10-19 Thread Richard Biener
On Wed, 18 Oct 2023, Andrew Carlotti wrote:

> This patch adds support for the "target_version" attribute to the middle
> end and the C++ frontend, which will be used to implement function
> multiversioning in the aarch64 backend.
> 
> Note that C++ is currently the only frontend which supports
> multiversioning using the "target" attribute, whereas the
> "target_clones" attribute is additionally supported in C, D and Ada.
> Support for the target_version attribute will be extended to C at a
> later date.
> 
> Targets that currently use the "target" attribute for function
> multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> 
> 
> I could have implemented the target hooks slightly differently, by reusing the
> valid_attribute_p hook and adding attribute name checks to each backend
> implementation (c.f. the aarch64 implementation in patch 2/3).  Would this be
> preferable?
> 
> Otherwise, is this ok for master?

This lacks user-level documentation in doc/extend.texi (where
target_clones is documented).

Was there any discussion/description of why target_clones cannot
be made work for aarch64?

Richard.

> 
> gcc/c-family/ChangeLog:
> 
>   * c-attribs.cc (handle_target_version_attribute): New.
>   (c_common_attribute_table): Add target_version.
>   (handle_target_clones_attribute): Add conflict with
>   target_version attribute.
> 
> gcc/ChangeLog:
> 
>   * attribs.cc (is_function_default_version): Update comment to
>   specify incompatibility with target_version attributes.
>   * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
>   Call valid_version_attribute_p for target_version attributes.
>   * target.def (valid_version_attribute_p): New hook.
>   (expanded_clones_attribute): New hook.
>   * doc/tm.texi.in: Add new hooks.
>   * doc/tm.texi: Regenerate.
>   * multiple_target.cc (create_dispatcher_calls): Remove redundant
>   is_function_default_version check.
>   (expand_target_clones): Use target hook for attribute name.
>   * targhooks.cc (default_target_option_valid_version_attribute_p):
>   New.
>   * targhooks.h (default_target_option_valid_version_attribute_p):
>   New.
>   * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
>   target_version attributes.
> 
> gcc/cp/ChangeLog:
> 
>   * decl2.cc (check_classfn): Update comment to include
>   target_version attributes.
> 
> 
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 
> b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
>  100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
>return func_decl;  
>  }
>  
> -/* Returns true if decl is multi-versioned and DECL is the default function,
> -   that is it is not tagged with target specific optimization.  */
> +/* Returns true if DECL is multi-versioned using the target attribute, and 
> this
> +   is the default version.  This function can only be used for targets that 
> do
> +   not support the "target_version" attribute.  */
>  
>  bool
>  is_function_default_version (const tree decl)
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index 
> 072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
>  100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, tree, 
> tree, int, bool *);
>  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool 
> *);
>  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_target_version_attribute (tree *, tree, tree, int, bool 
> *);
>  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
>  static tree ignore_attribute (tree *, tree, tree, int, bool *);
> @@ -480,6 +481,8 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_error_attribute, NULL },
>{ "target", 1, -1, true, false, false, false,
> handle_target_attribute, NULL },
> +  { "target_version", 1, -1, true, false, false, false,
> +   handle_target_version_attribute, NULL },
>{ "target_clones",  1, -1, true, false, false, false,
> handle_target_clones_attribute, NULL },
>{ "optimize",   1, -1, true, false, false, false,
> @@ -5569,6 +5572,45 @@ handle_target_attribute (tree *node, tree name, tree 
> args, int flags,
>return NULL_TREE;
>  }
>  
> +/* Handle a "target_version" attribute.  */
> +
> +static tree
> +handle_target_version_attribute (tree *node, tree name, tree args, int flags,
> +   

Re: [PATCH 0/8] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS

2023-10-19 Thread Richard Biener
On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> 
> Refactor simd clone handling code ahead of support for poly simdlen.

OK.

Richard.

> gcc/ChangeLog:
> 
>   * omp-simd-clone.cc (simd_clone_subparts): Remove.
>   (simd_clone_init_simd_arrays): Replace simd_clone_supbarts with
>   TYPE_VECTOR_SUBPARTS.
>   (ipa_simd_modify_function_body): Likewise.
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Likewise.
>   (simd_clone_subparts): Remove.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big.

2023-10-19 Thread Richard Biener
On Thu, Oct 19, 2023 at 8:16 AM liuhongt  wrote:
>
> >So the bugs were not fixed without this hunk?  IIRC in the audit
> >trail we concluded the value is always positive ... (but of course
> >a large unsigned value can appear negative if you test it this way?)
> No, I added this incase in the future there's negative skip_niters as
> you mentioned in the PR, it's just defensive programming.
>
> >I think you can use one of the mpz_pow* functions and
> >wi::to_mpz/from_mpz for this.  See tree-ssa-loop-niter.cc for the
> >most heavy user of mpz (but not pow I think).
> Changed.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> There's loop in vect_peel_nonlinear_iv_init to get init_expr *
> pow (step_expr, skip_niters). When skipn_iters is too big, compile time
> hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
> init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
> pow of 2, otherwise give up vectorization when skip_niters >=
> TYPE_PRECISION (TREE_TYPE (init_expr)).
>
> Also give up vectorization when niters_skip is negative which will be
> used for fully masked loop.
>
> gcc/ChangeLog:
>
> PR tree-optimization/111820
> PR tree-optimization/111833
> * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
> up vectorization for nonlinear iv vect_step_op_mul when
> step_expr is not exact_log2 and niters is greater than
> TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
> for nagative niters_skip which will be used by fully masked
> loop.
> (vect_can_advance_ivs_p): Pass whole phi_info to
> vect_can_peel_nonlinear_iv_p.
> * tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
> init_expr * pow (step_expr, skipn) to init_expr
> << (log2 (step_expr) * skipn) when step_expr is exact_log2.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr111820-1.c: New test.
> * gcc.target/i386/pr111820-2.c: New test.
> * gcc.target/i386/pr111820-3.c: New test.
> * gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
> * gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
> ---
>  .../gcc.target/i386/pr103144-mul-1.c  |  8 ++---
>  .../gcc.target/i386/pr103144-mul-2.c  |  8 ++---
>  gcc/testsuite/gcc.target/i386/pr111820-1.c| 16 +
>  gcc/testsuite/gcc.target/i386/pr111820-2.c| 16 +
>  gcc/testsuite/gcc.target/i386/pr111820-3.c| 16 +
>  gcc/tree-vect-loop-manip.cc   | 28 +--
>  gcc/tree-vect-loop.cc | 34 ---
>  7 files changed, 110 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-3.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> index 640c34fd959..913d7737dcd 100644
> --- a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
> @@ -11,7 +11,7 @@ foo_mul (int* a, int b)
>for (int i = 0; i != N; i++)
>  {
>a[i] = b;
> -  b *= 3;
> +  b *= 4;
>  }
>  }
>
> @@ -23,7 +23,7 @@ foo_mul_const (int* a)
>for (int i = 0; i != N; i++)
>  {
>a[i] = b;
> -  b *= 3;
> +  b *= 4;
>  }
>  }
>
> @@ -34,7 +34,7 @@ foo_mul_peel (int* a, int b)
>for (int i = 0; i != 39; i++)
>  {
>a[i] = b;
> -  b *= 3;
> +  b *= 4;
>  }
>  }
>
> @@ -46,6 +46,6 @@ foo_mul_peel_const (int* a)
>for (int i = 0; i != 39; i++)
>  {
>a[i] = b;
> -  b *= 3;
> +  b *= 4;
>  }
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> index 39fdea3a69d..b2ff186e335 100644
> --- a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
> @@ -16,12 +16,12 @@ avx2_test (void)
>
>__builtin_memset (epi32_exp, 0, N * sizeof (int));
>int b = 8;
> -  v8si init = __extension__(v8si) { b, b * 3, b * 9, b * 27, b * 81, b * 
> 243, b * 729, b * 2187 };
> +  v8si init = __extension__(v8si) { b, b * 4, b * 16, b * 64, b * 256, b * 
> 1024, b * 4096, b * 16384 };
>
>for (int i = 0; i != N / 8; i++)
>  {
>memcpy (epi32_exp + i * 8, &init, 32);
> -  init *= 6561;
> +  init *= 65536;
>  }
>
>foo_mul (epi32_dst, b);
> @@ -32,11 +32,11 @@ avx2_test (void)
>if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * 4) != 0)
>  __builtin_abort ();
>
> -  init = __extension__(v8si) { 1, 3, 9, 27, 81, 243, 729, 2187 };
> +  init = __extension__(v8si) { 1, 4, 16, 64, 256, 1024, 4096, 16384 };
>for (int i = 0; i != N / 8; i++)
>  {
>memcpy (epi32_exp + i * 8, &init, 32);
> -  init *= 6

[PATCH] return edge in make_eh_edges

2023-10-19 Thread Alexandre Oliva


The need to initialize edge probabilities has made make_eh_edges
undesirably hard to use.  I suppose we don't want make_eh_edges to
initialize the probability of the newly-added edge itself, so that the
caller takes care of it, but identifying the added edge in need of
adjustments is inefficient and cumbersome.  Change make_eh_edges so
that it returns the added edge.

Regstrapped on x86_64-linux-gnu, and (along with various hardening
patches) on ppc64el-linux-gnu.  Also tested on multiple other targets,
on older versions of GCC.  The returned value is unused in code already
in the compiler.  This is a preparatory patch for uses to be introduced
along with stack scrubbing and control flow redundancy.  Ok to install?


for  gcc/ChangeLog

* tree-eh.cc (make_eh_edges): Return the new edge.
* tree-eh.h (make_eh_edges): Likewise.
---
 gcc/tree-eh.cc |6 +++---
 gcc/tree-eh.h  |2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
index e8ceff36cc6e7..1cb8e08652909 100644
--- a/gcc/tree-eh.cc
+++ b/gcc/tree-eh.cc
@@ -2274,7 +2274,7 @@ make_eh_dispatch_edges (geh_dispatch *stmt)
 /* Create the single EH edge from STMT to its nearest landing pad,
if there is such a landing pad within the current function.  */
 
-void
+edge
 make_eh_edges (gimple *stmt)
 {
   basic_block src, dst;
@@ -2283,14 +2283,14 @@ make_eh_edges (gimple *stmt)
 
   lp_nr = lookup_stmt_eh_lp (stmt);
   if (lp_nr <= 0)
-return;
+return NULL;
 
   lp = get_eh_landing_pad_from_number (lp_nr);
   gcc_assert (lp != NULL);
 
   src = gimple_bb (stmt);
   dst = label_to_block (cfun, lp->post_landing_pad);
-  make_edge (src, dst, EDGE_EH);
+  return make_edge (src, dst, EDGE_EH);
 }
 
 /* Do the work in redirecting EDGE_IN to NEW_BB within the EH region tree;
diff --git a/gcc/tree-eh.h b/gcc/tree-eh.h
index 771be50fe9a1d..1382568b7c919 100644
--- a/gcc/tree-eh.h
+++ b/gcc/tree-eh.h
@@ -30,7 +30,7 @@ extern bool remove_stmt_from_eh_lp (gimple *);
 extern int lookup_stmt_eh_lp_fn (struct function *, const gimple *);
 extern int lookup_stmt_eh_lp (const gimple *);
 extern bool make_eh_dispatch_edges (geh_dispatch *);
-extern void make_eh_edges (gimple *);
+extern edge make_eh_edges (gimple *);
 extern edge redirect_eh_edge (edge, basic_block);
 extern void redirect_eh_dispatch_edge (geh_dispatch *, edge, basic_block);
 extern bool operation_could_trap_helper_p (enum tree_code, bool, bool, bool,


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-10-19 Thread chenglulu



在 2023/8/20 下午4:25, Xi Ruoyao 写道:

On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:

Seems ARMv8-A only guarantees to preserve low 64-bit value of
NEON/floating-point register value. I'm not sure that I modify the
testcase in the right way and maybe we need more investigations. Any
ideas or suggestion?


Hi, Ruoyao:

The implementation of hook loongarch_hard_regno_call_part_clobbered 
results in all vector registers being caller saved registers.


So no data will be lost during the function call.


Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
test case is not valid:

"As with global register variables, it is recommended that you choose a
register that is normally saved and restored by function calls on your
machine, so that calls to library routines will not clobber it."

So when I use asm(name), the compiler has no obligation to guarantee
that it will ever work like a normal variable after a function call.

But I still need to verify that the compiler correctly understands only
the low 64 bits of the vector register is saved.  I'll try to make
another test case...





Re: [PATCH][_Hashtable] Fix merge

2023-10-19 Thread Jonathan Wakely
On Thursday, 19 October 2023, François Dumont  wrote:
> libstdc++: [_Hashtable] Do not reuse untrusted cached hash code
>
> On merge reuse merged node cached hash code only if we are on the same
type of
> hash and this hash is stateless. Usage of function pointers or
std::function as
> hash functor will prevent this optimization.

I found this first sentence a little hard to parse. How about:

On merge, reuse a merged node's cached hash code only if we are on the same
type of
hash and this hash is stateless.


And for the second sentence, would it be clearer to say "will prevent
reusing cached hash codes" instead of "will prevent this optimization"?


And for the comment on the new function, I think this reads better:

"Only use the node's (possibly cached) hash code if its hash function _H2
matches _Hash. Otherwise recompute it using _Hash."

The code and tests look good, so if you're happy with the comment+changelog
suggestions, this is ok for trunk.

This seems like a bug fix that should be backported too, after some time on
trunk.


>
> libstdc++-v3/ChangeLog
>
> * include/bits/hashtable_policy.h
> (_Hash_code_base::_M_hash_code(const _Hash&, const
_Hash_node_value<>&)): Remove.
> (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const
_Hash_node_value<>&)): Remove.
> * include/bits/hashtable.h
> (_M_src_hash_code<_H2>(const _H2&, const key_type&, const
__node_value_type&)): New.
> (_M_merge_unique<>, _M_merge_multi<>): Use latter.
> * testsuite/23_containers/unordered_map/modifiers/merge.cc
> (test04, test05, test06): New test cases.
>
> Tested under Linux x86_64, ok to commit ?
>
> François
>
>


[PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding
This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only maintain
   and modify this virtual CFG. Phase 4 performs insertion, modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic block in the
   virtual CFG is called vsetvl_block_info and the vsetvl information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift vsetvl
   info to a pred basic block to a more unified method that there is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5 removes the avl
   operand from the RVV instruction and removes the unused dest operand
   register from the vsetvl insns.

These modifications resulted in some testcases needing to be updated. The 
reasons
for updating are summarized below:

1. more optimized
   vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
   avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
   avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c

PR target/111037
PR target/111234
PR target/111725

Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases

 gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
 gcc/config/riscv/riscv-vsetvl.def |  641 +-
 gcc/config/riscv/riscv-vsetvl.h   |  488 --
 gcc/config/riscv/t-riscv  |2 +-
 .../gcc.target/riscv/rvv/base/scalar_move-1.c |2 +-
 .../riscv/rvv/vsetvl/avl_single-104.c |   35 +
 .../riscv/rvv/vsetvl/avl_single-105.c |   23 +
 .../riscv/rvv/vsetvl/avl_single-106.c |   34 +
 .../riscv/rvv/vsetvl/avl_single-107.c |   41 +
 .../riscv/rvv/vsetvl/avl_single-108.c |   41 +
 .../riscv/rvv/vsetvl/avl_single-109.c |   45 +
 .../riscv/rvv/vsetvl/avl_single-23.c  |7 +-
 .../riscv/rvv/vsetvl/avl_single-46.c  |3 +-
 .../riscv/rvv/vsetvl/avl_single-84.c  |5 +-
 .../riscv/rvv/vsetvl/avl_single-89.c  |8 +-
 .../riscv/rvv/vsetvl/avl_single-95.c  |2 +-
 .../riscv/rvv/vsetvl/imm_bb_prop-1.c  |7 +-
 .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  |2 +-
 .../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  |2 +-
 .../riscv/rvv/{base => vsetvl}/pr111037-1.c   |0
 .../riscv/rvv/{base => vsetvl}/pr111037-2.c   |0
 .../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  |   16 +
 .../gcc.target/riscv/rvv/vsetvl/pr111037-4.c  |   16 +
 .../riscv/rvv/vsetvl/vlmax_back_prop-25.c |   10 +-
 .../riscv/rvv/vsetvl/vlmax_back_prop-26.c |   10 +-
 .../riscv/rvv/vsetvl/vlmax_conflict-12.c  |1 -
 .../riscv/rvv/vsetvl/vlmax_conflict-3.c   |2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c   |4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c   |4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |2 +-
 30 files changed, 3263 insertions(+), 4692 deletions(-)
 delete mode 100644 gcc/config/riscv/riscv-vsetvl.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-106.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-107.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-108.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-109.c
 rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-1.c (100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-2.c (100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c
 create mode 100644 gcc/testsuite/

[PATCH V3 02/11] RISC-V: P2: Refactor and cleanup demand system

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (incompatible_avl_p): Removed.
(different_sew_p): Removed.
(different_lmul_p): Removed.
(different_ratio_p): Removed.
(different_tail_policy_p): Removed.
(different_mask_policy_p): Removed.
(possible_zero_avl_p): Removed.
(second_ratio_invalid_for_first_sew_p): Removed.
(second_ratio_invalid_for_first_lmul_p): Removed.
(float_insn_valid_sew_p): Removed.
(second_sew_less_than_first_sew_p): Removed.
(first_sew_less_than_second_sew_p): Removed.
(compare_lmul): Removed.
(second_lmul_less_than_first_lmul_p): Removed.
(second_ratio_less_than_first_ratio_p): Removed.
(DEF_INCOMPATIBLE_COND): Removed.
(greatest_sew): Removed.
(first_sew): Removed.
(second_sew): Removed.
(first_vlmul): Removed.
(second_vlmul): Removed.
(first_ratio): Removed.
(second_ratio): Removed.
(vlmul_for_first_sew_second_ratio): Removed.
(vlmul_for_greatest_sew_second_ratio): Removed.
(ratio_for_second_sew_first_vlmul): Removed.
(DEF_SEW_LMUL_FUSE_RULE): Removed.
(always_unavailable): Removed.
(avl_unavailable_p): Removed.
(sew_unavailable_p): Removed.
(lmul_unavailable_p): Removed.
(ge_sew_unavailable_p): Removed.
(ge_sew_lmul_unavailable_p): Removed.
(ge_sew_ratio_unavailable_p): Removed.
(DEF_UNAVAILABLE_COND): Removed.
(same_sew_lmul_demand_p): Removed.
(propagate_avl_across_demands_p): Removed.
(reg_available_p): Removed.
(support_relaxed_compatible_p): Removed.
(count_regno_occurrences): Removed.
(demands_can_be_fused_p): Removed.
(earliest_pred_can_be_fused_p): Removed.
(vsetvl_dominated_by_p): Removed.
(class demand_system): New.
(DEF_SEW_LMUL_RULE): New.
(DEF_POLICY_RULE): New.
(DEF_AVL_RULE): New.

---
 gcc/config/riscv/riscv-vsetvl.cc | 1158 +-
 1 file changed, 668 insertions(+), 490 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 8908071dc0d..c9f2f653247 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1091,496 +1091,6 @@ calculate_vlmul (unsigned int sew, unsigned int ratio)
   return LMUL_RESERVED;
 }
 
-static bool
-incompatible_avl_p (const vector_insn_info &info1,
-   const vector_insn_info &info2)
-{
-  return !info1.compatible_avl_p (info2) && !info2.compatible_avl_p (info1);
-}
-
-static bool
-different_sew_p (const vector_insn_info &info1, const vector_insn_info &info2)
-{
-  return info1.get_sew () != info2.get_sew ();
-}
-
-static bool
-different_lmul_p (const vector_insn_info &info1, const vector_insn_info &info2)
-{
-  return info1.get_vlmul () != info2.get_vlmul ();
-}
-
-static bool
-different_ratio_p (const vector_insn_info &info1, const vector_insn_info 
&info2)
-{
-  return info1.get_ratio () != info2.get_ratio ();
-}
-
-static bool
-different_tail_policy_p (const vector_insn_info &info1,
-const vector_insn_info &info2)
-{
-  return info1.get_ta () != info2.get_ta ();
-}
-
-static bool
-different_mask_policy_p (const vector_insn_info &info1,
-const vector_insn_info &info2)
-{
-  return info1.get_ma () != info2.get_ma ();
-}
-
-static bool
-possible_zero_avl_p (const vector_insn_info &info1,
-const vector_insn_info &info2)
-{
-  return !info1.has_non_zero_avl () || !info2.has_non_zero_avl ();
-}
-
-static bool
-second_ratio_invalid_for_first_sew_p (const vector_insn_info &info1,
- const vector_insn_info &info2)
-{
-  return calculate_vlmul (info1.get_sew (), info2.get_ratio ())
-== LMUL_RESERVED;
-}
-
-static bool
-second_ratio_invalid_for_first_lmul_p (const vector_insn_info &info1,
-  const vector_insn_info &info2)
-{
-  return calculate_sew (info1.get_vlmul (), info2.get_ratio ()) == 0;
-}
-
-static bool
-float_insn_valid_sew_p (const vector_insn_info &info, unsigned int sew)
-{
-  if (info.get_insn () && info.get_insn ()->is_real ()
-  && get_attr_type (info.get_insn ()->rtl ()) == TYPE_VFMOVFV)
-{
-  if (sew == 16)
-   return TARGET_VECTOR_ELEN_FP_16;
-  else if (sew == 32)
-   return TARGET_VECTOR_ELEN_FP_32;
-  else if (sew == 64)
-   return TARGET_VECTOR_ELEN_FP_64;
-}
-  return true;
-}
-
-static bool
-second_sew_less_than_first_sew_p (const vector_insn_info &info1,
- const vector_insn_info &info2)
-{
-  return info2.get_sew () < info1.get_sew ()
-|| !float_insn_valid_sew_p (info1, info2.get_sew ());
-}
-
-static bool
-first_sew_less_than_second_sew_p (const vector_insn_info &info1,
- const vector_insn_i

[PATCH V3 04/11] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info): Removed.
(pass_vsetvl::get_block_info): Removed.
(pass_vsetvl::update_vector_info): Removed.
(pass_vsetvl::update_block_info): Removed.
(pass_vsetvl::simple_vsetvl): Removed.
(pass_vsetvl::lazy_vsetvl): Removed.
(pass_vsetvl::execute): Removed.
(make_pass_vsetvl): Removed.

---
 gcc/config/riscv/riscv-vsetvl.cc | 207 ++-
 1 file changed, 96 insertions(+), 111 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index c73a84cb6bd..f8b708c248a 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2721,6 +2721,7 @@ public:
   }
 };
 
+
 const pass_data pass_data_vsetvl = {
   RTL_PASS, /* type */
   "vsetvl", /* name */
@@ -2736,54 +2737,8 @@ const pass_data pass_data_vsetvl = {
 class pass_vsetvl : public rtl_opt_pass
 {
 private:
-  vector_infos_manager *m_vector_manager;
-
-  const vector_insn_info &get_vector_info (const rtx_insn *) const;
-  const vector_insn_info &get_vector_info (const insn_info *) const;
-  const vector_block_info &get_block_info (const basic_block) const;
-  const vector_block_info &get_block_info (const bb_info *) const;
-  vector_block_info &get_block_info (const basic_block);
-  vector_block_info &get_block_info (const bb_info *);
-  void update_vector_info (const insn_info *, const vector_insn_info &);
-  void update_block_info (int, profile_probability, const vector_insn_info &);
-
-  void simple_vsetvl (void) const;
-  void lazy_vsetvl (void);
-
-  /* Phase 1.  */
-  void compute_local_backward_infos (const bb_info *);
-
-  /* Phase 2.  */
-  bool need_vsetvl (const vector_insn_info &, const vector_insn_info &) const;
-  void transfer_before (vector_insn_info &, insn_info *) const;
-  void transfer_after (vector_insn_info &, insn_info *) const;
-  void emit_local_forward_vsetvls (const bb_info *);
-
-  /* Phase 3.  */
-  bool earliest_fusion (void);
-  void vsetvl_fusion (void);
-
-  /* Phase 4.  */
-  void prune_expressions (void);
-  void compute_local_properties (void);
-  bool can_refine_vsetvl_p (const basic_block, const vector_insn_info &) const;
-  void refine_vsetvls (void) const;
-  void cleanup_vsetvls (void);
-  bool commit_vsetvls (void);
-  void pre_vsetvl (void);
-
-  /* Phase 5.  */
-  rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) const;
-  void local_eliminate_vsetvl_insn (const bb_info *) const;
-  bool global_eliminate_vsetvl_insn (const bb_info *) const;
-  void ssa_post_optimization (void) const;
-
-  /* Phase 6.  */
-  void df_post_optimization (void) const;
-
-  void init (void);
-  void done (void);
-  void compute_probabilities (void);
+  void simple_vsetvl ();
+  void lazy_vsetvl ();
 
 public:
   pass_vsetvl (gcc::context *ctxt) : rtl_opt_pass (pass_data_vsetvl, ctxt) {}
@@ -2793,69 +2748,11 @@ public:
   virtual unsigned int execute (function *) final override;
 }; // class pass_vsetvl
 
-const vector_insn_info &
-pass_vsetvl::get_vector_info (const rtx_insn *i) const
-{
-  return m_vector_manager->vector_insn_infos[INSN_UID (i)];
-}
-
-const vector_insn_info &
-pass_vsetvl::get_vector_info (const insn_info *i) const
-{
-  return m_vector_manager->vector_insn_infos[i->uid ()];
-}
-
-const vector_block_info &
-pass_vsetvl::get_block_info (const basic_block bb) const
-{
-  return m_vector_manager->vector_block_infos[bb->index];
-}
-
-const vector_block_info &
-pass_vsetvl::get_block_info (const bb_info *bb) const
-{
-  return m_vector_manager->vector_block_infos[bb->index ()];
-}
-
-vector_block_info &
-pass_vsetvl::get_block_info (const basic_block bb)
-{
-  return m_vector_manager->vector_block_infos[bb->index];
-}
-
-vector_block_info &
-pass_vsetvl::get_block_info (const bb_info *bb)
-{
-  return m_vector_manager->vector_block_infos[bb->index ()];
-}
-
-void
-pass_vsetvl::update_vector_info (const insn_info *i,
-const vector_insn_info &new_info)
-{
-  m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
-}
-
-void
-pass_vsetvl::update_block_info (int index, profile_probability prob,
-   const vector_insn_info &new_info)
-{
-  m_vector_manager->vector_block_infos[index].probability = prob;
-  if (m_vector_manager->vector_block_infos[index].local_dem
-  == m_vector_manager->vector_block_infos[index].reaching_out)
-m_vector_manager->vector_block_infos[index].local_dem = new_info;
-  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
-}
-
-/* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
 void
-pass_vsetvl::simple_vsetvl (void) const
+pass_vsetvl::simple_vsetvl ()
 {
   if (dump_file)
-fprintf (dump_file,
-"\nEntering Simple VSETVL PASS and Handling %d basic blocks for "
-"function:%s\n",
-n_basic_blocks_for_fn (cfun), function_name (cfun));
+   

[PATCH V3 01/11] RISC-V: P1: Refactor avl_info/vl_vtype_info/vector_insn_info/vector_block_info

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (avl_info::avl_info): Removed.
(avl_info::single_source_equal_p): Removed.
(avl_info::multiple_source_equal_p): Removed.
(avl_info::operator=): Removed.
(avl_info::operator==): Removed.
(avl_info::operator!=): Removed.
(avl_info::has_non_zero_avl): Removed.
(vl_vtype_info::vl_vtype_info): Removed.
(vl_vtype_info::operator==): Removed.
(vl_vtype_info::operator!=): Removed.
(vl_vtype_info::same_avl_p): Removed.
(vl_vtype_info::same_vtype_p): Removed.
(enum demand_flags): New enum.
(vl_vtype_info::same_vlmax_p): Removed.
(vector_insn_info::operator>=): Removed.
(enum class): New demand_type.
(vector_insn_info::operator==): Removed.
(vector_insn_info::parse_insn): Removed.
(class vsetvl_info): New class.
(vector_insn_info::compatible_p): Removed.
(vector_insn_info::skip_avl_compatible_p): Removed.
(vector_insn_info::compatible_avl_p): Removed.
(vector_insn_info::compatible_vtype_p): Removed.
(vector_insn_info::available_p): Removed.
(vector_insn_info::fuse_avl): Removed.
(vector_insn_info::fuse_sew_lmul): Removed.
(vector_insn_info::fuse_tail_policy): Removed.
(vector_insn_info::fuse_mask_policy): Removed.
(vector_insn_info::local_merge): Removed.
(vector_insn_info::global_merge): Removed.
(vector_insn_info::get_avl_or_vl_reg): Removed.
(vector_insn_info::update_fault_first_load_avl): Removed.
(vlmul_to_str): Removed.
(policy_to_str): Removed.
(vector_insn_info::dump): Removed.
(class vsetvl_block_info): New class.

---
 gcc/config/riscv/riscv-vsetvl.cc | 1401 +-
 1 file changed, 602 insertions(+), 799 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 4b06d93e7f9..8908071dc0d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1581,827 +1581,630 @@ vsetvl_dominated_by_p (const basic_block cfg_bb,
   return true;
 }
 
-avl_info::avl_info (const avl_info &other)
-{
-  m_value = other.get_value ();
-  m_source = other.get_source ();
-}
-
-avl_info::avl_info (rtx value_in, set_info *source_in)
-  : m_value (value_in), m_source (source_in)
-{}
-
-bool
-avl_info::single_source_equal_p (const avl_info &other) const
-{
-  set_info *set1 = m_source;
-  set_info *set2 = other.get_source ();
-  insn_info *insn1 = extract_single_source (set1);
-  insn_info *insn2 = extract_single_source (set2);
-  if (!insn1 || !insn2)
-return false;
-  return source_equal_p (insn1, insn2);
-}
-
-bool
-avl_info::multiple_source_equal_p (const avl_info &other) const
-{
-  /* When the def info is same in RTL_SSA namespace, it's safe
- to consider they are avl compatible.  */
-  if (m_source == other.get_source ())
-return true;
-
-  /* We only consider handle PHI node.  */
-  if (!m_source->insn ()->is_phi () || !other.get_source ()->insn ()->is_phi 
())
-return false;
-
-  phi_info *phi1 = as_a (m_source);
-  phi_info *phi2 = as_a (other.get_source ());
-
-  if (phi1->is_degenerate () && phi2->is_degenerate ())
-{
-  /* Degenerate PHI means the PHI node only have one input.  */
-
-  /* If both PHI nodes have the same single input in use list.
-We consider they are AVL compatible.  */
-  if (phi1->input_value (0) == phi2->input_value (0))
-   return true;
-}
-  /* TODO: We can support more optimization cases in the future.  */
-  return false;
-}
-
-avl_info &
-avl_info::operator= (const avl_info &other)
-{
-  m_value = other.get_value ();
-  m_source = other.get_source ();
-  return *this;
-}
-
-bool
-avl_info::operator== (const avl_info &other) const
-{
-  if (!m_value)
-return !other.get_value ();
-  if (!other.get_value ())
-return false;
-
-  if (GET_CODE (m_value) != GET_CODE (other.get_value ()))
-return false;
-
-  /* Handle CONST_INT AVL.  */
-  if (CONST_INT_P (m_value))
-return INTVAL (m_value) == INTVAL (other.get_value ());
-
-  /* Handle VLMAX AVL.  */
-  if (vlmax_avl_p (m_value))
-return vlmax_avl_p (other.get_value ());
-  if (vlmax_avl_p (other.get_value ()))
-return false;
-
-  /* If any source is undef value, we think they are not equal.  */
-  if (!m_source || !other.get_source ())
-return false;
-
-  /* If both sources are single source (defined by a single real RTL)
- and their definitions are same.  */
-  if (single_source_equal_p (other))
-return true;
-
-  return multiple_source_equal_p (other);
-}
-
-bool
-avl_info::operator!= (const avl_info &other) const
-{
-  return !(*this == other);
-}
-
-bool
-avl_info::has_non_zero_avl () const
-{
-  if (has_avl_imm ())
-return INTVAL (get_value ()) > 0;
-  if (has_avl_reg ())
-return vlmax_avl_p (get_value ());
-  return false;
-}
-
-/* Ini

[PATCH V3 03/11] RISC-V: P3: Refactor vector_infos_manager

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::vector_infos_manager): Removed.
(vector_infos_manager::create_expr): Removed.
(class pre_vsetvl): New class.
(vector_infos_manager::get_expr_id): Removed.
(vector_infos_manager::all_same_ratio_p): Removed.
(vector_infos_manager::all_avail_in_compatible_p): Removed.
(vector_infos_manager::all_same_avl_p): Removed.
(vector_infos_manager::expr_set_num): Removed.
(vector_infos_manager::release): Removed.
(vector_infos_manager::create_bitmap_vectors): Removed.
(vector_infos_manager::free_bitmap_vectors): Removed.
(vector_infos_manager::dump): Removed.

---
 gcc/config/riscv/riscv-vsetvl.cc | 674 ++-
 1 file changed, 307 insertions(+), 367 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index c9f2f653247..c73a84cb6bd 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2384,402 +2384,342 @@ public:
   }
 };
 
-vector_infos_manager::vector_infos_manager ()
-{
-  vector_edge_list = nullptr;
-  vector_kill = nullptr;
-  vector_del = nullptr;
-  vector_insert = nullptr;
-  vector_antic = nullptr;
-  vector_transp = nullptr;
-  vector_comp = nullptr;
-  vector_avin = nullptr;
-  vector_avout = nullptr;
-  vector_antin = nullptr;
-  vector_antout = nullptr;
-  vector_earliest = nullptr;
-  vector_insn_infos.safe_grow_cleared (get_max_uid ());
-  vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun));
-  if (!optimize)
-{
-  basic_block cfg_bb;
-  rtx_insn *rinsn;
-  FOR_ALL_BB_FN (cfg_bb, cfun)
-   {
- vector_block_infos[cfg_bb->index].local_dem = vector_insn_info ();
- vector_block_infos[cfg_bb->index].reaching_out = vector_insn_info ();
- FOR_BB_INSNS (cfg_bb, rinsn)
-   vector_insn_infos[INSN_UID (rinsn)].parse_insn (rinsn);
-   }
-}
-  else
-{
-  for (const bb_info *bb : crtl->ssa->bbs ())
-   {
- vector_block_infos[bb->index ()].local_dem = vector_insn_info ();
- vector_block_infos[bb->index ()].reaching_out = vector_insn_info ();
- for (insn_info *insn : bb->real_insns ())
-   vector_insn_infos[insn->uid ()].parse_insn (insn);
- vector_block_infos[bb->index ()].probability = profile_probability ();
-   }
-}
-}
 
-void
-vector_infos_manager::create_expr (vector_insn_info &info)
+class pre_vsetvl
 {
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (*vector_exprs[i] == info)
-  return;
-  vector_exprs.safe_push (&info);
-}
-
-size_t
-vector_infos_manager::get_expr_id (const vector_insn_info &info) const
-{
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (*vector_exprs[i] == info)
-  return i;
-  gcc_unreachable ();
-}
-
-auto_vec
-vector_infos_manager::get_all_available_exprs (
-  const vector_insn_info &info) const
-{
-  auto_vec available_list;
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (info.available_p (*vector_exprs[i]))
-  available_list.safe_push (i);
-  return available_list;
-}
-
-bool
-vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
-{
-  if (bitmap_empty_p (bitdata))
-return false;
+private:
+  demand_system m_dem;
+  auto_vec m_vector_block_infos;
 
-  int ratio = -1;
-  unsigned int bb_index;
-  sbitmap_iterator sbi;
+  /* data for avl reaching defintion.  */
+  sbitmap m_avl_regs;
+  sbitmap *m_avl_def_in;
+  sbitmap *m_avl_def_out;
+  sbitmap *m_reg_def_loc;
+
+  /* data for vsetvl info reaching defintion.  */
+  vsetvl_info m_unknow_info;
+  auto_vec m_vsetvl_def_exprs;
+  sbitmap *m_vsetvl_def_in;
+  sbitmap *m_vsetvl_def_out;
+
+  /* data for lcm */
+  auto_vec m_exprs;
+  sbitmap *m_avloc;
+  sbitmap *m_avin;
+  sbitmap *m_avout;
+  sbitmap *m_kill;
+  sbitmap *m_antloc;
+  sbitmap *m_transp;
+  sbitmap *m_insert;
+  sbitmap *m_del;
+  struct edge_list *m_edges;
+
+  auto_vec m_delete_list;
+
+  vsetvl_block_info &get_block_info (const bb_info *bb)
+  {
+return m_vector_block_infos[bb->index ()];
+  }
+  const vsetvl_block_info &get_block_info (const basic_block bb) const
+  {
+return m_vector_block_infos[bb->index];
+  }
 
-  EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-{
-  if (ratio == -1)
-   ratio = vector_exprs[bb_index]->get_ratio ();
-  else if (vector_exprs[bb_index]->get_ratio () != ratio)
-   return false;
-}
-  return true;
-}
+  vsetvl_block_info &get_block_info (const basic_block bb)
+  {
+return m_vector_block_infos[bb->index];
+  }
 
-/* Return TRUE if the incoming vector configuration state
-   to CFG_BB is compatible with the vector configuration
-   state in CFG_BB, FALSE otherwise.  */
-bool
-vector_infos_manager::all_avail_in_compatible_p (const basic_block cfg_bb) 
const
-{
-  const auto &info = vector_block_infos[cfg_bb->index].local_dem;
-  sbit

[PATCH V3 05/11] RISC-V: P5: Combine phase 1 and 2

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): 
New.
(pass_vsetvl::compute_local_backward_infos): Removed.
(pass_vsetvl::need_vsetvl): Removed.
(pass_vsetvl::transfer_before): Removed.
(pass_vsetvl::transfer_after): Removed.
(pass_vsetvl::emit_local_forward_vsetvls): Removed.

---
 gcc/config/riscv/riscv-vsetvl.cc | 270 ++-
 1 file changed, 124 insertions(+), 146 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f8b708c248a..dad3d7c941e 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2722,6 +2722,130 @@ public:
 };
 
 
+void
+pre_vsetvl::fuse_local_vsetvl_info ()
+{
+  m_reg_def_loc
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), GP_REG_LAST + 1);
+  bitmap_vector_clear (m_reg_def_loc, last_basic_block_for_fn (cfun));
+  bitmap_ones (m_reg_def_loc[ENTRY_BLOCK_PTR_FOR_FN (cfun)->index]);
+
+  for (bb_info *bb : crtl->ssa->bbs ())
+{
+  auto &block_info = get_block_info (bb);
+  block_info.m_bb = bb;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "  Try fuse basic block %d\n", bb->index ());
+   }
+  auto_vec infos;
+  for (insn_info *insn : bb->real_nondebug_insns ())
+   {
+ vsetvl_info curr_info = vsetvl_info (insn);
+ if (curr_info.valid_p () || curr_info.unknown_p ())
+   infos.safe_push (curr_info);
+
+ /* Collecting GP registers modified by the current bb.  */
+ if (insn->is_real ())
+   for (def_info *def : insn->defs ())
+ if (def->is_reg () && GP_REG_P (def->regno ()))
+   bitmap_set_bit (m_reg_def_loc[bb->index ()], def->regno ());
+   }
+
+  vsetvl_info prev_info = vsetvl_info ();
+  prev_info.set_empty ();
+  for (auto &curr_info : infos)
+   {
+ if (prev_info.empty_p ())
+   prev_info = curr_info;
+ else if ((curr_info.unknown_p () && prev_info.valid_p ())
+  || (curr_info.valid_p () && prev_info.unknown_p ()))
+   {
+ block_info.infos.safe_push (prev_info);
+ prev_info = curr_info;
+   }
+ else if (curr_info.valid_p () && prev_info.valid_p ())
+   {
+ if (m_dem.available_p (prev_info, curr_info))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file,
+  "Ignore curr info since prev info "
+  "available with it:\n");
+ fprintf (dump_file, "  prev_info: ");
+ prev_info.dump (dump_file, "");
+ fprintf (dump_file, "  curr_info: ");
+ curr_info.dump (dump_file, "");
+ fprintf (dump_file, "\n");
+   }
+ if (!curr_info.vl_use_by_non_rvv_insn_p ()
+ && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+   m_delete_list.safe_push (curr_info);
+
+ if (curr_info.get_read_vl_insn ())
+   prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
+   }
+ else if (m_dem.compatible_p (prev_info, curr_info))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Fuse curr info since prev info "
+ "compatible with it:\n");
+ fprintf (dump_file, "  prev_info: ");
+ prev_info.dump (dump_file, "");
+ fprintf (dump_file, "  curr_info: ");
+ curr_info.dump (dump_file, "");
+   }
+ m_dem.merge (prev_info, curr_info);
+ if (curr_info.get_read_vl_insn ())
+   prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "  prev_info after fused: ");
+ prev_info.dump (dump_file, "");
+ fprintf (dump_file, "\n");
+   }
+   }
+ else
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file,
+  "Cannot fuse uncompatible infos:\n");
+ fprintf (dump_file, "  prev_info: ");
+ prev_info.dump (dump_file, "   ");
+ fprintf (dump_file, "  curr_info: ");
+ curr_info.dump (dump_file, "   ");
+   }
+ block_inf

[PATCH V3 08/11] RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): New.
(pre_vsetvl::cleaup): New.
(pre_vsetvl::remove_avl_operand): New.
(pre_vsetvl::remove_unused_dest_operand): New.
(pass_vsetvl::get_vsetvl_at_end): Removed.
(local_avl_compatible_p): Removed.
(pass_vsetvl::local_eliminate_vsetvl_insn): Removed.
(get_first_vsetvl_before_rvv_insns): Removed.
(pass_vsetvl::global_eliminate_vsetvl_insn): Removed.
(pass_vsetvl::ssa_post_optimization): Removed.
(has_no_uses): Removed.
(pass_vsetvl::df_post_optimization): Removed.
(pass_vsetvl::init): Removed.
(pass_vsetvl::done): Removed.
(pass_vsetvl::compute_probabilities): Removed.
(pass_vsetvl::lazy_vsetvl): Removed.
(pass_vsetvl::execute): Removed.
(make_pass_vsetvl): Removed.

---
 gcc/config/riscv/riscv-vsetvl.cc | 878 +++
 1 file changed, 203 insertions(+), 675 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 855edd6d0f5..06d02d25cb3 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3601,6 +3601,209 @@ pre_vsetvl::pre_global_vsetvl_info ()
 }
 }
 
+void
+pre_vsetvl::emit_vsetvl ()
+{
+  bool need_commit = false;
+
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  for (const auto &curr_info : get_block_info (bb).infos)
+   {
+ insn_info *insn = curr_info.get_insn ();
+ if (curr_info.delete_p ())
+   {
+ if (vsetvl_insn_p (insn->rtl ()))
+   remove_vsetvl_insn (curr_info);
+ continue;
+   }
+ else if (curr_info.valid_p ())
+   {
+ if (vsetvl_insn_p (insn->rtl ()))
+   {
+ const vsetvl_info temp = vsetvl_info (insn);
+ if (!(curr_info == temp))
+   {
+ if (dump_file)
+   {
+ fprintf (dump_file, "\n  Change vsetvl info from: ");
+ temp.dump (dump_file, "");
+ fprintf (dump_file, "  to: ");
+ curr_info.dump (dump_file, "");
+   }
+ change_vsetvl_insn (curr_info);
+   }
+   }
+ else
+   {
+ if (dump_file)
+   {
+ fprintf (dump_file,
+  "\n  Insert vsetvl info before insn %d: ",
+  insn->uid ());
+ curr_info.dump (dump_file, "");
+   }
+ insert_vsetvl_insn (EMIT_BEFORE, curr_info);
+   }
+   }
+   }
+}
+
+  for (const vsetvl_info &item : m_delete_list)
+{
+  gcc_assert (vsetvl_insn_p (item.get_insn ()->rtl ()));
+  remove_vsetvl_insn (item);
+}
+
+  /* m_insert vsetvl as LCM suggest. */
+  for (int ed = 0; ed < NUM_EDGES (m_edges); ed++)
+{
+  edge eg = INDEX_EDGE (m_edges, ed);
+  sbitmap i = m_insert[ed];
+  if (bitmap_count_bits (i) < 1)
+   continue;
+
+  if (bitmap_count_bits (i) > 1)
+   /* For code with infinite loop (e.g. pr61634.c), The data flow is
+  completely wrong.  */
+   continue;
+
+  gcc_assert (bitmap_count_bits (i) == 1);
+  unsigned expr_index = bitmap_first_set_bit (i);
+  const vsetvl_info &info = *m_exprs[expr_index];
+  gcc_assert (info.valid_p ());
+  if (dump_file)
+   {
+ fprintf (dump_file,
+  "\n  Insert vsetvl info at edge(bb %u -> bb %u): ",
+  eg->src->index, eg->dest->index);
+ info.dump (dump_file, "");
+   }
+  rtl_profile_for_edge (eg);
+  start_sequence ();
+
+  insert_vsetvl_insn (EMIT_DIRECT, info);
+  rtx_insn *rinsn = get_insns ();
+  end_sequence ();
+  default_rtl_profile ();
+
+  /* We should not get an abnormal edge here.  */
+  gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+  need_commit = true;
+  insert_insn_on_edge (rinsn, eg);
+}
+
+  /* Insert vsetvl info that was not deleted after lift up.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  const vsetvl_block_info &block_info = get_block_info (bb);
+  if (!block_info.has_info ())
+   continue;
+
+  const vsetvl_info &footer_info = block_info.get_exit_info ();
+
+  if (footer_info.delete_p ())
+   continue;
+
+  edge eg;
+  edge_iterator eg_iterator;
+  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
+   {
+ gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+ if (dump_file)
+   {
+ fprintf (
+   dump_file,
+   "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
+   eg->src->index, eg->dest->index);
+   

[PATCH V3 09/11] RISC-V: P9: Cleanup and reorganize helper functions

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (debug): Removed.
(bitmap_union_of_preds_with_entry): New.
(compute_reaching_defintion): New.
(vlmax_avl_p): New.
(enum vsetvl_type): Moved.
(enum emit_type): Moved.
(vlmul_to_str): Moved.
(vlmax_avl_insn_p): Moved.
(policy_to_str): Moved.
(loop_basic_block_p): Removed.
(valid_sew_p): Removed.
(vsetvl_insn_p): Moved.
(vsetvl_vtype_change_only_p): Removed.
(after_or_same_p): Removed.
(before_p): Removed.
(anticipatable_occurrence_p): Removed.
(available_occurrence_p): Removed.
(insn_should_be_added_p): Moved.
(get_all_sets): Moved.
(get_same_bb_set): Moved.
(gen_vsetvl_pat): Removed.
(emit_vsetvl_insn): Removed.
(eliminate_insn): Removed.
(calculate_vlmul): Moved.
(insert_vsetvl): Removed.
(get_max_int_sew): New.
(get_vl_vtype_info): Removed.
(get_max_float_sew): New.
(count_regno_occurrences): Moved.
(enum def_type): Moved.
(validate_change_or_fail): Moved.
(change_insn): Removed.
(get_all_real_uses): New.
(get_forward_read_vl_insn): Removed.
(get_backward_fault_first_load_insn): Removed.
(change_vsetvl_insn): Removed.
(avl_source_has_vsetvl_p): Moved.
(source_equal_p): Moved.
(same_equiv_note_p): Moved.
(calculate_sew): Moved.
(get_expr_id): New.
(get_regno): New.
(get_bb_index): New.
(has_no_uses): Moved.

---
 gcc/config/riscv/riscv-vsetvl.cc | 1153 ++
 1 file changed, 383 insertions(+), 770 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 06d02d25cb3..e136351aee5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -18,60 +18,47 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-/*  This pass is to Set VL/VTYPE global status for RVV instructions
-that depend on VL and VTYPE registers by Lazy code motion (LCM).
-
-Strategy:
-
--  Backward demanded info fusion within block.
-
--  Lazy code motion (LCM) based demanded info backward propagation.
-
--  RTL_SSA framework for def-use, PHI analysis.
-
--  Lazy code motion (LCM) for global VL/VTYPE optimization.
-
-Assumption:
-
--  Each avl operand is either an immediate (must be in range 0 ~ 31) or 
reg.
-
-This pass consists of 5 phases:
-
--  Phase 1 - compute VL/VTYPE demanded information within each block
-   by backward data-flow analysis.
-
--  Phase 2 - Emit vsetvl instructions within each basic block according to
-   demand, compute and save ANTLOC && AVLOC of each block.
-
--  Phase 3 - LCM Earliest-edge baseed VSETVL demand fusion.
-
--  Phase 4 - Lazy code motion including: compute local properties,
-   pre_edge_lcm and vsetvl insertion && delete edges for LCM results.
-
--  Phase 5 - Cleanup AVL operand of RVV instruction since it will not be
-   used any more and VL operand of VSETVL instruction if it is not used by
-   any non-debug instructions.
-
--  Phase 6 - DF based post VSETVL optimizations.
-
-Implementation:
-
--  The subroutine of optimize == 0 is simple_vsetvl.
-   This function simplily vsetvl insertion for each RVV
-   instruction. No optimization.
-
--  The subroutine of optimize > 0 is lazy_vsetvl.
-   This function optimize vsetvl insertion process by
-   lazy code motion (LCM) layering on RTL_SSA.
-
--  get_avl (), get_insn (), get_avl_source ():
-
-   1. get_insn () is the current instruction, find_access (get_insn
-   ())->def is the same as get_avl_source () if get_insn () demand VL.
-   2. If get_avl () is non-VLMAX REG, get_avl () == get_avl_source
-   ()->regno ().
-   3. get_avl_source ()->regno () is the REGNO that we backward propagate.
- */
+/* The values of the vl and vtype registers will affect the behavior of RVV
+   insns. That is, when we need to execute an RVV instruction, we need to set
+   the correct vl and vtype values by executing the vsetvl instruction before.
+   Executing the fewest number of vsetvl instructions while keeping the 
behavior
+   the same is the problem this pass is trying to solve. This vsetvl pass is
+   divided into 5 phases:
+
+ - Phase 1 (fuse local vsetvl infos): traverses each Basic Block, parses
+   each instruction in it that affects vl and vtype state and generates an
+   array of vsetvl_info objects. Then traverse the vsetvl_info array from
+   front to back and perform fusion according to the fusion rules. The 
fused
+   vsetvl infos are stored in the vsetvl_block_info object's `infos` field.
+
+ - Phase 2 (earliest fuse g

[PATCH V3 07/11] RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
New.
(pre_vsetvl::pre_global_vsetvl_info): New.
(pass_vsetvl::prune_expressions): Removed.
(pass_vsetvl::compute_local_properties): Removed.
(pass_vsetvl::earliest_fusion): Removed.
(pass_vsetvl::vsetvl_fusion): Removed.
(pass_vsetvl::can_refine_vsetvl_p): Removed.
(pass_vsetvl::refine_vsetvls): Removed.
(pass_vsetvl::cleanup_vsetvls): Removed.
(pass_vsetvl::commit_vsetvls): Removed.
(pass_vsetvl::pre_vsetvl): Removed.

---
 gcc/config/riscv/riscv-vsetvl.cc | 1004 +++---
 1 file changed, 361 insertions(+), 643 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 27d47d7c039..855edd6d0f5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2721,7 +2721,6 @@ public:
   }
 };
 
-
 void
 pre_vsetvl::compute_avl_def_data ()
 {
@@ -3241,6 +3240,367 @@ pre_vsetvl::fuse_local_vsetvl_info ()
 }
 
 
+bool
+pre_vsetvl::earliest_fuse_vsetvl_info ()
+{
+  compute_avl_def_data ();
+  compute_vsetvl_def_data ();
+  compute_lcm_local_properties ();
+
+  unsigned num_exprs = m_exprs.length ();
+  struct edge_list *m_edges = create_edge_list ();
+  unsigned num_edges = NUM_EDGES (m_edges);
+  sbitmap *antin
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+  sbitmap *antout
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+
+  sbitmap *earliest = sbitmap_vector_alloc (num_edges, num_exprs);
+
+  compute_available (m_avloc, m_kill, m_avout, m_avin);
+  compute_antinout_edge (m_antloc, m_transp, antin, antout);
+  compute_earliest (m_edges, num_exprs, antin, antout, m_avout, m_kill,
+   earliest);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "Expression List (%u):\n", num_exprs);
+  for (unsigned i = 0; i < num_exprs; i++)
+   {
+ const auto &info = *m_exprs[i];
+ fprintf (dump_file, "  Expr[%u]: ", i);
+ info.dump (dump_file, "");
+   }
+  fprintf (dump_file, "\nbitmap data:\n");
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   {
+ unsigned int i = bb->index ();
+ fprintf (dump_file, "  BB %u:\n", i);
+ fprintf (dump_file, "avloc: ");
+ dump_bitmap_file (dump_file, m_avloc[i]);
+ fprintf (dump_file, "kill: ");
+ dump_bitmap_file (dump_file, m_kill[i]);
+ fprintf (dump_file, "antloc: ");
+ dump_bitmap_file (dump_file, m_antloc[i]);
+ fprintf (dump_file, "transp: ");
+ dump_bitmap_file (dump_file, m_transp[i]);
+
+ fprintf (dump_file, "avin: ");
+ dump_bitmap_file (dump_file, m_avin[i]);
+ fprintf (dump_file, "avout: ");
+ dump_bitmap_file (dump_file, m_avout[i]);
+ fprintf (dump_file, "antin: ");
+ dump_bitmap_file (dump_file, antin[i]);
+ fprintf (dump_file, "antout: ");
+ dump_bitmap_file (dump_file, antout[i]);
+   }
+  fprintf (dump_file, "\n");
+  fprintf (dump_file, "  earliest:\n");
+  for (unsigned ed = 0; ed < num_edges; ed++)
+   {
+ edge eg = INDEX_EDGE (m_edges, ed);
+
+ if (bitmap_empty_p (earliest[ed]))
+   continue;
+ fprintf (dump_file, "Edge(bb %u -> bb %u): ", eg->src->index,
+  eg->dest->index);
+ dump_bitmap_file (dump_file, earliest[ed]);
+   }
+  fprintf (dump_file, "\n");
+}
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Fused global info result:\n");
+}
+
+  bool changed = false;
+  for (unsigned ed = 0; ed < num_edges; ed++)
+{
+  sbitmap e = earliest[ed];
+  if (bitmap_empty_p (e))
+   continue;
+
+  unsigned int expr_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (e, 0, expr_index, sbi)
+   {
+ vsetvl_info &curr_info = *m_exprs[expr_index];
+ if (!curr_info.valid_p ())
+   continue;
+
+ edge eg = INDEX_EDGE (m_edges, ed);
+ if (eg->probability == profile_probability::never ())
+   continue;
+ if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
+ || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+   continue;
+
+ vsetvl_block_info &src_block_info = get_block_info (eg->src);
+ vsetvl_block_info &dest_block_info = get_block_info (eg->dest);
+
+ if (src_block_info.probability
+ == profile_probability::uninitialized ())
+   continue;
+
+ if (src_block_info.empty_p ())
+   {
+ vsetvl_info new_curr_info = curr_info;
+ new_curr_info.set_bb (crtl-

[PATCH V3 06/11] RISC-V: P6: Add computing reaching definition data flow

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_avl_def_data): New.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pre_vsetvl::compute_lcm_local_properties): New.

---
 gcc/config/riscv/riscv-vsetvl.cc | 395 +++
 1 file changed, 395 insertions(+)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index dad3d7c941e..27d47d7c039 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2722,6 +2722,401 @@ public:
 };
 
 
+void
+pre_vsetvl::compute_avl_def_data ()
+{
+  if (bitmap_empty_p (m_avl_regs))
+return;
+
+  unsigned num_regs = GP_REG_LAST + 1;
+  unsigned num_bbs = last_basic_block_for_fn (cfun);
+
+  sbitmap *avl_def_loc_temp = sbitmap_vector_alloc (num_bbs, num_regs);
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  bitmap_and (avl_def_loc_temp[bb->index ()], m_avl_regs,
+ m_reg_def_loc[bb->index ()]);
+
+  vsetvl_block_info &block_info = get_block_info (bb);
+  if (block_info.has_info ())
+   {
+ vsetvl_info &footer_info = block_info.get_exit_info ();
+ gcc_assert (footer_info.valid_p ());
+ if (footer_info.has_vl ())
+   bitmap_set_bit (avl_def_loc_temp[bb->index ()],
+   REGNO (footer_info.get_vl ()));
+   }
+}
+
+  if (m_avl_def_in)
+sbitmap_vector_free (m_avl_def_in);
+  if (m_avl_def_out)
+sbitmap_vector_free (m_avl_def_out);
+
+  unsigned num_exprs = num_bbs * num_regs;
+  sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
+  sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
+  m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
+  m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);
+
+  bitmap_vector_clear (avl_def_loc, num_bbs);
+  bitmap_vector_clear (m_kill, num_bbs);
+  bitmap_vector_clear (m_avl_def_out, num_bbs);
+
+  unsigned regno;
+  sbitmap_iterator sbi;
+  for (const bb_info *bb : crtl->ssa->bbs ())
+EXECUTE_IF_SET_IN_BITMAP (avl_def_loc_temp[bb->index ()], 0, regno, sbi)
+  {
+   bitmap_set_bit (avl_def_loc[bb->index ()],
+   get_expr_id (bb->index (), regno, num_bbs));
+   bitmap_set_range (m_kill[bb->index ()], regno * num_bbs, num_bbs);
+  }
+
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  EXECUTE_IF_SET_IN_BITMAP (m_avl_regs, 0, regno, sbi)
+bitmap_set_bit (m_avl_def_out[entry->index],
+   get_expr_id (entry->index, regno, num_bbs));
+
+  compute_reaching_defintion (avl_def_loc, m_kill, m_avl_def_in, 
m_avl_def_out);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file,
+  "  Compute avl reaching defition data (num_bbs %d, num_regs "
+  "%d):\n\n",
+  num_bbs, num_regs);
+  fprintf (dump_file, "avl_regs: ");
+  dump_bitmap_file (dump_file, m_avl_regs);
+  fprintf (dump_file, "\nbitmap data:\n");
+  for (const bb_info *bb : crtl->ssa->bbs ())
+   {
+ unsigned int i = bb->index ();
+ fprintf (dump_file, "  BB %u:\n", i);
+ fprintf (dump_file, "avl_def_loc:");
+ unsigned expr_id;
+ sbitmap_iterator sbi;
+ EXECUTE_IF_SET_IN_BITMAP (avl_def_loc[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\nkill:");
+ EXECUTE_IF_SET_IN_BITMAP (m_kill[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\navl_def_in:");
+ EXECUTE_IF_SET_IN_BITMAP (m_avl_def_in[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\navl_def_out:");
+ EXECUTE_IF_SET_IN_BITMAP (m_avl_def_out[i], 0, expr_id, sbi)
+   {
+ fprintf (dump_file, " (r%u,bb%u)", get_regno (expr_id, num_bbs),
+  get_bb_index (expr_id, num_bbs));
+   }
+ fprintf (dump_file, "\n");
+   }
+}
+
+  sbitmap_vector_free (avl_def_loc);
+  sbitmap_vector_free (m_kill);
+  sbitmap_vector_free (avl_def_loc_temp);
+
+  m_dem.set_avl_in_out_data (m_avl_def_in, m_avl_def_out);
+}
+
+void
+pre_vsetvl::compute_vsetvl_def_data ()
+{
+  m_vsetvl_def_exprs.truncate (0);
+  add_expr (m_vsetvl_def_exprs, m_unknow_info);
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  vsetvl_block_info &block_info = get_block_info (bb);
+  if (block_info.empty_p ())
+   continue;
+  vsetvl_info &footer_info = block_info.get_exit_info ();
+  gcc_assert (fo

[PATCH V3 11/11] RISC-V: P11: Adjust and add testcases

2023-10-19 Thread Lehua Ding
PR target/111037
PR target/111234
PR target/111725

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-84.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109773-1.c: Adjust.
* gcc.target/riscv/rvv/base/pr111037-1.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-1.c: ...here.
* gcc.target/riscv/rvv/base/pr111037-2.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-2.c: ...here.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-13.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-104.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-105.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-106.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-108.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-109.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-4.c: New test.

---
 .../gcc.target/riscv/rvv/base/scalar_move-1.c |  2 +-
 .../riscv/rvv/vsetvl/avl_single-104.c | 35 +++
 .../riscv/rvv/vsetvl/avl_single-105.c | 23 ++
 .../riscv/rvv/vsetvl/avl_single-106.c | 34 ++
 .../riscv/rvv/vsetvl/avl_single-107.c | 41 +
 .../riscv/rvv/vsetvl/avl_single-108.c | 41 +
 .../riscv/rvv/vsetvl/avl_single-109.c | 45 +++
 .../riscv/rvv/vsetvl/avl_single-23.c  |  7 +--
 .../riscv/rvv/vsetvl/avl_single-46.c  |  3 +-
 .../riscv/rvv/vsetvl/avl_single-84.c  |  5 +--
 .../riscv/rvv/vsetvl/avl_single-89.c  |  8 ++--
 .../riscv/rvv/vsetvl/avl_single-95.c  |  2 +-
 .../riscv/rvv/vsetvl/imm_bb_prop-1.c  |  7 +--
 .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  |  2 +-
 .../riscv/rvv/{base => vsetvl}/pr111037-1.c   |  0
 .../riscv/rvv/{base => vsetvl}/pr111037-2.c   |  0
 .../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  | 16 +++
 .../gcc.target/riscv/rvv/vsetvl/pr111037-4.c  | 16 +++
 .../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 ++---
 .../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 ++---
 .../riscv/rvv/vsetvl/vlmax_conflict-12.c  |  1 -
 .../riscv/rvv/vsetvl/vlmax_conflict-3.c   |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c   |  4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c   |  4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |  2 +-
 26 files changed, 288 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-106.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-107.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-108.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-109.c
 rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-1.c (100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-2.c (100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
index 18349132a88..c833d8989e9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
@@ -46,8 +46,8 @@ int32_t foo3 (int32_t *base, size_t vl)
 ** vl1re32\.v\tv[0-9]+,0\([a-x0-9]+\)
 ** vsetvli\tzero,[a-x0-9]+,e32,m1,t[au],m[au]
 ** vadd.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
-** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
 ** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
+** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
 ** vmv.v.x\tv[0-9]+,\s*[a-x0-9]+
 ** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
 ** ret
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c 
b/gc

Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Robin Dapp
Hi Lehua,

thanks for the extensive rework.  I'm going to let Juzhe handle the review
since it's his pass and he knows it best.  Delegated it to him in patchwork.

Regards
 Robin


[PATCH V3 10/11] RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def

2023-10-19 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.def (DEF_INCOMPATIBLE_COND): Removed.
(DEF_SEW_LMUL_RULE): New.
(DEF_SEW_LMUL_FUSE_RULE): Removed.
(DEF_POLICY_RULE): New.
(DEF_UNAVAILABLE_COND): Removed.
(DEF_AVL_RULE): New.
(sew_lmul): New.
(ratio_only): New.
(sew_only): New.
(ge_sew): New.
(ratio_and_ge_sew): New.
(tail_mask_policy): New.
(tail_policy_only): New.
(mask_policy_only): New.
(ignore_policy): New.
(avl): New.
(non_zero_avl): New.
(ignore_avl): New.
* config/riscv/t-riscv: Removed.
* config/riscv/riscv-vsetvl.h: Removed.

---
 gcc/config/riscv/riscv-vsetvl.def | 641 +++---
 gcc/config/riscv/riscv-vsetvl.h   | 488 ---
 gcc/config/riscv/t-riscv  |   2 +-
 3 files changed, 155 insertions(+), 976 deletions(-)
 delete mode 100644 gcc/config/riscv/riscv-vsetvl.h

diff --git a/gcc/config/riscv/riscv-vsetvl.def 
b/gcc/config/riscv/riscv-vsetvl.def
index 709cc4ee0df..401d2c6f421 100644
--- a/gcc/config/riscv/riscv-vsetvl.def
+++ b/gcc/config/riscv/riscv-vsetvl.def
@@ -18,496 +18,163 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#ifndef DEF_INCOMPATIBLE_COND
-#define DEF_INCOMPATIBLE_COND(AVL1, SEW1, LMUL1, RATIO1, NONZERO_AVL1, 
\
- GE_SEW1, TAIL_POLICTY1, MASK_POLICY1, AVL2,  \
- SEW2, LMUL2, RATIO2, NONZERO_AVL2, GE_SEW2,  \
- TAIL_POLICTY2, MASK_POLICY2, COND)
+/* DEF_XXX_RULE (prev_demand, next_demand, fused_demand, compatible_p,
+   available_p, fuse)
+   prev_demand: the prev vector insn's sew_lmul_type
+   next_demand: the next vector insn's sew_lmul_type
+   fused_demand: if them are compatible, change prev_info demand to the
+fused_demand after fuse prev_info and next_info
+   compatible_p: check if prev_demand and next_demand are compatible
+   available_p: check if prev_demand is available for next_demand
+   fuse: if them are compatible, how to modify prev_info  */
+
+#ifndef DEF_SEW_LMUL_RULE
+#define DEF_SEW_LMUL_RULE(prev_demand, next_demand, fused_demand,  
\
+ compatible_p, available_p, fuse)
 #endif
 
-#ifndef DEF_SEW_LMUL_FUSE_RULE
-#define DEF_SEW_LMUL_FUSE_RULE(DEMAND_SEW1, DEMAND_LMUL1, DEMAND_RATIO1,   
\
-  DEMAND_GE_SEW1, DEMAND_SEW2, DEMAND_LMUL2,  \
-  DEMAND_RATIO2, DEMAND_GE_SEW2, NEW_DEMAND_SEW,  \
-  NEW_DEMAND_LMUL, NEW_DEMAND_RATIO,  \
-  NEW_DEMAND_GE_SEW, NEW_SEW, NEW_VLMUL,  \
-  NEW_RATIO)
+#ifndef DEF_POLICY_RULE
+#define DEF_POLICY_RULE(prev_demand, next_demand, fused_demand, compatible_p,  
\
+   available_p, fuse)
 #endif
 
-#ifndef DEF_UNAVAILABLE_COND
-#define DEF_UNAVAILABLE_COND(AVL1, SEW1, LMUL1, RATIO1, NONZERO_AVL1, GE_SEW1, 
\
-TAIL_POLICTY1, MASK_POLICY1, AVL2, SEW2, LMUL2,   \
-RATIO2, NONZERO_AVL2, GE_SEW2, TAIL_POLICTY2, \
-MASK_POLICY2, COND)
+#ifndef DEF_AVL_RULE
+#define DEF_AVL_RULE(prev_demand, next_demand, fused_demand, compatible_p, 
\
+available_p, fuse)
 #endif
 
-/* Case 1: Demand compatible AVL.  */
-DEF_INCOMPATIBLE_COND (/*AVL*/ DEMAND_TRUE, /*SEW*/ DEMAND_ANY,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_FALSE, /*GE_SEW*/ DEMAND_ANY,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*AVL*/ DEMAND_TRUE, /*SEW*/ DEMAND_ANY,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_FALSE, /*GE_SEW*/ DEMAND_ANY,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*COND*/ incompatible_avl_p)
-
-/* Case 2: Demand same SEW.  */
-DEF_INCOMPATIBLE_COND (/*AVL*/ DEMAND_ANY, /*SEW*/ DEMAND_TRUE,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_ANY, /*GE_SEW*/ DEMAND_FALSE,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*AVL*/ DEMAND_ANY, /*SEW*/ DEMAND_TRUE,
-  /*LMUL*/ DEMAND_ANY, /*RATIO*/ DEMAND_ANY,
-  /*NONZERO_AVL*/ DEMAND_ANY, /*GE_SEW*/ DEMAND_FALSE,
-  /*TAIL_POLICTY*/ DEMAND_ANY, /*MASK_POLICY*/ DEMAND_ANY,
-  /*COND*/ different_sew_p)
-
-/* Case 3: Demand same LMUL.  */
-DEF_INCOMPATIBLE_COND (/*AVL*/ DEMAND_ANY, /*SEW*/ DEMAND_A

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-19 Thread HAO CHEN GUI
Kewen & David,
  Thanks for your comments.

在 2023/10/17 10:19, Kewen.Lin 写道:
> I think David raised a good question, it sounds to me that the current
> handling simply consider that if MOVE_MAX_PIECES is set to 16, the
> required operations for this optimization on TImode are always available,
> but unfortunately on rs6000 the assumption doesn't hold, so could we
> teach generic code instead?

Finally I found that it doesn't check if the scalar mode used in by pieces
operations is enabled by the target. The TImode is not enabled on ppc. It
should be checked before taking TImode to do by pieces operations. I made
a patch for the generic code and testing it. With the patch, 16-byte
comparison could be enabled on both ppc64 and ppc.

Thanks
Gui Haochen


Re: [PATCH V2 00/14] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Hi Partick,

Thank you so much for the details. I have send a V3 patchs here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633529.html

Can you use the new patchs to run the gcc testsuite again? I have fix 
some fails and fix the last patch of v2([PATCH V2 14/14]) which cannot 
apply normal since my editor automatically trim the ending space in the 
diff content and so it can't apply properly. Could you please see if it 
all works when applying these V3 patch, thank you very much. I am 
applying based on the commit id 
(0308461d9d44ca9db45fb72ca080c14e6fc68739) of trunk and no new fails.


On 2023/10/18 4:25, Patrick O'Neill wrote:

Hi Lehua!

I ran the gcc testsuite on qemu before/after applying your patches to 
305034e3 rv32/64gcv [1].


Baseline
    = Summary of gcc testsuite =
     | # of unexpected case / # of unique 
unexpected case

     |  gcc |  g++ | gfortran |
     rv32gcv/ ilp32d/ medlow |  208 /    78 |   29 /    17 |   71 /    24 |
     rv64gcv/  lp64d/ medlow |  101 /    54 |   13 / 4 |   33 /    13 |

After applying patch series:
    = Summary of gcc testsuite =
     | # of unexpected case / # of unique 
unexpected case

     |  gcc |  g++ | gfortran |
     rv32gcv/ ilp32d/ medlow |  256 /    96 |   29 /    17 |   69 /    23 |
     rv64gcv/  lp64d/ medlow |  152 /    74 |   13 / 4 |   31 /    12 |

I'm seeing:
20 new unique gcc failures on rv64gcv [2]
18 new unique gcc failures on rv32gcv [3]

Thanks,
Patrick

[1] Build commands:
git clone https://github.com/patrick-rivos/riscv-gnu-toolchain.git
cd riscv-gnu-toolchain
git submodule update --init gcc
cd gcc
git checkout 305034e3
cd ..
mkdir build
cd build
../configure --prefix=$(pwd) 
--with-multilib-generator="rv64gcv-lp64d--;rv32gcv-ilp32d--"

make report-linux -j32

Note: If you'd prefer to use upstream riscv-gnu-toolchain, I'm pretty 
sure you can do

mkdir build-64
cd build-64
../configure --prefix=$(pwd) --with-arch=rv64gcv --with-abi=lp64d
cd ..
mkdir build-32
cd build-32
../configure --prefix=$(pwd) --with-arch=rv32gcv --with-abi=lp32d
This'll make 2 folders, so run make report-linux in each of them.

[2] rv64gcv New failures:
FAIL: gcc.dg/vect/slp-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-7.c execution test
FAIL: gcc.target/riscv/zero-scratch-regs-2.c   -O3 -g scan-assembler-not 
\\mvsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O1 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -Os 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O1 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -Os 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O1 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -Os 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fuse-l

Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-19 Thread Robin Dapp
Hi Juzhe,

as discussed off-list this approach generally makes sense to me so
the patch LGTM once the vsetvl rework is upstream and settled.

Independently, we still need to understand why the more complex
broadcast pattern is not hoisted out of the loop.

Regards
 Robin


[PATCH] c-family: Enable -fpermissive for C and ObjC

2023-10-19 Thread Florian Weimer
Future changes will treat some C front end warnings similar to
-Wnarrowing.

There are no new tests because there are no such C warnings yet.  The
existing test suite covers the -std=gnu89 -pedantic-errors corner cases
(which should not turn on -fpermissive).

gcc/

* doc/invoke.texi (Warning Options): Mention C diagnostics
for -fpermissive.

gcc/c-family/

* c.opt (fpermissive): Enable for C and ObjC.
* c-opts.cc (set_std_c89): Enable -fpermissive.

---
 gcc/c-family/c-opts.cc | 6 ++
 gcc/c-family/c.opt | 2 +-
 gcc/doc/invoke.texi| 8 ++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index ce2e021e69d..9a4beb4b024 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1711,6 +1711,12 @@ set_std_c89 (int c94, int iso)
   flag_isoc99 = 0;
   flag_isoc11 = 0;
   flag_isoc2x = 0;
+  /* -std=gnu89 etc. should not override -pedantic-errors.  */
+  if (!global_dc->pedantic_errors)
+{
+  flag_permissive = 1;
+  global_dc->permissive = 1;
+}
   lang_hooks.name = "GNU C89";
 }
 
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44b9c862c14..e6ea5e52d24 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2108,7 +2108,7 @@ C ObjC C++ ObjC++
 Look for and use PCH files even when preprocessing.
 
 fpermissive
-C++ ObjC++ Var(flag_permissive)
+C ObjC C++ ObjC++ Var(flag_permissive)
 Downgrade conformance errors to warnings.
 
 fplan9-extensions
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9b5ff457027..cfebab78f83 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6166,13 +6166,17 @@ errors by @option{-pedantic-errors}.  For instance:
 Downgrade some required diagnostics about nonconformant code from
 errors to warnings.  Thus, using @option{-fpermissive} allows some
 nonconforming code to compile.  Some C++ diagnostics are controlled
-only by this flag, but it also downgrades some diagnostics that have
-their own flag:
+only by this flag, but it also downgrades some C and C++ diagnostics
+that have their own flag:
 
 @gccoptlist{
 -Wnarrowing @r{(C++)}
 }
 
+The @option{-fpermissive} option is the default for historic C language
+modes (@option{-std=c89}, @option{-std=gnu89}, @option{-std=c90},
+@option{-std=gnu90}).
+
 @opindex Wall
 @opindex Wno-all
 @item -Wall

base-commit: 8f4bbdc28df6e87a7ad5ec5ca191a7a836a4f016



Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Okay, thanks anyway.

On 2023/10/19 16:38, Robin Dapp wrote:

Hi Lehua,

thanks for the extensive rework.  I'm going to let Juzhe handle the review
since it's his pass and he knows it best.  Delegated it to him in patchwork.

Regards
  Robin



--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai


Re: [PATCH] x86: Correct ISA enabled for clients since Arrow Lake

2023-10-19 Thread Hongtao Liu
On Wed, Oct 18, 2023 at 4:10 PM Haochen Jiang  wrote:
>
> Hi all,
>
> I just found that since ISAs enabled on Sierra Forest changed, clients since
> Arrow Lake will wrongly enable ENQCMD according to the current code.
>
> To avoid messing up again in the future, I changed the dependency on how ISAs
> are enabled currently by making clients depending on clients and Atom servers
> depending on Atom servers, which makes no functionality difference on
> Clearwater Forest.
>
> Also, revise the current out of date documentation in texi file.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> * config/i386/i386.h: Correct the ISA enabled for Arrow Lake.
> Also make Clearwater Forest depends on Sierra Forest.
> * doc/invoke.texi: Correct documentation.
> ---
>  gcc/config/i386/i386.h |  7 ---
>  gcc/doc/invoke.texi| 15 ---
>  2 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index abfe1672c41..92a7982c87f 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2401,11 +2401,12 @@ constexpr wide_int_bitmask PTA_GRANITERAPIDS = 
> PTA_SAPPHIRERAPIDS | PTA_AMX_FP16
>  constexpr wide_int_bitmask PTA_GRANITERAPIDS_D = PTA_GRANITERAPIDS
>| PTA_AMX_COMPLEX;
>  constexpr wide_int_bitmask PTA_GRANDRIDGE = PTA_SIERRAFOREST | PTA_RAOINT;
> -constexpr wide_int_bitmask PTA_ARROWLAKE = PTA_SIERRAFOREST;
> +constexpr wide_int_bitmask PTA_ARROWLAKE = PTA_ALDERLAKE | PTA_AVXIFMA
> +  | PTA_AVXVNNIINT8 | PTA_AVXNECONVERT | PTA_CMPCCXADD | PTA_UINTR;
>  constexpr wide_int_bitmask PTA_ARROWLAKE_S = PTA_ARROWLAKE | PTA_AVXVNNIINT16
>| PTA_SHA512 | PTA_SM3 | PTA_SM4;
> -constexpr wide_int_bitmask PTA_CLEARWATERFOREST = PTA_ARROWLAKE_S | 
> PTA_PREFETCHI
> -  | PTA_USER_MSR;
> +constexpr wide_int_bitmask PTA_CLEARWATERFOREST = PTA_SIERRAFOREST | 
> PTA_AVXVNNIINT16
> +  | PTA_SHA512 | PTA_SM3 | PTA_SM4 | PTA_USER_MSR | PTA_PREFETCHI;
>  constexpr wide_int_bitmask PTA_PANTHERLAKE = PTA_ARROWLAKE_S | PTA_PREFETCHI;
>  constexpr wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW
>| PTA_AVX5124FMAPS | PTA_AVX512VPOPCNTDQ;
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a0da7f9d5ac..69809db9f1b 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -32845,7 +32845,8 @@ SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PREFETCHW, 
> PCLMUL, RDRND, XSAVE, XSAVEC,
>  XSAVES, XSAVEOPT, FSGSBASE, PTWRITE, RDPID, SGX, GFNI-SSE, CLWB, MOVDIRI,
>  MOVDIR64B, CLDEMOTE, WAITPKG, ADCX, AVX, AVX2, BMI, BMI2, F16C, FMA, LZCNT,
>  PCONFIG, PKU, VAES, VPCLMULQDQ, SERIALIZE, HRESET, KL, WIDEKL, AVX-VNNI,
> -AVXIFMA, AVXVNNIINT8, AVXNECONVERT and CMPCCXADD instruction set support.
> +UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT and CMPCCXADD instruction set
> +support.
>
>  @item arrowlake-s
>  Intel Arrow Lake S CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
> @@ -32853,8 +32854,8 @@ SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PREFETCHW, 
> PCLMUL, RDRND, XSAVE, XSAVEC,
>  XSAVES, XSAVEOPT, FSGSBASE, PTWRITE, RDPID, SGX, GFNI-SSE, CLWB, MOVDIRI,
>  MOVDIR64B, CLDEMOTE, WAITPKG, ADCX, AVX, AVX2, BMI, BMI2, F16C, FMA, LZCNT,
>  PCONFIG, PKU, VAES, VPCLMULQDQ, SERIALIZE, HRESET, KL, WIDEKL, AVX-VNNI,
> -AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512, SM3
> -and SM4 instruction set support.
> +UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512,
> +SM3 and SM4 instruction set support.
>
>  @item clearwaterforest
>  Intel Clearwater Forest CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
> @@ -32862,8 +32863,8 @@ SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PREFETCHW, 
> PCLMUL, RDRND, XSAVE,
>  XSAVEC, XSAVES, XSAVEOPT, FSGSBASE, PTWRITE, RDPID, SGX, GFNI-SSE, CLWB,
>  MOVDIRI, MOVDIR64B, CLDEMOTE, WAITPKG, ADCX, AVX, AVX2, BMI, BMI2, F16C, FMA,
>  LZCNT, PCONFIG, PKU, VAES, VPCLMULQDQ, SERIALIZE, HRESET, KL, WIDEKL, 
> AVX-VNNI,
> -AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512, SM3, 
> SM4,
> -USER_MSR and PREFETCHI instruction set support.
> +ENQCMD, UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16,
> +SHA512, SM3, SM4, USER_MSR and PREFETCHI instruction set support.
>
>  @item pantherlake
>  Intel Panther Lake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
> @@ -32871,8 +32872,8 @@ SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PREFETCHW, 
> PCLMUL, RDRND, XSAVE, XSAVEC,
>  XSAVES, XSAVEOPT, FSGSBASE, PTWRITE, RDPID, SGX, GFNI-SSE, CLWB, MOVDIRI,
>  MOVDIR64B, CLDEMOTE, WAITPKG, ADCX, AVX, AVX2, BMI, BMI2, F16C, FMA, LZCNT,
>  PCONFIG, PKU, VAES, VPCLMULQDQ, SERIALIZE, HRESET, KL, WIDEKL, AVX-VNNI,
> -AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512, SM3, SM4
> -and PREFETCHI instruction set support.
> +UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512,
> +SM3, SM4 and PREFETCHI instruction set support.
>
>  @item knl
>  Intel Kni

Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread 钟居哲
LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your github  so that 
we can see the full running result.

Issues ・ patrick-rivos/riscv-gnu-toolchain ・ GitHub



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-19 16:33
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V3 00/11] Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:
 
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only maintain
   and modify this virtual CFG. Phase 4 performs insertion, modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic block in the
   virtual CFG is called vsetvl_block_info and the vsetvl information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift vsetvl
   info to a pred basic block to a more unified method that there is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5 removes the avl
   operand from the RVV instruction and removes the unused dest operand
   register from the vsetvl insns.
 
These modifications resulted in some testcases needing to be updated. The 
reasons
for updating are summarized below:
 
1. more optimized
   vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
   avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
   avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c
 
PR target/111037
PR target/111234
PR target/111725
 
Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases
 
gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
gcc/config/riscv/riscv-vsetvl.def |  641 +-
gcc/config/riscv/riscv-vsetvl.h   |  488 --
gcc/config/riscv/t-riscv  |2 +-
.../gcc.target/riscv/rvv/base/scalar_move-1.c |2 +-
.../riscv/rvv/vsetvl/avl_single-104.c |   35 +
.../riscv/rvv/vsetvl/avl_single-105.c |   23 +
.../riscv/rvv/vsetvl/avl_single-106.c |   34 +
.../riscv/rvv/vsetvl/avl_single-107.c |   41 +
.../riscv/rvv/vsetvl/avl_single-108.c |   41 +
.../riscv/rvv/vsetvl/avl_single-109.c |   45 +
.../riscv/rvv/vsetvl/avl_single-23.c  |7 +-
.../riscv/rvv/vsetvl/avl_single-46.c  |3 +-
.../riscv/rvv/vsetvl/avl_single-84.c  |5 +-
.../riscv/rvv/vsetvl/avl_single-89.c  |8 +-
.../riscv/rvv/vsetvl/avl_single-95.c  |2 +-
.../riscv/rvv/vsetvl/imm_bb_prop-1.c  |7 +-
.../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  |2 +-
.../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  |2 +-
.../riscv/rvv/{base => vsetvl}/pr111037-1.c   |0
.../riscv/rvv/{base => vsetvl}/pr111037-2.c   |0
.../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  |   16 +
.../gcc.target/riscv/rvv/vsetvl/pr111037-4.c  |   16 +
.../riscv/rvv/vsetvl/vlmax_back_prop-25.c |   10 +-
.../riscv/rvv/vsetvl/vlmax_back_prop-26.c |   10 +-
.../riscv/rvv/vsetvl/vlmax_conflict-12.c  |1 -
.../riscv/rvv/vsetvl/vlmax_conflict-3.c   |2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c   |4 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c   |4 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |2 +-
30 files changed, 3263 insertions(+), 4692 deletions(-)
delete mode 100644 gcc/config/riscv/riscv-vsetvl.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-106.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-107.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/

Re: Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-19 Thread 钟居哲
May be it is COST issue of RVV instruction ?

  /* TODO: We set RVV instruction cost as 1 by default.
 Cost Model need to be well analyzed and supported in the future. */
  if (riscv_v_ext_mode_p (mode))
{
  *total = COSTS_N_INSNS (1);
  return true;
}

Since all RVV instructions are considered as very cheap (COST = 1).


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-19 16:43
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction
Hi Juzhe,
 
as discussed off-list this approach generally makes sense to me so
the patch LGTM once the vsetvl rework is upstream and settled.
 
Independently, we still need to understand why the more complex
broadcast pattern is not hoisted out of the loop.
 
Regards
Robin
 


[committed] amdgcn: deprecate Fiji device and multilib

2023-10-19 Thread Andrew Stubbs
The build has been failing for the last few days because LLVM removed 
support for the HSACOv3 binary metadata format, which we were still 
using for the Fiji multilib.


The LLVM commit has now been reverted (thank you Pierre van Houtryve), 
but it's only a temporary repreive.


This patch removes Fiji from the default configuration, and updates the 
documentation accordingly, but no more.  Those that still use Fiji 
devices can re-enable it by configuring using --with-arch=fiji.


Why not remove Fiji support entirely? This is simply because about one 
third of our test farm conists of Fiji devices and we can't replace them 
quickly.


Andrewamdgcn: deprecate Fiji device and multilib

LLVM wants to remove it, which breaks our build.  This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.

I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.

gcc/ChangeLog:

* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
Implement support for --with-multilib-list.
* config/gcn/t-gcn-hsa: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Mark Fiji deprecated.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 37311fcd075..9c397156868 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4538,7 +4538,19 @@ case "${target}" in
;;
esac
done
-   [ "x$with_arch" = x ] && with_arch=fiji
+   [ "x$with_arch" = x ] && with_arch=gfx900
+
+   case "x${with_multilib_list}" in
+   x | xno)
+   TM_MULTILIB_CONFIG=
+   ;;
+   xdefault | xyes)
+   TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" 
| sed "s/${with_arch},\?//;s/,$//"`
+   ;;
+   *)
+   TM_MULTILIB_CONFIG="${with_multilib_list}"
+   ;;
+   esac
;;
 
hppa*-*-*)
diff --git a/gcc/config/gcn/t-gcn-hsa b/gcc/config/gcn/t-gcn-hsa
index ea27122e484..18db7075356 100644
--- a/gcc/config/gcn/t-gcn-hsa
+++ b/gcc/config/gcn/t-gcn-hsa
@@ -42,8 +42,12 @@ ALL_HOST_OBJS += gcn-run.o
 gcn-run$(exeext): gcn-run.o
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $< -ldl
 
-MULTILIB_OPTIONS = march=gfx900/march=gfx906/march=gfx908/march=gfx90a
-MULTILIB_DIRNAMES = gfx900 gfx906 gfx908 gfx90a
+empty :=
+space := $(empty) $(empty)
+comma := ,
+multilib_list := $(subst $(comma),$(space),$(TM_MULTILIB_CONFIG)) 
+MULTILIB_OPTIONS = $(subst $(space),/,$(addprefix march=,$(multilib_list)))
+MULTILIB_DIRNAMES = $(multilib_list)
 
 gcn-tree.o: $(srcdir)/config/gcn/gcn-tree.cc
$(COMPILE) $<
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 31f2234640f..4035e8020b2 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1236,8 +1236,8 @@ sysv, aix.
 @itemx --without-multilib-list
 Specify what multilibs to build.  @var{list} is a comma separated list of
 values, possibly consisting of a single value.  Currently only implemented
-for aarch64*-*-*, arm*-*-*, loongarch*-*-*, riscv*-*-*, sh*-*-* and
-x86-64-*-linux*.  The accepted values and meaning for each target is given
+for aarch64*-*-*, amdgcn*-*-*, arm*-*-*, loongarch*-*-*, riscv*-*-*, sh*-*-*
+and x86-64-*-linux*.  The accepted values and meaning for each target is given
 below.
 
 @table @code
@@ -1250,6 +1250,15 @@ default run-time library will be built.  If @var{list} is
 default set of libraries is selected based on the value of
 @option{--target}.
 
+@item amdgcn*-*-*
+@var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
+@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
+include the name of the default ISA, specified via @option{--with-arch}.  If
+@var{list} is empty, then there will be no multilibs and only the default
+run-time library will be built.  If @var{list} is @code{default} or
+@option{--with-multilib-list=} is not specified, then the default set of
+libraries is selected.
+
 @item arm*-*-*
 @var{list} is a comma separated list of @code{aprofile} and
 @code{rmprofile} to build multilibs for A or R and M architecture
@@ -3922,6 +3931,12 @@ To run the binaries, install the HSA Runtime from the
 @file{libexec/gcc/amdhsa-amdhsa/@var{version}/gcn-run} to launch them
 on the GPU.
 
+To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured with
+@option{--with-arch=@code{fiji}} or
+@option{--with-multilib-list=@code{fiji},...}.  Note that support for Fiji
+devices has been removed in ROCm 4.0 and support in LLVM is deprecated and will
+be removed in the future.
+
 @html
 
 @end html
diff --git a/gcc/doc/i

Re: [PATCH] return edge in make_eh_edges

2023-10-19 Thread Richard Biener
On Thu, Oct 19, 2023 at 9:59 AM Alexandre Oliva  wrote:
>
>
> The need to initialize edge probabilities has made make_eh_edges
> undesirably hard to use.  I suppose we don't want make_eh_edges to
> initialize the probability of the newly-added edge itself, so that the
> caller takes care of it, but identifying the added edge in need of
> adjustments is inefficient and cumbersome.  Change make_eh_edges so
> that it returns the added edge.
>
> Regstrapped on x86_64-linux-gnu, and (along with various hardening
> patches) on ppc64el-linux-gnu.  Also tested on multiple other targets,
> on older versions of GCC.  The returned value is unused in code already
> in the compiler.  This is a preparatory patch for uses to be introduced
> along with stack scrubbing and control flow redundancy.  Ok to install?

OK.  Maybe time to do s/make_eh_edges/make_eh_edge/ though.

Richard.

>
> for  gcc/ChangeLog
>
> * tree-eh.cc (make_eh_edges): Return the new edge.
> * tree-eh.h (make_eh_edges): Likewise.
> ---
>  gcc/tree-eh.cc |6 +++---
>  gcc/tree-eh.h  |2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
> index e8ceff36cc6e7..1cb8e08652909 100644
> --- a/gcc/tree-eh.cc
> +++ b/gcc/tree-eh.cc
> @@ -2274,7 +2274,7 @@ make_eh_dispatch_edges (geh_dispatch *stmt)
>  /* Create the single EH edge from STMT to its nearest landing pad,
> if there is such a landing pad within the current function.  */
>
> -void
> +edge
>  make_eh_edges (gimple *stmt)
>  {
>basic_block src, dst;
> @@ -2283,14 +2283,14 @@ make_eh_edges (gimple *stmt)
>
>lp_nr = lookup_stmt_eh_lp (stmt);
>if (lp_nr <= 0)
> -return;
> +return NULL;
>
>lp = get_eh_landing_pad_from_number (lp_nr);
>gcc_assert (lp != NULL);
>
>src = gimple_bb (stmt);
>dst = label_to_block (cfun, lp->post_landing_pad);
> -  make_edge (src, dst, EDGE_EH);
> +  return make_edge (src, dst, EDGE_EH);
>  }
>
>  /* Do the work in redirecting EDGE_IN to NEW_BB within the EH region tree;
> diff --git a/gcc/tree-eh.h b/gcc/tree-eh.h
> index 771be50fe9a1d..1382568b7c919 100644
> --- a/gcc/tree-eh.h
> +++ b/gcc/tree-eh.h
> @@ -30,7 +30,7 @@ extern bool remove_stmt_from_eh_lp (gimple *);
>  extern int lookup_stmt_eh_lp_fn (struct function *, const gimple *);
>  extern int lookup_stmt_eh_lp (const gimple *);
>  extern bool make_eh_dispatch_edges (geh_dispatch *);
> -extern void make_eh_edges (gimple *);
> +extern edge make_eh_edges (gimple *);
>  extern edge redirect_eh_edge (edge, basic_block);
>  extern void redirect_eh_dispatch_edge (geh_dispatch *, edge, basic_block);
>  extern bool operation_could_trap_helper_p (enum tree_code, bool, bool, bool,
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Enable top-level recursive 'autoreconf' (was: Hints on reconfiguring GCC)

2023-10-19 Thread Thomas Schwinge
Hi!

On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
> I guess I can ask, why there is not a recursive approach for configuring
> GCC. e.g. AC_SUBDIRS in the top level?

('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
ask the right questions...  ;-)

What do people think about the attached
"Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 43127e5643337ca407071ad93bccbc716024352e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 19 Oct 2023 10:28:30 +0200
Subject: [PATCH] Enable top-level recursive 'autoreconf'

	* configure.ac: At end of file, instantiate 'AC_CONFIG_SUBDIRS'
	for all relevant directories.
	* configure: Regenerate.
---
 configure| 102 ++-
 configure.ac |  36 ++
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 8fc163d36bd..fcb4d591334 100755
--- a/configure
+++ b/configure
@@ -584,7 +584,8 @@ PACKAGE_URL=
 
 ac_unique_file="move-if-change"
 enable_option_checking=no
-ac_subst_vars='LTLIBOBJS
+ac_subst_vars='subdirs
+LTLIBOBJS
 LIBOBJS
 compare_exclusions
 stage2_werror_flag
@@ -909,7 +910,37 @@ READELF_FOR_TARGET
 STRIP_FOR_TARGET
 WINDRES_FOR_TARGET
 WINDMC_FOR_TARGET'
-
+ac_subdirs_all='c++tools
+fixincludes
+gcc
+gcc/m2
+gnattools
+gotools
+intl
+libada
+libatomic
+libbacktrace
+libcc1
+libcody
+libcpp
+libdecnumber
+libffi
+libgcc
+libgfortran
+libgm2
+libgo
+libgomp
+libiberty
+libitm
+libobjc
+libphobos
+libquadmath
+libsanitizer
+libssp
+libstdc++-v3
+libvtv
+lto-plugin
+zlib'
 
 # Initialize some variables set by options.
 ac_init_help=
@@ -20081,3 +20112,70 @@ if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then
 $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;}
 fi
 
+
+# Enable top-level recursive 'autoreconf' by enumerating all relevant
+# directories here.  This is intentionally done at end of 'configure.ac',
+# *after* 'AC_OUTPUT', so that we don't attempt to prematurely 'configure'
+# these directories when the top-level 'configure' is invoked.
+subdirs="$subdirs c++tools"
+
+subdirs="$subdirs fixincludes"
+
+subdirs="$subdirs gcc"
+
+subdirs="$subdirs gcc/m2"
+
+subdirs="$subdirs gnattools"
+
+subdirs="$subdirs gotools"
+
+subdirs="$subdirs intl"
+
+subdirs="$subdirs libada"
+
+subdirs="$subdirs libatomic"
+
+subdirs="$subdirs libbacktrace"
+
+subdirs="$subdirs libcc1"
+
+subdirs="$subdirs libcody"
+
+subdirs="$subdirs libcpp"
+
+subdirs="$subdirs libdecnumber"
+
+subdirs="$subdirs libffi"
+
+subdirs="$subdirs libgcc"
+
+subdirs="$subdirs libgfortran"
+
+subdirs="$subdirs libgm2"
+
+subdirs="$subdirs libgo"
+
+subdirs="$subdirs libgomp"
+
+subdirs="$subdirs libiberty"
+
+subdirs="$subdirs libitm"
+
+subdirs="$subdirs libobjc"
+
+subdirs="$subdirs libphobos"
+
+subdirs="$subdirs libquadmath"
+
+subdirs="$subdirs libsanitizer"
+
+subdirs="$subdirs libssp"
+
+subdirs="$subdirs libstdc++-v3"
+
+subdirs="$subdirs libvtv"
+
+subdirs="$subdirs lto-plugin"
+
+subdirs="$subdirs zlib"
+
diff --git a/configure.ac b/configure.ac
index 1d16530140a..0d37d30196e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3944,3 +3944,39 @@ AC_CONFIG_FILES([Makefile],
extrasub_host="$extrasub_host"
extrasub_target="$extrasub_target"])
 AC_OUTPUT
+
+# Enable top-level recursive 'autoreconf' by enumerating all relevant
+# directories here.  This is intentionally done at end of 'configure.ac',
+# *after* 'AC_OUTPUT', so that we don't attempt to prematurely 'configure'
+# these directories when the top-level 'configure' is invoked.
+AC_CONFIG_SUBDIRS([c++tools])
+AC_CONFIG_SUBDIRS([fixincludes])
+AC_CONFIG_SUBDIRS([gcc])
+AC_CONFIG_SUBDIRS([gcc/m2])
+AC_CONFIG_SUBDIRS([gnattools])
+AC_CONFIG_SUBDIRS([gotools])
+AC_CONFIG_SUBDIRS([intl])
+AC_CONFIG_SUBDIRS([libada])
+AC_CONFIG_SUBDIRS([libatomic])
+AC_CONFIG_SUBDIRS([libbacktrace])
+AC_CONFIG_SUBDIRS([libcc1])
+AC_CONFIG_SUBDIRS([libcody])
+AC_CONFIG_SUBDIRS([libcpp])
+AC_CONFIG_SUBDIRS([libdecnumber])
+AC_CONFIG_SUBDIRS([libffi])
+AC_CONFIG_SUBDIRS([libgcc])
+AC_CONFIG_SUBDIRS([libgfortran])
+AC_CONFIG_SUBDIRS([libgm2])
+AC_CONFIG_SUBDIRS([libgo])
+AC_CONFIG_SUBDIRS([libgomp])
+AC_CONFIG_SUBDIRS([libiberty])
+AC_CONFIG_SUBDIRS([libitm])
+AC_CONFIG_SUBDIRS([libobjc])
+AC_CONFIG_SUBDIRS([libphobos])
+AC_CONFIG_SUBDIRS([libquadmath])
+AC_CONFIG_SUBDIRS([libsanitizer])
+AC_CONFIG_SUBDIRS([libssp])
+AC_CONFIG_SUBDIRS([libstdc++-v3])
+AC_CONFIG_SUBDIRS([libvtv])
+AC_CONFIG_SUBDIRS([lto-plugin])
+AC_CONFIG_SUBDIRS([zlib])
-- 
2.34.1



[PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-19 Thread Andrew Stubbs

OK to commit?

Andrewgcc-14: mark amdgcn fiji deprecated


diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c817dde4..91ab8132 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -178,6 +178,16 @@ a work-in-progress.
 
 
 
+AMD Radeon (GCN)
+
+
+  The Fiji device support is now deprecated and will be removed from a
+  future release.  The default compiler configuration no longer uses Fiji
+  as the default device, and no longer includes the Fiji libraries.  Both
+  can be restored by configuring with --with-arch=fiji.
+  The default device architecture is now gfx900 (Vega).
+
+
 
 
 


Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Andreas Schwab
On Okt 19 2023, Thomas Schwinge wrote:

> Hi!
>
> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
>> I guess I can ask, why there is not a recursive approach for configuring
>> GCC. e.g. AC_SUBDIRS in the top level?
>
> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
> ask the right questions...  ;-)
>
> What do people think about the attached
> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.

The top-level files are shared with binutils-gdb, which has a different
set of subdirs.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] wwwdocs: gcc-14: mark amdgcn fiji deprecated

2023-10-19 Thread Tobias Burnus

On 19.10.23 11:49, Andrew Stubbs wrote:

OK to commit?


(I think as maintainer you don't need approval - but of course comments
by others can be helpful; I hope mine are. Additionally, Gerald (CCed)
helps with keeping the webpages in good shape (thanks!).)



gcc-14: mark amdgcn fiji deprecated
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c817dde4..91ab8132 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -178,6 +178,16 @@ a work-in-progress.

  

+AMD Radeon (GCN)
+
+
+  The Fiji device support is now deprecated and will be removed from a
+  future release.  The default compiler configuration no longer uses Fiji
+  as the default device, and no longer includes the Fiji libraries.  Both
+  can be restored by configuring with --with-arch=fiji.
+  The default device architecture is now gfx900 (Vega).
+


Can you add ... around the "--with-arch=fiji"? Linking to

https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa

I think that page is helpful (once the cron job has updated that page).

Additionally, I wonder whether "Fiji" should be changed to "Fiji
(gfx803)" in the first line and whether the  "," should be removed in
"The ... configuration ... , and no longer includes".

Thanks,

Tobias



+
  

  

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Thomas Schwinge
Hi!

On 2023-10-19T11:57:33+0200, Andreas Schwab  wrote:
> On Okt 19 2023, Thomas Schwinge wrote:
>> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
>>> I guess I can ask, why there is not a recursive approach for configuring
>>> GCC. e.g. AC_SUBDIRS in the top level?
>>
>> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
>> ask the right questions...  ;-)
>>
>> What do people think about the attached
>> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.
>
> The top-level files are shared with binutils-gdb, which has a different
> set of subdirs.

Good point, thanks!  Fortunately, the failure mode for non-existing
directories is non-fatal (skipped with 'subdirectory [...] not present'
diagnostic); with my original "Enable top-level recursive 'autoreconf'"
(also re-attached) applied to Binutils/GDB Git master branch, we get:

$ PATH=[...] autoreconf -v
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal
autoreconf: configure.ac: tracing
autoreconf: configure.ac: subdirectory c++tools not present
autoreconf: configure.ac: subdirectory fixincludes not present
autoreconf: configure.ac: subdirectory gcc not present
autoreconf: configure.ac: subdirectory gcc/m2 not present
autoreconf: configure.ac: subdirectory gnattools not present
autoreconf: configure.ac: subdirectory gotools not present
autoreconf: configure.ac: adding subdirectory intl to autoreconf
autoreconf: Entering directory `intl'
[...]
autoreconf: Leaving directory `intl'
autoreconf: configure.ac: subdirectory libada not present
autoreconf: configure.ac: subdirectory libatomic not present
autoreconf: configure.ac: adding subdirectory libbacktrace to autoreconf
autoreconf: Entering directory `libbacktrace'
[...]

So we could (a) simply list *all* directories in the shared top-level
'configure.ac', or (b) configure GCC vs. other projrcts via a non-shared
file ('m4_include([config/AC_CONFIG_SUBDIRS.m4])' or similar -- is there
an established procedure for non-shared top-level files)?  (I don't have
a strong preference either way.)

It's just GCC and Binutils/GDB, or are the top-level files also shared
with additional projects?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 43127e5643337ca407071ad93bccbc716024352e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 19 Oct 2023 10:28:30 +0200
Subject: [PATCH] Enable top-level recursive 'autoreconf'

	* configure.ac: At end of file, instantiate 'AC_CONFIG_SUBDIRS'
	for all relevant directories.
	* configure: Regenerate.
---
 configure| 102 ++-
 configure.ac |  36 ++
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 8fc163d36bd..fcb4d591334 100755
--- a/configure
+++ b/configure
@@ -584,7 +584,8 @@ PACKAGE_URL=
 
 ac_unique_file="move-if-change"
 enable_option_checking=no
-ac_subst_vars='LTLIBOBJS
+ac_subst_vars='subdirs
+LTLIBOBJS
 LIBOBJS
 compare_exclusions
 stage2_werror_flag
@@ -909,7 +910,37 @@ READELF_FOR_TARGET
 STRIP_FOR_TARGET
 WINDRES_FOR_TARGET
 WINDMC_FOR_TARGET'
-
+ac_subdirs_all='c++tools
+fixincludes
+gcc
+gcc/m2
+gnattools
+gotools
+intl
+libada
+libatomic
+libbacktrace
+libcc1
+libcody
+libcpp
+libdecnumber
+libffi
+libgcc
+libgfortran
+libgm2
+libgo
+libgomp
+libiberty
+libitm
+libobjc
+libphobos
+libquadmath
+libsanitizer
+libssp
+libstdc++-v3
+libvtv
+lto-plugin
+zlib'
 
 # Initialize some variables set by options.
 ac_init_help=
@@ -20081,3 +20112,70 @@ if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then
 $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;}
 fi
 
+
+# Enable top-level recursive 'autoreconf' by enumerating all relevant
+# directories here.  This is intentionally done at end of 'configure.ac',
+# *after* 'AC_OUTPUT', so that we don't attempt to prematurely 'configure'
+# these directories when the top-level 'configure' is invoked.
+subdirs="$subdirs c++tools"
+
+subdirs="$subdirs fixincludes"
+
+subdirs="$subdirs gcc"
+
+subdirs="$subdirs gcc/m2"
+
+subdirs="$subdirs gnattools"
+
+subdirs="$subdirs gotools"
+
+subdirs="$subdirs intl"
+
+subdirs="$subdirs libada"
+
+subdirs="$subdirs libatomic"
+
+subdirs="$subdirs libbacktrace"
+
+subdirs="$subdirs libcc1"
+
+subdirs="$subdirs libcody"
+
+subdirs="$subdirs libcpp"
+
+subdirs="$subdirs libdecnumber"
+
+subdirs="$subdirs libffi"
+
+subdirs="$subdirs libgcc"
+
+subdirs="$subdirs libgfortran"
+
+subdirs="$subdirs libgm2"
+
+subdirs="$subdirs libgo"
+
+subdirs="$subdirs libgomp"
+
+subdirs="$subdirs libiberty"
+
+subdirs=

Re: [PATCH 1/2] arm: Use deltas for Arm switch tables

2023-10-19 Thread Richard Earnshaw




On 28/09/2023 14:26, Richard Ball wrote:

For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.

Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.

New tests have been added and no regressions on arm-none-eabi.

gcc/ChangeLog:

* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: New test.


#define CASE_VECTOR_PC_RELATIVE ((TARGET_ARM || TARGET_THUMB2   \
  || (TARGET_THUMB1 \
  && (optimize_size || flag_pic)))  \
 && (!target_pure_code))

A minor nit for future reference: (TARGET_ARM || TARGET_THUMB2) is 
normally written as TARGET_32BIT.  No need to fix this as the next patch 
will rewrite this macro again anyway.


This is OK.

Reviewed-by: rearn...@arm.com

R.


Re: [PATCH 2/2] arm: move the switch tables for Arm to the RO data section.

2023-10-19 Thread Richard Earnshaw




On 28/09/2023 14:29, Richard Ball wrote:

Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
 from (!target_pure_code) condition.
 (ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
 .Lrtx label and remove adr instructions.
* config/arm/arm.md
 (arm_casesi_internal): Use force_reg to generate ldr instructions that
 would otherwise be out of range, and change rtl to accommodate force 
reg.
 Additionally remove unnecessary register temp.
 (casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
 targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
 change adr instruction to ldr.


This all looks pretty good, but there are some minor niggles to sort out 
before it can go in...


arm.cc:

 arm_output_casesi (rtx *operands)
 {
+  char buf[100];

buf is unused, so this breaks a native bootstrap.

  output_asm_insn ("add\t%|pc, %|pc, %4, lsl #2", operands);;

Two semicolons at the end of the line.

+  else
+   {
+ output_asm_insn ("ldr\t%|pc, [%5, %0, lsl #2]", operands);
+   }

Our normal coding style is to omit the braces for a single statement in 
an 'if/else' clause, even if the other arm of the clause uses braces, so:


  else
output_asm_insn ("ldr\t%|pc, [%5, %0, lsl #2]", operands);

+output_asm_insn ("nop;", operands);

Stray semicolon after the "nop".

#define CASE_VECTOR_PC_RELATIVE (TARGET_ARM || ((TARGET_THUMB2  \
  || (TARGET_THUMB1 \
  && (optimize_size || flag_pic)))  \
 && (!target_pure_code)))

The indentation here is incorrect, which makes it very hard to 
understand the logic.  But I think a bit of reordering would help 
clarify things as well..


#define CASE_VECTOR_PC_RELATIVE
  (TARGET_ARM
   || (!target_pure_code
   && (TARGET_THUMB2
   || (TARGET_THUMB1 && (optimize_size || flag_pic)

(obviously with the line escapes added back in)

arm.md (casesi):

  "TARGET_ARM || ((TARGET_THUMB2 || optimize_size || flag_pic) &&
   ^^
operators should be at the start of the following line, not the end of 
the previous one.

  (!target_pure_code))"

So:

  "TARGET_ARM || ((TARGET_THUMB2 || optimize_size || flag_pic)
  && (!target_pure_code))"

But I think this could be laid out better as well:

  "(TARGET_ARM
|| (!target_pure_code
&& (TARGET_THUMB2 || optimize_size || flag_pic)))"

arm_casesi_internal:

  rtx tmp = force_reg (SImode, gen_rtx_LABEL_REF (SImode, operands[2]));

Tmp is not generally a good choice of name, even for short fragments 
like this.  Use something more descriptive to the object it holds, like 
"lref"; or, better still, a name that describes what the label points to 
(vec_table_ref?).


elf.h:

/* We put Thumb-2 jump tables in the text section, because it makes
   the code more efficient, but for Thumb-1 and ARM it's better to put 
them out of

   band unless we are generating compressed tables.  */

This comment is misleading now, as it implies that compressed tables for 
arm are still sometimes placed in the text segment (the unless clause) 
and that's not true.


R.


Re: [PATCH] Fix PR ada/111813 (Inconsistent limit in Ada.Calendar.Formatting)

2023-10-19 Thread Simon Wright
Pierre-Marie, I’ve CC’d you hoping you’re an appropriate person to ping on this 
one.
If not, who would be for this sort of change?

I should have said, tested by
- add test case, make -C gcc check-gnat: error reported
- make -C gcc gnatlib_and_tools; make install
- make -C gcc check-gnat: no error reported

FSF copyright assignment RT:1016382

—S

> On 16 Oct 2023, at 14:32, Simon Wright  wrote:
> 
> The description of the second Value function (returning Duration) (ARM 
> 9.6.1(87) 
> doesn't place any limitation on the Elapsed_Time parameter's value, beyond 
> "Constraint_Error is raised if the string is not formatted as described for 
> Image, or 
> the function cannot interpret the given string as a Duration value".
> 
> It would seem reasonable that Value and Image should be consistent, in that 
> any 
> string produced by Image should be accepted by Value. Since Image must produce
> a two-digit representation of the Hours, there's an implication that its 
> Elapsed_Time parameter should be less than 100.0 hours (the ARM merely says
> that in that case the result is implementation-defined).
> 
> The current implementation of Value raises Constraint_Error if the 
> Elapsed_Time
> parameter is greater than or equal to 24 hours.
> 
> This patch removes the restriction, so that the Elapsed_Time parameter must 
> only
> be less than 100.0 hours.
> 
> gcc/ada/Changelog:
> 
>  2023-10-15 Simon Wright 
> 
>  PR ada/111813
> 
>  * gcc/ada/libgnat/a-calfor.adb (Value (2)): Allow values of parameter
>  Elapsed_Time greater than or equal to 24 hours, by doing the
>  hour calculations in Natural rather than Hour_Number (0 .. 23).
>  Calculate the result directly rather than by using Seconds_Of
>  (whose Hour parameter is of type Hour_Number).
> 
>  If an exception occurs of type Constraint_Error, re-raise it
>  rather than raising a new CE.
> 
> gcc/testsuite/Changelog:
> 
>  2023-10-15 Simon Wright 
> 
>  PR ada/111813
> 
>  * gcc/testsuite/gnat.dg/calendar_format_value.adb: New test.
> 
> ---
> gcc/ada/libgnat/a-calfor.adb  | 11 +---
> .../gnat.dg/calendar_format_value.adb | 26 +++
> 2 files changed, 34 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gnat.dg/calendar_format_value.adb
> 
> diff --git a/gcc/ada/libgnat/a-calfor.adb b/gcc/ada/libgnat/a-calfor.adb
> index 18f4e7388df..493728b490e 100644
> --- a/gcc/ada/libgnat/a-calfor.adb
> +++ b/gcc/ada/libgnat/a-calfor.adb
> @@ -777,7 +777,7 @@ package body Ada.Calendar.Formatting is
> 
>function Value (Elapsed_Time : String) return Duration is
>   D  : String (1 .. 11);
> -  Hour   : Hour_Number;
> +  Hour   : Natural;
>   Minute : Minute_Number;
>   Second : Second_Number;
>   Sub_Second : Second_Duration := 0.0;
> @@ -817,7 +817,7 @@ package body Ada.Calendar.Formatting is
> 
>   --  Value extraction
> 
> -  Hour   := Hour_Number   (Hour_Number'Value   (D (1 .. 2)));
> +  Hour   := Natural   (Natural'Value   (D (1 .. 2)));
>   Minute := Minute_Number (Minute_Number'Value (D (4 .. 5)));
>   Second := Second_Number (Second_Number'Value (D (7 .. 8)));
> 
> @@ -837,9 +837,14 @@ package body Ada.Calendar.Formatting is
>  raise Constraint_Error;
>   end if;
> 
> -  return Seconds_Of (Hour, Minute, Second, Sub_Second);
> +  return Duration (Hour * 3600)
> ++ Duration (Minute * 60)
> ++ Duration (Second)
> ++ Sub_Second;
> 
>exception
> +  --  CE is mandated, but preserve trace if CE already.
> +  when Constraint_Error => raise;
>   when others => raise Constraint_Error;
>end Value;
> 
> diff --git a/gcc/testsuite/gnat.dg/calendar_format_value.adb 
> b/gcc/testsuite/gnat.dg/calendar_format_value.adb
> new file mode 100644
> index 000..e98e496fd3b
> --- /dev/null
> +++ b/gcc/testsuite/gnat.dg/calendar_format_value.adb
> @@ -0,0 +1,26 @@
> +-- { dg-do run }
> +-- { dg-options "-O2" }
> +
> +with Ada.Calendar.Formatting;
> +
> +procedure Calendar_Format_Value is
> +   Limit : constant Duration
> + := 99 * 3600.0 + 59 * 60.0 + 59.0;
> +begin
> +   declare
> +  Image : constant String := Ada.Calendar.Formatting .Image (Limit);
> +  Image_Error : exception;
> +   begin
> +  if Image /= "99:59:59" then
> + raise Image_Error with "image: " & Image;
> +  end if;
> +  declare
> + Value : constant Duration := Ada.Calendar.Formatting.Value (Image);
> + Value_Error : exception;
> +  begin
> + if Value /= Limit then
> +raise Value_Error with "duration: " & Value'Image;
> + end if;
> +  end;
> +   end;
> +end Calendar_Format_Value;
> -- 
> 2.39.3 (Apple Git-145)
> 



[PATCH 1/2] Refactor x86 vectorized gather path

2023-10-19 Thread Richard Biener
The following moves the builtin decl gather vectorization path along
the internal function and emulated gather vectorization paths,
simplifying the existing function down to generating the call and
required conversions to the actual argument types.  This thereby
exposes the unique support of two times larger number of offset
or data vector lanes.  It also makes the code path handle SLP
in principle (but SLP build needs adjustments for this, patch coming).

Bootstrapped and tested on x86_64-unnknown-linux-gnu, will push.

Richard.

* tree-vect-stmts.cc (vect_build_gather_load_calls): Rename
to ...
(vect_build_one_gather_load_call): ... this.  Refactor,
inline widening/narrowing support ...
(vectorizable_load): ... here, do gather vectorization
with builtin decls along other gather vectorization.
---
 gcc/tree-vect-stmts.cc | 406 ++---
 1 file changed, 179 insertions(+), 227 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e5ff44c25f1..ee5f56bbbda 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2595,268 +2595,99 @@ vect_build_zero_merge_argument (vec_info *vinfo,
 /* Build a gather load call while vectorizing STMT_INFO.  Insert new
instructions before GSI and add them to VEC_STMT.  GS_INFO describes
the gather load operation.  If the load is conditional, MASK is the
-   unvectorized condition and MASK_DT is its definition type, otherwise
-   MASK is null.  */
+   vectorized condition, otherwise MASK is null.  PTR is the base
+   pointer and OFFSET is the vectorized offset.  */
 
-static void
-vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info,
- gimple_stmt_iterator *gsi,
- gimple **vec_stmt,
- gather_scatter_info *gs_info,
- tree mask,
- stmt_vector_for_cost *cost_vec)
+static gimple *
+vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info,
+gimple_stmt_iterator *gsi,
+gather_scatter_info *gs_info,
+tree ptr, tree offset, tree mask)
 {
-  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  int ncopies = vect_get_num_copies (loop_vinfo, vectype);
-  edge pe = loop_preheader_edge (loop);
-  enum { NARROW, NONE, WIDEN } modifier;
-  poly_uint64 gather_off_nunits
-= TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
-
-  /* FIXME: Keep the previous costing way in vect_model_load_cost by costing
- N scalar loads, but it should be tweaked to use target specific costs
- on related gather load calls.  */
-  if (cost_vec)
-{
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  unsigned int inside_cost;
-  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
- scalar_load, stmt_info, 0, vect_body);
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"vect_model_load_cost: inside_cost = %d, "
-"prologue_cost = 0 .\n",
-inside_cost);
-  return;
-}
-
   tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
   tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
   tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
-  tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+  /* ptrtype */ arglist = TREE_CHAIN (arglist);
   tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
   tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
   tree scaletype = TREE_VALUE (arglist);
-  tree real_masktype = masktype;
+  tree var;
   gcc_checking_assert (types_compatible_p (srctype, rettype)
   && (!mask
   || TREE_CODE (masktype) == INTEGER_TYPE
   || types_compatible_p (srctype, masktype)));
-  if (mask)
-masktype = truth_type_for (srctype);
-
-  tree mask_halftype = masktype;
-  tree perm_mask = NULL_TREE;
-  tree mask_perm_mask = NULL_TREE;
-  if (known_eq (nunits, gather_off_nunits))
-modifier = NONE;
-  else if (known_eq (nunits * 2, gather_off_nunits))
-{
-  modifier = WIDEN;
 
-  /* Currently widening gathers and scatters are only supported for
-fixed-length vectors.  */
-  int count = gather_off_nunits.to_constant ();
-  vec_perm_builder sel (count, count, 1);
-  for (int i = 0; i < count; ++i)
-   sel.quick_push (i | (count / 2));
-
-  vec_perm_indices indices (sel, 1, count);
-  perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype,
-

[PATCH 2/2] tree-optimization/111131 - SLP for non-IFN gathers

2023-10-19 Thread Richard Biener
The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target).  That includes support for emulated gathers but also
the legacy x86 builtin path.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

Richard.

PR tree-optimization/31
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.

* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one.  Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.
---
 .../gcc.dg/vect/tsvc/vect-tsvc-s353.c |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-1.c |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 13 --
 gcc/testsuite/gcc.dg/vect/vect-gather-3.c |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-4.c |  6 +--
 gcc/tree-vect-loop.cc |  6 ++-
 gcc/tree-vect-slp.cc  | 45 +--
 7 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
index 98ba7522471..2c4fa3f5991 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
@@ -44,4 +44,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v 
} } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
index e3bbf5c0bf8..5f6640d9ab6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
@@ -58,4 +58,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
index a1f6ba458a9..4c23b808333 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
@@ -8,6 +8,7 @@ f1 (int *restrict y, int *restrict x1, int *restrict x2,
 {
   for (int i = 0; i < N; ++i)
 {
+  /* Different base.  */
   y[i * 2] = x1[indices[i * 2]] + 1;
   y[i * 2 + 1] = x2[indices[i * 2 + 1]] + 2;
 }
@@ -18,8 +19,9 @@ f2 (int *restrict y, int *restrict x, int *restrict indices)
 {
   for (int i = 0; i < N; ++i)
 {
-  y[i * 2] = x[indices[i * 2]] + 1;
-  y[i * 2 + 1] = x[indices[i * 2 + 1] * 2] + 2;
+  /* Different scale.  */
+  y[i * 2] = *(int *)((char *)x + (__UINTPTR_TYPE__)indices[i * 2] * 4) + 
1;
+  y[i * 2 + 1] = *(int *)((char *)x + (__UINTPTR_TYPE__)indices[i * 2 + 1] 
* 2) + 2;
 }
 }
 
@@ -28,9 +30,12 @@ f3 (int *restrict y, int *restrict x, int *restrict indices)
 {
   for (int i = 0; i < N; ++i)
 {
+  /* Different type.  */
   y[i * 2] = x[indices[i * 2]] + 1;
-  y[i * 2 + 1] = x[(unsigned int) indices[i * 2 + 1]] + 2;
+  y[i * 2 + 1] = x[((unsigned int *) indices)[i * 2 + 1]] + 2;
 }
 }
 
-/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect { 
target vect_gather_load_ifn } } } */
+/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */
+/* { dg-final { scan-tree-dump "different gather base" vect { target { ! 
vect_gather_load_ifn } } } } */
+/* { dg-final { scan-tree-dump "different gather scale" vect { target { ! 
vect_gather_load_ifn } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
index adfef3bf407..30ba6789e03 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
@@ -62,4 +62,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target { 
vect_gather_load_ifn && vect_masked_load } } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target { 
{ vect_gather_load_ifn || avx2 } && vect_masked_load

Re: gcc 13.2 is missing warnings?

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 07:39:43AM -0400, Eric Sokolowsky via Gcc wrote:
> I am using gcc 13.2 on Fedora 38. Consider the following program.
> 
> #include 
> int main(int argc, char **argv)
> {
> printf("Enter a number: ");
> int num = 0;
> scanf("%d", &num);
> 
> switch (num)
> {
> case 1:
> int a = num + 3;
> printf("The new number is %d.\n", a);
> break;
> case 2:
> int b = num - 4;
> printf("The new number is %d.\n", b);
> break;
> default:
> int c = num * 3;
> printf("The new number is %d.\n", c);
> break;
> }
> }
> 
> I would expect that gcc would complain about the declaration of
> variables (a, b, and c) within the case statements. When I run "gcc
> -Wall t.c" I get no warnings. When I run "g++ -Wall t.c" I get
> warnings and errors as expected. I do get warnings when using MinGW on
> Windows (gcc version 6.3 specifically). Did something change in 13.2?

C isn't C++.

In particular, the above is valid C23, which is why it is accepted as an
extension in older C language versions starting with GCC 11.
It is warned about with -pedantic/-Wpedantic and errored on with
-pedantic-errors/-Werror=pedantic unless -std=c2x or -std=gnu2x is used.

The C++ case is completely different.  There labels are allowed before
declarations already in C++98, but it is invalid to cross initialization
of some variable using the jump to case 2 or default labels above.
If you rewrite it as:
 case 1:
 int a;
 a = num + 3;
 printf("The new number is %d.\n", a);
 break;
 case 2:
 int b;
 b = num - 4;
 printf("The new number is %d.\n", b);
 break;
 default:
 int c;
 c = num * 3;
 printf("The new number is %d.\n", c);
 break;
it is valid C++ and it won't be diagnosed.

Note, this should have been posted to gcc-help instead.

Jakub



Re: [PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]

2023-10-19 Thread Richard Biener
On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> Rebased on top of trunk, minor change to check if loop_vinfo since we now do
> some slp vectorization for simd_clones.
> 
> I assume the previous OK still holds.

Ack.

> On 30/08/2023 13:54, Richard Biener wrote:
> > On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> > 
> >> When analyzing a loop and choosing a simdclone to use it is possible to
> >> choose
> >> a simdclone that cannot be used 'inbranch' for a loop that can use partial
> >> vectors.  This may lead to the vectorizer deciding to use partial vectors
> >> which are not supported for notinbranch simd clones. This patch fixes that
> >> by
> >> disabling the use of partial vectors once a notinbranch simd clone has been
> >> selected.
> > 
> > OK.
> > 
> >> gcc/ChangeLog:
> >>
> >>  PR tree-optimization/110485
> >>  * tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable partial
> >>  vectors usage if a notinbranch simdclone has been selected.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/gomp/pr110485.c: New test.
> >>
> > 


Re: [Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones

2023-10-19 Thread Richard Biener
On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> Made it a local function and changed prototype according to comments.
> 
> Is this OK?

OK.

>  gcc/ChangeLog:
>   * tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Special
>   case
>   simd clone calls and only use types that are mapped to vectors.
> (simd_clone_call_p): New helper function.
>   
> On 30/08/2023 13:54, Richard Biener wrote:
> > On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> > 
> >> The vect_get_smallest_scalar_type helper function was using any argument to
> >> a
> >> simd clone call when trying to determine the smallest scalar type that
> >> would
> >> be vectorized.  This included the function pointer type in a MASK_CALL for
> >> instance, and would result in the wrong type being selected.  Instead this
> >> patch special cases simd_clone_call's and uses only scalar types of the
> >> original function that get transformed into vector types.
> > 
> > Looks sensible.
> > 
> > +bool
> > +simd_clone_call_p (gimple *stmt, cgraph_node **out_node)
> > 
> > you could return the cgraph_node * or NULL here.  Are you going to
> > use the function elsewhere?  Otherwise put it in the same TU as
> > the only use please and avoid exporting it.
> > 
> > Richard.
> > 
> >> gcc/ChangeLog:
> >>
> >>  * tree-vect-data-refs.cci (vect_get_smallest_scalar_type): Special
> >>  case
> >>  simd clone calls and only use types that are mapped to vectors.
> >>  * tree-vect-stmts.cc (simd_clone_call_p): New helper function.
> >>  * tree-vectorizer.h (simd_clone_call_p): Declare new function.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/vect/vect-simd-clone-16f.c: Remove unnecessary differentation
> >>  between targets with different pointer sizes.
> >>  * gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
> >>  * gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
> >>
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] libcpp: testsuite: Add test for fixed _Pragma bug [PR82335]

2023-10-19 Thread Marek Polacek
On Wed, Oct 18, 2023 at 05:03:57PM -0400, Lewis Hyatt wrote:
> May I please ping this one, and/or, is it something straightforward
> enough I can just commit it as obvious? Thanks!
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631814.html

Please go ahead and apply the patch, thanks.

Sorry about the wait.
 
> -Lewis
> 
> On Mon, Oct 2, 2023 at 6:23 PM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82335 is another
> > _Pragma-related bug that got fixed in GCC 12 but is still open. Before
> > closing it out, I thought it would be good to add the testcase from that
> > PR, which we don't have exactly in the testsuite already. Is it OK please?
> > Thanks!
> >
> > -Lewis
> >
> > -- >8 --
> >
> > This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
> > that is not represented elsewhere.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR preprocessor/82335
> > * c-c++-common/cpp/diagnostic-pragma-3.c: New test.
> > ---
> >  .../c-c++-common/cpp/diagnostic-pragma-3.c| 37 +++
> >  1 file changed, 37 insertions(+)
> >  create mode 100644 gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
> >
> > diff --git a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c 
> > b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
> > new file mode 100644
> > index 000..459dcec73b3
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
> > @@ -0,0 +1,37 @@
> > +/* This is like diagnostic-pragma-2.c, but handles the case where 
> > everything
> > +   is wrapped inside a macro, which previously caused additional issues 
> > tracked
> > +   in PR preprocessor/82335.  */
> > +
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-save-temps -Wattributes -Wtype-limits" } */
> > +
> > +#define B _Pragma("GCC diagnostic push") \
> > +  _Pragma("GCC diagnostic ignored \"-Wattributes\"")
> > +#define E _Pragma("GCC diagnostic pop")
> > +
> > +#define X() B int __attribute((unknown_attr)) x; E
> > +#define Y   B int __attribute((unknown_attr)) y; E
> > +#define WRAP(x) x
> > +
> > +void test1(void)
> > +{
> > +  WRAP(X())
> > +  WRAP(Y)
> > +}
> > +
> > +/* Additional test provided on the PR.  */
> > +#define PRAGMA(...) _Pragma(#__VA_ARGS__)
> > +#define PUSH_IGN(X) PRAGMA(GCC diagnostic push) PRAGMA(GCC diagnostic 
> > ignored X)
> > +#define POP() PRAGMA(GCC diagnostic pop)
> > +#define TEST(X, Y) \
> > +  PUSH_IGN("-Wtype-limits") \
> > +  int Y = (__typeof(X))-1 < 0; \
> > +  POP()
> > +
> > +int test2()
> > +{
> > +  unsigned x;
> > +  TEST(x, i1);
> > +  WRAP(TEST(x, i2))
> > +  return i1 + i2;
> > +}
> 

Marek



Re: [PATCH] Fix PR ada/111813 (Inconsistent limit in Ada.Calendar.Formatting)

2023-10-19 Thread Arnaud Charlet
Hi Simon,

> Pierre-Marie, I’ve CC’d you hoping you’re an appropriate person to ping on 
> this one.
> If not, who would be for this sort of change?
> 
> I should have said, tested by
> - add test case, make -C gcc check-gnat: error reported
> - make -C gcc gnatlib_and_tools; make install
> - make -C gcc check-gnat: no error reported
> 
> FSF copyright assignment RT:1016382

Your message hasn't been forgotten. We're currently reviewing it at AdaCore,
this is taking some time.

I'll get back to you when I have some feedback.

Regards,

Arno


Re: [PATCH 5/8] vect: Use inbranch simdclones in masked loops

2023-10-19 Thread Richard Biener
On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> Rebased, needs review.

+  tree parm_type = NULL_TREE;
+  if(i < args.length())
+   {

space before (

+/* Return SSA name of the result of the conversion of OPERAND into type 
TYPE.
+   The conversion statement is inserted at GSI.  */
+ 
+static tree
+vect_convert (vec_info *vinfo, stmt_vec_info stmt_info, tree type, tree 
operand,
+ gimple_stmt_iterator *gsi) 
+{  
+  operand = build1 (VIEW_CONVERT_EXPR, type, operand);
+  gassign *new_stmt = gimple_build_assign (make_ssa_name (type),
+  operand);

I don't like this much, it's got one use in your patch only.  Please
leave this abstraction out.

OK with the above two changes.

Thanks,
Richard.

> On 30/08/2023 10:13, Andre Vieira (lists) via Gcc-patches wrote:
> > This patch enables the compiler to use inbranch simdclones when generating
> > masked loops in autovectorization.
> > 
> > gcc/ChangeLog:
> > 
> >  * omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
> >  compatible with mask parameters in clone.
> >  * tree-vect-stmts.cc (vect_convert): New helper function.
> >  (vect_build_all_ones_mask): Allow vector boolean typed masks.
> >  (vectorizable_simd_clone_call): Enable the use of masked clones in
> >  fully masked loops.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] gcc: Introduce -fhardened

2023-10-19 Thread Richard Biener
On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>
> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> > >  wrote:
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
> > > > powerpc64le-unknown-linux-gnu,
> > > > and aarch64-unknown-linux-gnu; ok for trunk?
> > > >
> > > > -- >8 --
> > > > In 
> > > > I proposed -fhardened, a new umbrella option that enables a reasonable 
> > > > set
> > > > of hardening flags.  The read of the room seems to be that the option
> > > > would be useful.  So here's a patch implementing that option.
> > > >
> > > > Currently, -fhardened enables:
> > > >
> > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> > > >   -D_GLIBCXX_ASSERTIONS
> > > >   -ftrivial-auto-var-init=pattern

I think =zero is much better here given the overhead is way
cheaper and pointers get a more reliable behavior.

> > > >   -fPIE  -pie  -Wl,-z,relro,-z,now
> > > >   -fstack-protector-strong
> > > >   -fstack-clash-protection
> > > >   -fcf-protection=full (x86 GNU/Linux only)
> > > >
> > > > -fhardened will not override options that were specified on the command 
> > > > line
> > > > (before or after -fhardened).  For example,
> > > >
> > > >  -D_FORTIFY_SOURCE=1 -fhardened
> > > >
> > > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> > > >
> > > >   -fhardened -fstack-protector
> > > >
> > > > will not enable -fstack-protector-strong.
> > > >
> > > > In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
> > > > to anything.  I think we need a better way to show what it actually
> > > > enables.
> > >
> > > I do think we need to find a solution here to solve asserting compliance.
> >
> > Fair enough.
> >
> > > Maybe we can have -Whardened that will diagnose any altering of
> > > -fhardened by other options on the command-line or by missed target
> > > implementations?  People might for example use -fstack-protector
> > > but don't really want to make protection lower than requested with 
> > > -fhardened.
> > >
> > > Any such conflict is much less appearant than when you use the
> > > flags -fhardened composes.
> >
> > How about: --help=hardened says which options -fhardened attempts to
> > enable, and -Whardened warns when it didn't enable an option?  E.g.,
> >
> >   -fstack-protector -fhardened -Whardened
> >
> > would say that it didn't enable -fstack-protector-strong because
> > -fstack-protector was specified on the command line?
> >
> > If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
> > list -z now, likewise for -z relro.
> >
> > Unclear if -Whardened should be enabled by default, but probably yes?
>
> Here's v2 which adds -Whardened (enabled by default).
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

I think it's OK but I'd like to see a second ACK here.  Can you see how our
primary and secondary targets (+ host OS) behave here?  I think the
documentation should elaborate a bit on expectations for non-Linux/GNU
targets, specifically I think the default configuration for a target should
with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
have a testcase for this?

Thanks,
Richard.

>
> -- >8 --
> In 
> I proposed -fhardened, a new umbrella option that enables a reasonable set
> of hardening flags.  The read of the room seems to be that the option
> would be useful.  So here's a patch implementing that option.
>
> Currently, -fhardened enables:
>
>   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>   -D_GLIBCXX_ASSERTIONS
>   -ftrivial-auto-var-init=pattern
>   -fPIE  -pie  -Wl,-z,relro,-z,now
>   -fstack-protector-strong
>   -fstack-clash-protection
>   -fcf-protection=full (x86 GNU/Linux only)
>
> -fhardened will not override options that were specified on the command line
> (before or after -fhardened).  For example,
>
>  -D_FORTIFY_SOURCE=1 -fhardened
>
> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
>
>   -fhardened -fstack-protector
>
> will not enable -fstack-protector-strong.
>
> In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
> to anything.  This patch provides -Whardened, enabled by default, which
> warns when -fhardened couldn't enable a particular option.  I think most
> often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
> were not enabled.
>
> gcc/c-family/ChangeLog:
>
> * c-opts.cc (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
> and _GLIBCXX_ASSERTIONS.
>
> gcc/ChangeLog:
>
> * common.opt (Whardened, fhardened): New options.
> * config.in: Regenerate.
> * config/bpf/bpf.cc: Include "opts.h".
> (bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
> 

[PATCH]middle-end: don't create LC-SSA PHI variables for PHI nodes who dominate loop

2023-10-19 Thread Tamar Christina
Hi All,

As the testcase shows, when a PHI node dominates the loop there is no new
definition inside the loop.  As such there would be no PHI nodes to update.

When we maintain LCSSA form we create an intermediate node in between the two
loops to thread alongt the value.  However later on when we update the second
loop we don't have any PHI nodes to update and so adjust_phi_and_debug_stmts
does nothing.   This leaves us with an incorrect phi node.  Normally this does
nothing and just gets ignored.  But in the case of the vUSE chain we end up
corrupting the chain.

As such whenever a PHI node's argument dominates the loop, we should remove
the newly created PHI node after edge redirection.

The one exception to this is when the loop has been versioned.  In such cases
the versioned loop may not use the value but the second loop can.

When this happens and we add the loop guard unless the join block has the PHI
it can't find the original value for use inside the guard block.

The next refactoring in the series moves the formation of the guard block
inside peeling itself.  Here we have all the information and wouldn't
need to re-create it later.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu
and no issues and issues in libgomp fixed.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/111860
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Remove PHI nodes that dominate loop.

gcc/testsuite/ChangeLog:

PR tree-optimization/111860
* gcc.dg/vect/pr111860.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/pr111860.c 
b/gcc/testsuite/gcc.dg/vect/pr111860.c
new file mode 100644
index 
..36f0774601040418bc6b7f27c9425b2bf93b18cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111860.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+int optimize_path_n, optimize_path_d;
+int *optimize_path_d_0;
+extern void path_threeOpt( long);
+void optimize_path() {
+  int i;
+  long length;
+  i = 0;
+  for (; i <= optimize_path_n; i++)
+optimize_path_d = 0;
+  i = 0;
+  for (; i < optimize_path_n; i++)
+length += optimize_path_d_0[i];
+  path_threeOpt(length);
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
1f7779b9834c3aef3c6a993fab916224fab03147..db1d4f867ead5c6079cda3ff0d0870234d11e39d
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1633,6 +1633,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
{
  tree new_arg = gimple_phi_arg (phi, 0)->def;
  new_phi_args.put (new_arg, gimple_phi_result (phi));
+
+ if (TREE_CODE (new_arg) != SSA_NAME)
+   continue;
+ /* If the PHI node dominates the loop then we shouldn't create
+ a new LC-SSSA PHI for it in the intermediate block.  Unless the
+ the loop has been versioned.  If it has then we need the PHI
+ node such that later when the loop guard is added the original
+ dominating PHI can be found.  */
+ basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (new_arg));
+ if (loop == scalar_loop
+ && (!def_bb || !flow_bb_inside_loop_p (loop, def_bb)))
+   {
+ auto gsi = gsi_for_stmt (phi);
+ remove_phi_node (&gsi, true);
+   }
}
 
   /* Copy the current loop LC PHI nodes between the original loop exit




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/pr111860.c 
b/gcc/testsuite/gcc.dg/vect/pr111860.c
new file mode 100644
index 
..36f0774601040418bc6b7f27c9425b2bf93b18cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111860.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+int optimize_path_n, optimize_path_d;
+int *optimize_path_d_0;
+extern void path_threeOpt( long);
+void optimize_path() {
+  int i;
+  long length;
+  i = 0;
+  for (; i <= optimize_path_n; i++)
+optimize_path_d = 0;
+  i = 0;
+  for (; i < optimize_path_n; i++)
+length += optimize_path_d_0[i];
+  path_threeOpt(length);
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
1f7779b9834c3aef3c6a993fab916224fab03147..db1d4f867ead5c6079cda3ff0d0870234d11e39d
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1633,6 +1633,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
{
  tree new_arg = gimple_phi_arg (phi, 0)->def;
  new_phi_args.put (new_arg, gimple_phi_result (phi));
+
+ if (TREE_CODE (new_arg) != SSA_NAME)
+   continue;
+ /* If the PHI node dominates the loop then we shouldn't create
+ a new LC-SSSA PHI for it in the intermediate block.  Unless the
+ the loop has been versioned.  If it has then we need the PHI
+ node such that later when the loop guard is added the original
+

Re: [PATCH]middle-end: don't create LC-SSA PHI variables for PHI nodes who dominate loop

2023-10-19 Thread Richard Biener
On Thu, 19 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> As the testcase shows, when a PHI node dominates the loop there is no new
> definition inside the loop.  As such there would be no PHI nodes to update.
> 
> When we maintain LCSSA form we create an intermediate node in between the two
> loops to thread alongt the value.  However later on when we update the second
> loop we don't have any PHI nodes to update and so adjust_phi_and_debug_stmts
> does nothing.   This leaves us with an incorrect phi node.  Normally this does
> nothing and just gets ignored.  But in the case of the vUSE chain we end up
> corrupting the chain.
> 
> As such whenever a PHI node's argument dominates the loop, we should remove
> the newly created PHI node after edge redirection.
> 
> The one exception to this is when the loop has been versioned.  In such cases
> the versioned loop may not use the value but the second loop can.
> 
> When this happens and we add the loop guard unless the join block has the PHI
> it can't find the original value for use inside the guard block.
> 
> The next refactoring in the series moves the formation of the guard block
> inside peeling itself.  Here we have all the information and wouldn't
> need to re-create it later.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu
> and no issues and issues in libgomp fixed.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/111860
>   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
>   Remove PHI nodes that dominate loop.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/111860
>   * gcc.dg/vect/pr111860.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111860.c 
> b/gcc/testsuite/gcc.dg/vect/pr111860.c
> new file mode 100644
> index 
> ..36f0774601040418bc6b7f27c9425b2bf93b18cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111860.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +
> +int optimize_path_n, optimize_path_d;
> +int *optimize_path_d_0;
> +extern void path_threeOpt( long);
> +void optimize_path() {
> +  int i;
> +  long length;
> +  i = 0;
> +  for (; i <= optimize_path_n; i++)
> +optimize_path_d = 0;
> +  i = 0;
> +  for (; i < optimize_path_n; i++)
> +length += optimize_path_d_0[i];
> +  path_threeOpt(length);
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 1f7779b9834c3aef3c6a993fab916224fab03147..db1d4f867ead5c6079cda3ff0d0870234d11e39d
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1633,6 +1633,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>   {
> tree new_arg = gimple_phi_arg (phi, 0)->def;
> new_phi_args.put (new_arg, gimple_phi_result (phi));
> +
> +   if (TREE_CODE (new_arg) != SSA_NAME)
> + continue;
> +   /* If the PHI node dominates the loop then we shouldn't create
> +   a new LC-SSSA PHI for it in the intermediate block.  Unless the
> +   the loop has been versioned.  If it has then we need the PHI
> +   node such that later when the loop guard is added the original
> +   dominating PHI can be found.  */
> +   basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (new_arg));
> +   if (loop == scalar_loop
> +   && (!def_bb || !flow_bb_inside_loop_p (loop, def_bb)))
> + {
> +   auto gsi = gsi_for_stmt (phi);
> +   remove_phi_node (&gsi, true);
> + }
>   }
>  
>/* Copy the current loop LC PHI nodes between the original loop exit
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] gcc: Introduce -fhardened

2023-10-19 Thread Sam James


Richard Biener  writes:

> On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
>>
>> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
>> > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
>> > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
>> > >  wrote:
>> > > >
>> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
>> > > > powerpc64le-unknown-linux-gnu,
>> > > > and aarch64-unknown-linux-gnu; ok for trunk?
>> > > >
>> > > > -- >8 --
>> > > > In 
>> > > > I proposed -fhardened, a new umbrella option that enables a reasonable 
>> > > > set
>> > > > of hardening flags.  The read of the room seems to be that the option
>> > > > would be useful.  So here's a patch implementing that option.
>> > > >
>> > > > Currently, -fhardened enables:
>> > > >
>> > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>> > > >   -D_GLIBCXX_ASSERTIONS
>> > > >   -ftrivial-auto-var-init=pattern
>
> I think =zero is much better here given the overhead is way
> cheaper and pointers get a more reliable behavior.

Yes please, as I wouldn't want us to use =pattern distro-wide.

>
>> > > >   -fPIE  -pie  -Wl,-z,relro,-z,now
>> > > >   -fstack-protector-strong
>> > > >   -fstack-clash-protection
>> > > >   -fcf-protection=full (x86 GNU/Linux only)
>> > > >
>> > > > -fhardened will not override options that were specified on the 
>> > > > command line
>> > > > (before or after -fhardened).  For example,
>> > > >
>> > > >  -D_FORTIFY_SOURCE=1 -fhardened
>> > > >
>> > > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
>> > > >
>> > > >   -fhardened -fstack-protector
>> > > >
>> > > > will not enable -fstack-protector-strong.
>> > > >
>> > > > In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
>> > > > to anything.  I think we need a better way to show what it actually
>> > > > enables.
>> > >
>> > > I do think we need to find a solution here to solve asserting compliance.
>> >
>> > Fair enough.
>> >
>> > > Maybe we can have -Whardened that will diagnose any altering of
>> > > -fhardened by other options on the command-line or by missed target
>> > > implementations?  People might for example use -fstack-protector
>> > > but don't really want to make protection lower than requested with 
>> > > -fhardened.
>> > >
>> > > Any such conflict is much less appearant than when you use the
>> > > flags -fhardened composes.
>> >
>> > How about: --help=hardened says which options -fhardened attempts to
>> > enable, and -Whardened warns when it didn't enable an option?  E.g.,
>> >
>> >   -fstack-protector -fhardened -Whardened
>> >
>> > would say that it didn't enable -fstack-protector-strong because
>> > -fstack-protector was specified on the command line?
>> >
>> > If !HAVE_LD_NOW_SUPPORT, --help=hardened probably doesn't even have to
>> > list -z now, likewise for -z relro.
>> >
>> > Unclear if -Whardened should be enabled by default, but probably yes?
>>
>> Here's v2 which adds -Whardened (enabled by default).
>>
>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>
> I think it's OK but I'd like to see a second ACK here.  Can you see how our
> primary and secondary targets (+ host OS) behave here?  I think the
> documentation should elaborate a bit on expectations for non-Linux/GNU
> targets, specifically I think the default configuration for a target should
> with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
> have a testcase for this?
>
> Thanks,
> Richard.
>
>>
>> -- >8 --
>> In 
>> I proposed -fhardened, a new umbrella option that enables a reasonable set
>> of hardening flags.  The read of the room seems to be that the option
>> would be useful.  So here's a patch implementing that option.
>>
>> Currently, -fhardened enables:
>>
>>   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>>   -D_GLIBCXX_ASSERTIONS
>>   -ftrivial-auto-var-init=pattern
>>   -fPIE  -pie  -Wl,-z,relro,-z,now
>>   -fstack-protector-strong
>>   -fstack-clash-protection
>>   -fcf-protection=full (x86 GNU/Linux only)
>>
>> -fhardened will not override options that were specified on the command line
>> (before or after -fhardened).  For example,
>>
>>  -D_FORTIFY_SOURCE=1 -fhardened
>>
>> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
>>
>>   -fhardened -fstack-protector
>>
>> will not enable -fstack-protector-strong.
>>
>> In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
>> to anything.  This patch provides -Whardened, enabled by default, which
>> warns when -fhardened couldn't enable a particular option.  I think most
>> often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
>> were not enabled.
>>
>> gcc/c-family/ChangeLog:
>>
>> * c-opts.cc (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
>> and _GLIBCXX_ASSERTIONS.
>>
>> gcc/ChangeLog:
>>
>>

Re: [PATCH] c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]

2023-10-19 Thread Marek Polacek
On Wed, Oct 18, 2023 at 05:15:42PM -0400, Lewis Hyatt wrote:
> Hello-
> 
> The PR points out that my fix for PR53431 was incomplete and did not handle
> -Wunknown-pragmas. This is a one-line fix to correct that, is it OK for
> trunk and for GCC 13 backport please? bootstrap + regtest all languages on
> x86-64 Linux. Thanks!

I think I can approve this, so, OK.  Thanks.
 
> -Lewis
> 
> -- >8 --
> 
> As noted on the PR, commit r13-1544, the fix for PR53431, did not handle
> the specific case of -Wunknown-pragmas, because that warning is issued
> during preprocessing, but not by libcpp directly (it comes from the
> cb_def_pragma callback).  Address that by handling this pragma in
> addition to libcpp pragmas during the early pragma handler.
> 
> gcc/c-family/ChangeLog:
> 
>   PR c++/89038
>   * c-pragma.cc (handle_pragma_diagnostic_impl):  Handle
>   -Wunknown-pragmas during early processing.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/89038
>   * c-c++-common/cpp/Wunknown-pragmas-1.c: New test.
> ---
>  gcc/c-family/c-pragma.cc|  3 ++-
>  gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c | 13 +
>  2 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c
> 
> diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
> index 293311dd4ce..98dfb0f108b 100644
> --- a/gcc/c-family/c-pragma.cc
> +++ b/gcc/c-family/c-pragma.cc
> @@ -963,7 +963,8 @@ handle_pragma_diagnostic_impl ()
>/* option_string + 1 to skip the initial '-' */
>unsigned int option_index = find_opt (data.option_str + 1, lang_mask);
>  
> -  if (early && !c_option_is_from_cpp_diagnostics (option_index))
> +  if (early && !(c_option_is_from_cpp_diagnostics (option_index)
> +  || option_index == OPT_Wunknown_pragmas))
>  return;
>  
>if (option_index == OPT_SPECIAL_unknown)
> diff --git a/gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c 
> b/gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c
> new file mode 100644
> index 000..fb58739e2bc
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/cpp/Wunknown-pragmas-1.c
> @@ -0,0 +1,13 @@
> +/* PR c++/89038 */
> +/* { dg-additional-options "-Wunknown-pragmas" } */
> +
> +#pragma oops /* { dg-warning "-:-Wunknown-pragmas" } */
> +#pragma GGC diagnostic push /* { dg-warning "-:-Wunknown-pragmas" } */
> +#pragma GCC diagnostics push /* { dg-warning "-:-Wunknown-pragmas" } */
> +
> +/* Test we can disable the warnings.  */
> +#pragma GCC diagnostic ignored "-Wunknown-pragmas"
> +
> +#pragma oops /* { dg-bogus "-:-Wunknown-pragmas" } */
> +#pragma GGC diagnostic push /* { dg-bogus "-:-Wunknown-pragmas" } */
> +#pragma GCC diagnostics push /* { dg-bogus "-:-Wunknown-pragmas" } */
> 

Marek



[PATCH] AArch64: Improve immediate generation

2023-10-19 Thread Wilco Dijkstra
Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Fix test.
* gcc.target/aarch64/moveor_imm.c: Add new test.
* gcc.target/aarch64/pr106583.c: Fix test.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
578a253d6e0e133e19592553fc873b3e73f9f218..ed5be2b64c9a767d74e9d78415da964c669001aa
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5748,6 +5748,26 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
}
  return 2;
}
+
+  /* Try 2 bitmask immediates which are xor'd together. */
+  for (i = 0; i < 64; i += 16)
+   {
+ val2 = (val >> i) & mask;
+ val2 |= val2 << 16;
+ val2 |= val2 << 32;
+ if (aarch64_bitmask_imm (val2) && aarch64_bitmask_imm (val ^ val2))
+   break;
+   }
+
+  if (i != 64)
+   {
+ if (generate)
+   {
+ emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
+ emit_insn (gen_xordi3 (dest, dest, GEN_INT (val ^ val2)));
+   }
+ return 2;
+   }
 }
 
   /* Try a bitmask plus 2 movk to generate the immediate in 3 instructions.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
index 
ebc44d6dbc7287d907603d77d7b54496de177c4b..2434ca380ca2cad3e1e4181deeaad680f518b866
 100644
--- a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -6,7 +6,7 @@
 int
 foo (long long x)
 {
-  return x <= 0x1998;
+  return x <= 0x9998;
 }
 
 int
diff --git a/gcc/testsuite/gcc.target/aarch64/moveor_imm.c 
b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
new file mode 100644
index 
..5f4997b50398fdda5924610959e0c54967ad0735
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
@@ -0,0 +1,31 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 --save-temps" } */
+
+long f1 (void)
+{
+  return 0x2aab;
+}
+
+long f2 (void)
+{
+  return 0x10f0f0f0f0f0f0f1;
+}
+
+long f3 (void)
+{
+  return 0xccd;
+}
+
+long f4 (void)
+{
+  return 0x1998;
+}
+
+long f5 (void)
+{
+  return 0x3f333f33;
+}
+
+/* { dg-final { scan-assembler-not {\tmovk\t} } } */
+/* { dg-final { scan-assembler-times {\tmov\t} 5 } } */
+/* { dg-final { scan-assembler-times {\teor\t} 5 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c 
b/gcc/testsuite/gcc.target/aarch64/pr106583.c
index 
0f931580817d78dc1cc58f03b251bd21bec71f59..79ada5160ce059d66eeaee407ca02488b2a1f114
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr106583.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr106583.c
@@ -3,7 +3,7 @@
 
 long f1 (void)
 {
-  return 0x7efefefefefefeff;
+  return 0x75fefefefefefeff;
 }
 
 long f2 (void)



[PATCH] AArch64: Cleanup memset expansion

2023-10-19 Thread Wilco Dijkstra
Cleanup memset implementation.  Similar to memcpy/memmove, use an offset and
bytes throughout.  Simplify the complex calculations when optimizing for size
by using a fixed limit.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_progress_pointer): Remove function.
(aarch64_set_one_block_and_progress_pointer): Simplify and clean up.
(aarch64_expand_setmem): Clean up implementation, use byte offsets,
simplify size calculation.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
e19e2d1de2e5b30eca672df05d9dcc1bc106ecc8..578a253d6e0e133e19592553fc873b3e73f9f218
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25229,15 +25229,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amount)
next, amount);
 }
 
-/* Return a new RTX holding the result of moving POINTER forward by the
-   size of the mode it points to.  */
-
-static rtx
-aarch64_progress_pointer (rtx pointer)
-{
-  return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
-}
-
 /* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
 
 static void
@@ -25393,46 +25384,22 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove)
   return true;
 }
 
-/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
-   SRC is a register we have created with the duplicated value to be set.  */
+/* Set one block of size MODE at DST at offset OFFSET to value in SRC.  */
 static void
-aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
-   machine_mode mode)
-{
-  /* If we are copying 128bits or 256bits, we can do that straight from
- the SIMD register we prepared.  */
-  if (known_eq (GET_MODE_BITSIZE (mode), 256))
-{
-  mode = GET_MODE (src);
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memset.  */
-  emit_insn (aarch64_gen_store_pair (mode, *dst, src,
-aarch64_progress_pointer (*dst), src));
-
-  /* Move the pointers forward.  */
-  *dst = aarch64_move_pointer (*dst, 32);
-  return;
-}
-  if (known_eq (GET_MODE_BITSIZE (mode), 128))
+aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode)
+{
+  /* Emit explict store pair instructions for 32-byte writes.  */
+  if (known_eq (GET_MODE_SIZE (mode), 32))
 {
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, GET_MODE (src), 0);
-  /* Emit the memset.  */
-  emit_move_insn (*dst, src);
-  /* Move the pointers forward.  */
-  *dst = aarch64_move_pointer (*dst, 16);
+  mode = V16QImode;
+  rtx dst1 = adjust_address (dst, mode, offset);
+  rtx dst2 = adjust_address (dst, mode, offset + 16);
+  emit_insn (aarch64_gen_store_pair (mode, dst1, src, dst2, src));
   return;
 }
-  /* For copying less, we have to extract the right amount from src.  */
-  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
-
-  /* "Cast" the *dst to the correct mode.  */
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memset.  */
-  emit_move_insn (*dst, reg);
-  /* Move the pointer forward.  */
-  *dst = aarch64_progress_pointer (*dst);
+  if (known_lt (GET_MODE_SIZE (mode), 16))
+src = lowpart_subreg (mode, src, GET_MODE (src));
+  emit_move_insn (adjust_address (dst, mode, offset), src);
 }
 
 /* Expand a setmem using the MOPS instructions.  OPERANDS are the same
@@ -25461,7 +25428,7 @@ aarch64_expand_setmem_mops (rtx *operands)
 bool
 aarch64_expand_setmem (rtx *operands)
 {
-  int n, mode_bits;
+  int mode_bytes;
   unsigned HOST_WIDE_INT len;
   rtx dst = operands[0];
   rtx val = operands[2], src;
@@ -25474,104 +25441,70 @@ aarch64_expand_setmem (rtx *operands)
   || (STRICT_ALIGNMENT && align < 16))
 return aarch64_expand_setmem_mops (operands);
 
-  bool size_p = optimize_function_for_size_p (cfun);
-
   /* Default the maximum to 256-bytes when considering only libcall vs
  SIMD broadcast sequence.  */
   unsigned max_set_size = 256;
   unsigned mops_threshold = aarch64_mops_memset_size_threshold;
 
+  /* Reduce the maximum size with -Os.  */
+  if (optimize_function_for_size_p (cfun))
+max_set_size = 96;
+
   len = UINTVAL (operands[1]);
 
   /* Large memset uses MOPS when available or a library call.  */
   if (len > max_set_size || (TARGET_MOPS && len > mops_threshold))
 return aarch64_expand_setmem_mops (operands);
 
-  int cst_val = !!(CONST_INT_P (val) && (INTVAL (val) != 0));
-  /* The MOPS sequence takes:
- 3 instructions for the memory storing
- + 1 to move the constant size into a reg
- + 1 if VAL is a non-zero constant to move into a reg
-(zero constants can use XZR directly).  */
-  unsigned mops_cost = 3 + 1 + cst_val;
-  /* A libcall to memset in the worst ca

Re: [PATCH] c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]

2023-10-19 Thread Lewis Hyatt
On Thu, Oct 19, 2023 at 8:43 AM Marek Polacek  wrote:
>
> On Wed, Oct 18, 2023 at 05:15:42PM -0400, Lewis Hyatt wrote:
> > Hello-
> >
> > The PR points out that my fix for PR53431 was incomplete and did not handle
> > -Wunknown-pragmas. This is a one-line fix to correct that, is it OK for
> > trunk and for GCC 13 backport please? bootstrap + regtest all languages on
> > x86-64 Linux. Thanks!
>
> I think I can approve this, so, OK.  Thanks.
>

Great, thank you very much. Just to be safe, was that OK for the
backport as well?

-Lewis


Re: [PATCH] c++: Make -Wunknown-pragmas controllable by #pragma GCC diagnostic [PR89038]

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 09:07:36AM -0400, Lewis Hyatt wrote:
> On Thu, Oct 19, 2023 at 8:43 AM Marek Polacek  wrote:
> >
> > On Wed, Oct 18, 2023 at 05:15:42PM -0400, Lewis Hyatt wrote:
> > > Hello-
> > >
> > > The PR points out that my fix for PR53431 was incomplete and did not 
> > > handle
> > > -Wunknown-pragmas. This is a one-line fix to correct that, is it OK for
> > > trunk and for GCC 13 backport please? bootstrap + regtest all languages on
> > > x86-64 Linux. Thanks!
> >
> > I think I can approve this, so, OK.  Thanks.
> >
> 
> Great, thank you very much. Just to be safe, was that OK for the
> backport as well?

It's a safe bugfix, it doesn't enable an extra warning, so yes, I don't
see why not.  Thanks,

Marek



Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Patrick Palka
On Tue, 17 Oct 2023, Marek Polacek wrote:

> On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:
> > On 10/16/23 20:39, Marek Polacek wrote:
> > > On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:
> > > > On 10/13/23 14:53, Marek Polacek wrote:
> > > > > On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:
> > > > > > On 10/12/23 17:04, Marek Polacek wrote:
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > My recent patch introducing cp_fold_immediate_r caused exponential
> > > > > > > compile time with nested COND_EXPRs.  The problem is that the 
> > > > > > > COND_EXPR
> > > > > > > case recursively walks the arms of a COND_EXPR, but after 
> > > > > > > processing
> > > > > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > > > > sub-expressions of the outermost COND_EXPR, triggering again 
> > > > > > > walking
> > > > > > > the arms of the nested COND_EXPR, and so on.  This patch brings 
> > > > > > > the
> > > > > > > compile time down to about 0m0.033s.
> > > > > > > 
> > > > > > > I've added some debug prints to make sure that the rest of 
> > > > > > > cp_fold_r
> > > > > > > is still performed as before.
> > > > > > > 
> > > > > > >PR c++/111660
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > >* cp-gimplify.cc (cp_fold_immediate_r)  > > > > > > COND_EXPR>: Return
> > > > > > >integer_zero_node instead of break;.
> > > > > > >(cp_fold_immediate): Return true if 
> > > > > > > cp_fold_immediate_r returned
> > > > > > >error_mark_node.
> > > > > > > 
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > 
> > > > > > >* g++.dg/cpp0x/hog1.C: New test.
> > > > > > > ---
> > > > > > > gcc/cp/cp-gimplify.cc |  9 ++--
> > > > > > > gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 
> > > > > > > +++
> > > > > > > 2 files changed, 82 insertions(+), 4 deletions(-)
> > > > > > > create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C
> > > > > > > 
> > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > index bdf6e5f98ff..ca622ca169a 100644
> > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > @@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int 
> > > > > > > *walk_subtrees, void *data_)
> > > > > > >   break;
> > > > > > >   if (TREE_OPERAND (stmt, 1)
> > > > > > > && cp_walk_tree (&TREE_OPERAND (stmt, 1), 
> > > > > > > cp_fold_immediate_r, data,
> > > > > > > -nullptr))
> > > > > > > +nullptr) == error_mark_node)
> > > > > > >   return error_mark_node;
> > > > > > >   if (TREE_OPERAND (stmt, 2)
> > > > > > > && cp_walk_tree (&TREE_OPERAND (stmt, 2), 
> > > > > > > cp_fold_immediate_r, data,
> > > > > > > -nullptr))
> > > > > > > +nullptr) == error_mark_node)
> > > > > > >   return error_mark_node;
> > > > > > >   /* We're done here.  Don't clear *walk_subtrees here 
> > > > > > > though: we're called
> > > > > > >from cp_fold_r and we must let it recurse on the 
> > > > > > > expression with
> > > > > > >cp_fold.  */
> > > > > > > -  break;
> > > > > > > +  return integer_zero_node;
> > > > > > 
> > > > > > I'm concerned this will end up missing something like
> > > > > > 
> > > > > > 1 ? 1 : ((1 ? 1 : 1), immediate())
> > > > > > 
> > > > > > as the integer_zero_node from the inner ?: will prevent walk_tree 
> > > > > > from
> > > > > > looking any farther.
> > > > > 
> > > > > You are right.  The line above works as expected, but
> > > > > 
> > > > > 1 ? 1 : ((1 ? 1 : id (42)), id (i));
> > > > > 
> > > > > shows the problem (when the expression isn't used as an initializer).
> > > > > 
> > > > > > Maybe we want to handle COND_EXPR in cp_fold_r instead of here?
> > > > > 
> > > > > I hope this version is better.
> > > > > 
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > My recent patch introducing cp_fold_immediate_r caused exponential
> > > > > compile time with nested COND_EXPRs.  The problem is that the 
> > > > > COND_EXPR
> > > > > case recursively walks the arms of a COND_EXPR, but after processing
> > > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > > sub-expressions of the outermost COND_EXPR, triggering again walking
> > > > > the arms of the nested COND_EXPR, and so on.  This patch brings the
> > > > > compile time down to about 0m0.033s.
> > > > 
> > > > Is this number still accurate for this version?
> > > 
> > > It is.  I ran time(1) a few more times and the results were 0m0.033s - 
> > > 0m0.035s.
> > > That said, ...
> > > 
> > > > This change seems algorithmically bett

[PATCH 0/3] [GCC] arm: vld1_types_xN ACLE intrinsics

2023-10-19 Thread Ezra.Sitorus
Add xN variants of vld1_types intrinsic for AArch32.




[PATCH 3/3] [GCC] arm: vld1_types_x4 ACLE intrinsics

2023-10-19 Thread Ezra.Sitorus
From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1 intrinsic for AArch32.
This patch adds the _x4 variants of the vld1 intrinsic. The previous vld1_x4 
has been updated to vld1q_x4 to take into account that it works with 
4-word-length types. vld1_x4 is now only for 2-word-length types.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_u64_x4): New
(vld1_s8_x4, vld1_s16_x4, vld1_s32_x4, vld1_s64_x4): New.
(vld1_f16_x4, vld1_f32_x4): New.
(vld1_p8_x4, vld1_p16_x4, vld1_p64_x4): New.
(vld1_bf16_x4): New.
(vld1q_types_x4): Updated to use vld1q_x4 from arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x4): Updated entries.
(vld1q_x4): New entries, but comes from the old vld1_x4
* config/arm/neon.md (neon_vld1q_x4): Updated from 
neon_vld1_x4.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 156 --
 gcc/config/arm/arm_neon_builtins.def  |   3 +-
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  63 ++-
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   7 +-
 7 files changed, 231 insertions(+), 22 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 31f5be8322d..c797787f468 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10325,6 +10325,15 @@ vld1_p64_x3 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x1x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_p64_x4 (const poly64_t * __a)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10426,6 +10435,42 @@ vld1_s64_x3 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x8x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x4 (const int8_t * __a)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s16_x4 (const int16_t * __a)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s32_x4 (const int32_t * __a)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x1x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s64_x4 (const int64_t * __a)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10482,6 +10527,26 @@ vld1_f32_x3 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f16_x4 (const float16_t * __a)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f32_x4 (const float32_t * __a)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_u8 (const uint8_t * __a)
@@ -10582,6 +106

[PATCH 1/3] [GCC] arm: vld1_types_x2 ACLE intrinsics

2023-10-19 Thread Ezra.Sitorus
From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1 intrinsic for AArch32.
This patch adds the _x2 variants of the vld1 intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.
The previous vld1_x2 has been updated to vld1q_x2 to take into account that it 
works with 4-word-length types. vld1_x2 is
now only for 2-word-length types.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x2, vld1_u16_x2, vld1_u32_x2, vld1_u64_x2): New
(vld1_s8_x2, vld1_s16_x2, vld1_s32_x2, vld1_s64_x2): New.
(vld1_f16_x2, vld1_f32_x2): New.
(vld1_p8_x2, vld1_p16_x2, vld1_p64_x2): New.
(vld1_bf16_x2): New.
(vld1q_types_x2): Updated to use vld1q_x2 from arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x2): Updated entries.
(vld1q_x2): New entries, but comes from the old vld1_x2
* config/arm/neon.md (neon_vld1_x2): Updated from 
neon_vld1_x2.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 156 --
 gcc/config/arm/arm_neon_builtins.def  |   3 +-
 gcc/config/arm/neon.md|  10 +-
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  66 
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |  13 ++
 7 files changed, 254 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index df3e23b6e95..7650c066e20 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10307,6 +10307,15 @@ vld1_p64 (const poly64_t * __a)
   return (poly64x1_t) { *__a };
 }
 
+__extension__ extern __inline poly64x1x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_p64_x2 (const poly64_t * __a)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10336,6 +10345,42 @@ vld1_s64 (const int64_t * __a)
   return (int64x1_t) { *__a };
 }
 
+__extension__ extern __inline int8x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x2 (const int8_t * __a)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s16_x2 (const int16_t * __a)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s32_x2 (const int32_t * __a)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x1x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s64_x2 (const int64_t * __a)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10352,6 +10397,26 @@ vld1_f32 (const float32_t * __a)
   return (float32x2_t)__builtin_neon_vld1v2sf ((const __builtin_neon_sf *) 
__a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f16_x2 (const float16_t * __a)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x2x2_t

[PATCH 2/3] [GCC] arm: vld1_types_x3 ACLE intrinsics

2023-10-19 Thread Ezra.Sitorus
From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1 intrinsic for AArch32.
This patch adds the _x3 variants of the vld1 intrinsic. The previous vld1_x3 
has been updated to vld1q_x3 to take into account that it works with 
4-word-length types. vld1_x3 is now only for 2-word-length types.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x3, vld1_u16_x3, vld1_u32_x3, vld1_u64_x3): New
(vld1_s8_x3, vld1_s16_x3, vld1_s32_x3, vld1_s64_x3): New.
(vld1_f16_x3, vld1_f32_x3): New.
(vld1_p8_x3, vld1_p16_x3, vld1_p64_x3): New.
(vld1_bf16_x3): New.
(vld1q_types_x3): Updated to use vld1q_x3 from arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x3): Updated entries.
(vld1q_x3): New entries, but comes from the old vld1_x3
* config/arm/neon.md (neon_vld1q_x3): Updated from 
neon_vld1_x3.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 156 --
 gcc/config/arm/arm_neon_builtins.def  |   3 +-
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  63 ++-
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   7 +-
 7 files changed, 231 insertions(+), 22 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 7650c066e20..31f5be8322d 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10316,6 +10316,15 @@ vld1_p64_x2 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x1x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_p64_x3 (const poly64_t * __a)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10381,6 +10390,42 @@ vld1_s64_x2 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x8x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x3 (const int8_t * __a)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s16_x3 (const int16_t * __a)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s32_x3 (const int32_t * __a)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x1x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s64_x3 (const int64_t * __a)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10417,6 +10462,26 @@ vld1_f32_x2 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f16_x3 (const float16_t * __a)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f32_x3 (const float32_t * __a)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_u8 (const uint8_t * __a)
@@ -10481,6 +105

[RFC] Add function attribute: null_terminated_string_arg(PARAM_IDX)

2023-10-19 Thread David Malcolm
This patch adds a new function attribute to GCC for marking that an
argument is expected to be a null-terminated string.

For example, consider:

  void test_a (const char *p)
__attribute__((null_terminated_string_arg (1)));

which would indicate to humans and compilers that argument 1 of "test_a"
is expected to be a null-terminated string, with the idea:

- we should complain if it's not valid to read from *p up to the first
  '\0' character in the buffer

- we should complain if *p is not terminated, or if it's uninitialized
  before the first '\0' character

This is independent of the nonnull-ness of the pointer: if you also want
to express that the argument must be non-null, we already have
__attribute__((nonnull (N))), so the user can write e.g.:

  void test_b (const char *p)
__attribute__((null_terminated_string_arg (1))
__attribute__((nonnull (1)));

which can also be spelled as:

  void test_b (const char *p)
 __attribute__((null_terminated_string_arg (1),
nonnull (1)));

For a function similar to strncpy, we can use the "access" attribute to
express a maximum size of the read:

  void test_c (const char *p, size_t sz)
 __attribute__((null_terminated_string_arg (1),
nonnull (1),
access (read_only, 1, 2)));

The patch implements:
(a) C/C++ frontends: recognition of this attribute
(b) analyzer: usage of this attribute

The name is rather long; a shorter name might be "c_string_arg".

Does anything like this already exist in GCC, or in any other
compilers or analysis tools?

Thoughts?

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

gcc/analyzer/ChangeLog:
* region-model.cc
(region_model::check_external_function_for_access_attr): Split
out, replacing with...
(region_model::check_function_attr_access): ...this new function
and...
(region_model::check_function_attrs): ...this new function.
(region_model::check_one_function_attr_null_terminated_string_arg):
New.
(region_model::check_function_attr_null_terminated_string_arg):
New.
(region_model::handle_unrecognized_call): Update for renaming of
check_external_function_for_access_attr to check_function_attrs.
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload.  Make both overloads const.
* region-model.h: Include "stringpool.h" and "attribs.h".
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload.  Make both overloads const.
(region_model::check_external_function_for_access_attr): Delete
decl.
(region_model::check_function_attr_access): New decl.
(region_model::check_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_one_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_function_attrs): New decl.

gcc/c-family/ChangeLog:
* c-attribs.cc (c_common_attribute_table): Add
"null_terminated_string_arg".
(handle_null_terminated_string_arg_attribute): New.

gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Add
null_terminated_string_arg.

gcc/testsuite/ChangeLog:
* 
c-c++-common/analyzer/attr-null_terminated_string_arg-access-read_write.c:
New test.
* 
c-c++-common/analyzer/attr-null_terminated_string_arg-access-without-size.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-multiple.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-2.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull.c:
New test.
* 
c-c++-common/analyzer/attr-null_terminated_string_arg-nullable-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable.c:
New test.
* c-c++-common/attr-null_terminated_string_arg.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  | 180 +++---
 gcc/analyzer/region-model.h   |  27 ++-
 gcc/c-family/c-attribs.cc |  17 ++
 gcc/doc/extend.texi   |  57 ++
 ..._terminated_string_arg-access-read_write.c |  15 ++
 ...erminated_string_arg-access-without-size.c |  54 ++
 ...attr-null_terminated_string_arg-multiple.c |  52 +
 ...ttr-null_terminated_string_arg-nonnull-2.c |  33 
 ...null_terminated_string_arg-nonnull-sized.c |  69 +++
 .../attr-null_terminated_string_arg-nonnull.c |  34 
 ...ull_terminated_string_arg-nullable-sized.c |  69 +++
 ...attr-null_terminated_string_arg-nullable.c |  34 
 .../attr-null_terminated_string_arg.c |  16 ++
 13 files chang

[PATCH 1/5] LoongArch: Add enum-style -mexplicit-relocs= option

2023-10-19 Thread Xi Ruoyao
To take a better balance between scheduling and relaxation when -flto is
enabled, add three-way -mexplicit-relocs={auto,none,always} options.
The old -mexplicit-relocs and -mno-explicit-relocs options are still
supported, they are mapped to -mexplicit-relocs=always and
-mexplicit-relocs=none.

The default choice is determined by probing assembler capabilities at
build time.  If the assembler does not supports explicit relocs at all,
the default will be none; if it supports explicit relocs but not
relaxation, the default will be always; if both explicit relocs and
relaxation are supported, the default will be auto.

Currently auto is same as none.  We will make auto more clever in
following changes.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add strings for
-mexplicit-relocs={auto,none,always}.
* config/loongarch/genopts/loongarch.opt.in: Add options for
-mexplicit-relocs={auto,none,always}.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-def.h
(EXPLICIT_RELOCS_AUTO): Define.
(EXPLICIT_RELOCS_NONE): Define.
(EXPLICIT_RELOCS_ALWAYS): Define.
(N_EXPLICIT_RELOCS_TYPES): Define.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Error out if the old-style
-m[no-]explicit-relocs option is used with
-mexplicit-relocs={auto,none,always} together.  Map
-mno-explicit-relocs to -mexplicit-relocs=none and
-mexplicit-relocs to -mexplicit-relocs=always for backward
compatibility.  Set a proper default for -mexplicit-relocs=
based on configure-time probed linker capability.  Update a
diagnostic message to mention -mexplicit-relocs=always instead
of the old-style -mexplicit-relocs.
(loongarch_handle_model_attribute): Update a diagnostic message
to mention -mexplicit-relocs=always instead of the old-style
-mexplicit-relocs.
* config/loongarch/loongarch.h (TARGET_EXPLICIT_RELOCS): Define.
---
 .../loongarch/genopts/loongarch-strings   |  6 +
 gcc/config/loongarch/genopts/loongarch.opt.in | 21 ++--
 gcc/config/loongarch/loongarch-def.h  |  6 +
 gcc/config/loongarch/loongarch-str.h  |  5 
 gcc/config/loongarch/loongarch.cc | 24 +--
 gcc/config/loongarch/loongarch.h  |  3 +++
 gcc/config/loongarch/loongarch.opt| 21 ++--
 7 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index adecaec3eda..8e412f7536e 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -63,3 +63,9 @@ STR_CMODEL_TS   tiny-static
 STR_CMODEL_MEDIUM medium
 STR_CMODEL_LARGE  large
 STR_CMODEL_EXTREMEextreme
+
+# -mexplicit-relocs
+OPTSTR_EXPLICIT_RELOCS explicit-relocs
+STR_EXPLICIT_RELOCS_AUTO   auto
+STR_EXPLICIT_RELOCS_NONE   none
+STR_EXPLICIT_RELOCS_ALWAYS always
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4a2d7438f1b..e1fe0c7086e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -170,10 +170,27 @@ mmax-inline-memcpy-size=
 Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024)
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
-mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Enum
+Name(explicit_relocs) Type(int)
+The code model option names for -mexplicit-relocs:
+
+EnumValue
+Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_AUTO@@) 
Value(EXPLICIT_RELOCS_AUTO)
+
+EnumValue
+Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_NONE@@) 
Value(EXPLICIT_RELOCS_NONE)
+
+EnumValue
+Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_ALWAYS@@) 
Value(EXPLICIT_RELOCS_ALWAYS)
+
+mexplicit-relocs=
+Target RejectNegative Joined Enum(explicit_relocs) Var(la_opt_explicit_relocs) 
Init(M_OPT_UNSET)
 Use %reloc() assembly operators.
 
+mexplicit-relocs
+Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
+Use %reloc() assembly operators (for backward compatibility).
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 769efcb70fb..6e2a6987910 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -99,6 +99,12 @@ extern const char* loongarch_cmodel_strings[];
 #define CMODEL_EXTREME   5
 #define N_CMODEL_TYPES   6
 
+/* enum explicit_relocs */
+#define EXPLICIT_RELOCS_AUTO   0
+#define EXPLICIT_RELOCS_NONE   1
+#de

[PATCH 3/5] LoongArch: Use explicit relocs for TLS access with -mexplicit-relocs=auto

2023-10-19 Thread Xi Ruoyao
The linker does not know how to relax TLS access for LoongArch, so let's
emit machine instructions with explicit relocs for TLS.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Return true for TLS symbol types if -mexplicit-relocs=auto.
(loongarch_call_tls_get_addr): Replace TARGET_EXPLICIT_RELOCS
with la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@tls_low): Remove
TARGET_EXPLICIT_RELOCS from insn condition.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: New
test.
* gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c: New
test.
---
 gcc/config/loongarch/loongarch.cc | 37 ---
 gcc/config/loongarch/loongarch.md |  2 +-
 .../explicit-relocs-auto-tls-ld-gd.c  |  9 +
 .../explicit-relocs-auto-tls-le-ie.c  |  6 +++
 4 files changed, 40 insertions(+), 14 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index c12d77ea144..c782f571abc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1936,16 +1936,27 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
   if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO)
 return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS;
 
-  /* If we are performing LTO for a final link, and we have the linker
- plugin so we know the resolution of the symbols, then all GOT
- references are binding to external symbols or preemptable symbols.
- So the linker cannot relax them.  */
-  return (in_lto_p
- && !flag_incremental_link
- && HAVE_LTO_PLUGIN == 2
- && (!global_options_set.x_flag_use_linker_plugin
- || global_options.x_flag_use_linker_plugin)
- && type == SYMBOL_GOT_DISP);
+  switch (type)
+{
+  case SYMBOL_TLS_IE:
+  case SYMBOL_TLS_LE:
+  case SYMBOL_TLSGD:
+  case SYMBOL_TLSLDM:
+   /* The linker don't know how to relax TLS accesses.  */
+   return true;
+  case SYMBOL_GOT_DISP:
+   /* If we are performing LTO for a final link, and we have the
+  linker plugin so we know the resolution of the symbols, then
+  all GOT references are binding to external symbols or
+  preemptable symbols.  So the linker cannot relax them.  */
+   return (in_lto_p
+   && !flag_incremental_link
+   && HAVE_LTO_PLUGIN == 2
+   && (!global_options_set.x_flag_use_linker_plugin
+   || global_options.x_flag_use_linker_plugin));
+  default:
+   return false;
+}
 }
 
 /* Returns the number of instructions necessary to reference a symbol.  */
@@ -2753,7 +2764,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
   start_sequence ();
 
-  if (TARGET_EXPLICIT_RELOCS)
+  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
 {
   /* Split tls symbol to high and low.  */
   rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
@@ -2918,7 +2929,7 @@ loongarch_legitimize_tls_address (rtx loc)
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
  tmp1 = gen_reg_rtx (Pmode);
  dest = gen_reg_rtx (Pmode);
- if (TARGET_EXPLICIT_RELOCS)
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
  tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_IE);
  tmp3 = gen_reg_rtx (Pmode);
@@ -2955,7 +2966,7 @@ loongarch_legitimize_tls_address (rtx loc)
  tmp1 = gen_reg_rtx (Pmode);
  dest = gen_reg_rtx (Pmode);
 
- if (TARGET_EXPLICIT_RELOCS)
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
  tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_LE);
  tmp3 = gen_reg_rtx (Pmode);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index bec73f1bc91..695c8eb9a6f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2257,7 +2257,7 @@ (define_insn "@tls_low"
(unspec:P [(mem:P (lo_sum:P (match_operand:P 1 "register_operand" "r")
(match_operand:P 2 "symbolic_operand" "")))]
UNSPEC_TLS_LOW))]
-  "TARGET_EXPLICIT_RELOCS"
+  ""
   "addi.\t%0,%1,%L2"
   [(set_attr "type" "arith")
(set_attr "mode" "")])
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
new file mode 100644
index 000..957ff98df62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd

[PATCH 5/5] LoongArch: Document -mexplicit-relocs={auto,none,always}

2023-10-19 Thread Xi Ruoyao
gcc/ChangeLog:

* doc/invoke.texi (-mexplicit-relocs=style): Document.
(-mexplicit-relocs): Document as an alias of
-mexplicit-relocs=always.
(-mno-explicit-relocs): Document as an alias of
-mexplicit-relocs=none.
(-mcmodel=extreme): Mention -mexplicit-relocs=always instead of
-mexplicit-relocs.
---
 gcc/doc/invoke.texi | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16c45843123..f4633715e2b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1038,7 +1038,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcond-move-float  -mno-cond-move-float
 -memcpy  -mno-memcpy -mstrict-align -mno-strict-align
 -mmax-inline-memcpy-size=@var{n}
--mexplicit-relocs -mno-explicit-relocs
+-mexplicit-relocs=@var{style} -mexplicit-relocs -mno-explicit-relocs
 -mdirect-extern-access -mno-direct-extern-access
 -mcmodel=@var{code-model}}
 
@@ -26194,26 +26194,39 @@ The text segment and data segment must be within 2GB 
addressing space.
 
 @item extreme
 This mode does not limit the size of the code segment and data segment.
-The @option{-mcmodel=extreme} option is incompatible with @option{-fplt} and
-@option{-mno-explicit-relocs}.
+The @option{-mcmodel=extreme} option is incompatible with @option{-fplt},
+and it requires @option{-mexplicit-relocs=always}.
 @end table
 The default code model is @code{normal}.
 
-@opindex mexplicit-relocs
-@opindex mno-explicit-relocs
-@item -mexplicit-relocs
-@itemx -mno-explicit-relocs
-Use or do not use assembler relocation operators when dealing with symbolic
+@item -mexplicit-relocs=@var{style}
+Set when to use assembler relocation operators when dealing with symbolic
 addresses.  The alternative is to use assembler macros instead, which may
-limit instruction scheduling but allow linker relaxation.  The default
+limit instruction scheduling but allow linker relaxation.
+with @option{-mexplicit-relocs=none} the assembler macros are always used,
+with @option{-mexplicit-relocs=always} the assembler relocation operators
+are always used, with @option{-mexplicit-relocs=auto} the compiler will
+use the relocation operators where the linker relaxation is impossible to
+improve the code quality, and macros elsewhere.  The default
 value for the option is determined during GCC build-time by detecting
 corresponding assembler support:
-@code{-mno-explicit-relocs} if the assembler supports relaxation or it
-does not support relocation operators at all,
-@code{-mexplicit-relocs} otherwise.  This option is mostly useful for
+@option{-mexplicit-relocs=none} if the assembler does not support
+relocation operators at all,
+@option{-mexplicit-relocs=always} if the assembler supports relocation
+operators but does not support relaxation,
+@option{-mexplicit-relocs=auto} if the assembler supports both relocation
+operators and relaxation.  This option is mostly useful for
 debugging, or interoperation with assemblers different from the build-time
 one.
 
+@opindex mexplicit-relocs
+@item -mexplicit-relocs
+An alias of @option{-mexplicit-relocs=always} for backward compatibility.
+
+@opindex mno-explicit-relocs
+@item -mno-explicit-relocs
+An alias of @option{-mexplicit-relocs=none} for backward compatibility.
+
 @opindex mdirect-extern-access
 @item -mdirect-extern-access
 @itemx -mno-direct-extern-access
-- 
2.42.0



[PATCH 0/5] LoongArch: Better balance between relaxation and scheduling

2023-10-19 Thread Xi Ruoyao
For relaxation we are now generating assembler macros for symbolic
addresses everywhere, but this is limiting scheduling and there are
known situations where the relaxation cannot improve the code.

1. When we are performing LTO during a final link and the linker plugin
is used, la.global won't be relaxed because they reference to an
external or preemptable symbol.
2. The linker currently do not relax la.tls.*.
3. For la.local + ld/st pairs, if the address is only used once,
emitting pcalau12i + ld/st is always not worse than relying on linker
relaxation.

Add -mexplicit-relocs=auto to allow the compiler to use explicit relocs
for these cases, but assembler macros for other cases.  Use it as the
default if the assembler supports both explicit relocs and relaxation.

LTO-bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (5):
  LoongArch: Add enum-style -mexplicit-relocs= option
  LoongArch: Use explicit relocs for GOT access when
-mexplicit-relocs=auto and LTO during a final link with linker
plugin
  LoongArch: Use explicit relocs for TLS access with
-mexplicit-relocs=auto
  LoongArch: Use explicit relocs for addresses only used for one load or
store with -mexplicit-relocs=auto and -mcmodel={normal,medium}
  LoongArch: Document -mexplicit-relocs={auto,none,always}

 .../loongarch/genopts/loongarch-strings   |   6 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  21 ++-
 gcc/config/loongarch/loongarch-def.h  |   6 +
 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch-str.h  |   5 +
 gcc/config/loongarch/loongarch.cc |  75 --
 gcc/config/loongarch/loongarch.h  |   3 +
 gcc/config/loongarch/loongarch.md | 128 +-
 gcc/config/loongarch/loongarch.opt|  21 ++-
 gcc/config/loongarch/predicates.md|  15 +-
 gcc/doc/invoke.texi   |  37 +++--
 .../loongarch/explicit-relocs-auto-lto.c  |  26 
 ...-relocs-auto-single-load-store-no-anchor.c |   6 +
 .../explicit-relocs-auto-single-load-store.c  |  14 ++
 .../explicit-relocs-auto-tls-ld-gd.c  |   9 ++
 .../explicit-relocs-auto-tls-le-ie.c  |   6 +
 16 files changed, 343 insertions(+), 36 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c

-- 
2.42.0



[PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin

2023-10-19 Thread Xi Ruoyao
If we are performing LTO for a final link and linker plugin is enabled,
then we are sure any GOT access may resolve to a symbol out of the link
unit (otherwise the linker plugin will tell us the symbol should be
resolved locally and we'll use PC-relative access instead).

Produce machine instructions with explicit relocs instead of la.global
for better scheduling.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_explicit_relocs_p): Declare new function.
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Implement.
(loongarch_symbol_insns): Call loongarch_explicit_relocs_p for
SYMBOL_GOT_DISP, instead of using TARGET_EXPLICIT_RELOCS.
(loongarch_split_symbol): Call loongarch_explicit_relocs_p for
deciding if return early, instead of using
TARGET_EXPLICIT_RELOCS.
(loongarch_output_move): CAll loongarch_explicit_relocs_p
instead of using TARGET_EXPLICIT_RELOCS.
* config/loongarch/loongarch.md (*low): Remove
TARGET_EXPLICIT_RELOCS from insn condition.
(@ld_from_got): Likewise.
* config/loongarch/predicates.md (move_operand): Call
loongarch_explicit_relocs_p instead of using
TARGET_EXPLICIT_RELOCS.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-lto.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  1 +
 gcc/config/loongarch/loongarch.cc | 34 +++
 gcc/config/loongarch/loongarch.md |  4 +--
 gcc/config/loongarch/predicates.md|  8 ++---
 .../loongarch/explicit-relocs-auto-lto.c  | 26 ++
 5 files changed, 59 insertions(+), 14 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 72ae9918b09..cb8fc36b086 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -220,4 +220,5 @@ extern rtx loongarch_gen_const_int_vector_shuffle 
(machine_mode, int);
 extern tree loongarch_build_builtin_va_list (void);
 
 extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool);
+extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5df8b12ed92..c12d77ea144 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1925,6 +1925,29 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
   gcc_unreachable ();
 }
 
+/* If -mexplicit-relocs=auto, we use machine operations with reloc hints
+   for cases where the linker is unable to relax so we can schedule the
+   machine operations, otherwise use an assembler pseudo-op so the
+   assembler will generate R_LARCH_RELAX.  */
+
+bool
+loongarch_explicit_relocs_p (enum loongarch_symbol_type type)
+{
+  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO)
+return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS;
+
+  /* If we are performing LTO for a final link, and we have the linker
+ plugin so we know the resolution of the symbols, then all GOT
+ references are binding to external symbols or preemptable symbols.
+ So the linker cannot relax them.  */
+  return (in_lto_p
+ && !flag_incremental_link
+ && HAVE_LTO_PLUGIN == 2
+ && (!global_options_set.x_flag_use_linker_plugin
+ || global_options.x_flag_use_linker_plugin)
+ && type == SYMBOL_GOT_DISP);
+}
+
 /* Returns the number of instructions necessary to reference a symbol.  */
 
 static int
@@ -1940,7 +1963,7 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, 
machine_mode mode)
 case SYMBOL_GOT_DISP:
   /* The constant will have to be loaded from the GOT before it
 is used in an address.  */
-  if (!TARGET_EXPLICIT_RELOCS && mode != MAX_MACHINE_MODE)
+  if (!loongarch_explicit_relocs_p (type) && mode != MAX_MACHINE_MODE)
return 0;
 
   return 3;
@@ -3038,7 +3061,7 @@ loongarch_symbol_extreme_p (enum loongarch_symbol_type 
type)
If so, and if LOW_OUT is nonnull, emit the high part and store the
low part in *LOW_OUT.  Leave *LOW_OUT unchanged otherwise.
 
-   Return false if build with '-mno-explicit-relocs'.
+   Return false if build with '-mexplicit-relocs=none'.
 
TEMP is as for loongarch_force_temporary and is used to load the high
part into a register.
@@ -3052,12 +3075,9 @@ loongarch_split_symbol (rtx temp, rtx addr, machine_mode 
mode, rtx *low_out)
 {
   enum loongarch_symbol_type symbol_type;
 
-  /* If build with '-mno-explicit-relocs', don't split symbol.  */
-  if (!TARGET_EXPLICIT_RELOCS)
-return false;
-
   if ((GET_CODE (addr) == HIGH && mode == MAX_MACHINE_MODE)
   || !loongarch_symbolic_constant_p (addr, &symbol_type)
+  ||

[PATCH 4/5] LoongArch: Use explicit relocs for addresses only used for one load or store with -mexplicit-relocs=auto and -mcmodel={normal, medium}

2023-10-19 Thread Xi Ruoyao
In these cases, if we use explicit relocs, we end up with 2
instructions:

pcalau12it0, %pc_hi20(x)
ld.d t0, t0, %pc_lo12(x)

If we use la.local pseudo-op, in the best scenario (x is in +/- 2MiB
range) we still have 2 instructions:

pcaddi   t0, %pcrel_20(x)
ld.d t0, t0, 0

If x is out of the range we'll have 3 instructions.  So for these cases
just emit machine instructions with explicit relocs.

gcc/ChangeLog:

* config/loongarch/predicates.md (symbolic_pcrel_operand): New
predicate.
* config/loongarch/loongarch.md (define_peephole2): Optimize
la.local + ld/st to pcalau12i + ld/st if the address is only used
once if -mexplicit-relocs=auto and -mcmodel=normal or medium.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-single-load-store.c:
New test.
* 
gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c:
New test.
---
 gcc/config/loongarch/loongarch.md | 122 ++
 gcc/config/loongarch/predicates.md|   7 +
 ...-relocs-auto-single-load-store-no-anchor.c |   6 +
 .../explicit-relocs-auto-single-load-store.c  |  14 ++
 4 files changed, 149 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 695c8eb9a6f..13473472171 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -65,6 +65,7 @@ (define_c_enum "unspec" [
 
   UNSPEC_LOAD_FROM_GOT
   UNSPEC_PCALAU12I
+  UNSPEC_PCALAU12I_GR
   UNSPEC_ORI_L_LO12
   UNSPEC_LUI_L_HI20
   UNSPEC_LUI_H_LO20
@@ -2297,6 +2298,16 @@ (define_insn "@pcalau12i"
   "pcalau12i\t%0,%%pc_hi20(%1)"
   [(set_attr "type" "move")])
 
+;; @pcalau12i may be used for sibcall so it has a strict constraint.  This
+;; allows any general register as the operand.
+(define_insn "@pcalau12i_gr"
+  [(set (match_operand:P 0 "register_operand" "=r")
+   (unspec:P [(match_operand:P 1 "symbolic_operand" "")]
+   UNSPEC_PCALAU12I_GR))]
+  ""
+  "pcalau12i\t%0,%%pc_hi20(%1)"
+  [(set_attr "type" "move")])
+
 (define_insn "@ori_l_lo12"
   [(set (match_operand:P 0 "register_operand" "=r")
(unspec:P [(match_operand:P 1 "register_operand" "r")
@@ -3748,6 +3759,117 @@ (define_insn "loongarch_crcc_w__w"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+;; With normal or medium code models, if the only use of a pc-relative
+;; address is for loading or storing a value, then relying on linker
+;; relaxation is not better than emitting the machine instruction directly.
+;; Even if the la.local pseudo op can be relaxed, we get:
+;;
+;; pcaddi $t0, %pcrel_20(x)
+;; ld.d   $t0, $t0, 0
+;;
+;; There are still two instructions, same as using the machine instructions
+;; and explicit relocs:
+;;
+;; pcalau12i  $t0, %pc_hi20(x)
+;; ld.d   $t0, $t0, %pc_lo12(x)
+;;
+;; And if the pseudo op cannot be relaxed, we'll get a worse result (with
+;; 3 instructions).
+(define_peephole2
+  [(set (match_operand:P 0 "register_operand")
+   (match_operand:P 1 "symbolic_pcrel_operand"))
+   (set (match_operand:GPR 2 "register_operand")
+   (mem:GPR (match_dup 0)))]
+  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
+   && (peep2_reg_dead_p (2, operands[0]) \
+   || REGNO (operands[0]) == REGNO (operands[2]))"
+  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  {
+emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+  })
+
+(define_peephole2
+  [(set (match_operand:P 0 "register_operand")
+   (match_operand:P 1 "symbolic_pcrel_operand"))
+   (set (match_operand:GPR 2 "register_operand")
+   (mem:GPR (plus (match_dup 0)
+  (match_operand 3 "const_int_operand"]
+  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
+   && (peep2_reg_dead_p (2, operands[0]) \
+   || REGNO (operands[0]) == REGNO (operands[2]))"
+  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  {
+operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3]));
+emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+  })
+
+(define_peephole2
+  [(set (match_operand:P 0 "register_operand")
+   (match_operand:P 1 "symbolic_pcrel_operand"))
+   (set (match_operand:GPR 2 "register_operand")
+   (any_extend:GPR (mem:SUBDI (match_dup 0]
+  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
+   && (peep2_reg_dead_p (2, operands[0]) \
+   || REGNO (operands[0]) == REGNO (operands[2]))"
+  [(set (match_dup 2)
+   (any_extend:GPR (mem:SUBDI (lo_sum:P

Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Jason Merrill

On 10/19/23 09:39, Patrick Palka wrote:

On Tue, 17 Oct 2023, Marek Polacek wrote:


On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:

On 10/16/23 20:39, Marek Polacek wrote:

On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:

On 10/13/23 14:53, Marek Polacek wrote:

On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:

On 10/12/23 17:04, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.

I've added some debug prints to make sure that the rest of cp_fold_r
is still performed as before.

PR c++/111660

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r) : Return
integer_zero_node instead of break;.
(cp_fold_immediate): Return true if cp_fold_immediate_r returned
error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/hog1.C: New test.
---
 gcc/cp/cp-gimplify.cc |  9 ++--
 gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 +++
 2 files changed, 82 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index bdf6e5f98ff..ca622ca169a 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
break;
   if (TREE_OPERAND (stmt, 1)
  && cp_walk_tree (&TREE_OPERAND (stmt, 1), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
   if (TREE_OPERAND (stmt, 2)
  && cp_walk_tree (&TREE_OPERAND (stmt, 2), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
   /* We're done here.  Don't clear *walk_subtrees here though: we're 
called
 from cp_fold_r and we must let it recurse on the expression with
 cp_fold.  */
-  break;
+  return integer_zero_node;


I'm concerned this will end up missing something like

1 ? 1 : ((1 ? 1 : 1), immediate())

as the integer_zero_node from the inner ?: will prevent walk_tree from
looking any farther.


You are right.  The line above works as expected, but

 1 ? 1 : ((1 ? 1 : id (42)), id (i));

shows the problem (when the expression isn't used as an initializer).


Maybe we want to handle COND_EXPR in cp_fold_r instead of here?


I hope this version is better.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.


Is this number still accurate for this version?


It is.  I ran time(1) a few more times and the results were 0m0.033s - 0m0.035s.
That said, ...


This change seems algorithmically better than the current code, but still
problematic: if we have nested COND_EXPR A/B/C/D/E, it looks like we will
end up cp_fold_immediate_r walking the arms of E five times, once for each
COND_EXPR.


...this is accurate.  I should have addressed the redundant folding in v2
even though the compilation is pretty much immediate.

What I was thinking by handling COND_EXPR in cp_fold_r was to cp_fold_r walk
its subtrees (or cp_fold_immediate_r if it's clear from op0 that the branch
isn't taken) so we can clear *walk_subtrees and we don't fold_imm walk a
node more than once.


I agree I should do better here.  How's this, then?  I've added
debug_generic_expr to cp_fold_immediate_r to see if it gets the same
expr multiple times and it doesn't seem to.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.030s.

The ff_fold_i

Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 09:39:27AM -0400, Patrick Palka wrote:
> On Tue, 17 Oct 2023, Marek Polacek wrote:
> 
> > On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:
> > > On 10/16/23 20:39, Marek Polacek wrote:
> > > > -  if (cxx_dialect > cxx17)
> > > > -cp_fold_immediate_r (stmt_p, walk_subtrees, data);
> > > > +  if (cxx_dialect >= cxx20)
> > > > +{
> > > > +  /* Unfortunately we must handle code like
> > > > +  false ? bar () : 42
> > > > +where we have to check bar too.  The cp_fold call below could
> > > > +fold the ?: into a constant before we've checked it.  */
> > > > +  if (code == COND_EXPR)
> > > > +   {
> > > > + auto then_fn = cp_fold_r, else_fn = cp_fold_r;
> > > > + /* See if we can figure out if either of the branches is 
> > > > dead.  If it
> > > > +is, we don't need to do everything that cp_fold_r does.  */
> > > > + tree cond = maybe_constant_value (TREE_OPERAND (stmt, 0));
> > > > + if (integer_zerop (cond))
> > > > +   then_fn = cp_fold_immediate_r;
> > > > + else if (TREE_CODE (cond) == INTEGER_CST)
> > > > +   else_fn = cp_fold_immediate_r;
> > > > +
> > > > + cp_walk_tree (&TREE_OPERAND (stmt, 0), cp_fold_r, data, 
> > > > nullptr);
> > > 
> > > I wonder about doing this before maybe_constant_value, to hopefully reduce
> > > the redundant calculations?  OK either way.
> > 
> > Yeah, I was toying with that, I had
> > 
> >   foo() ? x : y
> > 
> > where foo was a constexpr function but the cp_fold_r on op0 didn't eval it
> > to a constant :(.
> 
> IIUC that's because cp_fold evaluates constexpr calls only with -O, so
> doing cp_fold_r before maybe_constant_value on the condition should
> still be desirable in that case?

Ah yeah, it depends on whether -fno-inline is on or off as described below.

OK if testing passes?  Thanks,

-- >8 --
This patch is an optimization tweak for cp_fold_r.  If we cp_fold_r the
COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O,
saving some work.  cp_fold has:

3143 if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
3144 && !flag_no_inline)
...
3151 r = maybe_constant_value (x, /*decl=*/NULL_TREE,

flag_no_inline is 1 for -O0:

1124   if (opts->x_optimize == 0)
1125 {
1126   /* Inlining does not work if not optimizing,
1127  so force it not to be done.  */
1128   opts->x_warn_inline = 0;
1129   opts->x_flag_no_inline = 1;
1130 }

but otherwise it's 0 and cp_fold will maybe_constant_value calls to
constexpr functions.

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_r): cp_fold_r the first operand of a COND_EXPR
before calling maybe_constant_value.
---
 gcc/cp/cp-gimplify.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index a282c3930a3..746de86dfa6 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1151,14 +1151,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
*data_)
{
  auto then_fn = cp_fold_r, else_fn = cp_fold_r;
  /* See if we can figure out if either of the branches is dead.  If it
-is, we don't need to do everything that cp_fold_r does.  */
+is, we don't need to do everything that cp_fold_r does.  If we
+cp_fold_r first, there's a chance it will evaluate the condition to
+a constant so maybe_constant_value won't have a lot of work to do. 
 */
+ cp_walk_tree (&TREE_OPERAND (stmt, 0), cp_fold_r, data, nullptr);
  tree cond = maybe_constant_value (TREE_OPERAND (stmt, 0));
  if (integer_zerop (cond))
then_fn = cp_fold_immediate_r;
  else if (TREE_CODE (cond) == INTEGER_CST)
else_fn = cp_fold_immediate_r;
 
- cp_walk_tree (&TREE_OPERAND (stmt, 0), cp_fold_r, data, nullptr);
  if (TREE_OPERAND (stmt, 1))
cp_walk_tree (&TREE_OPERAND (stmt, 1), then_fn, data,
  nullptr);

base-commit: 19cc4b9d74940f29c961e2a5a8b1fa84992d3d30
-- 
2.41.0



Re: [RFC] Add function attribute: null_terminated_string_arg(PARAM_IDX)

2023-10-19 Thread Andreas Schwab
On Okt 19 2023, David Malcolm wrote:

> +void
> +region_model::
> +check_one_function_attr_null_terminated_string_arg (const gcall *call,
> + tree callee_fndecl,
> + region_model_context *ctxt,
> + rdwr_map &rdwr_idx,
> + tree attr)
> +{
> +  gcc_assert (call);
> +  gcc_assert (callee_fndecl);
> +  gcc_assert (ctxt);
> +  gcc_assert (attr);
> +
> +  tree arg = TREE_VALUE (attr);
> +  if (!arg)
> +return;
> +
> +  /* Convert from 1-based to 0-based index.  */
> +  unsigned int arg_idx = TREE_INT_CST_LOW (TREE_VALUE (arg)) - 1;
> +
> +  /* If there's also an "access" attribute on the ptr param
> + for reading with a size param specified, then that size
> + limits the size of the possible read from the pointer.  */
> +  if (const attr_access* access = rdwr_idx.get (arg_idx))
> +if ((access->mode == access_read_only
> +  || access->mode == access_read_write)
> + && access->sizarg != UINT_MAX)
> +  {
> + /* First, check for a null-terminated string *without*
> +emitting emitting warnings (via a null context), to

-emitting

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 10:06:01AM -0400, Jason Merrill wrote:
> On 10/19/23 09:39, Patrick Palka wrote:
> > On Tue, 17 Oct 2023, Marek Polacek wrote:
> > 
> > > On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:
> > > > On 10/16/23 20:39, Marek Polacek wrote:
> > > > > On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:
> > > > > > On 10/13/23 14:53, Marek Polacek wrote:
> > > > > > > On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:
> > > > > > > > On 10/12/23 17:04, Marek Polacek wrote:
> > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > > > > 
> > > > > > > > > -- >8 --
> > > > > > > > > My recent patch introducing cp_fold_immediate_r caused 
> > > > > > > > > exponential
> > > > > > > > > compile time with nested COND_EXPRs.  The problem is that the 
> > > > > > > > > COND_EXPR
> > > > > > > > > case recursively walks the arms of a COND_EXPR, but after 
> > > > > > > > > processing
> > > > > > > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > > > > > > sub-expressions of the outermost COND_EXPR, triggering again 
> > > > > > > > > walking
> > > > > > > > > the arms of the nested COND_EXPR, and so on.  This patch 
> > > > > > > > > brings the
> > > > > > > > > compile time down to about 0m0.033s.
> > > > > > > > > 
> > > > > > > > > I've added some debug prints to make sure that the rest of 
> > > > > > > > > cp_fold_r
> > > > > > > > > is still performed as before.
> > > > > > > > > 
> > > > > > > > > PR c++/111660
> > > > > > > > > 
> > > > > > > > > gcc/cp/ChangeLog:
> > > > > > > > > 
> > > > > > > > > * cp-gimplify.cc (cp_fold_immediate_r)  > > > > > > > > COND_EXPR>: Return
> > > > > > > > > integer_zero_node instead of break;.
> > > > > > > > > (cp_fold_immediate): Return true if 
> > > > > > > > > cp_fold_immediate_r returned
> > > > > > > > > error_mark_node.
> > > > > > > > > 
> > > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > > > 
> > > > > > > > > * g++.dg/cpp0x/hog1.C: New test.
> > > > > > > > > ---
> > > > > > > > >  gcc/cp/cp-gimplify.cc |  9 ++--
> > > > > > > > >  gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 
> > > > > > > > > +++
> > > > > > > > >  2 files changed, 82 insertions(+), 4 deletions(-)
> > > > > > > > >  create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C
> > > > > > > > > 
> > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > > > index bdf6e5f98ff..ca622ca169a 100644
> > > > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > > > @@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, 
> > > > > > > > > int *walk_subtrees, void *data_)
> > > > > > > > >   break;
> > > > > > > > >if (TREE_OPERAND (stmt, 1)
> > > > > > > > > && cp_walk_tree (&TREE_OPERAND (stmt, 1), 
> > > > > > > > > cp_fold_immediate_r, data,
> > > > > > > > > -nullptr))
> > > > > > > > > +nullptr) == error_mark_node)
> > > > > > > > >   return error_mark_node;
> > > > > > > > >if (TREE_OPERAND (stmt, 2)
> > > > > > > > > && cp_walk_tree (&TREE_OPERAND (stmt, 2), 
> > > > > > > > > cp_fold_immediate_r, data,
> > > > > > > > > -nullptr))
> > > > > > > > > +nullptr) == error_mark_node)
> > > > > > > > >   return error_mark_node;
> > > > > > > > >/* We're done here.  Don't clear *walk_subtrees 
> > > > > > > > > here though: we're called
> > > > > > > > >from cp_fold_r and we must let it recurse on the 
> > > > > > > > > expression with
> > > > > > > > >cp_fold.  */
> > > > > > > > > -  break;
> > > > > > > > > +  return integer_zero_node;
> > > > > > > > 
> > > > > > > > I'm concerned this will end up missing something like
> > > > > > > > 
> > > > > > > > 1 ? 1 : ((1 ? 1 : 1), immediate())
> > > > > > > > 
> > > > > > > > as the integer_zero_node from the inner ?: will prevent 
> > > > > > > > walk_tree from
> > > > > > > > looking any farther.
> > > > > > > 
> > > > > > > You are right.  The line above works as expected, but
> > > > > > > 
> > > > > > >  1 ? 1 : ((1 ? 1 : id (42)), id (i));
> > > > > > > 
> > > > > > > shows the problem (when the expression isn't used as an 
> > > > > > > initializer).
> > > > > > > 
> > > > > > > > Maybe we want to handle COND_EXPR in cp_fold_r instead of here?
> > > > > > > 
> > > > > > > I hope this version is better.
> > > > > > > 
> > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > My recent patch introducing cp_fold_immediate_r caused exponential
> > > > > > > compile time with nested COND_EXPRs.  The problem is that the 
> > > > > > > COND_EXPR
> > > > > > > case recursively walks the arms of a COND

Re: [PATCH 2/1] c++: more non-static memfn call dependence cleanup [PR106086]

2023-10-19 Thread Jason Merrill

On 10/12/23 14:49, Patrick Palka wrote:

On Tue, 26 Sep 2023, Patrick Palka wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

-- >8 --

This follow-up patch removes some more repetition of the type-dependent


On second thought there's no good reason to split these patches into a two
part series, so here's a single squashed patch:


OK.


-- >8 --

Subject: [PATCH] c++: non-static memfn call dependence cleanup [PR106086]

In cp_parser_postfix_expression and in the CALL_EXPR case of
tsubst_copy_and_build, we essentially repeat the type-dependent and
COMPONENT_REF callee cases of finish_call_expr.  This patch deduplicates
this logic by making both spots consistently go through finish_call_expr.

This allows us to easily fix PR106086 -- which is about us neglecting to
capture 'this' when we resolve a use of a non-static member function of
the current instantiation only at lambda regeneration time -- by moving
the call to maybe_generic_this_capture from the parser to finish_call_expr
so that we consider capturing 'this' at regeneration time as well.

PR c++/106086

gcc/cp/ChangeLog:

* parser.cc (cp_parser_postfix_expression): Consolidate three
calls to finish_call_expr, one to build_new_method_call and
one to build_min_nt_call_vec into one call to finish_call_expr.
Don't call maybe_generic_this_capture here.
* pt.cc (tsubst_copy_and_build) : Remove
COMPONENT_REF callee handling.
(type_dependent_expression_p): Use t_d_object_e_p instead of
t_d_e_p for COMPONENT_REF and OFFSET_REF.
* semantics.cc (finish_call_expr): In the type-dependent case,
call maybe_generic_this_capture here instead.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash127.C: Expect additional error due to
being able to check the member access expression ahead of time.
Strengthen the test by not instantiating the class template.
* g++.dg/cpp1y/lambda-generic-this5.C: New test.
---
  gcc/cp/parser.cc  | 52 +++
  gcc/cp/pt.cc  | 27 +-
  gcc/cp/semantics.cc   | 12 +++--
  .../g++.dg/cpp1y/lambda-generic-this5.C   | 22 
  gcc/testsuite/g++.dg/template/crash127.C  |  3 +-
  5 files changed, 38 insertions(+), 78 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-this5.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f3abae716fe..b00ef36b831 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -8047,54 +8047,12 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
close_paren_loc);
iloc_sentinel ils (combined_loc);
  
-	if (TREE_CODE (postfix_expression) == COMPONENT_REF)

- {
-   tree instance = TREE_OPERAND (postfix_expression, 0);
-   tree fn = TREE_OPERAND (postfix_expression, 1);
-
-   if (processing_template_decl
-   && (type_dependent_object_expression_p (instance)
-   || (!BASELINK_P (fn)
-   && TREE_CODE (fn) != FIELD_DECL)
-   || type_dependent_expression_p (fn)
-   || any_type_dependent_arguments_p (args)))
- {
-   maybe_generic_this_capture (instance, fn);
-   postfix_expression
- = build_min_nt_call_vec (postfix_expression, args);
- }
-   else if (BASELINK_P (fn))
- {
- postfix_expression
-   = (build_new_method_call
-  (instance, fn, &args, NULL_TREE,
-   (idk == CP_ID_KIND_QUALIFIED
-? LOOKUP_NORMAL|LOOKUP_NONVIRTUAL
-: LOOKUP_NORMAL),
-   /*fn_p=*/NULL,
-   complain));
- }
-   else
- postfix_expression
-   = finish_call_expr (postfix_expression, &args,
-   /*disallow_virtual=*/false,
-   /*koenig_p=*/false,
-   complain);
- }
-   else if (TREE_CODE (postfix_expression) == OFFSET_REF
-|| TREE_CODE (postfix_expression) == MEMBER_REF
-|| TREE_CODE (postfix_expression) == DOTSTAR_EXPR)
+   if (TREE_CODE (postfix_expression) == OFFSET_REF
+   || TREE_CODE (postfix_expression) == MEMBER_REF
+   || TREE_CODE (postfix_expression) == DOTSTAR_EXPR)
  postfix_expression = (build_offset_ref_call_from_tree
(postfix_expression, &args,
 complain));
-   else if (idk == CP_ID_KIND_QUALIFIED)

[COMMITTED] ada: Simplify "not Present" with "No"

2023-10-19 Thread Marc Poulhiès
From: Piotr Trojanek 

gcc/ada/

* exp_aggr.adb (Expand_Container_Aggregate): Simplify with "No".

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index e5f36326600..340c8c68465 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -7288,7 +7288,7 @@ package body Exp_Aggr is
  --  Iterated component association. Discard
  --  positional insertion procedure.
 
- if not Present (Iterator_Specification (Comp)) then
+ if No (Iterator_Specification (Comp)) then
 Add_Named_Subp := Assign_Indexed_Subp;
 Add_Unnamed_Subp := Empty;
  end if;
-- 
2.42.0



[COMMITTED] ada: Seize opportunity to reuse List_Length

2023-10-19 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch is intended as a readability improvement. It doesn't
change the behavior of the compiler.

gcc/ada/

* sem_ch3.adb (Constrain_Array): Replace manual list length
computation by call to List_Length.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index c79d323395f..e92b46fa6f6 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -13809,7 +13809,7 @@ package body Sem_Ch3 is
   Suffix  : Character)
is
   C : constant Node_Id := Constraint (SI);
-  Number_Of_Constraints : Nat := 0;
+  Number_Of_Constraints : constant Nat := List_Length (Constraints (C));
   Index : Node_Id;
   S, T  : Entity_Id;
   Constraint_OK : Boolean := True;
@@ -13835,12 +13835,6 @@ package body Sem_Ch3 is
  Constraint_OK := False;
 
   else
- S := First (Constraints (C));
- while Present (S) loop
-Number_Of_Constraints := Number_Of_Constraints + 1;
-Next (S);
- end loop;
-
  --  In either case, the index constraint must provide a discrete
  --  range for each index of the array type and the type of each
  --  discrete range must be the same as that of the corresponding
-- 
2.42.0



[COMMITTED] ada: Document gnatbind -Q switch

2023-10-19 Thread Marc Poulhiès
From: Patrick Bernardi 

Add documentation for the -Q gnatbind switch in GNAT User's Guide and
improve gnatbind's help output for the switch to emphasize that it adds the
requested number of stacks to the secondary stack pool generated by the
binder.

gcc/ada/

* bindusg.adb (Display): Make it clear -Q adds to the number of
secondary stacks generated by the binder.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst:
Document the -Q gnatbind switch and fix references to old
runtimes.
* gnat-style.texi: Regenerate.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/bindusg.adb   |  2 +-
 ...building_executable_programs_with_gnat.rst | 29 ++---
 gcc/ada/gnat-style.texi   |  4 +-
 gcc/ada/gnat_rm.texi  |  4 +-
 gcc/ada/gnat_ugn.texi | 41 ++-
 5 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/gcc/ada/bindusg.adb b/gcc/ada/bindusg.adb
index fca425b2244..89a6caedf31 100644
--- a/gcc/ada/bindusg.adb
+++ b/gcc/ada/bindusg.adb
@@ -234,7 +234,7 @@ package body Bindusg is
   --  Line for Q switch
 
   Write_Line
-("  -Qnnn Generate nnn default-sized secondary stacks");
+("  -Qnnn Generate nnn additional default-sized secondary stacks");
 
   --  Line for -r switch
 
diff --git a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst 
b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
index 6e80163d6d4..a708ef4b995 100644
--- a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
+++ b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
@@ -6524,12 +6524,12 @@ be presented in subsequent sections.
   determines the initial size of the secondary stack for each task and the
   smallest amount the secondary stack can grow by.
 
-  For Ravenscar, ZFP, and Cert run-times the size of the secondary stack is
-  fixed. This switch can be used to change the default size of these stacks.
-  The default secondary stack size can be overridden on a per-task basis if
-  individual tasks have different secondary stack requirements. This is
-  achieved through the Secondary_Stack_Size aspect that takes the size of the
-  secondary stack in bytes.
+  For Light, Light-Tasking, and Embedded run-times the size of the secondary
+  stack is fixed. This switch can be used to change the default size of these
+  stacks. The default secondary stack size can be overridden on a per-task
+  basis if individual tasks have different secondary stack requirements. This
+  is achieved through the Secondary_Stack_Size aspect, which takes the size of
+  the secondary stack in bytes.
 
 .. index:: -e  (gnatbind)
 
@@ -6739,6 +6739,23 @@ be presented in subsequent sections.
   Generate binder file suitable for CodePeer.
 
 
+.. index:: -Q  (gnatbind)
+
+:switch:`-Q{nnn}`
+  Generate ``nnn`` additional default-sized secondary stacks.
+
+  Tasks declared at the library level that use default-size secondary stacks
+  have their secondary stacks allocated from a pool of stacks generated by
+  gnatbind. This allows the default secondary stack size to be quickly changed
+  by rebinding the application.
+
+  While the binder sizes this pool to match the number of such tasks defined in
+  the application, the pool size may need to be increased with the :switch:`-Q`
+  switch to accommodate foreign threads registered with the Light run-time. For
+  more information, please see the *The Primary and Secondary Stack* chapter in
+  the *GNAT User’s Guide Supplement for Cross Platforms*.
+
+
   .. index:: -R  (gnatbind)
 
 :switch:`-R`
diff --git a/gcc/ada/gnat-style.texi b/gcc/ada/gnat-style.texi
index bcdc160..33bb1886985 100644
--- a/gcc/ada/gnat-style.texi
+++ b/gcc/ada/gnat-style.texi
@@ -3,7 +3,7 @@
 @setfilename gnat-style.info
 @documentencoding UTF-8
 @ifinfo
-@*Generated by Sphinx 5.2.3.@*
+@*Generated by Sphinx 7.2.6.@*
 @end ifinfo
 @settitle GNAT Coding Style A Guide for GNAT Developers
 @defindex ge
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT Coding Style: A Guide for GNAT Developers , May 09, 2023
+GNAT Coding Style: A Guide for GNAT Developers , Oct 16, 2023
 
 AdaCore
 
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index b7e098331e9..9a6a0170ae8 100644
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -3,7 +3,7 @@
 @setfilename gnat_rm.info
 @documentencoding UTF-8
 @ifinfo
-@*Generated by Sphinx 5.2.3.@*
+@*Generated by Sphinx 7.2.6.@*
 @end ifinfo
 @settitle GNAT Reference Manual
 @defindex ge
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT Reference Manual , Jul 17, 2023
+GNAT Reference Manual , Oct 16, 2023
 
 AdaCore
 
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 1562bee1f64..897153bcfc7 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -

[COMMITTED] ada: Add pragma Annotate for GNATcheck exemptions

2023-10-19 Thread Marc Poulhiès
From: Sheri Bernstein 

Exempt the GNATcheck rule "Unassigned_OUT_Parameters"
with the rationale "the OUT parameter is assigned by component".

gcc/ada/

* libgnat/s-imguti.adb (Set_Decimal_Digits): Add pragma to exempt
Unassigned_OUT_Parameters.
(Set_Floating_Invalid_Value): Likewise

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-imguti.adb | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/ada/libgnat/s-imguti.adb b/gcc/ada/libgnat/s-imguti.adb
index 4b9e27a7d8f..cb081108950 100644
--- a/gcc/ada/libgnat/s-imguti.adb
+++ b/gcc/ada/libgnat/s-imguti.adb
@@ -37,6 +37,8 @@ package body System.Img_Util is
-- Set_Decimal_Digits --

 
+   pragma Annotate (Gnatcheck, Exempt_On, "Unassigned_OUT_Parameters",
+"the OUT parameter is assigned by component");
procedure Set_Decimal_Digits
  (Digs  : in out String;
   NDigs : Natural;
@@ -47,6 +49,8 @@ package body System.Img_Util is
   Aft   : Natural;
   Exp   : Natural)
is
+  pragma Annotate (Gnatcheck, Exempt_Off, "Unassigned_OUT_Parameters");
+
   pragma Assert (NDigs >= 1);
   pragma Assert (Digs'First = 1);
   pragma Assert (Digs'First < Digs'Last);
@@ -413,6 +417,8 @@ package body System.Img_Util is
-- Set_Floating_Invalid_Value --

 
+   pragma Annotate (Gnatcheck, Exempt_On, "Unassigned_OUT_Parameters",
+"the OUT parameter is assigned by component");
procedure Set_Floating_Invalid_Value
  (V: Floating_Invalid_Value;
   S: out String;
@@ -421,6 +427,8 @@ package body System.Img_Util is
   Aft  : Natural;
   Exp  : Natural)
is
+  pragma Annotate (Gnatcheck, Exempt_Off, "Unassigned_OUT_Parameters");
+
   procedure Set (C : Character);
   --  Sets character C in output buffer
 
-- 
2.42.0



[COMMITTED] ada: Refactor code to remove GNATcheck violation

2023-10-19 Thread Marc Poulhiès
From: Sheri Bernstein 

Rewrite for loop containing an exit (which violates GNATcheck
rule Exits_From_Conditional_Loops), to use a while loop
which contains the exit criteria in its condition.
Also, move special case of first time through loop, to come
before loop.

gcc/ada/

* libgnat/s-imagef.adb (Set_Image_Fixed): Refactor loop.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-imagef.adb | 75 +++-
 1 file changed, 40 insertions(+), 35 deletions(-)

diff --git a/gcc/ada/libgnat/s-imagef.adb b/gcc/ada/libgnat/s-imagef.adb
index 3f6bfa20cb2..6194a3163de 100644
--- a/gcc/ada/libgnat/s-imagef.adb
+++ b/gcc/ada/libgnat/s-imagef.adb
@@ -307,6 +307,9 @@ package body System.Image_F is
   YY : Int := Y;
   --  First two operands of the scaled divide
 
+  J : Natural;
+  --  Loop index
+
begin
   --  Set the first character like Image
 
@@ -317,59 +320,61 @@ package body System.Image_F is
  Ndigs := 0;
   end if;
 
-  for J in 1 .. N loop
- exit when XX = 0;
+  --  First round of scaled divide
 
+  if XX /= 0 then
  Scaled_Divide (XX, YY, Z, Q, R => XX, Round => False);
+ if Q /= 0 then
+Set_Image_Integer (Q, Digs, Ndigs);
+ end if;
 
- if J = 1 then
-if Q /= 0 then
-   Set_Image_Integer (Q, Digs, Ndigs);
-end if;
-
-Scale := Scale + D;
+ Scale := Scale + D;
 
---  Prepare for next round, if any
+ --  Prepare for next round, if any
 
-YY := 10**Maxdigs;
+ YY := 10**Maxdigs;
+  end if;
 
- else
-pragma Assert (-10**Maxdigs < Q and then Q < 10**Maxdigs);
+  J := 2;
+  while J <= N and then XX /= 0 loop
+ Scaled_Divide (XX, YY, Z, Q, R => XX, Round => False);
 
-Len := 0;
-Set_Image_Integer (abs Q, Buf, Len);
+ pragma Assert (-10**Maxdigs < Q and then Q < 10**Maxdigs);
 
-pragma Assert (1 <= Len and then Len <= Maxdigs);
+ Len := 0;
+ Set_Image_Integer (abs Q, Buf, Len);
 
---  If no character but the space has been written, write the
---  minus if need be, since Set_Image_Integer did not do it.
+ pragma Assert (1 <= Len and then Len <= Maxdigs);
 
-if Ndigs <= 1 then
-   if Q /= 0 then
-  if Ndigs = 0 then
- Digs (1) := '-';
-  end if;
+ --  If no character but the space has been written, write the
+ --  minus if need be, since Set_Image_Integer did not do it.
 
-  Digs (2 .. Len + 1) := Buf (1 .. Len);
-  Ndigs := Len + 1;
+ if Ndigs <= 1 then
+if Q /= 0 then
+   if Ndigs = 0 then
+  Digs (1) := '-';
end if;
 
---  Or else pad the output with zeroes up to Maxdigs
+   Digs (2 .. Len + 1) := Buf (1 .. Len);
+   Ndigs := Len + 1;
+end if;
 
-else
-   for K in 1 .. Maxdigs - Len loop
-  Digs (Ndigs + K) := '0';
-   end loop;
+ --  Or else pad the output with zeroes up to Maxdigs
 
-   for K in 1 .. Len loop
-  Digs (Ndigs + Maxdigs - Len + K) := Buf (K);
-   end loop;
+ else
+for K in 1 .. Maxdigs - Len loop
+   Digs (Ndigs + K) := '0';
+end loop;
 
-   Ndigs := Ndigs + Maxdigs;
-end if;
+for K in 1 .. Len loop
+   Digs (Ndigs + Maxdigs - Len + K) := Buf (K);
+end loop;
 
-Scale := Scale + Maxdigs;
+Ndigs := Ndigs + Maxdigs;
  end if;
+
+ Scale := Scale + Maxdigs;
+ J := J + 1;
   end loop;
 
   --  If no digit was output, this is zero
-- 
2.42.0



Re: [PATCH 06/11] haifa-sched: Allow for NOTE_INSN_DELETED at start of epilogue

2023-10-19 Thread Jeff Law




On 10/17/23 14:48, Alex Coplan wrote:

haifa-sched.cc:remove_notes asserts that it lands on a real (non-note)
insn after advancing past NOTE_INSN_EPILOGUE_BEG, but with the upcoming
post-RA aarch64 load pair pass enabled, we can land on
NOTE_INSN_DELETED.

This patch adjusts remove_notes to remove these if they occur at the
start of the epilogue instead of asserting.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* haifa-sched.cc (remove_notes): Allow for NOTE_INSN_DELETED at
the start of the epilgoue, remove these.
One could argue that the pass should have actually deleted the insn 
rather than just turned it into a NOTE_INSN_DELETED.  Is there some 
reason that's not done?  A NOTE_INSN_DELETED carries no useful information.




+ /* Skip over any NOTE_INSN_DELETED at the start of the epilogue.
+  */


Don't bring the close comment down to a new line.  If it fits, but it on 
the last line of the actual comment.  Otherwise bring down part of 
comment so that we don't have the close comment on a line by itself.


Jeff


[PATCH v2 11/11] aarch64: Add new load/store pair fusion pass

2023-10-19 Thread Alex Coplan
Hi,

This v2 fixes a significant compile-time hog in the original patch
for the pass posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633355.html

Having seen a large compile-time regression when turning on the early
ldp pass, compiling 502.gcc_r from SPEC CPU 2017, I found that the
benchmark's insn-attrtab.c dominated the compile time, and moreover that
compile time for that file increased by 3.79x when enabling the early
ldp fusion pass at -O.

Running cc1 under a profiler revealed that ~44% of the entire profile
was in rtx_equal_p (inlined via cleanup_tombstones into ldp_fusion_bb).
I missed that we can skip running cleanup_tombstones entirely in the
(common) case that we never emitted a tombstone insn for a given bb.

This patch implements that optimization.  This reduces the overhead for
the early ldp pass on that file to around 1%, which seems more like an
acceptable overhead for an additional pass.

Incrementally, the change is as follows:

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index e5de4bbb3f5..f1621c9a384 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -148,7 +148,7 @@ struct ldp_bb_info
   static const size_t obstack_alignment = sizeof (void *);
   bb_info *m_bb;
 
-  ldp_bb_info (bb_info *bb) : m_bb (bb)
+  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
   {
 obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
obstack_alignment, obstack_chunk_alloc,
@@ -164,7 +164,10 @@ struct ldp_bb_info
   inline void cleanup_tombstones ();
 
 private:
+  // Did we emit a tombstone insn for this bb?
+  bool m_emitted_tombstone;
   obstack m_obstack;
+
   inline splay_tree_node *node_alloc (access_record *);
 
   template
@@ -1006,7 +1009,8 @@ fuse_pair (bool load_p,
   insn_info *i1,
   insn_info *i2,
   base_cand &base,
-  const insn_range_info &move_range)
+  const insn_range_info &move_range,
+  bool &emitted_tombstone_p)
 {
   auto attempt = crtl->ssa->new_change_attempt ();
 
@@ -1021,6 +1025,9 @@ fuse_pair (bool load_p,
insn_change::DELETE);
 };
 
+  // Are we using a tombstone insn for this pair?
+  bool have_tombstone_p = false;
+
   insn_info *first = (*i1 < *i2) ? i1 : i2;
   insn_info *second = (first == i1) ? i2 : i1;
 
@@ -1217,6 +1224,7 @@ fuse_pair (bool load_p,
  gcc_assert (validate_change (rti, ®_NOTES (rti),
   NULL_RTX, true));
  change->new_uses = use_array (nullptr, 0);
+ have_tombstone_p = true;
}
  gcc_assert (change->new_defs.is_valid ());
  gcc_assert (change->new_uses.is_valid ());
@@ -1283,6 +1291,7 @@ fuse_pair (bool load_p,
 
   confirm_change_group ();
   crtl->ssa->change_insns (changes);
+  emitted_tombstone_p |= have_tombstone_p;
   return true;
 }
 
@@ -1702,7 +1711,8 @@ try_fuse_pair (bool load_p,
   unsigned access_size,
   insn_info *i1,
   insn_info *i2,
-  base_info binfo)
+  base_info binfo,
+  bool &emitted_tombstone_p)
 {
   if (dump_file)
 fprintf (dump_file, "analyzing pair (load=%d): (%d,%d)\n",
@@ -1991,7 +2001,7 @@ try_fuse_pair (bool load_p,
   range.first->uid (), range.last->uid ());
 }
 
-  return fuse_pair (load_p, i1, i2, *base, range);
+  return fuse_pair (load_p, i1, i2, *base, range, emitted_tombstone_p);
 }
 
 // Erase [l.begin (), i] inclusive, respecting iterator order.
@@ -2047,7 +2057,8 @@ merge_pairs (insn_iter_t l_begin,
 hash_set  &to_delete,
 bool load_p,
 unsigned access_size,
-base_info binfo)
+base_info binfo,
+bool &emitted_tombstone_p)
 {
   auto iter_l = l_begin;
   auto iter_r = r_begin;
@@ -2076,7 +2087,8 @@ merge_pairs (insn_iter_t l_begin,
   bool update_r = false;
 
   result = try_fuse_pair (load_p, access_size,
- *iter_l, *iter_r, binfo);
+ *iter_l, *iter_r, binfo,
+ emitted_tombstone_p);
   if (result)
{
  update_l = update_r = true;
@@ -2153,7 +2165,8 @@ ldp_bb_info::try_form_pairs (insn_list_t *left_orig,
   merge_pairs (left_orig->begin (), left_orig->end (),
   right_copy.begin (), right_copy.end (),
   *left_orig, right_copy,
-  to_delete, load_p, access_size, binfo);
+  to_delete, load_p, access_size, binfo,
+  m_emitted_tombstone);
 
   // If we formed all right candidates into pairs,
   // then we can skip the next iteration.
@@ -2206,6 +2219,10 @@ ldp_bb_info::transform_for_base (base_info binfo,
 void
 ldp_bb_info::cleanup_tombstones ()
 {
+  // No need to do anyth

Re: [PATCH 2/1] c++: more non-static memfn call dependence cleanup [PR106086]

2023-10-19 Thread Patrick Palka
On Thu, 19 Oct 2023, Jason Merrill wrote:

> On 10/12/23 14:49, Patrick Palka wrote:
> > On Tue, 26 Sep 2023, Patrick Palka wrote:
> > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > > for trunk?
> > > 
> > > -- >8 --
> > > 
> > > This follow-up patch removes some more repetition of the type-dependent
> > 
> > On second thought there's no good reason to split these patches into a two
> > part series, so here's a single squashed patch:
> 
> OK.

Thanks.  It turns out this patch slightly depends on the
NON_DEPENDENT_EXPR removal patches, since without them finish_call_expr
in a template context will undesirably do build_non_dependent_expr on
the fn/args before its COMPONENT_REF branch that dispatches to
build_new_method_call, but this latter function expects to be called
with unwrapped fn/args.  This (seemingly latent bug) can trivially be
fixed by moving finish_call_expr's build_non_dependent_expr calls to
happen after the COMPONENT_REF branch, but I reckon I'll just wait until
the NON_DEPENDENT_EXPR removal patches are in before pushing this one.

> 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: non-static memfn call dependence cleanup [PR106086]
> > 
> > In cp_parser_postfix_expression and in the CALL_EXPR case of
> > tsubst_copy_and_build, we essentially repeat the type-dependent and
> > COMPONENT_REF callee cases of finish_call_expr.  This patch deduplicates
> > this logic by making both spots consistently go through finish_call_expr.
> > 
> > This allows us to easily fix PR106086 -- which is about us neglecting to
> > capture 'this' when we resolve a use of a non-static member function of
> > the current instantiation only at lambda regeneration time -- by moving
> > the call to maybe_generic_this_capture from the parser to finish_call_expr
> > so that we consider capturing 'this' at regeneration time as well.
> > 
> > PR c++/106086
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (cp_parser_postfix_expression): Consolidate three
> > calls to finish_call_expr, one to build_new_method_call and
> > one to build_min_nt_call_vec into one call to finish_call_expr.
> > Don't call maybe_generic_this_capture here.
> > * pt.cc (tsubst_copy_and_build) : Remove
> > COMPONENT_REF callee handling.
> > (type_dependent_expression_p): Use t_d_object_e_p instead of
> > t_d_e_p for COMPONENT_REF and OFFSET_REF.
> > * semantics.cc (finish_call_expr): In the type-dependent case,
> > call maybe_generic_this_capture here instead.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/template/crash127.C: Expect additional error due to
> > being able to check the member access expression ahead of time.
> > Strengthen the test by not instantiating the class template.
> > * g++.dg/cpp1y/lambda-generic-this5.C: New test.
> > ---
> >   gcc/cp/parser.cc  | 52 +++
> >   gcc/cp/pt.cc  | 27 +-
> >   gcc/cp/semantics.cc   | 12 +++--
> >   .../g++.dg/cpp1y/lambda-generic-this5.C   | 22 
> >   gcc/testsuite/g++.dg/template/crash127.C  |  3 +-
> >   5 files changed, 38 insertions(+), 78 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-this5.C
> > 
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index f3abae716fe..b00ef36b831 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -8047,54 +8047,12 @@ cp_parser_postfix_expression (cp_parser *parser,
> > bool address_p, bool cast_p,
> > close_paren_loc);
> > iloc_sentinel ils (combined_loc);
> >   - if (TREE_CODE (postfix_expression) == COMPONENT_REF)
> > - {
> > -   tree instance = TREE_OPERAND (postfix_expression, 0);
> > -   tree fn = TREE_OPERAND (postfix_expression, 1);
> > -
> > -   if (processing_template_decl
> > -   && (type_dependent_object_expression_p (instance)
> > -   || (!BASELINK_P (fn)
> > -   && TREE_CODE (fn) != FIELD_DECL)
> > -   || type_dependent_expression_p (fn)
> > -   || any_type_dependent_arguments_p (args)))
> > - {
> > -   maybe_generic_this_capture (instance, fn);
> > -   postfix_expression
> > - = build_min_nt_call_vec (postfix_expression, args);
> > - }
> > -   else if (BASELINK_P (fn))
> > - {
> > - postfix_expression
> > -   = (build_new_method_call
> > -  (instance, fn, &args, NULL_TREE,
> > -   (idk == CP_ID_KIND_QUALIFIED
> > -? LOOKUP_NORMAL|LOOKUP_NONVIRTUAL
> > -: LOOKUP_NORMAL),
> > -   /*fn_p=*/NULL,
> > -   complain));
> > - }
> > -   else
> > - postfix_expression
> > -   = fin

[PATCH] ABOUT-GCC-NLS: add usage guidance

2023-10-19 Thread Jason Merrill
A recent question led me to look at this file again, and it occurred to me that
it could use to offer more guidance.  OK for trunk?

-- 8< --

gcc/ChangeLog:

* ABOUT-GCC-NLS: Add usage guidance.
---
 gcc/ABOUT-GCC-NLS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/ABOUT-GCC-NLS b/gcc/ABOUT-GCC-NLS
index e90a67144e3..4c8b94d0811 100644
--- a/gcc/ABOUT-GCC-NLS
+++ b/gcc/ABOUT-GCC-NLS
@@ -23,6 +23,19 @@ For example, GCC source code should not contain calls like 
`error
 ("unterminated comment")' instead, as it is the `error' function's
 responsibility to translate the message before the user sees it.
 
+In general, use no markup for strings that are the immediate format string
+argument of a diagnostic function.  Use G_("str") for strings that will be
+used as the format string for a diagnostic but are e.g. assigned to a
+variable first.  Use N_("str") for other strings, particularly in a
+statically allocated array, that will be translated later by
+e.g. _(msgs[idx]).  Use _("str") for strings that will not be translated
+elsewhere.
+
+Avoid using %s to compose a diagnostic message from multiple translateable
+strings; instead, write out the full diagnostic message for each variant.
+Only use %s for message components that do not need translation, such as
+keywords.
+
 By convention, any function parameter in the GCC sources whose name
 ends in `msgid' is expected to be a message requiring translation.
 If the parameter name ends with `gmsgid', it is assumed to be a GCC

base-commit: faa0e82b409362ba022f6872cea9677e9dd42f0c
-- 
2.39.3



Re: [PATCH 02/11] Handle epilogues that contain jumps

2023-10-19 Thread Jeff Law




On 10/17/23 03:19, Richard Biener wrote:

On Thu, Oct 12, 2023 at 10:15 AM Richard Sandiford
 wrote:


Richard Biener  writes:

On Tue, Aug 22, 2023 at 12:42 PM Szabolcs Nagy via Gcc-patches
 wrote:


From: Richard Sandiford 

The prologue/epilogue pass allows the prologue sequence
to contain jumps.  The sequence is then partitioned into
basic blocks using find_many_sub_basic_blocks.

This patch treats epilogues in the same way.  It's needed for
a follow-on aarch64 patch that adds conditional code to both
the prologue and the epilogue.

Tested on aarch64-linux-gnu (including with a follow-on patch)
and x86_64-linux-gnu.  OK to install?

Richard

gcc/
 * function.cc (thread_prologue_and_epilogue_insns): Handle
 epilogues that contain jumps.
---

This is a previously approved patch that was not committed
because it was not needed at the time, but i'd like to commit
it as it is needed for the followup aarch64 eh_return changes:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605769.html

---
  gcc/function.cc | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/gcc/function.cc b/gcc/function.cc
index dd2c1136e07..70d1cd65303 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6120,6 +6120,11 @@ thread_prologue_and_epilogue_insns (void)
   && returnjump_p (BB_END (e->src)))
 e->flags &= ~EDGE_FALLTHRU;
 }
+
+ auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+ bitmap_clear (blocks);
+   bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->index);
+ find_many_sub_basic_blocks (blocks);
 }
else if (next_active_insn (BB_END (exit_fallthru_edge->src)))
 {
@@ -6218,6 +6223,11 @@ thread_prologue_and_epilogue_insns (void)
   set_insn_locations (seq, epilogue_location);

   emit_insn_before (seq, insn);
+
+ auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+ bitmap_clear (blocks);
+ bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index);
+ find_many_sub_basic_blocks (blocks);


I'll note that clearing a full sbitmap to pass down a single basic block
to find_many_sub_basic_blocks is a quite expensive operation.  May I suggest
to add an overload operating on a single basic block?  It's only

   FOR_EACH_BB_FN (bb, cfun)
 SET_STATE (bb,
bitmap_bit_p (blocks, bb->index) ? BLOCK_TO_SPLIT :
BLOCK_ORIGINAL);

using the bitmap, so factoring the rest of the function and customizing this
walk would do the trick.  Note that the whole function could be refactored to
handle single blocks more efficiently.


Sorry for the late reply, but does this look OK?  Tested on
aarch64-linux-gnu and x86_64-linux-gnu.


LGTM, not sure if I'm qualified enough to approve though (I think you
are more qualified here, so ..)

It looks quite sensible to me.

Jeff


Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-19 Thread Jeff Law




On 10/14/23 16:14, Roger Sayle wrote:


This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
 2: r25:HI=zero_extend(R12:QI)
   REG_DEAD R12:QI
 7: r28:PSI=sign_extend(r25:HI)#0
   REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
 (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]

which results in the following code:

foo:AND #0xff, R12
 RLAM.A #4, R12 { RRAM.A #4, R12
 RLAM.A  #1, R12
 MOVX.W  table(R12), R12
 RETA

With this patch, we now see:

Trying 2 -> 7:
 2: r25:HI=zero_extend(R12:QI)
   REG_DEAD R12:QI
 7: r28:PSI=sign_extend(r25:HI)#0
   REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
 (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:MOV.B   R12, R12
 RLAM.A  #1, R12
 MOVX.W  table(R12), R12
 RETA


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-14  Roger Sayle  

gcc/ChangeLog
 PR rtl-optimization/91865
 * combine.cc (make_compound_operation): Avoid creating a
 ZERO_EXTEND of a ZERO_EXTEND.
Final question.  Is there a reasonable expectation that we could get a 
similar situation with sign extensions?   If so we probably ought to try 
and handle both.


OK with the obvious change to handle nested sign extensions if you think 
it's useful to do so.  And OK as-is if you don't think handling nested 
sign extensions is useful.


jeff


Re: [PATCH] ABOUT-GCC-NLS: add usage guidance

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 11:11:30AM -0400, Jason Merrill wrote:
> A recent question led me to look at this file again, and it occurred to me 
> that
> it could use to offer more guidance.  OK for trunk?
> 
> -- 8< --
> 
> gcc/ChangeLog:
> 
>   * ABOUT-GCC-NLS: Add usage guidance.
> ---
>  gcc/ABOUT-GCC-NLS | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/gcc/ABOUT-GCC-NLS b/gcc/ABOUT-GCC-NLS
> index e90a67144e3..4c8b94d0811 100644
> --- a/gcc/ABOUT-GCC-NLS
> +++ b/gcc/ABOUT-GCC-NLS
> @@ -23,6 +23,19 @@ For example, GCC source code should not contain calls like 
> `error
>  ("unterminated comment")' instead, as it is the `error' function's
>  responsibility to translate the message before the user sees it.
>  
> +In general, use no markup for strings that are the immediate format string
> +argument of a diagnostic function.  Use G_("str") for strings that will be
> +used as the format string for a diagnostic but are e.g. assigned to a
> +variable first.  Use N_("str") for other strings, particularly in a
> +statically allocated array, that will be translated later by
> +e.g. _(msgs[idx]).  Use _("str") for strings that will not be translated

The difference between G_ and N_ is whether they are GCC internal diagnostic
format strings or printf family format strings (gcc-internal-format vs.
c-format in po/gcc.pot), not anything else, so G_ can be used in statically
allocated array as well and both for diagnostic routines and printf* family
when translation is desirable it is eventually translated through _() or
other gettext invocations, either inside of diagnostics.cc or elsewhere.
So, the note about statically allocated array should move to G_ as well, and
N_ should be described as similarly, but for strings which after translation
are passed to printf family functions or so.  ANd the translated later by
e.g. note should be again for both and said that it is done automatically
for GCC diagnostic routines.

Jakub



Re: [PATCH] genemit: Split insn-emit.cc into ten files.

2023-10-19 Thread Jeff Law




On 10/17/23 01:04, Robin Dapp wrote:

Natively, things seem fine, but for cross, I get failures on a few
targets (hppa2.0-unknown-linux-gnu, hppa64-unknown-linux-gnu).

With ./configure --host=x86_64-pc-linux-gnu
--target=hppa2.0-unknown-linux-gnu --build=x86_64-pc-linux-gnu && make
-j$(nproc), I get a bunch of stuff like:

mv: cannot stat 'tmp-emit-9.cc': No such file or directory
echo timestamp > s-insn-emit-8
mv: cannot stat 'tmp-emit-10.cc': No such file or directory
make[2]: *** [Makefile:2598: s-insn-emit-9] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: *** [Makefile:2598: s-insn-emit-10] Error 1


Thanks.  I presume this is not a native vs cross problem (as I have
been building crosses with it for some days now) but rather a "race"
i.e. missing dependency on the individual output files.  Need to
re-check this.

Yea, that's almost certainly a missed dependency.

I'm ready to throw this into the tester once you've got the dependency 
issue resolved.


It's been running behind the last several weeks until I did some 
revamping of how jobs are fired off to make it more efficient.  It seems 
to be keeping up now.


Jeff


[pushed] c++: use G_ instead of _

2023-10-19 Thread Jason Merrill
Tested with make po/gcc.pot to see that the strings are still there (though in
a different place, now with the gcc-internal-format tag).

Applying to trunk.

-- 8< --

Since these strings are passed to error_at, they should be marked for
translation with G_, like other diagnostic messages, rather than _, which
forces immediate (redundant) translation.  The use of N_ is less
problematic, but also imprecise.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_primary_expression): Use G_.
(cp_parser_using_enum): Likewise.
* decl.cc (identify_goto): Likewise.
---
 gcc/cp/decl.cc   |  4 ++--
 gcc/cp/parser.cc | 16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 255c4026bdb..ce4c89dea70 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -3607,8 +3607,8 @@ identify_goto (tree decl, location_t loc, const 
location_t *locus,
 {
   bool complained
 = emit_diagnostic (diag_kind, loc, 0,
-  decl ? N_("jump to label %qD")
-  : N_("jump to case label"), decl);
+  decl ? G_("jump to label %qD")
+  : G_("jump to case label"), decl);
   if (complained && locus)
 inform (*locus, "  from here");
   return complained;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 57b62fb7363..c77e93ef104 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -6206,8 +6206,8 @@ cp_parser_primary_expression (cp_parser *parser,
  {
const char *msg
  = (TREE_CODE (decl) == PARM_DECL
-? _("parameter %qD may not appear in this context")
-: _("local variable %qD may not appear in this context"));
+? G_("parameter %qD may not appear in this context")
+: G_("local variable %qD may not appear in this context"));
error_at (id_expression.get_location (), msg,
  decl.get_value ());
return error_mark_node;
@@ -22145,16 +22145,16 @@ cp_parser_using_enum (cp_parser *parser)
  shall have a reachable enum-specifier.  */
   const char *msg = nullptr;
   if (cxx_dialect < cxx20)
-msg = _("% "
-   "only available with %<-std=c++20%> or %<-std=gnu++20%>");
+msg = G_("% "
+"only available with %<-std=c++20%> or %<-std=gnu++20%>");
   else if (dependent_type_p (type))
-msg = _("% of dependent type %qT");
+msg = G_("% of dependent type %qT");
   else if (TREE_CODE (type) != ENUMERAL_TYPE)
-msg = _("% of non-enumeration type %q#T");
+msg = G_("% of non-enumeration type %q#T");
   else if (!COMPLETE_TYPE_P (type))
-msg = _("% of incomplete type %qT");
+msg = G_("% of incomplete type %qT");
   else if (OPAQUE_ENUM_P (type))
-msg = _("% of %qT before its enum-specifier");
+msg = G_("% of %qT before its enum-specifier");
   if (msg)
 {
   location_t loc = make_location (start, start, end);

base-commit: 04d6c74564b7eb51660a00b35353aeab706b5a50
-- 
2.39.3



[PATCH] c: [PR104822] Don't warn about converting NULL to different sso endian

2023-10-19 Thread Andrew Pinski
In a similar way we don't warn about NULL pointer constant conversion to
a different named address we should not warn to a different sso endian
either.
This adds the simple check.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/104822

gcc/c/ChangeLog:

* c-typeck.cc (convert_for_assignment): Check for null pointer
before warning about an incompatible scalar storage order.

gcc/testsuite/ChangeLog:

* gcc.dg/sso-18.c: New test.
---
 gcc/c/c-typeck.cc |  1 +
 gcc/testsuite/gcc.dg/sso-18.c | 16 
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/sso-18.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6e044b4afbc..f39dc71d593 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -7449,6 +7449,7 @@ convert_for_assignment (location_t location, location_t 
expr_loc, tree type,
 
   /* See if the pointers point to incompatible scalar storage orders.  */
   if (warn_scalar_storage_order
+ && !null_pointer_constant_p (rhs)
  && (AGGREGATE_TYPE_P (ttl) && TYPE_REVERSE_STORAGE_ORDER (ttl))
 != (AGGREGATE_TYPE_P (ttr) && TYPE_REVERSE_STORAGE_ORDER (ttr)))
{
diff --git a/gcc/testsuite/gcc.dg/sso-18.c b/gcc/testsuite/gcc.dg/sso-18.c
new file mode 100644
index 000..799a0c858f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sso-18.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* PR c/104822 */
+
+#include 
+
+struct Sb {
+  int i;
+} __attribute__((scalar_storage_order("big-endian")));
+struct Sl {
+  int i;
+} __attribute__((scalar_storage_order("little-endian")));
+
+/* Neither of these should warn about incompatible scalar storage order
+   as NULL pointers are compatiable with both endian. */
+struct Sb *pb = NULL; /* { dg-bogus "" } */
+struct Sl *pl = NULL; /* { dg-bogus "" } */
-- 
2.39.3



[PATCH] c: [PR100532] Fix ICE when an agrgument was an error mark

2023-10-19 Thread Andrew Pinski
In the case of convert_argument, we would return the same expression
back rather than error_mark_node after the error message about
trying to convert to an incomplete type. This causes issues in
the gimplfier trying to see if another conversion is needed.

The code here dates back to before the revision history too so
it might be the case it never noticed we should return an error_mark_node.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR c/100532

gcc/c/ChangeLog:

* c-typeck.cc (convert_argument): After erroring out
about an incomplete type return error_mark_node.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100532-1.c: New test.
---
 gcc/c/c-typeck.cc | 2 +-
 gcc/testsuite/gcc.dg/pr100532-1.c | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr100532-1.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6e044b4afbc..8f8562936dc 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3367,7 +3367,7 @@ convert_argument (location_t ploc, tree function, tree 
fundecl,
 {
   error_at (ploc, "type of formal parameter %d is incomplete",
parmnum + 1);
-  return val;
+  return error_mark_node;
 }
 
   /* Optionally warn about conversions that differ from the default
diff --git a/gcc/testsuite/gcc.dg/pr100532-1.c 
b/gcc/testsuite/gcc.dg/pr100532-1.c
new file mode 100644
index 000..81e37c60415
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100532-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* PR c/100532 */
+
+typedef __SIZE_TYPE__ size_t;
+void *memcpy(void[], const void *, size_t); /* { dg-error "declaration of type 
name" } */
+void c(void) { memcpy(c, "a", 2); } /* { dg-error "type of formal parameter" } 
*/
+
-- 
2.34.1



Re: [pushed] c++: use G_ instead of _

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 11:31:58AM -0400, Jason Merrill wrote:
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -3607,8 +3607,8 @@ identify_goto (tree decl, location_t loc, const 
> location_t *locus,
>  {
>bool complained
>  = emit_diagnostic (diag_kind, loc, 0,
> -decl ? N_("jump to label %qD")
> -: N_("jump to case label"), decl);
> +decl ? G_("jump to label %qD")

N_ for this is wrong because gettext will then not properly verify
translators didn't screw things up by using some incompatible format
string in the translation.
I believe some translations e.g. changed %s to %S.  And that seems to be
still the case:
grep -B3 '[^%]%S' po/*.po
po/sr.po-#, fuzzy, gcc-internal-format
po/sr.po-#| msgid "Duplicate %s attribute specified at %L"
po/sr.po-msgid "Multiple %qs modifiers specified at %C"
po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L"
--
po/sr.po-#, fuzzy, gcc-internal-format, gfc-internal-format
po/sr.po-#| msgid "Duplicate %s attribute specified at %L"
po/sr.po-msgid "Duplicate %s attribute specified at %L"
po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L"
--
po/sr.po-#, fuzzy, gcc-internal-format, gfc-internal-format
po/sr.po-#| msgid "Duplicate %s attribute specified at %L"
po/sr.po-msgid "Duplicate BIND attribute specified at %L"
po/sr.po:msgstr "Удвостручени атрибут %S наведен код %L"
--
po/tr.po-#, fuzzy, gcc-internal-format, gfc-internal-format
po/tr.po-#| msgid "%s statement must appear in a MODULE"
po/tr.po-msgid "%s statement must appear in a MODULE"
po/tr.po:msgstr "%S deyimi bir MODULE'de görünmemeli"

> +: G_("jump to case label"), decl);

While in this case G_ is better just for consistency, N_ would work
exactly the same given that there are no format strings.

Jakub



Re: [PATCH] c: [PR104822] Don't warn about converting NULL to different sso endian

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 08:37:31AM -0700, Andrew Pinski wrote:
> In a similar way we don't warn about NULL pointer constant conversion to
> a different named address we should not warn to a different sso endian
> either.
> This adds the simple check.
> 
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Looks good, but please put "[PR104822]" at the end of the subject line.
 
>   PR c/104822
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (convert_for_assignment): Check for null pointer
>   before warning about an incompatible scalar storage order.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/sso-18.c: New test.
> ---
>  gcc/c/c-typeck.cc |  1 +
>  gcc/testsuite/gcc.dg/sso-18.c | 16 
>  2 files changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/sso-18.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 6e044b4afbc..f39dc71d593 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -7449,6 +7449,7 @@ convert_for_assignment (location_t location, location_t 
> expr_loc, tree type,
>  
>/* See if the pointers point to incompatible scalar storage orders.  */
>if (warn_scalar_storage_order
> +   && !null_pointer_constant_p (rhs)
> && (AGGREGATE_TYPE_P (ttl) && TYPE_REVERSE_STORAGE_ORDER (ttl))
>!= (AGGREGATE_TYPE_P (ttr) && TYPE_REVERSE_STORAGE_ORDER (ttr)))
>   {
> diff --git a/gcc/testsuite/gcc.dg/sso-18.c b/gcc/testsuite/gcc.dg/sso-18.c
> new file mode 100644
> index 000..799a0c858f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sso-18.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* PR c/104822 */
> +
> +#include 
> +
> +struct Sb {
> +  int i;
> +} __attribute__((scalar_storage_order("big-endian")));
> +struct Sl {
> +  int i;
> +} __attribute__((scalar_storage_order("little-endian")));
> +
> +/* Neither of these should warn about incompatible scalar storage order
> +   as NULL pointers are compatiable with both endian. */
> +struct Sb *pb = NULL; /* { dg-bogus "" } */
> +struct Sl *pl = NULL; /* { dg-bogus "" } */

Maybe test nullptr as well?

Marek



[PATCH RFA] diagnostic: rename new permerror overloads

2023-10-19 Thread Jason Merrill
OK for trunk?

-- 8< --

While checking another change, I noticed that the new permerror overloads
break gettext with "permerror used incompatibly as both
 --keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
 --keyword=permerror:3 --flag=permerror:3:gcc-internal-format".  So let's
change the name.

gcc/ChangeLog:

* diagnostic-core.h (permerror): Rename new overloads...
(permerror_opt): To this.
* diagnostic.cc: Likewise.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Adjust.
---
 gcc/diagnostic-core.h | 7 ---
 gcc/cp/typeck2.cc | 6 +++---
 gcc/diagnostic.cc | 4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h
index 2d9909f18bd..04eba3d140e 100644
--- a/gcc/diagnostic-core.h
+++ b/gcc/diagnostic-core.h
@@ -105,9 +105,10 @@ extern bool pedwarn (rich_location *, int, const char *, 
...)
 extern bool permerror (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern bool permerror (rich_location *, const char *,
   ...) ATTRIBUTE_GCC_DIAG(2,3);
-extern bool permerror (location_t, int, const char *, ...) 
ATTRIBUTE_GCC_DIAG(3,4);
-extern bool permerror (rich_location *, int, const char *,
-  ...) ATTRIBUTE_GCC_DIAG(3,4);
+extern bool permerror_opt (location_t, int, const char *, ...)
+  ATTRIBUTE_GCC_DIAG(3,4);
+extern bool permerror_opt (rich_location *, int, const char *,
+  ...) ATTRIBUTE_GCC_DIAG(3,4);
 extern void sorry (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
 extern void sorry_at (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void inform (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index ab819d4e49d..8e63fb1dc0e 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -1109,9 +1109,9 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
   else if (complain & tf_error)
{
  int savederrorcount = errorcount;
- permerror (loc, OPT_Wnarrowing,
-"narrowing conversion of %qE from %qH to %qI",
-init, ftype, type);
+ permerror_opt (loc, OPT_Wnarrowing,
+"narrowing conversion of %qE from %qH to %qI",
+init, ftype, type);
  if (errorcount == savederrorcount)
ok = true;
}
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index ecf1097bd2c..0f392358aef 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -2027,7 +2027,7 @@ permerror (rich_location *richloc, const char *gmsgid, 
...)
diagnostic can also be downgraded by -Wno-error=opt.  */
 
 bool
-permerror (location_t location, int opt, const char *gmsgid, ...)
+permerror_opt (location_t location, int opt, const char *gmsgid, ...)
 {
   auto_diagnostic_group d;
   va_list ap;
@@ -2041,7 +2041,7 @@ permerror (location_t location, int opt, const char 
*gmsgid, ...)
 /* Same as "permerror" above, but at RICHLOC.  */
 
 bool
-permerror (rich_location *richloc, int opt, const char *gmsgid, ...)
+permerror_opt (rich_location *richloc, int opt, const char *gmsgid, ...)
 {
   gcc_assert (richloc);
 

base-commit: f53de2baae5a6992d93d58951c4c0a25ee678091
-- 
2.39.3



Re: [PATCH RFA] diagnostic: rename new permerror overloads

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 11:45:01AM -0400, Jason Merrill wrote:
> OK for trunk?
> 
> -- 8< --
> 
> While checking another change, I noticed that the new permerror overloads
> break gettext with "permerror used incompatibly as both
>  --keyword=permerror:2 --flag=permerror:2:gcc-internal-format and
>  --keyword=permerror:3 --flag=permerror:3:gcc-internal-format".  So let's
> change the name.
> 
> gcc/ChangeLog:
> 
>   * diagnostic-core.h (permerror): Rename new overloads...
>   (permerror_opt): To this.
>   * diagnostic.cc: Likewise.
> 
> gcc/cp/ChangeLog:
> 
>   * typeck2.cc (check_narrowing): Adjust.

Ok.

Jakub



Re: [PATCH] c: [PR100532] Fix ICE when an agrgument was an error mark

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 03:38:57PM +, Andrew Pinski wrote:
> In the case of convert_argument, we would return the same expression
> back rather than error_mark_node after the error message about
> trying to convert to an incomplete type. This causes issues in
> the gimplfier trying to see if another conversion is needed.
> 
> The code here dates back to before the revision history too so
> it might be the case it never noticed we should return an error_mark_node.
> 
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Looks OK but please move [PR100532] to the end of the subject.

>   PR c/100532
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (convert_argument): After erroring out
>   about an incomplete type return error_mark_node.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr100532-1.c: New test.
> ---
>  gcc/c/c-typeck.cc | 2 +-
>  gcc/testsuite/gcc.dg/pr100532-1.c | 7 +++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr100532-1.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 6e044b4afbc..8f8562936dc 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -3367,7 +3367,7 @@ convert_argument (location_t ploc, tree function, tree 
> fundecl,
>  {
>error_at (ploc, "type of formal parameter %d is incomplete",
>   parmnum + 1);
> -  return val;
> +  return error_mark_node;
>  }
>  
>/* Optionally warn about conversions that differ from the default
> diff --git a/gcc/testsuite/gcc.dg/pr100532-1.c 
> b/gcc/testsuite/gcc.dg/pr100532-1.c
> new file mode 100644
> index 000..81e37c60415
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr100532-1.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* PR c/100532 */
> +
> +typedef __SIZE_TYPE__ size_t;
> +void *memcpy(void[], const void *, size_t); /* { dg-error "declaration of 
> type name" } */
> +void c(void) { memcpy(c, "a", 2); } /* { dg-error "type of formal parameter" 
> } */
> +

Extra newline.

Marek



[patch,libgcc,contrib]: Add some auto-generated files deps to gcc_update.

2023-10-19 Thread Georg-Johann Lay

This patch adds two deps to gcc_update files_and_dependencies for
two auto-generated headers from avr libgcc.

Ok for master?

Johann

--

Add dependencies for some auto-generated files from avr-libgcc.

/
* contrib/gcc_update (files_and_dependencies): Add dependencies for:
libgcc/config/avr/libf7/f7-renames.h,
libgcc/config/avr/libf7/f7-wraps.h.


diff --git a/contrib/gcc_update b/contrib/gcc_update
index cda2bdb0df9..f9f9aed743e 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -183,6 +183,8 @@ libphobos/configure: libphobos/configure.ac 
libphobos/aclocal.m4

 libphobos/src/Makefile.in: libphobos/src/Makefile.am libphobos/aclocal.m4
 libphobos/testsuite/Makefile.in: libphobos/testsuite/Makefile.am 
libphobos/aclocal.m4
 libstdc++-v3/include/bits/version.h: 
libstdc++-v3/include/bits/version.def libstdc++-v3/include/bits/version.tpl
+libgcc/config/avr/libf7/f7-renames.h: 
libgcc/config/avr/libf7/f7renames.sh libgcc/config/avr/libf7/libf7-common.mk
+libgcc/config/avr/libf7/f7-wraps.h: libgcc/config/avr/libf7/f7wraps.sh 
libgcc/config/avr/libf7/libf7-common.mk libgcc/config/avr/libf7/t-libf7-math

 # Top level
 Makefile.in: Makefile.tpl Makefile.def
 configure: configure.ac config/acx.m4


Re: [1/3] Add support for target_version attribute

2023-10-19 Thread Andrew Carlotti
On Thu, Oct 19, 2023 at 07:04:09AM +, Richard Biener wrote:
> On Wed, 18 Oct 2023, Andrew Carlotti wrote:
> 
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in the aarch64 backend.
> > 
> > Note that C++ is currently the only frontend which supports
> > multiversioning using the "target" attribute, whereas the
> > "target_clones" attribute is additionally supported in C, D and Ada.
> > Support for the target_version attribute will be extended to C at a
> > later date.
> > 
> > Targets that currently use the "target" attribute for function
> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> > 
> > 
> > I could have implemented the target hooks slightly differently, by reusing 
> > the
> > valid_attribute_p hook and adding attribute name checks to each backend
> > implementation (c.f. the aarch64 implementation in patch 2/3).  Would this 
> > be
> > preferable?
> > 
> > Otherwise, is this ok for master?
> 
> This lacks user-level documentation in doc/extend.texi (where
> target_clones is documented).

Good point.  I'll add documentation updates as a separate patch in the series
(rather than documenting the state after this patch, in which the attribute is
supported on zero targets).  I think the existing documentation for target and
target_clones needs some improvement as well.

> Was there any discussion/description of why target_clones cannot
> be made work for aarch64?
> 
> Richard.

The second patch in this series does include support for target_clones on
aarch64.  However, the support in that patch is not fully compliant with our
ACLE specification.  I also have some unresolved questions about the
correctness of current function multiversioning implementations using ifuncs
across translation units, which could affect how we want to implement it for
aarch64.

Andrew

> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c-attribs.cc (handle_target_version_attribute): New.
> > (c_common_attribute_table): Add target_version.
> > (handle_target_clones_attribute): Add conflict with
> > target_version attribute.
> > 
> > gcc/ChangeLog:
> > 
> > * attribs.cc (is_function_default_version): Update comment to
> > specify incompatibility with target_version attributes.
> > * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> > Call valid_version_attribute_p for target_version attributes.
> > * target.def (valid_version_attribute_p): New hook.
> > (expanded_clones_attribute): New hook.
> > * doc/tm.texi.in: Add new hooks.
> > * doc/tm.texi: Regenerate.
> > * multiple_target.cc (create_dispatcher_calls): Remove redundant
> > is_function_default_version check.
> > (expand_target_clones): Use target hook for attribute name.
> > * targhooks.cc (default_target_option_valid_version_attribute_p):
> > New.
> > * targhooks.h (default_target_option_valid_version_attribute_p):
> > New.
> > * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> > target_version attributes.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl2.cc (check_classfn): Update comment to include
> > target_version attributes.
> > 
> > 
> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> > index 
> > b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
> >  100644
> > --- a/gcc/attribs.cc
> > +++ b/gcc/attribs.cc
> > @@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
> >return func_decl;  
> >  }
> >  
> > -/* Returns true if decl is multi-versioned and DECL is the default 
> > function,
> > -   that is it is not tagged with target specific optimization.  */
> > +/* Returns true if DECL is multi-versioned using the target attribute, and 
> > this
> > +   is the default version.  This function can only be used for targets 
> > that do
> > +   not support the "target_version" attribute.  */
> >  
> >  bool
> >  is_function_default_version (const tree decl)
> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > index 
> > 072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
> >  100644
> > --- a/gcc/c-family/c-attribs.cc
> > +++ b/gcc/c-family/c-attribs.cc
> > @@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, tree, 
> > tree, int, bool *);
> >  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
> > +static tree handle_target_version_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> >  static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > @@ -480

Re: [PATCH] ABOUT-GCC-NLS: add usage guidance

2023-10-19 Thread Jason Merrill

On 10/19/23 11:21, Jakub Jelinek wrote:

On Thu, Oct 19, 2023 at 11:11:30AM -0400, Jason Merrill wrote:

A recent question led me to look at this file again, and it occurred to me that
it could use to offer more guidance.  OK for trunk?

-- 8< --

gcc/ChangeLog:

* ABOUT-GCC-NLS: Add usage guidance.
---
  gcc/ABOUT-GCC-NLS | 13 +
  1 file changed, 13 insertions(+)

diff --git a/gcc/ABOUT-GCC-NLS b/gcc/ABOUT-GCC-NLS
index e90a67144e3..4c8b94d0811 100644
--- a/gcc/ABOUT-GCC-NLS
+++ b/gcc/ABOUT-GCC-NLS
@@ -23,6 +23,19 @@ For example, GCC source code should not contain calls like 
`error
  ("unterminated comment")' instead, as it is the `error' function's
  responsibility to translate the message before the user sees it.
  
+In general, use no markup for strings that are the immediate format string

+argument of a diagnostic function.  Use G_("str") for strings that will be
+used as the format string for a diagnostic but are e.g. assigned to a
+variable first.  Use N_("str") for other strings, particularly in a
+statically allocated array, that will be translated later by
+e.g. _(msgs[idx]).  Use _("str") for strings that will not be translated


The difference between G_ and N_ is whether they are GCC internal diagnostic
format strings or printf family format strings (gcc-internal-format vs.
c-format in po/gcc.pot), not anything else, so G_ can be used in statically
allocated array as well and both for diagnostic routines and printf* family
when translation is desirable it is eventually translated through _() or
other gettext invocations, either inside of diagnostics.cc or elsewhere.
So, the note about statically allocated array should move to G_ as well, and
N_ should be described as similarly, but for strings which after translation
are passed to printf family functions or so.  ANd the translated later by
e.g. note should be again for both and said that it is done automatically
for GCC diagnostic routines.


How about this?

 In general, use no markup for strings that are the immediate format string
 argument of a diagnostic function.  Use G_("str") for strings that will be
 used as the format string for a diagnostic but are e.g. assigned to a
 variable first.  Use N_("str") for strings that are not diagnostic format
 strings, but will still be translated later.  Use _("str") for strings 
that

 will not be translated elsewhere.  It's important not to use _("str") in
 the initializer of a statically allocated variable; use one of the others
 instead and make sure that uses of that variable translate the string,
 whether directly with _(msg) or by passing it to a diagnostic or other
 function that performs the translation.

Jason



Re: [PATCH] ABOUT-GCC-NLS: add usage guidance

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 12:13:55PM -0400, Jason Merrill wrote:
> How about this?
> 
>  In general, use no markup for strings that are the immediate format string
>  argument of a diagnostic function.  Use G_("str") for strings that will be
>  used as the format string for a diagnostic but are e.g. assigned to a
>  variable first.  Use N_("str") for strings that are not diagnostic format
>  strings, but will still be translated later.  Use _("str") for strings that
>  will not be translated elsewhere.  It's important not to use _("str") in
>  the initializer of a statically allocated variable; use one of the others
>  instead and make sure that uses of that variable translate the string,
>  whether directly with _(msg) or by passing it to a diagnostic or other
>  function that performs the translation.

LGTM.

Jakub



Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Jason Merrill

On 10/19/23 10:14, Marek Polacek wrote:

On Thu, Oct 19, 2023 at 10:06:01AM -0400, Jason Merrill wrote:

On 10/19/23 09:39, Patrick Palka wrote:

On Tue, 17 Oct 2023, Marek Polacek wrote:


On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:

On 10/16/23 20:39, Marek Polacek wrote:

On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:

On 10/13/23 14:53, Marek Polacek wrote:

On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:

On 10/12/23 17:04, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.

I've added some debug prints to make sure that the rest of cp_fold_r
is still performed as before.

 PR c++/111660

gcc/cp/ChangeLog:

 * cp-gimplify.cc (cp_fold_immediate_r) : Return
 integer_zero_node instead of break;.
 (cp_fold_immediate): Return true if cp_fold_immediate_r returned
 error_mark_node.

gcc/testsuite/ChangeLog:

 * g++.dg/cpp0x/hog1.C: New test.
---
  gcc/cp/cp-gimplify.cc |  9 ++--
  gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 +++
  2 files changed, 82 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index bdf6e5f98ff..ca622ca169a 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
break;
if (TREE_OPERAND (stmt, 1)
  && cp_walk_tree (&TREE_OPERAND (stmt, 1), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
if (TREE_OPERAND (stmt, 2)
  && cp_walk_tree (&TREE_OPERAND (stmt, 2), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
/* We're done here.  Don't clear *walk_subtrees here though: we're 
called
 from cp_fold_r and we must let it recurse on the expression with
 cp_fold.  */
-  break;
+  return integer_zero_node;


I'm concerned this will end up missing something like

1 ? 1 : ((1 ? 1 : 1), immediate())

as the integer_zero_node from the inner ?: will prevent walk_tree from
looking any farther.


You are right.  The line above works as expected, but

  1 ? 1 : ((1 ? 1 : id (42)), id (i));

shows the problem (when the expression isn't used as an initializer).


Maybe we want to handle COND_EXPR in cp_fold_r instead of here?


I hope this version is better.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.


Is this number still accurate for this version?


It is.  I ran time(1) a few more times and the results were 0m0.033s - 0m0.035s.
That said, ...


This change seems algorithmically better than the current code, but still
problematic: if we have nested COND_EXPR A/B/C/D/E, it looks like we will
end up cp_fold_immediate_r walking the arms of E five times, once for each
COND_EXPR.


...this is accurate.  I should have addressed the redundant folding in v2
even though the compilation is pretty much immediate.

What I was thinking by handling COND_EXPR in cp_fold_r was to cp_fold_r walk
its subtrees (or cp_fold_immediate_r if it's clear from op0 that the branch
isn't taken) so we can clear *walk_subtrees and we don't fold_imm walk a
node more than once.


I agree I should do better here.  How's this, then?  I've added
debug_generic_expr to cp_fold_immediate_r to see if it gets the same
expr multiple times and it doesn't seem to.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking

Re: [PING] [PATCH] Harmonize headers between both dg-extract-results scripts

2023-10-19 Thread Jeff Law




On 10/18/23 03:35, Thomas Schwinge wrote:


Is this (case variants) maybe something that has changed in DejaGnu at
some point in time?  (I have not checked.)

No idea :-)



I suggest that we adapt all remaining upper-case instances in GCC,
similar to your change.  And/or, as applicable, recognize both variants
(or ignore case distinctions generally)?
Yea, we should try to get this commonized.  Probably wise to recognize 
both variants as well -- especially if there are instances of these 
strings which aren't under GCC's contorl.





Given Paul's (and colleagues'?) ongoing work on GCC (Kalray KVX back end,
complex numbers support), is it maybe now time to enable Git write access
for him (them?)?

, "write after approval".

Sure.  I'd sponsor them.

jeff


Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 12:32:49PM -0400, Jason Merrill wrote:
> On 10/19/23 10:14, Marek Polacek wrote:
> > On Thu, Oct 19, 2023 at 10:06:01AM -0400, Jason Merrill wrote:
> > > On 10/19/23 09:39, Patrick Palka wrote:
> > > > On Tue, 17 Oct 2023, Marek Polacek wrote:
> > > > 
> > > > > On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:
> > > > > > On 10/16/23 20:39, Marek Polacek wrote:
> > > > > > > On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:
> > > > > > > > On 10/13/23 14:53, Marek Polacek wrote:
> > > > > > > > > On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:
> > > > > > > > > > On 10/12/23 17:04, Marek Polacek wrote:
> > > > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for 
> > > > > > > > > > > trunk?
> > > > > > > > > > > 
> > > > > > > > > > > -- >8 --
> > > > > > > > > > > My recent patch introducing cp_fold_immediate_r caused 
> > > > > > > > > > > exponential
> > > > > > > > > > > compile time with nested COND_EXPRs.  The problem is that 
> > > > > > > > > > > the COND_EXPR
> > > > > > > > > > > case recursively walks the arms of a COND_EXPR, but after 
> > > > > > > > > > > processing
> > > > > > > > > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > > > > > > > > sub-expressions of the outermost COND_EXPR, triggering 
> > > > > > > > > > > again walking
> > > > > > > > > > > the arms of the nested COND_EXPR, and so on.  This patch 
> > > > > > > > > > > brings the
> > > > > > > > > > > compile time down to about 0m0.033s.
> > > > > > > > > > > 
> > > > > > > > > > > I've added some debug prints to make sure that the rest 
> > > > > > > > > > > of cp_fold_r
> > > > > > > > > > > is still performed as before.
> > > > > > > > > > > 
> > > > > > > > > > >  PR c++/111660
> > > > > > > > > > > 
> > > > > > > > > > > gcc/cp/ChangeLog:
> > > > > > > > > > > 
> > > > > > > > > > >  * cp-gimplify.cc (cp_fold_immediate_r)  > > > > > > > > > > COND_EXPR>: Return
> > > > > > > > > > >  integer_zero_node instead of break;.
> > > > > > > > > > >  (cp_fold_immediate): Return true if 
> > > > > > > > > > > cp_fold_immediate_r returned
> > > > > > > > > > >  error_mark_node.
> > > > > > > > > > > 
> > > > > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > > > > > 
> > > > > > > > > > >  * g++.dg/cpp0x/hog1.C: New test.
> > > > > > > > > > > ---
> > > > > > > > > > >   gcc/cp/cp-gimplify.cc |  9 ++--
> > > > > > > > > > >   gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 
> > > > > > > > > > > +++
> > > > > > > > > > >   2 files changed, 82 insertions(+), 4 deletions(-)
> > > > > > > > > > >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > > > > > index bdf6e5f98ff..ca622ca169a 100644
> > > > > > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > > > > > @@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree 
> > > > > > > > > > > *stmt_p, int *walk_subtrees, void *data_)
> > > > > > > > > > >   break;
> > > > > > > > > > > if (TREE_OPERAND (stmt, 1)
> > > > > > > > > > > && cp_walk_tree (&TREE_OPERAND (stmt, 1), 
> > > > > > > > > > > cp_fold_immediate_r, data,
> > > > > > > > > > > -nullptr))
> > > > > > > > > > > +nullptr) == error_mark_node)
> > > > > > > > > > >   return error_mark_node;
> > > > > > > > > > > if (TREE_OPERAND (stmt, 2)
> > > > > > > > > > > && cp_walk_tree (&TREE_OPERAND (stmt, 2), 
> > > > > > > > > > > cp_fold_immediate_r, data,
> > > > > > > > > > > -nullptr))
> > > > > > > > > > > +nullptr) == error_mark_node)
> > > > > > > > > > >   return error_mark_node;
> > > > > > > > > > > /* We're done here.  Don't clear 
> > > > > > > > > > > *walk_subtrees here though: we're called
> > > > > > > > > > >from cp_fold_r and we must let it recurse on 
> > > > > > > > > > > the expression with
> > > > > > > > > > >cp_fold.  */
> > > > > > > > > > > -  break;
> > > > > > > > > > > +  return integer_zero_node;
> > > > > > > > > > 
> > > > > > > > > > I'm concerned this will end up missing something like
> > > > > > > > > > 
> > > > > > > > > > 1 ? 1 : ((1 ? 1 : 1), immediate())
> > > > > > > > > > 
> > > > > > > > > > as the integer_zero_node from the inner ?: will prevent 
> > > > > > > > > > walk_tree from
> > > > > > > > > > looking any farther.
> > > > > > > > > 
> > > > > > > > > You are right.  The line above works as expected, but
> > > > > > > > > 
> > > > > > > > >   1 ? 1 : ((1 ? 1 : id (42)), id (i));
> > > > > > > > > 
> > > > > > > > > shows the problem (when the expression isn't used as an 
> > > > > > > > > initializer)

Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-19 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle  wrote:
>
>
> This patch contains clean-ups of the widening multiplication patterns in
> i386.md, and provides variants of the existing highpart multiplication
> peephole2 transformations (that tidy up register allocation after
> reload), and thereby fixes PR target/110511, which is a superfluous
> move instruction.
>
> For the new test case, compiled on x86_64 with -O2.
>
> Before:
> mulx64: movabsq $-7046029254386353131, %rcx
> movq%rcx, %rax
> mulq%rdi
> xorq%rdx, %rax
> ret
>
> After:
> mulx64: movabsq $-7046029254386353131, %rax
> mulq%rdi
> xorq%rdx, %rax
> ret
>
> The clean-ups are (i) that operand 1 is consistently made register_operand
> and operand 2 becomes nonimmediate_operand, so that predicates match the
> constraints, (ii) the representation of the BMI2 mulx instruction is
> updated to use the new umul_highpart RTX, and (iii) because operands
> 0 and 1 have different modes in widening multiplications, "a" is a more
> appropriate constraint than "0" (which avoids spills/reloads containing
> SUBREGs).  The new peephole2 transformations are based upon those at
> around line 9951 of i386.md, that begins with the comment
> ;; Highpart multiplication peephole2s to tweak register allocation.
> ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq %rdi
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-10-17  Roger Sayle  
>
> gcc/ChangeLog
> PR target/110511
> * config/i386/i386.md (mul3): Make operands 1 and
> 2 take "regiser_operand" and "nonimmediate_operand" respectively.
> (mulqihi3): Likewise.
> (*bmi2_umul3_1): Operand 2 needs to be register_operand
> matching the %d constraint.  Use umul_highpart RTX to represent
> the highpart multiplication.
> (*umul3_1):  Operand 2 should use regiser_operand
> predicate, and "a" rather than "0" as operands 0 and 2 have
> different modes.
> (define_split): For mul to mulx conversion, use the new
> umul_highpart RTX representation.
> (*mul3_1):  Operand 1 should be register_operand
> and the constraint %a as operands 0 and 1 have different modes.
> (*mulqihi3_1): Operand 1 should be register_operand matching
> the constraint %0.
> (define_peephole2): Providing widening multiplication variants
> of the peephole2s that tweak highpart multiplication register
> allocation.
>
> gcc/testsuite/ChangeLog
> PR target/110511
> * gcc.target/i386/pr110511.c: New test case.
>

 (define_insn "*bmi2_umul3_1"
   [(set (match_operand:DWIH 0 "register_operand" "=r")
 (mult:DWIH
-  (match_operand:DWIH 2 "nonimmediate_operand" "%d")
+  (match_operand:DWIH 2 "register_operand" "%d")
   (match_operand:DWIH 3 "nonimmediate_operand" "rm")))
(set (match_operand:DWIH 1 "register_operand" "=r")

This will fail. Because of %, both predicates must allow
nonimmediate_operand, since RA can swap operand constraints due to %.

@@ -9747,7 +9743,7 @@
   [(set (match_operand: 0 "register_operand" "=r,A")
 (mult:
   (zero_extend:
-(match_operand:DWIH 1 "nonimmediate_operand" "%d,0"))
+(match_operand:DWIH 1 "register_operand" "%d,a"))
   (zero_extend:
 (match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"
(clobber (reg:CC FLAGS_REG))]

The same here, although changing "0" to "a" is correct, but the
oversight is benign. "A" reads as "ad", and the first alternative
already takes "d".

+;; Widening multiplication peephole2s to tweak register allocation.
+;; mov imm,%rdx; mov %rdi,%rax; mulq %rdx  ->  mov imm,%rax; mulq %rdi

Maybe instead of peephole2s, we allow general_operands in the
instruction (both inputs) and rely on the RA to fulfill the
constraint? Would this work?

Uros.


  1   2   >