[PATCH] RISC-V: Add norelax function attribute

2024-11-07 Thread shiyulong
From: yulong 

This patch adds norelax function attribute that be discussed in riscv-c-api-doc 
PR#94.
URL:https://github.com/riscv-non-isa/riscv-c-api-doc/pull/94

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_declare_function_name): Add new 
attribute.

---
 gcc/config/riscv/riscv.cc | 18 +---
 .../gcc.target/riscv/target-attr-norelax.c| 21 +++
 2 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/target-attr-norelax.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2e9ac280c8f2..42525ff6faa3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -654,6 +654,10 @@ static const attribute_spec riscv_gnu_attributes[] =
  types.  */
   {"riscv_rvv_vector_bits", 1, 1, false, true, false, true,
riscv_handle_rvv_vector_bits_attribute, NULL},
+  /* This attribute is used to declare a function, forcing it to use the
+standard vector calling convention variant. Syntax:
+__attribute__((norelax)). */
+  {"norelax", 0, 0, true, false, false, false, NULL, NULL},
 };
 
 static const scoped_attribute_specs riscv_gnu_attribute_table  =
@@ -10051,10 +10055,17 @@ riscv_declare_function_name (FILE *stream, const char 
*name, tree fndecl)
   riscv_asm_output_variant_cc (stream, fndecl, name);
   ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
   ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl);
-  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
+  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
+  || lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
 {
   fprintf (stream, "\t.option push\n");
-
+  if (lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
+   {
+ fprintf (stream, "\t.option norelax\n");
+   }
+}
+  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
+{
   struct cl_target_option *local_cl_target =
TREE_TARGET_OPTION (DECL_FUNCTION_SPECIFIC_TARGET (fndecl));
   struct cl_target_option *global_cl_target =
@@ -10078,7 +10089,8 @@ riscv_declare_function_size (FILE *stream, const char 
*name, tree fndecl)
   if (!flag_inhibit_size_directive)
 ASM_OUTPUT_MEASURED_SIZE (stream, name);
 
-  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
+  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
+  || lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
 {
   fprintf (stream, "\t.option pop\n");
 }
diff --git a/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c 
b/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
new file mode 100644
index ..77de6195ad1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc" { target { rv64 } } } */
+
+__attribute__((norelax))
+void foo1()
+{}
+
+void foo2(void)
+{}
+
+int main()
+{
+  foo1();
+  foo2();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times ".option push\t" 1 } } */
+/* { dg-final { scan-assembler-times ".option norelax\t" 1 } } */
+/* { dg-final { scan-assembler-times ".option pop\t" 1 } } */
-- 
2.34.1



[pushed] Darwin: Fix a narrowing warning.

2024-11-07 Thread Iain Sandoe
Tested on x86_64-darwin, pushed to trunk, thanks
Iain

--- 8< ---

cdtor_record needs to have an unsigned entry for the position in order to
match with vec_safe_length.

gcc/ChangeLog:

* config/darwin.cc (cdtor_record): Make position unsigned.

Signed-off-by: Iain Sandoe 
---
 gcc/config/darwin.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
index ae821e32012..4e495fce82b 100644
--- a/gcc/config/darwin.cc
+++ b/gcc/config/darwin.cc
@@ -90,7 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 typedef struct GTY(()) cdtor_record {
   rtx symbol;
   int priority;/* [con/de]structor priority */
-  int position;/* original position */
+  unsigned position;   /* original position */
 } cdtor_record;
 
 static GTY(()) vec *ctors = NULL;
-- 
2.39.2 (Apple Git-143)



[PATCH v2] arm: Don't ICE on arm_mve.h pragma without MVE types [PR117408]

2024-11-07 Thread Torbjörn SVENSSON
Changes since v1:

- Updated the error message to mention that arm_mve_types.h needs to be
  included.
- Corrected some spelling errors in commit message.

As the warning for pure functions returning void is not related to this
patch, I'll leave it for you Christophe to look into. :)

Ok for trunk and releases/gcc-14?

--

Starting with r14-435-g00d97bf3b5a, doing `#pragma arm "arm_mve.h"
false` or `#pragma arm "arm_mve.h" true` without first doing
`#pragma arm "arm_mve_types.h"` causes GCC to ICE.

gcc/ChangeLog:

  PR target/117408
  * config/arm/arm-mve-builtins.cc(handle_arm_mve_h): Detect if MVE
  types are missing and if so, return error.

gcc/testsuite/ChangeLog:

  PR target/117408
  * gcc.target/arm/mve/pr117408-1.c: New test.
  * gcc.target/arm/mve/pr117408-2.c: Likewise.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/config/arm/arm-mve-builtins.cc| 7 +++
 gcc/testsuite/gcc.target/arm/mve/pr117408-1.c | 7 +++
 gcc/testsuite/gcc.target/arm/mve/pr117408-2.c | 7 +++
 3 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr117408-2.c

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index af1908691b6..ed3d6000641 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -535,6 +535,13 @@ handle_arm_mve_h (bool preserve_user_namespace)
   return;
 }
 
+  if (!handle_arm_mve_types_p)
+{
+  error ("this definition requires MVE types, please include %qs",
+"arm_mve_types.h");
+  return;
+}
+
   /* Define MVE functions.  */
   function_table = new hash_table (1023);
   function_builder builder;
diff --git a/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c 
b/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
new file mode 100644
index 000..25eaf67e297
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+/* It doesn't really matter if this produces errors missing types,
+  but it mustn't trigger an ICE.  */
+#pragma GCC arm "arm_mve.h" false /* { dg-error "this definition requires MVE 
types, please include 'arm_mve_types.h'" } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c 
b/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
new file mode 100644
index 000..c3a0af25f77
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+/* It doesn't really matter if this produces errors missing types,
+  but it mustn't trigger an ICE.  */
+#pragma GCC arm "arm_mve.h" true /* { dg-error "this definition requires MVE 
types, please include 'arm_mve_types.h'" } */
-- 
2.25.1



Re: [PATCH 04/10] gimple: Disallow sizeless types in BIT_FIELD_REFs.

2024-11-07 Thread Richard Biener
On Thu, Nov 7, 2024 at 11:13 AM Tejas Belagod  wrote:
>
> On 11/7/24 2:36 PM, Richard Biener wrote:
> > On Thu, Nov 7, 2024 at 8:25 AM Tejas Belagod  wrote:
> >>
> >> On 11/6/24 6:02 PM, Richard Biener wrote:
> >>> On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod  
> >>> wrote:
> 
>  Ensure sizeless types don't end up trying to be canonicalised to 
>  BIT_FIELD_REFs.
> >>>
> >>> You mean variable-sized?  But don't we know, when there's a constant
> >>> array index,
> >>> that the size is at least so this indexing is OK?  So what's wrong with a
> >>> fixed position, fixed size BIT_FIELD_REF extraction of a VLA object?
> >>>
> >>> Richard.
> >>>
> >>
> >> Ah! The code and comment/description don't match, sorry. This change
> >> started out as gating out all canonicalizations of VLA vectors when I
> >> had limited understanding of how this worked, but eventually was
> >> simplified to gate in only those offsets that were known_le, but missed
> >> out fixing the comment/description. So, for eg.
> >>
> >> int foo (svint32_t v) { return v[3]; }
> >>
> >> canonicalises to a BIT_FIELD_REF 
> >>
> >> but something like:
> >>
> >> int foo (svint32_t v) { return v[4]; }
> >
> > So this is possibly out-of-bounds?
> >
> >> reduces to a VEC_EXTRACT <>
> >
> > But if out-of-bounds a VEC_EXTRACT isn't any better than a BIT_FIELD_REF, 
> > no?
>
> Someone may have code protecting accesses like so:
>
>   /* svcntw () returns num of 32-bit elements in a vec */
>   if (svcntw () >= 8)
> return v[4];
>
> So I didn't error or warn (-Warray-bounds) for this or for that matter
> make it UB as it will be spurious. So technically, it may not be OOB access.
>
> Therefore BIT_FIELD_REFs are generated for anything within the range of
> a Adv SIMD register and anything beyond is left to be vec_extracted with
> SVE instructions.

You still didn't state the technical reason why BIT_FIELD_REF is worse than
.VEC_EXTRACT (which is introduced quite late only btw).

I'm mostly questioning that we have two different canonicalizations that oddly
depend on the constant index.  I'd rather always go .VEC_EXTRACT or
always BIT_FIELD_REF (prefer that one) instead of having a mix for VLA vectors.

Richard.

>
> Thanks,
> Tejas.
>
>
> >
> >> I'll fix the comment/description.
> >>
> >> Thanks,
> >> Tejas.
> >>
>  gcc/ChangeLog:
> 
>    * gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Disallow 
>  sizeless
>    types in BIT_FIELD_REFs.
>  ---
> gcc/gimple-fold.cc | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
>  diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>  index c19dac0dbfd..dd45d9f7348 100644
>  --- a/gcc/gimple-fold.cc
>  +++ b/gcc/gimple-fold.cc
>  @@ -6281,6 +6281,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool 
>  is_debug = false)
>   && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 
>  0), 0
> {
>   tree vtype = TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 0));
>  +  /* BIT_FIELD_REF can only happen on constant-size vectors.  */
>   if (VECTOR_TYPE_P (vtype))
>    {
>  tree low = array_ref_low_bound (*t);
>  @@ -6294,7 +6295,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool 
>  is_debug = false)
> (TYPE_SIZE (TREE_TYPE (*t;
>  widest_int ext
>    = wi::add (idx, wi::to_widest (TYPE_SIZE 
>  (TREE_TYPE (*t;
>  - if (wi::les_p (ext, wi::to_widest (TYPE_SIZE (vtype
>  + if (known_le (ext, wi::to_poly_widest (TYPE_SIZE 
>  (vtype
>    {
>  *t = build3_loc (EXPR_LOCATION (*t), 
>  BIT_FIELD_REF,
>   TREE_TYPE (*t),
>  --
>  2.25.1
> 
> >>
>


Re: [PATCH] RISC-V: Add norelax function attribute

2024-11-07 Thread Yangyu Chen
Thanks for doing this!

> On Nov 8, 2024, at 00:19, shiyul...@iscas.ac.cn wrote:
> 
> From: yulong 
> 
> This patch adds norelax function attribute that be discussed in 
> riscv-c-api-doc PR#94.
> URL:https://github.com/riscv-non-isa/riscv-c-api-doc/pull/94
> 
> gcc/ChangeLog:
> 
>* config/riscv/riscv.cc (riscv_declare_function_name): Add new 
> attribute.
> 
> ---
> gcc/config/riscv/riscv.cc | 18 +---
> .../gcc.target/riscv/target-attr-norelax.c| 21 +++
> 2 files changed, 36 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
> 
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 2e9ac280c8f2..42525ff6faa3 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -654,6 +654,10 @@ static const attribute_spec riscv_gnu_attributes[] =
>  types.  */
>   {"riscv_rvv_vector_bits", 1, 1, false, true, false, true,
>riscv_handle_rvv_vector_bits_attribute, NULL},
> +  /* This attribute is used to declare a function, forcing it to use the
> +standard vector calling convention variant. Syntax:
> +__attribute__((norelax)). */
> +  {"norelax", 0, 0, true, false, false, false, NULL, NULL},
> };
> 
> static const scoped_attribute_specs riscv_gnu_attribute_table  =
> @@ -10051,10 +10055,17 @@ riscv_declare_function_name (FILE *stream, const 
> char *name, tree fndecl)
>   riscv_asm_output_variant_cc (stream, fndecl, name);
>   ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
>   ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl);
> -  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
> +  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
> +  || lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
> {
>   fprintf (stream, "\t.option push\n");
> -
> +  if (lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
> + {
> +  fprintf (stream, "\t.option norelax\n");
> + }
> +}
> +  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
> +{

It's better to include the above 2 lines in the first block.

So the whole block `if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))`
will be in the true block of `if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
|| lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))`.

Don't forget to adjust the indentation.

Otherwise, LGTM.

>   struct cl_target_option *local_cl_target =
> TREE_TARGET_OPTION (DECL_FUNCTION_SPECIFIC_TARGET (fndecl));
>   struct cl_target_option *global_cl_target =
> @@ -10078,7 +10089,8 @@ riscv_declare_function_size (FILE *stream, const char 
> *name, tree fndecl)
>   if (!flag_inhibit_size_directive)
> ASM_OUTPUT_MEASURED_SIZE (stream, name);
> 
> -  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
> +  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
> +  || lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
> {
>   fprintf (stream, "\t.option pop\n");
> }
> diff --git a/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c 
> b/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
> new file mode 100644
> index ..77de6195ad1e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc" { target { rv32 } } } */
> +/* { dg-options "-march=rv64gc" { target { rv64 } } } */
> +
> +__attribute__((norelax))
> +void foo1()
> +{}
> +
> +void foo2(void)
> +{}
> +
> +int main()
> +{
> +  foo1();
> +  foo2();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times ".option push\t" 1 } } */
> +/* { dg-final { scan-assembler-times ".option norelax\t" 1 } } */
> +/* { dg-final { scan-assembler-times ".option pop\t" 1 } } */
> -- 
> 2.34.1



Re: [PATCH][RFC][PR117093] match.pd: Fold vec_perm with view_convert

2024-11-07 Thread Richard Biener
On Tue, 5 Nov 2024, Jennifer Schmitz wrote:

> We are working on a patch to improve the codegen for the following test case:
> uint64x2_t foo (uint64x2_t r) {
> uint32x4_t a = vreinterpretq_u32_u64 (r);
> uint32_t t;
> t = a[0]; a[0] = a[1]; a[1] = t;
> t = a[2]; a[2] = a[3]; a[3] = t;
> return vreinterpretq_u64_u32 (a);
> }
> that GCC currently compiles to (-O1):
> foo:
> mov v31.16b, v0.16b
> ins v0.s[0], v0.s[1]
> ins v0.s[1], v31.s[0]
> ins v0.s[2], v31.s[3]
> ins v0.s[3], v31.s[2]
> ret
> whereas LLVM produces the preferable sequence
> foo:
>   rev64   v0.4s, v0.4s
> ret
> 
> On gimple level, we currently have:
>   _1 = VIEW_CONVERT_EXPR(r_3(D));
>   t_4 = BIT_FIELD_REF ;
>   a_5 = VEC_PERM_EXPR <_1, _1, { 1, 1, 2, 3 }>;
>   a_6 = BIT_INSERT_EXPR ;
>   t_7 = BIT_FIELD_REF ;
>   _2 = BIT_FIELD_REF ;
>   a_8 = BIT_INSERT_EXPR ;
>   a_9 = BIT_INSERT_EXPR ;
>   _10 = VIEW_CONVERT_EXPR(a_9);
>   return _10;
> 
> whereas the desired sequence is:
>   _1 = VIEW_CONVERT_EXPR(r_2(D));
>   a_3 = VEC_PERM_EXPR <_1, _1, { 1, 0, 3, 2 }>;
>   _4 = VIEW_CONVERT_EXPR(a_3);
>   return _4;
> 
> If we remove the casts from the test case, the forwprop1 dump shows that
> a series of match.pd is applied (repeatedly, only showing the first
> iteration here):
> Applying pattern match.pd:10881, gimple-match-1.cc:25213
> Applying pattern match.pd:11099, gimple-match-1.cc:25714
> Applying pattern match.pd:9549, gimple-match-1.cc:24274
> gimple_simplified to a_7 = VEC_PERM_EXPR ;
> 
> The reason why these patterns cannot be applied with casts seems to be
> the failing types_match (@0, @1) in the following pattern:
> /* Simplify vector inserts of other vector extracts to a permute.  */
> (simplify
>  (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
>  (if (VECTOR_TYPE_P (type)
>   && (VECTOR_MODE_P (TYPE_MODE (type))
>   || optimize_vectors_before_lowering_p ())
>   && types_match (@0, @1)
>   && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
>   && TYPE_VECTOR_SUBPARTS (type).is_constant ()
>   && multiple_p (wi::to_poly_offset (@rpos),
>  wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (type)
>   (with
>{
>  [...]
>}
>(if (!VECTOR_MODE_P (TYPE_MODE (type))
> || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
> false))
> (vec_perm @0 @1 { vec_perm_indices_to_tree
> (build_vector_type (ssizetype, nunits), sel); })
> 
> The types_match fails, because the following pattern has already removed the
> view_convert expression, thereby changing the type of @0:
> (simplify
>  (BIT_FIELD_REF (view_convert @0) @1 @2)
>   [...]
>   (BIT_FIELD_REF @0 @1 @2)))
> 
> One attempt to make the types_match true was to add a single_use flag to
> the view_convert expression in the pattern above, preventing it from
> being applied.
> While this actually fixed the test case and produced the intended
> instruction sequence, it caused another test to fail that relies on 
> application
> of the pattern with multiple use of the view_convert expression
> (gcc.target/i386/vect-strided-3.c).
> 
> Hence, the RFC: How can we make the types_match work with view_convert
> expressions in the arguments?

You could remove the types_match (@0, @1) with

diff --git a/gcc/match.pd b/gcc/match.pd
index 00988241348..820a589b577 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9539,7 +9539,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (VECTOR_TYPE_P (type)
   && (VECTOR_MODE_P (TYPE_MODE (type))
  || optimize_vectors_before_lowering_p ())
-  && types_match (@0, @1)
+  && operand_equal_p (TYPE_SIZE (TREE_TYPE (@0)),
+ TYPE_SIZE (TREE_TYPE (@1)), 0)
   && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
   && TYPE_VECTOR_SUBPARTS (type).is_constant ()
   && multiple_p (wi::to_poly_offset (@rpos),
@@ -9547,7 +9548,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (with
{
  unsigned HOST_WIDE_INT elsz
-   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
+   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@0;
  poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
  poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
  unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
@@ -9559,7 +9560,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
}
(if (!VECTOR_MODE_P (TYPE_MODE (type))
|| can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
false))
-(vec_perm @0 @1 { vec_perm_indices_to_tree
+(vec_perm @0 (view_convert @1) { vec_perm_indices_to_tree
 (build_vector_type (ssizetype, nunits), sel); 
})
 
 (if (canonicalize_math_after_vectorization_p ())

or alternatively avoid the BIT_FIELD_REF (view_convert @) transform
iff the original ref type-wise matches a vector ele

Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-07 Thread Richard Earnshaw (lists)
On 06/11/2024 19:50, Torbjorn SVENSSON wrote:
> 
> 
> On 2024-11-06 19:06, Richard Earnshaw (lists) wrote:
>> On 06/11/2024 13:50, Torbjorn SVENSSON wrote:
>>>
>>>
>>> On 2024-11-06 14:04, Richard Earnshaw (lists) wrote:
 On 06/11/2024 12:23, Torbjorn SVENSSON wrote:
>
>
> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:
>> On 06/11/2024 07:44, Christophe Lyon wrote:
>>> On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON
>>>  wrote:

 While the regression was reported on GCC15, I'm sure that same
 regression will be seen on GCC14 when it's tested in the
 arm-linux-gnueabihf configuration.

 Ok for trunk and releases/gcc-14?

 -- 

 This fixes reported regression at
 https://linaro.atlassian.net/browse/GNU-1407.

 gcc/testsuite/ChangeLog:

    * gcc.target/arm/pr68620.c: Use effective-target arm_fp.

 Signed-off-by: Torbjörn SVENSSON 
 ---
     gcc/testsuite/gcc.target/arm/pr68620.c | 4 +++-
     1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/gcc/testsuite/gcc.target/arm/pr68620.c 
 b/gcc/testsuite/gcc.target/arm/pr68620.c
 index 6e38671752f..1ed84f4ac75 100644
 --- a/gcc/testsuite/gcc.target/arm/pr68620.c
 +++ b/gcc/testsuite/gcc.target/arm/pr68620.c
 @@ -1,8 +1,10 @@
     /* { dg-do compile } */
     /* { dg-skip-if "-mpure-code supports M-profile without Neon only" 
 { *-*-* } { "-mpure-code" } } */
     /* { dg-require-effective-target arm_arch_v7a_ok } */
 -/* { dg-options "-mfp16-format=ieee -mfpu=auto -mfloat-abi=softfp" } 
 */
 +/* { dg-require-effective-target arm_fp_ok } */
 +/* { dg-options "-mfp16-format=ieee -mfpu=auto" } */
     /* { dg-add-options arm_arch_v7a } */
 +/* { dg-add-options arm_fp } */

>>>
>>> So... this partially reverts your previous patch (bringing back
>>> arm_fp). What is the problem now?
>>>
>>
>> Yeah, that sounds wrong.  arm_fp_ok tries to find options to add to the 
>> basic testsuite options, but it can't be combined with arm_arch_v7a as 
>> that picks a totally different set of flags for the architecture.
>
> The problem is that for arm-linux-gnueabihf, we cannot use 
> -mfloat-abi=softfp as there is no multilib available for that ABI, or at 
> least that's my interpretation of below error message.
>
> This is the output from the CI run:
>
> Executing on host: 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/bin/armv8l-unknown-linux-gnueabihf-gcc
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c
>     -fdiagnostics-plain-output   -mfp16-format=ieee -mfpu=auto 
> -mfloat-abi=softfp -mcpu=unset -march=armv7-a+fp -S -o pr68620.s (timeout 
> = 600)
> spawn -ignore SIGHUP 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/bin/armv8l-unknown-linux-gnueabihf-gcc
>  
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c
>  -fdiagnostics-plain-output -mfp16-format=ieee -mfpu=auto 
> -mfloat-abi=softfp -mcpu=unset -march=armv7-a+fp -S -o pr68620.s
> In file included from /usr/include/features.h:510,
>    from 
> /usr/include/arm-linux-gnueabihf/bits/libc-header-start.h:33,
>    from /usr/include/stdint.h:26,
>    from 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/stdint.h:11,
>    from 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_fp16.h:34,
>    from 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_neon.h:41,
>    from 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c:7:
> /usr/include/arm-linux-gnueabihf/gnu/stubs.h:7:11: fatal error: 
> gnu/stubs-soft.h: No such file or directory
> compilation terminated.
> compiler exited with status 1
> output is:
> In file included from /usr/include/features.h:510,
>    from 
> /usr/include/arm-linux-gnueabihf/bits/libc-header-start.h:33,
>    from /usr/include/stdint.h:26,
>    from 
> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-g

Re: [PATCH] rtl-optimization/117467 - 33% compile-time in rest of compilation

2024-11-07 Thread Jeff Law




On 11/7/24 2:15 AM, Richard Biener wrote:

ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time.
The following adds a timevar to it for proper blaming.

Bootstrap running on x86_64-unknown-linux-gnu.

PR rtl-optimization/117467
* timevar.def (TV_EXT_DCE): New.
* ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
Definitely not OK on multiple levels.  33% of compile time is absurd. 
Clearly mine.  Thanks for taking care of the timevar.


jeff



[PATCH] testsuite: arm: Allow vst1.32 instruction in pr40457-2.c

2024-11-07 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

--

When building the test case with neon, the 'vst1.32' instruction is used
instead of 'strd'. Allow both variants to make the test pass.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed
instruction.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/pr40457-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr40457-2.c 
b/gcc/testsuite/gcc.target/arm/pr40457-2.c
index 31624d35127..5f742a3029a 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-2.c
@@ -7,4 +7,4 @@ void foo(int* p)
   p[1] = 0;
 }
 
-/* { dg-final { scan-assembler "strd|stm" } } */
+/* { dg-final { scan-assembler "strd|stm|vst1\\.32" } } */
-- 
2.25.1



Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-11-07 Thread Richard Sandiford
"Robin Dapp"  writes:
>>> If the problem is tracking liveness, wouldn't it be better to
>>> iterate over the "then" block in reverse order?  We would start
>>> with the liveness set for the join block and update as we move
>>> backwards through the "then" block.  This liveness set would
>>> tell us whether the current instruction needs to preserve a
>>> particular register.  That should make it possible to do the
>>> transformation in one step, and so avoid the risk that the
>>> second attempt does something that is unexpectedly different
>>> from the first attempt.
>>
>> I agree that the current approach is rather cumbersome.  Indeed
>> the second attempt was conditional at first and I changed it to
>> be unconditional after some patch iterations.
>> Your reverse-order idea sounds like it should work.  To further
>> clean up the algorithm we could also make it more explicit
>> that a "cmov" depends on either the condition or the CC and
>> basically track two separate paths through the block, one CC
>> path and one "condition" path.
>
> I gave this another thought.  Right now we keep track of the
> generated targets and temporaries in forward order, using those
> for the source rewiring.  I don't see how we could do that in
> reverse order other than have another fixup iteration afterwards.
>
>>> FWIW, the reason for asking was that it seemed safer to pass
>>> use_cond_earliest back from noce_convert_multiple_sets_1
>>> to noce_convert_multiple_sets, as another parameter,
>>> and then do the adjustment around noce_convert_multiple_sets's
>>> call to targetm.noce_conversion_profitable_p.  That would avoid
>>> the new for a new if_info field, which in turn would make it
>>> less likely that stale information is carried over from one attempt
>>> to the next (e.g. if other ifcvt techniques end up using the same
>>> field in future).
>>
>> Would something like the attached v4 be OK that uses a parameter
>> instead (I mean without having refactored the full algorithm)?
>> At least I changed the comment before the second attempt to
>> hopefully cause a tiny bit less confusion :)
>> I haven't fully bootstrapped it yet.
>
> That v4 was bootstrapped and regtested on x86 and aarch64 in the meanwhile and
> it has been in our internal tree for a while without problems.
>
> Would it be OK for trunk without further refactoring?

I think it'd be better if I abstain from this.  I probably disagree too
much with the current structure and the way that the code is developing.
I won't object if anyone else approves it though.

Thanks,
Richard


[PATCH v4 5/8] aarch64: Add masked-load else operands.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

This adds zero else operands to masked loads and their intrinsics.
I needed to adjust more than initially thought because we rely on
combine for several instructions and a change in a "base" pattern
needs to propagate to all those.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc: Add else
handling.
* config/aarch64/aarch64-sve-builtins.cc 
(function_expander::use_contiguous_load_insn):
Ditto.
* config/aarch64/aarch64-sve-builtins.h: Add else operand to
contiguous load.
* config/aarch64/aarch64-sve.md (@aarch64_load
_):
Split and add else operand.
(@aarch64_load_):
Ditto.

(*aarch64_load__mov):
Ditto.
* config/aarch64/aarch64-sve2.md: Ditto.
* config/aarch64/iterators.md: Remove unused iterators.
* config/aarch64/predicates.md (aarch64_maskload_else_operand):
Add zero else operand.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 24 +
 gcc/config/aarch64/aarch64-sve-builtins.cc| 12 -
 gcc/config/aarch64/aarch64-sve-builtins.h |  2 +-
 gcc/config/aarch64/aarch64-sve.md | 52 ---
 gcc/config/aarch64/aarch64-sve2.md|  3 +-
 gcc/config/aarch64/iterators.md   |  4 --
 gcc/config/aarch64/predicates.md  |  4 ++
 7 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 1c9f515a52c..e70aedd2917 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1523,11 +1523,12 @@ public:
 gimple_seq stmts = NULL;
 tree pred = f.convert_pred (stmts, vectype, 0);
 tree base = f.fold_contiguous_base (stmts, vectype);
+tree els = build_zero_cst (vectype);
 gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT);
 
 tree cookie = f.load_store_cookie (TREE_TYPE (vectype));
-gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD, 3,
- base, cookie, pred);
+gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD, 4,
+ base, cookie, pred, els);
 gimple_call_set_lhs (new_call, f.lhs);
 return new_call;
   }
@@ -1541,7 +1542,7 @@ public:
 e.vector_mode (0), e.gp_mode (0));
 else
   icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0));
-return e.use_contiguous_load_insn (icode);
+return e.use_contiguous_load_insn (icode, true);
   }
 };
 
@@ -1554,10 +1555,10 @@ public:
   rtx
   expand (function_expander &e) const override
   {
-insn_code icode = code_for_aarch64_load (UNSPEC_LD1_SVE, extend_rtx_code 
(),
+insn_code icode = code_for_aarch64_load (extend_rtx_code (),
 e.vector_mode (0),
 e.memory_vector_mode ());
-return e.use_contiguous_load_insn (icode);
+return e.use_contiguous_load_insn (icode, true);
   }
 };
 
@@ -1576,6 +1577,8 @@ public:
 e.prepare_gather_address_operands (1);
 /* Put the predicate last, as required by mask_gather_load_optab.  */
 e.rotate_inputs_left (0, 5);
+/* Add the else operand.  */
+e.args.quick_push (CONST0_RTX (e.vector_mode (0)));
 machine_mode mem_mode = e.memory_vector_mode ();
 machine_mode int_mode = aarch64_sve_int_mode (mem_mode);
 insn_code icode = convert_optab_handler (mask_gather_load_optab,
@@ -1599,6 +1602,8 @@ public:
 e.rotate_inputs_left (0, 5);
 /* Add a constant predicate for the extension rtx.  */
 e.args.quick_push (CONSTM1_RTX (VNx16BImode));
+/* Add the else operand.  */
+e.args.quick_push (CONST0_RTX (e.vector_mode (1)));
 insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (),
e.vector_mode (0),
e.memory_vector_mode ());
@@ -1741,6 +1746,7 @@ public:
 /* Get the predicate and base pointer.  */
 gimple_seq stmts = NULL;
 tree pred = f.convert_pred (stmts, vectype, 0);
+tree els = build_zero_cst (vectype);
 tree base = f.fold_contiguous_base (stmts, vectype);
 gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT);
 
@@ -1759,8 +1765,8 @@ public:
 
 /* Emit the load itself.  */
 tree cookie = f.load_store_cookie (TREE_TYPE (vectype));
-gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3,
- base, cookie, pred);
+gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 4,
+ base, cookie, pred, els);
 gimple_call_set_lhs (new_call, lhs_array);
 gsi_insert_after (f.gsi, new_call, GSI_SAME_STMT);
 
@@ -1773,7 +1779,7 @@ public:
  

[PATCH v4 0/8] Add maskload else operand.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

Hi,

changes from v3:

- Check if we support vec_cond_expr for the selected mode in case we
  need to set the inactive elements to zero.
- Add another undef operand to gcn.
- Remove unnecessary changes in i386 patch.

Robin Dapp (8):
  docs: Document maskload else operand and behavior.
  ifn: Add else-operand handling.
  tree-ifcvt: Add zero maskload else value.
  vect: Add maskload else value support.
  aarch64: Add masked-load else operands.
  gcn: Add else operand to masked loads.
  i386: Add zero maskload else operand.
  RISC-V: Add else operand to masked loads [PR115336].

 .../aarch64/aarch64-sve-builtins-base.cc  |  24 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc|  12 +-
 gcc/config/aarch64/aarch64-sve-builtins.h |   2 +-
 gcc/config/aarch64/aarch64-sve.md |  52 ++-
 gcc/config/aarch64/aarch64-sve2.md|   3 +-
 gcc/config/aarch64/iterators.md   |   4 -
 gcc/config/aarch64/predicates.md  |   4 +
 gcc/config/gcn/gcn-valu.md|  23 +-
 gcc/config/gcn/predicates.md  |   2 +
 gcc/config/i386/sse.md|  21 +-
 gcc/config/riscv/autovec.md   |  50 +--
 gcc/config/riscv/predicates.md|   3 +
 gcc/config/riscv/riscv-v.cc   |  30 +-
 gcc/doc/md.texi   |  63 ++--
 gcc/internal-fn.cc| 148 ++--
 gcc/internal-fn.h |  13 +-
 gcc/optabs-query.cc   |  70 +++-
 gcc/optabs-query.h|   3 +-
 gcc/optabs-tree.cc|  66 +++-
 gcc/optabs-tree.h |   8 +-
 .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 ++
 .../gcc.target/riscv/rvv/autovec/pr116059.c   |  15 +
 gcc/tree-if-conv.cc   |  12 +-
 gcc/tree-vect-data-refs.cc|  74 ++--
 gcc/tree-vect-patterns.cc |  12 +-
 gcc/tree-vect-slp.cc  |  25 +-
 gcc/tree-vect-stmts.cc| 326 +++---
 gcc/tree-vectorizer.h |  10 +-
 28 files changed, 854 insertions(+), 241 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c

-- 
2.47.0



[PATCH v4 1/8] docs: Document maskload else operand and behavior.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

This patch amends the documentation for masked loads (maskload,
vec_mask_load_lanes, and mask_gather_load as well as their len
counterparts) with an else operand.

gcc/ChangeLog:

* doc/md.texi: Document masked load else operand.
---
 gcc/doc/md.texi | 63 -
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6d9c8643739..38d839ac4c9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5014,8 +5014,10 @@ This pattern is not allowed to @code{FAIL}.
 @item @samp{vec_mask_load_lanes@var{m}@var{n}}
 Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional
 mask operand (operand 2) that specifies which elements of the destination
-vectors should be loaded.  Other elements of the destination
-vectors are set to zero.  The operation is equivalent to:
+vectors should be loaded.  Other elements of the destination vectors are
+taken from operand 3, which is an else operand similar to the one in
+@code{maskload}.
+The operation is equivalent to:
 
 @smallexample
 int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
@@ -5025,7 +5027,7 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
   operand0[i][j] = operand1[j * c + i];
   else
 for (i = 0; i < c; i++)
-  operand0[i][j] = 0;
+  operand0[i][j] = operand3[j];
 @end smallexample
 
 This pattern is not allowed to @code{FAIL}.
@@ -5033,16 +5035,20 @@ This pattern is not allowed to @code{FAIL}.
 @cindex @code{vec_mask_len_load_lanes@var{m}@var{n}} instruction pattern
 @item @samp{vec_mask_len_load_lanes@var{m}@var{n}}
 Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional
-mask operand (operand 2), length operand (operand 3) as well as bias operand 
(operand 4)
-that specifies which elements of the destination vectors should be loaded.
-Other elements of the destination vectors are undefined.  The operation is 
equivalent to:
+mask operand (operand 2), length operand (operand 4) as well as bias operand
+(operand 5) that specifies which elements of the destination vectors should be
+loaded.  Other elements of the destination vectors are taken from operand 3,
+which is an else operand similar to the one in @code{maskload}.
+The operation is equivalent to:
 
 @smallexample
 int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
-for (j = 0; j < operand3 + operand4; j++)
-  if (operand2[j])
-for (i = 0; i < c; i++)
+for (j = 0; j < operand4 + operand5; j++)
+  for (i = 0; i < c; i++)
+if (operand2[j])
   operand0[i][j] = operand1[j * c + i];
+else
+  operand0[i][j] = operand3[j];
 @end smallexample
 
 This pattern is not allowed to @code{FAIL}.
@@ -5122,18 +5128,25 @@ address width.
 @cindex @code{mask_gather_load@var{m}@var{n}} instruction pattern
 @item @samp{mask_gather_load@var{m}@var{n}}
 Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand as
-operand 5.  Bit @var{i} of the mask is set if element @var{i}
+operand 5.
+Other elements of the destination vectors are taken from operand 6,
+which is an else operand similar to the one in @code{maskload}.
+Bit @var{i} of the mask is set if element @var{i}
 of the result should be loaded from memory and clear if element @var{i}
-of the result should be set to zero.
+of the result should be set to operand 6.
 
 @cindex @code{mask_len_gather_load@var{m}@var{n}} instruction pattern
 @item @samp{mask_len_gather_load@var{m}@var{n}}
-Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand 
(operand 5),
-a len operand (operand 6) as well as a bias operand (operand 7).  Similar to 
mask_len_load,
-the instruction loads at most (operand 6 + operand 7) elements from memory.
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand
+(operand 5) and an else operand (operand 6) as well as a len operand
+(operand 7) and a bias operand (operand 8).
+
+Similar to mask_len_load the instruction loads at
+most (operand 7 + operand 8) elements from memory.
 Bit @var{i} of the mask is set if element @var{i} of the result should
-be loaded from memory and clear if element @var{i} of the result should be 
undefined.
-Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+be loaded from memory and clear if element @var{i} of the result should
+be set to element @var{i} of operand 6.
+Mask elements @var{i} with @var{i} > (operand 7 + operand 8) are ignored.
 
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
@@ -5365,8 +5378,13 @@ Operands 4 and 5 have a target-dependent scalar integer 
mode.
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
-into register operand 0.  Mask is provided in register operand 2 of
-mode @var{n}.
+into register operand 0.  The mask is provided in register operand 2 of
+mode @var{n}.  Operand 3 (

RE: [PATCH 5/5] Allow multiple vectorized epilogs via --param vect-epilogues-nomask=N

2024-11-07 Thread Richard Biener
On Thu, 7 Nov 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, November 6, 2024 2:32 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: RISC-V CI ; Tamar Christina
> > ; Richard Sandiford 
> > Subject: [PATCH 5/5] Allow multiple vectorized epilogs via --param 
> > vect-epilogues-
> > nomask=N
> > 
> > The following is a prototype allowing N possible vector epilogues.
> > In the end I'd like the target to tell us a set of (or no) vector modes
> > to consider for the epilogue of the main or the current epilog analyzed loop
> > in a way similar as to how we communicate back suggested_unroll_factor.
> > 
> > The main motivation is SPEC CPU 2017 525.x264_r which when doing
> > AVX512 vectorization ends up with using the scalar epilogue in
> > a hot function because the AVX2 epilogue has a too high VF.  Using
> > two vector epilogues mitigates this and also avoids regressing in
> > 527.cam4_r which has a loop iteration count exactly matching the
> > AVX2 epilogue (one of the original ideas was to always use a SSE2
> > vector epilogue, even with a AVX512 main loop).
> > 
> > It turns out that two vector epilogues even create smaller code
> > in some cases since we tend to fully unroll epilogues with less
> > than 16 iterations.  So a simple (int x[])
> > 
> >   for (int i = 0; i < n; ++i)
> > x[i] *= 3;
> > 
> > has a -O3 -march=znver4 code size
> > 
> > N vector epilogues   size
> > 0615
> > 1429
> > 2388
> > 3392
> > 
> > I'm unsure how important/effective multiple vector epilogues are
> > for non-x86 ISAs who all seem to have only a single vector size
> > or VLA vectors.  For better target control on x86 I'd like to
> > tell the vectorizer the array of modes to consider for the
> > epilogue of the current loop plus a flag whether to consider
> > using partial vectors (x86 does not have that encoded into the mode).
> > So I'd add m_epilog_vec_modes[] and m_epilog_vec_mode_partial,
> > since currently x86 doesn't do cost compares the latter can be a
> > flag and we'd try that first when set, together with (only?) the
> > first mode?  Alternatively only hint a single mode, but this won't
> > ever scale to cost compare targets?
> > 
> > So using --param vect-epilogues-nomask=N is mainly for this RFC,
> > not sure if it has to prevail.
> > 
> > Note I didn't manage to get aarch64 to use more than one epilogue,
> > not even with -msve-vector-bits=512.
> > 
> 
> My guess is it's probably due to partial SVE vector type support not
> being as robust as full vector.  And once you say all vectors are 512bits
> to use a smaller one it needs support for partial vectors.
> 
> I think this change would be useful for AArch64 as well, but I (personally)
> think the most useful mode for us is to be able to generate different
> kinds of epilogues.
> 
> With that I mean, having an unpredicated SVE main loop,
> unpredicated Adv. SIMD first epilogue and predicated SVE second epilogue.
> 
> For that I think this change is a good step forward :)
> 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
> > built SPEC CPU 2017 with --param vect-epilogues-nomask=2 - as
> > said, I want the target to have more control, even on x86 we
> > probably only want two epilogues when doing 512bit vectorization
> > for the main loop and possibly depend on its VF.
> 
> Agreed, for AArch64 we'd definitely like this as the cases we'd generate more
> than one epilogue would have a large overlap with ones where we unrolled.

OK.  I'll for now push the prerequesites (1-4/5), after fixing a
compile issue in 3/5 caused by splitting the series.  I'll then post
a RFC for the target control and the x86 implementation, for now
skipping the --param change.  It's then also easier to iterate on
the interface between the vectorizer and the target without breaking
the user interaction - on the x86 side we'd want to control defaults
based on -mtune= with manual control via the x86 -mtune-ctrl=, I
do not expect much heuristics on the x86 side for now.

Thanks for looking,
Richard.

> Cheers,
> Tamar
> 
> > 
> > Any comments sofar?
> > 
> > Thanks,
> > Richard.
> > 
> > * doc/invoke.texi (vect-epilogues-nomask): Adjust.
> > * params.opt (vect-epilogues-nomask): Adjust max value and
> > documentation.
> > * tree-vect-loop.cc (vect_analyze_loop): Hack in multiple
> > vectorized epilogs.
> > ---
> >  gcc/doc/invoke.texi   |  3 ++-
> >  gcc/params.opt|  2 +-
> >  gcc/tree-vect-loop.cc | 23 +--
> >  3 files changed, 20 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index f2555ec83a1..73e54a47381 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -16870,7 +16870,8 @@ The maximum number of insns in loop header
> > duplicated
> >  by the copy loop headers pass.
> > 
> >  @item vect-epilogues-nomask
> > -Enable

[PATCH v4 4/8] vect: Add maskload else value support.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

This patch adds an else operand to vectorized masked load calls.
The current implementation adds else-value arguments to the respective
target-querying functions that is used to supply the vectorizer with the
proper else value.

We query the target for its supported else operand and uses that for the
maskload call.  If necessary, i.e. if the mode has padding bits and if
the else operand is nonzero, a VEC_COND enforcing a zero else value is
emitted.

gcc/ChangeLog:

* optabs-query.cc (supports_vec_convert_optab_p): Return icode.
(get_supported_else_val): Return supported else value for
optab's operand at index.
(supports_vec_gather_load_p): Add else argument.
(supports_vec_scatter_store_p): Ditto.
* optabs-query.h (supports_vec_gather_load_p): Ditto.
(get_supported_else_val): Ditto.
* optabs-tree.cc (target_supports_mask_load_store_p): Ditto.
(can_vec_mask_load_store_p): Ditto.
(target_supports_len_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* optabs-tree.h (target_supports_mask_load_store_p): Ditto.
(can_vec_mask_load_store_p): Ditto.
* tree-vect-data-refs.cc (vect_lanes_optab_supported_p): Ditto.
(vect_gather_scatter_fn_p): Ditto.
(vect_check_gather_scatter): Ditto.
(vect_load_lanes_supported): Ditto.
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern):
Ditto.
* tree-vect-slp.cc (vect_get_operand_map): Adjust indices for
else operand.
(vect_slp_analyze_node_operations): Skip undefined else operand.
* tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p):
Add else operand handling.
(vect_get_vec_defs_for_operand): Handle undefined else operand.
(check_load_store_for_partial_vectors): Add else argument.
(vect_truncate_gather_scatter_offset): Ditto.
(vect_use_strided_gather_scatters_p): Ditto.
(get_group_load_store_type): Ditto.
(get_load_store_type): Ditto.
(vect_get_mask_load_else): Ditto.
(vect_get_else_val_from_tree): Ditto.
(vect_build_one_gather_load_call): Add zero else operand.
(vectorizable_load): Use else operand.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Add else
argument.
(vect_load_lanes_supported): Ditto.
(vect_get_mask_load_else): Ditto.
(vect_get_else_val_from_tree): Ditto.

vect
---
 gcc/optabs-query.cc|  70 +---
 gcc/optabs-query.h |   3 +-
 gcc/optabs-tree.cc |  66 ++--
 gcc/optabs-tree.h  |   8 +-
 gcc/tree-vect-data-refs.cc |  74 ++---
 gcc/tree-vect-patterns.cc  |  12 +-
 gcc/tree-vect-slp.cc   |  25 ++-
 gcc/tree-vect-stmts.cc | 326 +++--
 gcc/tree-vectorizer.h  |  10 +-
 9 files changed, 468 insertions(+), 126 deletions(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index cc52bc0f5ea..c1f3558af92 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -29,6 +29,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl.h"
 #include "recog.h"
 #include "vec-perm-indices.h"
+#include "internal-fn.h"
+#include "memmodel.h"
+#include "optabs.h"
 
 struct target_optabs default_target_optabs;
 struct target_optabs *this_fn_optabs = &default_target_optabs;
@@ -672,34 +675,57 @@ lshift_cheap_p (bool speed_p)
that mode, given that the second mode is always an integer vector.
If MODE is VOIDmode, return true if OP supports any vector mode.  */
 
-static bool
-supports_vec_convert_optab_p (optab op, machine_mode mode)
+static enum insn_code
+supported_vec_convert_optab (optab op, machine_mode mode)
 {
   int start = mode == VOIDmode ? 0 : mode;
   int end = mode == VOIDmode ? MAX_MACHINE_MODE - 1 : mode;
+  enum insn_code icode = CODE_FOR_nothing;
   for (int i = start; i <= end; ++i)
 if (VECTOR_MODE_P ((machine_mode) i))
   for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j)
-   if (convert_optab_handler (op, (machine_mode) i,
-  (machine_mode) j) != CODE_FOR_nothing)
- return true;
+   {
+ if ((icode
+  = convert_optab_handler (op, (machine_mode) i,
+   (machine_mode) j)) != CODE_FOR_nothing)
+   return icode;
+   }
 
-  return false;
+  return icode;
 }
 
 /* If MODE is not VOIDmode, return true if vec_gather_load is available for
that mode.  If MODE is VOIDmode, return true if gather_load is available
-   for at least one vector mode.  */
+   for at least one vector mode.
+   In that case, and if ELSVALS is nonzero, store the supported else values
+   into the vector it points to.  */
 
 bool
-supports_vec_gather_load_p (machine_mode mode)
+supports_vec_gather_load_p (machine_mode mode, vec *elsvals)
 {
-  if (!this_fn_optabs->supports_vec_gather_load[mode])
-this_fn_opt

[PATCH v4 7/8] i386: Add zero maskload else operand.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

gcc/ChangeLog:

* config/i386/sse.md (maskload):
Call maskload..._1.
(maskload_1): Rename.
---
 gcc/config/i386/sse.md | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 22c6c817dd7..1523e2c4d75 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -28641,7 +28641,7 @@ (define_insn 
"_maskstore"
(set_attr "btver2_decode" "vector") 
(set_attr "mode" "")])
 
-(define_expand "maskload"
+(define_expand "maskload_1"
   [(set (match_operand:V48_128_256 0 "register_operand")
(unspec:V48_128_256
  [(match_operand: 2 "register_operand")
@@ -28649,13 +28649,28 @@ (define_expand "maskload"
  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
+(define_expand "maskload"
+  [(set (match_operand:V48_128_256 0 "register_operand")
+   (unspec:V48_128_256
+ [(match_operand: 2 "register_operand")
+  (match_operand:V48_128_256 1 "memory_operand")
+  (match_operand:V48_128_256 3 "const0_operand")]
+ UNSPEC_MASKMOV))]
+  "TARGET_AVX"
+{
+  emit_insn (gen_maskload_1 (operands[0],
+  operands[1],
+  operands[2]));
+  DONE;
+})
+
 (define_expand "maskload"
   [(set (match_operand:V48_AVX512VL 0 "register_operand")
(vec_merge:V48_AVX512VL
  (unspec:V48_AVX512VL
[(match_operand:V48_AVX512VL 1 "memory_operand")]
UNSPEC_MASKLOAD)
- (match_dup 0)
+  (match_operand:V48_AVX512VL 3 "const0_operand")
  (match_operand: 2 "register_operand")))]
   "TARGET_AVX512F")
 
@@ -28665,7 +28680,7 @@ (define_expand "maskload"
  (unspec:VI12HFBF_AVX512VL
[(match_operand:VI12HFBF_AVX512VL 1 "memory_operand")]
UNSPEC_MASKLOAD)
- (match_dup 0)
+  (match_operand:VI12HFBF_AVX512VL 3 "const0_operand")
  (match_operand: 2 "register_operand")))]
   "TARGET_AVX512BW")
 
-- 
2.47.0



[PATCH v4 8/8] RISC-V: Add else operand to masked loads [PR115336].

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

This patch adds else operands to masked loads.  Currently the default
else operand predicate just accepts "undefined" (i.e. SCRATCH) values.

PR middle-end/115336
PR middle-end/116059

gcc/ChangeLog:

* config/riscv/autovec.md: Add else operand.
* config/riscv/predicates.md (maskload_else_operand): New
predicate.
* config/riscv/riscv-v.cc (get_else_operand): Remove static.
(expand_load_store): Use get_else_operand and adjust index.
(expand_gather_scatter): Ditto.
(expand_lanes_load_store): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr115336.c: New test.
* gcc.target/riscv/rvv/autovec/pr116059.c: New test.
---
 gcc/config/riscv/autovec.md   | 50 +++
 gcc/config/riscv/predicates.md|  3 ++
 gcc/config/riscv/riscv-v.cc   | 30 +++
 .../gcc.target/riscv/rvv/autovec/pr115336.c   | 20 
 .../gcc.target/riscv/rvv/autovec/pr116059.c   | 15 ++
 5 files changed, 88 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1f1849d5237..26489e537c6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -26,8 +26,9 @@ (define_expand "mask_len_load"
   [(match_operand:V 0 "register_operand")
(match_operand:V 1 "memory_operand")
(match_operand: 2 "vector_mask_operand")
-   (match_operand 3 "autovec_length_operand")
-   (match_operand 4 "const_0_operand")]
+   (match_operand:V 3 "maskload_else_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")]
   "TARGET_VECTOR"
 {
   riscv_vector::expand_load_store (operands, true);
@@ -57,8 +58,9 @@ (define_expand 
"mask_len_gather_load"
(match_operand 3 "")
(match_operand 4 "")
(match_operand: 5 "vector_mask_operand")
-   (match_operand 6 "autovec_length_operand")
-   (match_operand 7 "const_0_operand")]
+   (match_operand 6 "maskload_else_operand")
+   (match_operand 7 "autovec_length_operand")
+   (match_operand 8 "const_0_operand")]
   "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
@@ -72,8 +74,9 @@ (define_expand 
"mask_len_gather_load"
(match_operand 3 "")
(match_operand 4 "")
(match_operand: 5 "vector_mask_operand")
-   (match_operand 6 "autovec_length_operand")
-   (match_operand 7 "const_0_operand")]
+   (match_operand 6 "maskload_else_operand")
+   (match_operand 7 "autovec_length_operand")
+   (match_operand 8 "const_0_operand")]
   "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
@@ -87,8 +90,9 @@ (define_expand 
"mask_len_gather_load"
(match_operand 3 "")
(match_operand 4 "")
(match_operand: 5 "vector_mask_operand")
-   (match_operand 6 "autovec_length_operand")
-   (match_operand 7 "const_0_operand")]
+   (match_operand 6 "maskload_else_operand")
+   (match_operand 7 "autovec_length_operand")
+   (match_operand 8 "const_0_operand")]
   "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
@@ -102,8 +106,9 @@ (define_expand 
"mask_len_gather_load"
(match_operand 3 "")
(match_operand 4 "")
(match_operand: 5 "vector_mask_operand")
-   (match_operand 6 "autovec_length_operand")
-   (match_operand 7 "const_0_operand")]
+   (match_operand 6 "maskload_else_operand")
+   (match_operand 7 "autovec_length_operand")
+   (match_operand 8 "const_0_operand")]
   "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
@@ -117,8 +122,9 @@ (define_expand 
"mask_len_gather_load"
(match_operand 3 "")
(match_operand 4 "")
(match_operand: 5 "vector_mask_operand")
-   (match_operand 6 "autovec_length_operand")
-   (match_operand 7 "const_0_operand")]
+   (match_operand 6 "maskload_else_operand")
+   (match_operand 7 "autovec_length_operand")
+   (match_operand 8 "const_0_operand")]
   "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
@@ -132,8 +138,9 @@ (define_expand 
"mask_len_gather_load"
(match_operand 3 "")
(match_operand 4 "")
(match_operand: 5 "vector_mask_operand")
-   (match_operand 6 "autovec_length_operand")
-   (match_operand 7 "const_0_operand")]
+   (match_operand 6 "maskload_else_operand")
+   (match_operand 7 "autovec_length_operand")
+   (match_operand 8 "const_0_operand")]
   "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p 
(mode)"
 {
   riscv_vector::expand_gather_scatter (operands, true);
@@ -151,8 +158,9 @@ (define_expan

[PATCH v4 2/8] ifn: Add else-operand handling.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

This patch adds else-operand handling to the internal functions.

gcc/ChangeLog:

* internal-fn.cc (add_mask_and_len_args): Rename...
(add_mask_else_and_len_args): ...to this and add else handling.
(expand_partial_load_optab_fn): Use adjusted function.
(expand_partial_store_optab_fn): Ditto.
(expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_fn_len_index): Add else handling.
(internal_fn_else_index): Ditto.
(internal_fn_mask_index): Ditto.
(get_supported_else_vals): New function.
(supported_else_val_p): New function.
(internal_gather_scatter_fn_supported_p): Add else operand.
* internal-fn.h (internal_gather_scatter_fn_supported_p): Define
else constants.
(MASK_LOAD_ELSE_ZERO): Ditto.
(MASK_LOAD_ELSE_M1): Ditto.
(MASK_LOAD_ELSE_UNDEFINED): Ditto.
(get_supported_else_vals): Declare.
(supported_else_val_p): Ditto.
---
 gcc/internal-fn.cc | 148 ++---
 gcc/internal-fn.h  |  13 +++-
 2 files changed, 139 insertions(+), 22 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 1b3fe7be047..6b4f344b40e 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -333,17 +333,18 @@ get_multi_vector_move (tree array_type, convert_optab 
optab)
   return convert_optab_handler (optab, imode, vmode);
 }
 
-/* Add mask and len arguments according to the STMT.  */
+/* Add mask, else, and len arguments according to the STMT.  */
 
 static unsigned int
-add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt)
+add_mask_else_and_len_args (expand_operand *ops, unsigned int opno, gcall 
*stmt)
 {
   internal_fn ifn = gimple_call_internal_fn (stmt);
   int len_index = internal_fn_len_index (ifn);
   /* BIAS is always consecutive next of LEN.  */
   int bias_index = len_index + 1;
   int mask_index = internal_fn_mask_index (ifn);
-  /* The order of arguments are always {len,bias,mask}.  */
+
+  /* The order of arguments is always {mask, else, len, bias}.  */
   if (mask_index >= 0)
 {
   tree mask = gimple_call_arg (stmt, mask_index);
@@ -365,6 +366,22 @@ add_mask_and_len_args (expand_operand *ops, unsigned int 
opno, gcall *stmt)
   create_input_operand (&ops[opno++], mask_rtx,
TYPE_MODE (TREE_TYPE (mask)));
 }
+
+  int els_index = internal_fn_else_index (ifn);
+  if (els_index >= 0)
+{
+  tree els = gimple_call_arg (stmt, els_index);
+  tree els_type = TREE_TYPE (els);
+  if (TREE_CODE (els) == SSA_NAME
+ && SSA_NAME_IS_DEFAULT_DEF (els)
+ && VAR_P (SSA_NAME_VAR (els)))
+   create_undefined_input_operand (&ops[opno++], TYPE_MODE (els_type));
+  else
+   {
+ rtx els_rtx = expand_normal (els);
+ create_input_operand (&ops[opno++], els_rtx, TYPE_MODE (els_type));
+   }
+}
   if (len_index >= 0)
 {
   tree len = gimple_call_arg (stmt, len_index);
@@ -3016,7 +3033,7 @@ static void
 expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   int i = 0;
-  class expand_operand ops[5];
+  class expand_operand ops[6];
   tree type, lhs, rhs, maskt;
   rtx mem, target;
   insn_code icode;
@@ -3046,7 +3063,7 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall 
*stmt, convert_optab optab)
   target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   create_call_lhs_operand (&ops[i++], target, TYPE_MODE (type));
   create_fixed_operand (&ops[i++], mem);
-  i = add_mask_and_len_args (ops, i, stmt);
+  i = add_mask_else_and_len_args (ops, i, stmt);
   expand_insn (icode, i, ops);
 
   assign_call_lhs (lhs, target, &ops[0]);
@@ -3092,7 +3109,7 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall 
*stmt, convert_optab optab
   reg = expand_normal (rhs);
   create_fixed_operand (&ops[i++], mem);
   create_input_operand (&ops[i++], reg, TYPE_MODE (type));
-  i = add_mask_and_len_args (ops, i, stmt);
+  i = add_mask_else_and_len_args (ops, i, stmt);
   expand_insn (icode, i, ops);
 }
 
@@ -3678,7 +3695,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset)));
   create_integer_operand (&ops[i++], scale_int);
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
-  i = add_mask_and_len_args (ops, i, stmt);
+  i = add_mask_else_and_len_args (ops, i, stmt);
 
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)),
   TYPE_MODE (TREE_TYPE (offset)));
@@ -3701,13 +3718,13 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
 
   int i = 0;
-  class expand_operand ops[8];
+  class expand_operand ops[9];
   create_call_lhs_operand (&ops[i++], lhs_rtx, TYPE_

[PATCH v4 6/8] gcn: Add else operand to masked loads.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

This patch adds an undefined else operand to the masked loads.

gcc/ChangeLog:

* config/gcn/predicates.md (maskload_else_operand): New
predicate.
* config/gcn/gcn-valu.md: Use new predicate.
---
 gcc/config/gcn/gcn-valu.md   | 23 +++
 gcc/config/gcn/predicates.md |  2 ++
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index cb2f4a78035..ce7a68f0e2d 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3989,7 +3989,8 @@ (define_expand "while_ultsidi"
 (define_expand "maskloaddi"
   [(match_operand:V_MOV 0 "register_operand")
(match_operand:V_MOV 1 "memory_operand")
-   (match_operand 2 "")]
+   (match_operand 2 "")
+   (match_operand:V_MOV 3 "maskload_else_operand")]
   ""
   {
 rtx exec = force_reg (DImode, operands[2]);
@@ -3998,11 +3999,8 @@ (define_expand "maskloaddi"
 rtx as = gen_rtx_CONST_INT (VOIDmode, MEM_ADDR_SPACE (operands[1]));
 rtx v = gen_rtx_CONST_INT (VOIDmode, MEM_VOLATILE_P (operands[1]));
 
-/* Masked lanes are required to hold zero.  */
-emit_move_insn (operands[0], gcn_vec_constant (mode, 0));
-
 emit_insn (gen_gather_expr_exec (operands[0], addr, as, v,
-  operands[0], exec));
+  gcn_gen_undef (mode), exec));
 DONE;
   })
 
@@ -4027,7 +4025,8 @@ (define_expand "mask_gather_load"
(match_operand: 2 "register_operand")
(match_operand 3 "immediate_operand")
(match_operand:SI 4 "gcn_alu_operand")
-   (match_operand:DI 5 "")]
+   (match_operand:DI 5 "")
+   (match_operand:V_MOV 6 "maskload_else_operand")]
   ""
   {
 rtx exec = force_reg (DImode, operands[5]);
@@ -4036,18 +4035,18 @@ (define_expand "mask_gather_load"
  operands[2], operands[4],
  INTVAL (operands[3]), exec);
 
-/* Masked lanes are required to hold zero.  */
-emit_move_insn (operands[0], gcn_vec_constant (mode, 0));
-
 if (GET_MODE (addr) == mode)
   emit_insn (gen_gather_insn_1offset_exec (operands[0], addr,
 const0_rtx, const0_rtx,
-const0_rtx, operands[0],
-exec));
+gcn_gen_undef
+   (mode),
+operands[0], exec));
 else
   emit_insn (gen_gather_insn_2offsets_exec (operands[0], operands[1],
  addr, const0_rtx,
- const0_rtx, const0_rtx,
+ const0_rtx,
+ gcn_gen_undef
+   (mode),
  operands[0], exec));
 DONE;
   })
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index 3f59396a649..21beeb586a4 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -228,3 +228,5 @@ (define_predicate "ascending_zero_int_parallel"
   return gcn_stepped_zero_int_parallel_p (op, 1);
 })
 
+(define_predicate "maskload_else_operand"
+  (match_operand 0 "scratch_operand"))
-- 
2.47.0



[PATCH v4 3/8] tree-ifcvt: Add zero maskload else value.

2024-11-07 Thread Robin Dapp
From: Robin Dapp 

When predicating a load we implicitly assume that the else value is
zero.  This matters in case the loaded value is padded (like e.g.
a Bool) and we must ensure that the padding bytes are zero on targets
that don't implicitly zero inactive elements.

A former version of this patch still had this handling in ifcvt but
the latest version defers it to the vectorizer.

gcc/ChangeLog:

* tree-if-conv.cc (predicate_load_or_store): Add zero else
operand and comment.
---
 gcc/tree-if-conv.cc | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index eb981642bae..f1a1f8fd0d3 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -2555,9 +2555,17 @@ predicate_load_or_store (gimple_stmt_iterator *gsi, 
gassign *stmt, tree mask)
   ref);
   if (TREE_CODE (lhs) == SSA_NAME)
 {
+  /* Get a zero else value.  This might not be what a target actually uses
+but we cannot be sure about which vector mode the vectorizer will
+choose.  Therefore, leave the decision whether we need to force the
+inactive elements to zero to the vectorizer.  */
+  tree els = vect_get_mask_load_else (MASK_LOAD_ELSE_ZERO,
+ TREE_TYPE (lhs));
+
   new_stmt
-   = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr,
- ptr, mask);
+   = gimple_build_call_internal (IFN_MASK_LOAD, 4, addr,
+ ptr, mask, els);
+
   gimple_call_set_lhs (new_stmt, lhs);
   gimple_set_vuse (new_stmt, gimple_vuse (stmt));
 }
-- 
2.47.0



Re: [PATCH v4 6/8] gcn: Add else operand to masked loads.

2024-11-07 Thread Andrew Stubbs

On 07/11/2024 17:57, Robin Dapp wrote:

From: Robin Dapp 

This patch adds an undefined else operand to the masked loads.

gcc/ChangeLog:

* config/gcn/predicates.md (maskload_else_operand): New
predicate.
* config/gcn/gcn-valu.md: Use new predicate.
---
  gcc/config/gcn/gcn-valu.md   | 23 +++
  gcc/config/gcn/predicates.md |  2 ++
  2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index cb2f4a78035..ce7a68f0e2d 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3989,7 +3989,8 @@ (define_expand "while_ultsidi"
  (define_expand "maskloaddi"
[(match_operand:V_MOV 0 "register_operand")
 (match_operand:V_MOV 1 "memory_operand")
-   (match_operand 2 "")]
+   (match_operand 2 "")
+   (match_operand:V_MOV 3 "maskload_else_operand")]
""
{
  rtx exec = force_reg (DImode, operands[2]);
@@ -3998,11 +3999,8 @@ (define_expand "maskloaddi"
  rtx as = gen_rtx_CONST_INT (VOIDmode, MEM_ADDR_SPACE (operands[1]));
  rtx v = gen_rtx_CONST_INT (VOIDmode, MEM_VOLATILE_P (operands[1]));
  
-/* Masked lanes are required to hold zero.  */

-emit_move_insn (operands[0], gcn_vec_constant (mode, 0));
-
  emit_insn (gen_gather_expr_exec (operands[0], addr, as, v,
-  operands[0], exec));
+  gcn_gen_undef (mode), exec));
  DONE;
})
  
@@ -4027,7 +4025,8 @@ (define_expand "mask_gather_load"

 (match_operand: 2 "register_operand")
 (match_operand 3 "immediate_operand")
 (match_operand:SI 4 "gcn_alu_operand")
-   (match_operand:DI 5 "")]
+   (match_operand:DI 5 "")
+   (match_operand:V_MOV 6 "maskload_else_operand")]
""
{
  rtx exec = force_reg (DImode, operands[5]);
@@ -4036,18 +4035,18 @@ (define_expand "mask_gather_load"
  operands[2], operands[4],
  INTVAL (operands[3]), exec);
  
-/* Masked lanes are required to hold zero.  */

-emit_move_insn (operands[0], gcn_vec_constant (mode, 0));
-
  if (GET_MODE (addr) == mode)
emit_insn (gen_gather_insn_1offset_exec (operands[0], addr,
 const0_rtx, const0_rtx,
-const0_rtx, operands[0],
-exec));
+gcn_gen_undef
+   (mode),
+operands[0], exec));
  else
emit_insn (gen_gather_insn_2offsets_exec (operands[0], 
operands[1],
  addr, const0_rtx,
- const0_rtx, const0_rtx,
+ const0_rtx,
+ gcn_gen_undef
+   (mode),
  operands[0], exec));
  DONE;
})
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index 3f59396a649..21beeb586a4 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -228,3 +228,5 @@ (define_predicate "ascending_zero_int_parallel"
return gcn_stepped_zero_int_parallel_p (op, 1);
  })
  
+(define_predicate "maskload_else_operand"

+  (match_operand 0 "scratch_operand"))


LGTM.

Andrew


[committed] btf: check hash maps are non-null before emptying

2024-11-07 Thread David Faust
These maps will always be non-null in btf_finalize under normal
circumstances, but be safe and verify that before trying to empty them.

Tested on x86_64-linux-gnu and x86_64-linux-gnu host for bpf-unknown-none
target. Pushed as obvious.

gcc/
* btfout.cc (btf_finalize): Check that hash maps are non-null before
emptying them.
---
 gcc/btfout.cc | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 083ca48d627..4a6b5453e08 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -1661,13 +1661,19 @@ btf_finalize (void)
   datasecs.release ();
 
   funcs = NULL;
-  func_map->empty ();
-  func_map = NULL;
+  if (func_map)
+{
+  func_map->empty ();
+  func_map = NULL;
+}
 
   if (debug_prune_btf)
 {
-  btf_used_types->empty ();
-  btf_used_types = NULL;
+  if (btf_used_types)
+   {
+ btf_used_types->empty ();
+ btf_used_types = NULL;
+   }
 
   fixups.release ();
   forwards = NULL;
-- 
2.45.2



Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-07 Thread Torbjorn SVENSSON




On 2024-11-07 16:33, Richard Earnshaw (lists) wrote:

On 06/11/2024 19:50, Torbjorn SVENSSON wrote:



On 2024-11-06 19:06, Richard Earnshaw (lists) wrote:

On 06/11/2024 13:50, Torbjorn SVENSSON wrote:



On 2024-11-06 14:04, Richard Earnshaw (lists) wrote:

On 06/11/2024 12:23, Torbjorn SVENSSON wrote:



On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:

On 06/11/2024 07:44, Christophe Lyon wrote:

On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON
 wrote:


While the regression was reported on GCC15, I'm sure that same
regression will be seen on GCC14 when it's tested in the
arm-linux-gnueabihf configuration.

Ok for trunk and releases/gcc-14?

--

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1407.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/pr68620.c: Use effective-target arm_fp.

Signed-off-by: Torbjörn SVENSSON 
---
     gcc/testsuite/gcc.target/arm/pr68620.c | 4 +++-
     1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr68620.c 
b/gcc/testsuite/gcc.target/arm/pr68620.c
index 6e38671752f..1ed84f4ac75 100644
--- a/gcc/testsuite/gcc.target/arm/pr68620.c
+++ b/gcc/testsuite/gcc.target/arm/pr68620.c
@@ -1,8 +1,10 @@
     /* { dg-do compile } */
     /* { dg-skip-if "-mpure-code supports M-profile without Neon only" { *-*-* } { 
"-mpure-code" } } */
     /* { dg-require-effective-target arm_arch_v7a_ok } */
-/* { dg-options "-mfp16-format=ieee -mfpu=auto -mfloat-abi=softfp" } */
+/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-options "-mfp16-format=ieee -mfpu=auto" } */
     /* { dg-add-options arm_arch_v7a } */
+/* { dg-add-options arm_fp } */



So... this partially reverts your previous patch (bringing back
arm_fp). What is the problem now?



Yeah, that sounds wrong.  arm_fp_ok tries to find options to add to the basic 
testsuite options, but it can't be combined with arm_arch_v7a as that picks a 
totally different set of flags for the architecture.


The problem is that for arm-linux-gnueabihf, we cannot use -mfloat-abi=softfp 
as there is no multilib available for that ABI, or at least that's my 
interpretation of below error message.

This is the output from the CI run:

Executing on host: 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/bin/armv8l-unknown-linux-gnueabihf-gcc
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c
    -fdiagnostics-plain-output   -mfp16-format=ieee -mfpu=auto 
-mfloat-abi=softfp -mcpu=unset -march=armv7-a+fp -S -o pr68620.s (timeout = 600)
spawn -ignore SIGHUP 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/bin/armv8l-unknown-linux-gnueabihf-gcc
 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c
 -fdiagnostics-plain-output -mfp16-format=ieee -mfpu=auto -mfloat-abi=softfp 
-mcpu=unset -march=armv7-a+fp -S -o pr68620.s
In file included from /usr/include/features.h:510,
    from 
/usr/include/arm-linux-gnueabihf/bits/libc-header-start.h:33,
    from /usr/include/stdint.h:26,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/stdint.h:11,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_fp16.h:34,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_neon.h:41,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c:7:
/usr/include/arm-linux-gnueabihf/gnu/stubs.h:7:11: fatal error: 
gnu/stubs-soft.h: No such file or directory
compilation terminated.
compiler exited with status 1
output is:
In file included from /usr/include/features.h:510,
    from 
/usr/include/arm-linux-gnueabihf/bits/libc-header-start.h:33,
    from /usr/include/stdint.h:26,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/stdint.h:11,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_fp16.h:34,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_neon.h:41,
    from 
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c:7:
/usr/include/arm-linux-gnuea

[PATCH] bpf: avoid possible null deref in btf_ext_output [PR target/117447]

2024-11-07 Thread David Faust
The BPF-specific .BTF.ext section is always generated for BPF programs
if -gbtf is specified, and generating it requires BTF information and
assumes that the BTF info has already been generated.

Compiling non-C languages to BPF is not supported, nor is generating
CTF/BTF for non-C.  But, compiling another language like C++ to BPF
with -gbtf specified meant that we would try to generate the .BTF.ext
section anyway, and then ICE because no BTF information was available.

Add a check to bail out of btf_ext_output if the TU CTFC does not exist,
meaning no BTF info is available.

Tested on x86_64-linux-gnu host for bpf-unknown-none.

gcc/
PR target/117447
* config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
---
 gcc/config/bpf/btfext-out.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/bpf/btfext-out.cc b/gcc/config/bpf/btfext-out.cc
index ca6241aa52e..760b2b59ff6 100644
--- a/gcc/config/bpf/btfext-out.cc
+++ b/gcc/config/bpf/btfext-out.cc
@@ -611,6 +611,9 @@ btf_ext_init (void)
 void
 btf_ext_output (void)
 {
+  if (!ctf_get_tu_ctfc ())
+return;
+
   output_btfext_header ();
   output_btfext_func_info (btf_ext);
   if (TARGET_BPF_CORE)
-- 
2.45.2



Re: [PATCH] ifcombine: For short circuit case, allow 2 defining statements [PR85605]

2024-11-07 Thread Andrew Pinski
On Fri, Nov 1, 2024 at 4:06 PM Andrew Pinski  wrote:
>
> On Tue, Oct 29, 2024 at 10:10 AM Andrew Pinski  wrote:
> >
> > On Tue, Oct 29, 2024 at 5:59 AM Richard Biener
> >  wrote:
> > >
> > > On Tue, Oct 29, 2024 at 4:29 AM Andrew Pinski  
> > > wrote:
> > > >
> > > > r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the 
> > > > ifs
> > > > into using either AND or OR. But it only allowed the inner condition
> > > > basic block having the conditional only. This changes to allow up to 2 
> > > > defining
> > > > statements as long as they are just nop conversions for either the lhs 
> > > > or rhs
> > > > of the conditional.
> > > >
> > > > This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more 
> > > > than before.
> > > >
> > > > Boootstrapped and tested on x86_64-linux-gnu.
> > > >
> > > > PR tree-optimization/85605
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): 
> > > > New function.
> > > > (ifcombine_ifandif): Use can_combine_bbs_with_short_circuit 
> > > > instead of checking
> > > > if iterator is one before the last statement.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test.
> > > > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test.
> > > > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test.
> > > >
> > > > Signed-off-by: Andrew Pinski 
> > > > ---
> > > >  .../g++.dg/tree-ssa/ifcombine-ccmp-1.C| 27 +
> > > >  .../gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c| 18 +
> > > >  .../gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c| 19 +
> > > >  gcc/tree-ssa-ifcombine.cc | 39 ++-
> > > >  4 files changed, 101 insertions(+), 2 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifcombine-ccmp-1.C
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c
> > > >
> > > > diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifcombine-ccmp-1.C 
> > > > b/gcc/testsuite/g++.dg/tree-ssa/ifcombine-ccmp-1.C
> > > > new file mode 100644
> > > > index 000..282cec8c628
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/tree-ssa/ifcombine-ccmp-1.C
> > > > @@ -0,0 +1,27 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -g -fdump-tree-optimized --param 
> > > > logical-op-non-short-circuit=1" } */
> > > > +
> > > > +/* PR tree-optimization/85605 */
> > > > +#include 
> > > > +
> > > > +template
> > > > +inline bool cmp(T a, T2 b) {
> > > > +  return a<0 ? true : T2(a) < b;
> > > > +}
> > > > +
> > > > +template
> > > > +inline bool cmp2(T a, T2 b) {
> > > > +  return (a<0) | (T2(a) < b);
> > > > +}
> > > > +
> > > > +bool f(int a, int b) {
> > > > +return cmp(int64_t(a), unsigned(b));
> > > > +}
> > > > +
> > > > +bool f2(int a, int b) {
> > > > +return cmp2(int64_t(a), unsigned(b));
> > > > +}
> > > > +
> > > > +
> > > > +/* Both of these functions should be optimized to the same, and have 
> > > > an | in them. */
> > > > +/* { dg-final { scan-tree-dump-times " \\\| " 2 "optimized" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c 
> > > > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c
> > > > new file mode 100644
> > > > index 000..1bdbb9358b4
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c
> > > > @@ -0,0 +1,18 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -g -fdump-tree-optimized --param 
> > > > logical-op-non-short-circuit=1" } */
> > > > +
> > > > +/* PR tree-optimization/85605 */
> > > > +/* Like ssa-ifcombine-ccmp-1.c but with conversion from unsigned to 
> > > > signed in the
> > > > +   inner bb which should be able to move too. */
> > > > +
> > > > +int t (int a, unsigned b)
> > > > +{
> > > > +  if (a > 0)
> > > > +  {
> > > > +signed t = b;
> > > > +if (t > 0)
> > > > +  return 0;
> > > > +  }
> > > > +  return 1;
> > > > +}
> > > > +/* { dg-final { scan-tree-dump "\&" "optimized" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c 
> > > > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c
> > > > new file mode 100644
> > > > index 000..8d74b4932c5
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -g -fdump-tree-optimized --param 
> > > > logical-op-non-short-circuit=1" } */
> > > > +
> > > > +/* PR tree-optimization/85605 */
> > > > +/* Like ssa-ifcombine-ccmp-2.c but with conversion from unsigned to 
> > > > signed in the
> > > > +   inner bb which should be able to move too. */
> > > > +
> > > > +int t (int a, unsigned b)
> > > > +{
> > > > +  if (a > 0)
> > > > +goto L1;
> > > > +  signed t = b;
> 

[RFC/PATCH] c++: Unwrap type traits defined in terms of builtins within concept diagnostics [PR117294]

2024-11-07 Thread Nathaniel Shead
Does this approach seem reasonable?  I'm pretty sure that the way I've
handled the templating here is unideal but I'm not sure what a neat way
to do what I'm trying to do here would be; any comments are welcome.

-- >8 --

Currently, concept failures of standard type traits just report
'expression X evaluates to false'.  However, many type traits are
actually defined in terms of compiler builtins; we can do better here.
For instance, 'is_constructible_v' could go on to explain why the type
is not constructible, or 'is_invocable_v' could list potential
candidates.

As a first step to supporting that we need to be able to map the
standard type traits to the builtins that they use.  Rather than adding
another list that would need to be kept up-to-date whenever a builtin is
added, this patch instead tries to detect any variable template defined
directly in terms of a TRAIT_EXPR.

To avoid false positives, we ignore any variable templates that have any
specialisations (partial or explicit), even if we wouldn't have chosen
that specialisation anyway.  This shouldn't affect any of the standard
library type traits that I could see.

The new diagnostics from this patch are not immediately much better;
however, it would be relatively straight-forward to update the messages
in 'diagnose_trait_expr' to provide these new details.

This logic could also perhaps be used by 'diagnose_failing_condition' so
that cases like 'static_assert(std::is_constructible_v)' get the same
treatment; this patch doesn't attempt to update this yet.

PR c++/117294
PR c++/113854

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Take location to diagnose
at explicitly.
(maybe_unwrap_standard_trait): New function.
(diagnose_atomic_constraint): Use it; pass in the location of
the atomic constraint to diagnose_trait_expr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-traits4.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constraint.cc  | 52 +--
 gcc/testsuite/g++.dg/cpp2a/concepts-traits4.C | 31 +++
 2 files changed, 79 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-traits4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8a36b9c88c4..c683e6a44dd 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3108,10 +3108,8 @@ get_constraint_error_location (tree t)
 /* Emit a diagnostic for a failed trait.  */
 
 static void
-diagnose_trait_expr (tree expr, tree args)
+diagnose_trait_expr (location_t loc, tree expr, tree args)
 {
-  location_t loc = cp_expr_location (expr);
-
   /* Build a "fake" version of the instantiated trait, so we can
  get the instantiated types from result.  */
   ++processing_template_decl;
@@ -3323,6 +3321,51 @@ diagnose_trait_expr (tree expr, tree args)
 }
 }
 
+/* Attempt to detect if this is a standard type trait, defined in terms
+   of a compiler builtin (above).  If so, this will allow us to provide
+   slightly more helpful diagnostics.
+
+   However, don't unwrap if the type has been specialized (even if we
+   wouldn't have used said specialization)..  */
+
+static void
+maybe_unwrap_standard_trait (tree *expr, tree *args)
+{
+  if (TREE_CODE (*expr) != TEMPLATE_ID_EXPR)
+return;
+
+  tree templ = TREE_OPERAND (*expr, 0);
+  if (TREE_CODE (templ) != TEMPLATE_DECL
+  || !variable_template_p (templ))
+return;
+
+  tree gen_tmpl = most_general_template (templ);
+  if (DECL_TEMPLATE_SPECIALIZATIONS (gen_tmpl))
+return;
+
+  for (tree inst = DECL_TEMPLATE_INSTANTIATIONS (gen_tmpl);
+   inst; inst = TREE_CHAIN (inst))
+if (DECL_TEMPLATE_SPECIALIZATION (TREE_VALUE (inst)))
+  return;
+
+  tree pattern = DECL_TEMPLATE_RESULT (gen_tmpl);
+  tree initial = DECL_INITIAL (pattern);
+  if (TREE_CODE (initial) != TRAIT_EXPR)
+return;
+
+  /* At this point we're definitely providing a TRAIT_EXPR, update
+ *expr to point at it and provide remapped *args for it.  */
+  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_tmpl);
+  tree targs = TREE_OPERAND (*expr, 1);
+  if (targs)
+targs = tsubst_template_args (targs, *args, tf_none, NULL_TREE);
+  targs = add_outermost_template_args (templ, targs);
+  targs = coerce_template_parms (parms, targs, templ, tf_none);
+
+  *expr = initial;
+  *args = targs;
+}
+
 /* Diagnose a substitution failure in the atomic constraint T using ARGS.  */
 
 static void
@@ -3347,10 +3390,11 @@ diagnose_atomic_constraint (tree t, tree args, tree 
result, sat_info info)
   /* Generate better diagnostics for certain kinds of expressions.  */
   tree expr = ATOMIC_CONSTR_EXPR (t);
   STRIP_ANY_LOCATION_WRAPPER (expr);
+  maybe_unwrap_standard_trait (&expr, &args);
   switch (TREE_CODE (expr))
 {
 case TRAIT_EXPR:
-  diagnose_trait_expr (expr, args);
+  diagnose_trait_expr (loc, expr, args);
   break;
 case REQUIRES_EXPR:
   gcc_checking_as

Re: [PATCH v2 2/2] VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

2024-11-07 Thread Andrew Pinski
On Thu, Nov 7, 2024 at 12:50 AM Richard Biener
 wrote:
>
> On Thu, Nov 7, 2024 at 12:43 AM Andrew Pinski  
> wrote:
> >
> > After the last patch, we also want to record `(A CMP B) != 0`
> > as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the
> > true/false edges swapped.
> >
> > This shows up more due to the new handling of
> > `(A | B) ==/!= 0` in insert_predicates_for_cond
> > as now we can notice these comparisons which were not seen before.
> >
> > This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c`
> > and make sure we don't regress it when enhancing ifcombine.
> >
> > This adds that predicate and allows us to optimize f
> > in fre-predicated-3.c.
> >
> > Changes since v1:
> > * v2:  Use vn_valueize.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > PR tree-optimization/117414
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) 
> > !=/== 0`.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/fre-predicated-3.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  .../gcc.dg/tree-ssa/fre-predicated-3.c| 46 +++
> >  gcc/tree-ssa-sccvn.cc | 14 ++
> >  2 files changed, 60 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-3.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-3.c
> > new file mode 100644
> > index 000..4a89372fd70
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-3.c
> > @@ -0,0 +1,46 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR tree-optimization/117414 */
> > +
> > +/* Fre1 should figure out that `*aaa != 0`
> > +   For f0, f1, and f2. */
> > +
> > +void foo();
> > +int f(int *aaa, int j, int t)
> > +{
> > +  int b = *aaa;
> > +  int c = b == 0;
> > +  int d = t != 1;
> > +  if (c | d)
> > +return 0;
> > +
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (*aaa)
> > +  ;
> > +else
> > +  foo();
> > +  }
> > +  return 0;
> > +}
> > +
> > +int f1(int *aaa, int j, int t)
> > +{
> > +  int b = *aaa;
> > +  if (b == 0)
> > +return 0;
> > +  if (t != 1)
> > +return 0;
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (*aaa)
> > +  ;
> > +else
> > +  foo();
> > +  }
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
> > +/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index c60ba6d..67ed2cd8ffe 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -7948,6 +7948,20 @@ insert_predicates_for_cond (tree_code code, tree 
> > lhs, tree rhs,
> >&& (code == NE_EXPR || code == EQ_EXPR))
> >  {
> >gimple *def_stmt = SSA_NAME_DEF_STMT (lhs);
> > +  /* (A CMP B) != 0 is the same as (A CMP B).
> > +(A CMP B) == 0 is just (A CMP B) with the edges swapped.  */
> > +  if (is_gimple_assign (def_stmt)
> > + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) == 
> > tcc_comparison)
> > + {
> > +   tree_code nc = gimple_assign_rhs_code (def_stmt);
> > +   tree nlhs = vn_valueize (gimple_assign_rhs1 (def_stmt));
> > +   tree nrhs = vn_valueize (gimple_assign_rhs2 (def_stmt));
> > +   edge nt = true_e;
> > +   edge nf = false_e;
> > +   if (code == EQ_EXPR)
> > + std::swap (nt, nf);
>
> I'll note there's canonicalization for tree_swap_operands as well,
> _1 < _2 vs. _2 > _1 - but I think this can be fixed as followup?

Yes I will handle that later today in a followup patch because it will
touch other areas of sccvn including places which does the lookup.

Thanks,
Andrew

>
> OK.
>
> Thanks,
> Richard.
>
> > +   insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf);
> > + }
> >/* (a | b) == 0 ->
> > on true edge assert: a == 0 & b == 0. */
> >/* (a | b) != 0 ->
> > --
> > 2.43.0
> >


Re: [PATCH] bpf: avoid possible null deref in btf_ext_output [PR target/117447]

2024-11-07 Thread Jose E. Marchesi


Hi Faust.
Thanks for the patch.  OK for master.

> The BPF-specific .BTF.ext section is always generated for BPF programs
> if -gbtf is specified, and generating it requires BTF information and
> assumes that the BTF info has already been generated.
>
> Compiling non-C languages to BPF is not supported, nor is generating
> CTF/BTF for non-C.  But, compiling another language like C++ to BPF
> with -gbtf specified meant that we would try to generate the .BTF.ext
> section anyway, and then ICE because no BTF information was available.
>
> Add a check to bail out of btf_ext_output if the TU CTFC does not exist,
> meaning no BTF info is available.
>
> Tested on x86_64-linux-gnu host for bpf-unknown-none.
>
> gcc/
>   PR target/117447
>   * config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.
> ---
>  gcc/config/bpf/btfext-out.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/config/bpf/btfext-out.cc b/gcc/config/bpf/btfext-out.cc
> index ca6241aa52e..760b2b59ff6 100644
> --- a/gcc/config/bpf/btfext-out.cc
> +++ b/gcc/config/bpf/btfext-out.cc
> @@ -611,6 +611,9 @@ btf_ext_init (void)
>  void
>  btf_ext_output (void)
>  {
> +  if (!ctf_get_tu_ctfc ())
> +return;
> +
>output_btfext_header ();
>output_btfext_func_info (btf_ext);
>if (TARGET_BPF_CORE)


[PATCH][ivopts]: perform affine fold to unsigned on non address expressions. [PR114932]

2024-11-07 Thread Tamar Christina
Hi All,

When the patch for PR114074 was applied we saw a good boost in exchange2.

This boost was partially caused by a simplification of the addressing modes.
With the patch applied IV opts saw the following form for the base addressing;

  Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) l0_19(D) *
324) + 36)

vs what we normally get:

  Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) l0_19(D)
* 81) + 9) * 4

This is because the patch promoted multiplies where one operand is a constant
from a signed multiply to an unsigned one, to attempt to fold away the constant.

This patch attempts the same but due to the various problems with SCEV and
niters not being able to analyze the resulting forms (i.e. PR114322) we can't
do it during SCEV or in the general form like in fold-const like extract_muldiv
attempts.

Instead this applies the simplification during IVopts initialization when we
create the IV.  Essentially when we know the IV won't overflow with regards to
niters then we perform an affine fold which gets it to simplify the internal
computation, even if this is signed because we know that for IVOPTs uses the
IV won't ever overflow.  This allows IV opts to see the simplified form
without influencing the rest of the compiler.

as mentioned in PR114074 it would be good to fix the missed optimization in the
other passes so we can perform this in general.

The reason this has a big impact on fortran code is that fortran doesn't seem to
have unsigned integer types.  As such all it's addressing are created with
signed types and folding does not happen on them due to the possible overflow.

concretely on AArch64 this changes the results from generation:

mov x27, -108
mov x24, -72
mov x23, -36
add x21, x1, x0, lsl 2
add x19, x20, x22
.L5:
add x0, x22, x19
add x19, x19, 324
ldr d1, [x0, x27]
add v1.2s, v1.2s, v15.2s
str d1, [x20, 216]
ldr d0, [x0, x24]
add v0.2s, v0.2s, v15.2s
str d0, [x20, 252]
ldr d31, [x0, x23]
add v31.2s, v31.2s, v15.2s
str d31, [x20, 288]
bl  digits_20_
cmp x21, x19
bne .L5

into:

.L5:
ldr d1, [x19, -108]
add v1.2s, v1.2s, v15.2s
str d1, [x20, 216]
ldr d0, [x19, -72]
add v0.2s, v0.2s, v15.2s
str d0, [x20, 252]
ldr d31, [x19, -36]
add x19, x19, 324
add v31.2s, v31.2s, v15.2s
str d31, [x20, 288]
bl  digits_20_
cmp x21, x19
bne .L5

The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.

The patch folds every IV to unsigned to canonicalize them.  At the end of the
pass we match.pd will then remove unneeded conversions.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu -m32, -m64 and some issues below:

 * gcc.dg/torture/bitint-49.c   -O1  execution test
 * gcc.c-torture/execute/pr110115.c   -O1  execution test

These two start to fail now because of a bug in the stack slot sharing conflict
function.  Basically the change changes the addressing from ADDR_REF to
(unsigned) ADDR_REF and add_scope_conflics_2 does not look deep enough through
the promotion to realize that the two values are live at the same time.

Both of these issues are fixed by Andrew's patch [1],  Since this patch rewrites
the entire thing, it didn't seem useful for me to provide a spot fix for this.

[1] 
https://inbox.sourceware.org/gcc-patches/20241017024205.2660484-1-quic_apin...@quicinc.com/

Ok for master after Andrew's patch gets in?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/114932
* tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned fold.

gcc/testsuite/ChangeLog:

PR tree-optimization/114932
* gcc.dg/tree-ssa/pr64705.c: Update dump file scan.
* gcc.target/i386/pr115462.c: The testcase shares 3 IVs which calculates
the same thing but with a slightly different increment offset.  The test
checks for 3 complex addressing loads, one for each IV.  But with this
change they now all share one IV.  That is the loop now only has one
complex addressing.  This is ultimately driven by the backend costing
and the current costing says this is preferred so updating the testcase.
* gfortran.dg/addressing-modes_1.f90: New test.

---
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c
index 
fd24e38a53e9f3a4659dece5af4d71fbc0ce2c18..3c9c2e5deed1ba755d1fae15a6553d68b6ad0098
 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64705.c
+++ b/gcc/testsuite/gcc.dg/tree-

Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-07 Thread Jeff Law




On 11/7/24 8:07 AM, Tamar Christina wrote:



-Original Message-
From: Li, Pan2 
Sent: Thursday, November 7, 2024 12:57 PM
To: Tamar Christina ; Richard Biener

Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned
SAT_ADD into branchless

I see your point that the backend can leverage condition move to emit the branch
code.


For instance see https://godbolt.org/z/fvrq3aq6K
On ISAs with conditional operations the branch version gets ifconverted.
On AArch64 we get:
sat_add_u_1(unsigned int, unsigned int):
 addsw0, w0, w1
 csinv   w0, w0, wzr, cc
 ret
so just 2 instructions, and also branchless. On x86_64 we get:
sat_add_u_1(unsigned int, unsigned int):
 add edi, esi
 mov eax, -1
 cmovnc  eax, edi
 ret
so 3 instructions but a dependency chain of 2.
also branchless.  This patch would regress both of these.


But the above Godbolt may not be a good example for evidence, because both
x86_64 and aarch64 implemented usadd
already.
Thus, they all go to usadd. For example as below, the sat_add_u_1 and
sat_add_u_2 are almost the
same when the backend implemented usadd.

#include 

#define T uint32_t

   T sat_add_u_1 (T x, T y)
   {
 return (T)(x + y) < x ? -1 : (x + y);
   }

   T sat_add_u_2 (T x, T y)
   {
 return (x + y) | -((x + y) < x);
   }

It will become different when take gcc 14.2 (which doesn’t have .SAT_ADD GIMPLE
IR), the x86_64
will have below asm dump for -O3. Looks like no obvious difference here.

sat_add_u_1(unsigned int, unsigned int):
 add edi, esi
 mov eax, -1
 cmovnc  eax, edi
 ret

sat_add_u_2(unsigned int, unsigned int):
 add edi, esi
 sbb eax, eax
 or  eax, edi
 ret



Because CE is able to recognize the idiom back into a conditional move.
Pick a target that doesn't have conditional instructions, like PowerPC
https://godbolt.org/z/4bTv18WMv

You'll see that this canonicalization has made codegen worse.

After:

.L.sat_add_u_1(unsigned int, unsigned int):
 add 4,3,4
 rldicl 9,4,0,32
 subf 3,3,9
 sradi 3,3,63
 or 3,3,4
 rldicl 3,3,0,32
 blr

and before

.L.sat_add_u_1(unsigned int, unsigned int):
 add 4,3,4
 cmplw 0,4,3
 bge 0,.L2
 li 4,-1
.L2:
 rldicl 3,4,0,32
 blr

It means now it always has to execute 6 instructions, whereas before it was 4 
or 5 depending
on the order of the branch. So for those architectures, it's always slower.
I'm not sure it's that simple.  It'll depend on the micro-architecture. 
So things like strength of the branch predictors, how fetch blocks are 
handled (can you have embedded not-taken branches, short-forward-branch 
optimizations, etc).


Jeff


[PATCH] VN: Canonicalize compares before calling vn_nary_op_lookup_pieces

2024-11-07 Thread Andrew Pinski
This is the followup as mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667987.html .
We need to canonicalize the compares using tree_swap_operands_p instead
of checking CONSTANT_CLASS_P.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-sccvn.cc (visit_phi): Swap the operands
before calling vn_nary_op_lookup_pieces if
tree_swap_operands_p returns true.
(insert_predicates_for_cond): Use tree_swap_operands_p
instead of checking for CONSTANT_CLASS_P.
(process_bb): Swap the comparison and operands
if tree_swap_operands_p returns true.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-sccvn.cc | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 1967bbdca84..16299662b95 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -6067,6 +6067,9 @@ visit_phi (gimple *phi, bool *inserted, bool 
backedges_varying_p)
tree ops[2];
ops[0] = def;
ops[1] = sameval;
+   /* Canonicalize the operands order for eq below. */
+   if (tree_swap_operands_p (ops[0], ops[1]))
+ std::swap (ops[0], ops[1]);
tree val = vn_nary_op_lookup_pieces (2, EQ_EXPR,
 boolean_type_node,
 ops, &vnresult);
@@ -7905,8 +7908,9 @@ insert_predicates_for_cond (tree_code code, tree lhs, 
tree rhs,
   if (!true_e && !false_e)
 return;
 
-  /* Canonicalize the comparison so the rhs are constants.  */
-  if (CONSTANT_CLASS_P (lhs))
+  /* Canonicalize the comparison if needed, putting
+ the constant in the rhs.  */
+  if (tree_swap_operands_p (lhs, rhs))
 {
   std::swap (lhs, rhs);
   code = swap_tree_comparison (code);
@@ -8145,7 +8149,15 @@ process_bb (rpo_elim &avail, basic_block bb,
  {
tree lhs = vn_valueize (gimple_cond_lhs (last));
tree rhs = vn_valueize (gimple_cond_rhs (last));
-   tree val = gimple_simplify (gimple_cond_code (last),
+   tree_code cmpcode = gimple_cond_code (last);
+   /* Canonicalize the comparison if needed, putting
+  the constant in the rhs.  */
+   if (tree_swap_operands_p (lhs, rhs))
+ {
+   std::swap (lhs, rhs);
+   cmpcode = swap_tree_comparison (cmpcode);
+  }
+   tree val = gimple_simplify (cmpcode,
boolean_type_node, lhs, rhs,
NULL, vn_valueize);
/* If the condition didn't simplfy see if we have recorded
@@ -8156,7 +8168,7 @@ process_bb (rpo_elim &avail, basic_block bb,
tree ops[2];
ops[0] = lhs;
ops[1] = rhs;
-   val = vn_nary_op_lookup_pieces (2, gimple_cond_code (last),
+   val = vn_nary_op_lookup_pieces (2, cmpcode,
boolean_type_node, ops,
&vnresult);
/* Got back a ssa name, then try looking up `val != 0`
@@ -8193,14 +8205,13 @@ process_bb (rpo_elim &avail, basic_block bb,
   important as early cleanup.  */
edge true_e, false_e;
extract_true_false_edges_from_block (bb, &true_e, &false_e);
-   enum tree_code code = gimple_cond_code (last);
if ((do_region && bitmap_bit_p (exit_bbs, true_e->dest->index))
|| !can_track_predicate_on_edge (true_e))
  true_e = NULL;
if ((do_region && bitmap_bit_p (exit_bbs, false_e->dest->index))
|| !can_track_predicate_on_edge (false_e))
  false_e = NULL;
-   insert_predicates_for_cond (code, lhs, rhs, true_e, false_e);
+   insert_predicates_for_cond (cmpcode, lhs, rhs, true_e, false_e);
  }
break;
  }
-- 
2.43.0



Re: [PATCH] testsuite: arm: Use check-function-bodies in epilog-1.c test

2024-11-07 Thread Christophe Lyon
On Thu, 7 Nov 2024 at 20:35, Torbjörn SVENSSON
 wrote:
>
> The generated assembler is:
>
> armv7-m:
> push{r4, lr}
> ldr r4, .L6
> ldr r4, [r4]
> lslsr4, r4, #29
> it  mi
> addmi   r2, r2, #1
> bl  bar
> movsr0, #0
> pop {r4, pc}
>
>
> armv8.1-m.main:
> push{r3, r4, r5, lr}
> ldr r4, .L5
> ldr r5, [r4]
> tst r5, #4
> csinc   r2, r2, r2, eq
> bl  bar
> movsr0, #0
> pop {r3, r4, r5, pc}
>
>
> Ok for trunk and releases/gcc-14?

LGTM, but wait for Richard's approval.
Maybe include the above info in the commit message, it might be
helpful later when doing archaeologiy :-)
(I notice that the testcase hasn't changed since it was introduced in 2012...)

Thanks,

Christophe

>
> --
>
> Update test case for armv8.1-m.main that supports conditional
> arithmetic.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/epilog-1.c: Use check-function-bodies.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.target/arm/epilog-1.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/epilog-1.c 
> b/gcc/testsuite/gcc.target/arm/epilog-1.c
> index f97f1ebeaaf..903251a70e6 100644
> --- a/gcc/testsuite/gcc.target/arm/epilog-1.c
> +++ b/gcc/testsuite/gcc.target/arm/epilog-1.c
> @@ -2,16 +2,28 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mthumb -Os" } */
>  /* { dg-require-effective-target arm_thumb2_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
>
>  volatile int g_k;
>  extern void bar(int, int, int, int);
>
> +/*
> +** foo:
> +** ...
> +** (
> +** lslsr[0-9]+, r[0-9]+, #29
> +** it  mi
> +** addmi   r2, r2, #1
> +** |
> +** tst r[0-9]+, #4
> +** csinc   r2, r2, r2, eq
> +** )
> +** bl  bar
> +** ...
> +*/
>  int foo(int a, int b, int c, int d)
>  {
>if (g_k & 4) c++;
>bar (a, b, c, d);
>return 0;
>  }
> -
> -/* { dg-final { scan-assembler-times "lsls.*#29" 1 } } */
> -/* { dg-final { scan-assembler-not "tst" } } */
> --
> 2.25.1
>


[PATCH v2] c: Implement C2y N3356, if declarations [PR117019]

2024-11-07 Thread Marek Polacek
On Wed, Nov 06, 2024 at 06:06:46PM +, Joseph Myers wrote:
> On Wed, 6 Nov 2024, Marek Polacek wrote:
> 
> > On Wed, Nov 06, 2024 at 09:42:02AM -0500, Marek Polacek wrote:
> > > On reflection, I'm not so sure about these anymore:
> > > 
> > > On Mon, Nov 04, 2024 at 06:26:47PM -0500, Marek Polacek wrote:
> > > > +  switch (extern int i = 0);  /* { dg-error "in condition|both 
> > > > .extern. and initializer" } */
> > > 
> > > I think this is definitely valid.
> > 
> > Ugh, *INvalid*.
> >  
> > > > +  switch (register int i = 0); /* { dg-error "in condition" } */
> > > > +  switch (static int i = 0); /* { dg-error "in condition" } */
> > > > +  switch (thread_local int i = 0); /* { dg-error "in 
> > > > condition|function-scope" } */
> > > 
> > > All three may be valid, actually.
> > > 
> > > > +  switch (typedef int i); /* { dg-error "in condition|initializer" } */
> > > > +  switch (typedef int i = 0); /* { dg-error "in condition|initialized" 
> > > > } */
> > > 
> > > Both remain invalid.
> > > 
> > > Joseph, let me know if you agree, and I'll adjust the patch.  Thanks.
> 
> There are no restrictions on storage class specifiers for 
> simple-declarations separate from those applying to declarations in 
> general.  When such a declaration would be invalid inside a function, it 
> remains invalid as a simple-declaration; otherwise, it appears such 
> statements are valid.  The wording is certainly questionable for such 
> cases, since it talks about "the controlling expression is the value of 
> the single declared object after initialization", but initialization in 
> the static storage duration case occurs once at translation time whereas 
> one would expect the if/switch to determine the controlling expression 
> value once each execution.

Thanks, I've removed the check to prohibit storage class specifiers.

> I don't see any -std=c2y -pedantic-errors tests in the patch.  Such tests 
> should be present for both valid and invalid code.

Added.
 
> The logic for dealing with a simple declaration appears to use the 
> declaration directly to extract its initialized value, without any 
> convert_lvalue_to_rvalue, though there's one subsequently for switch 
> statements.  But for if statements, I don't see anything that would ensure 
> convert_lvalue_to_rvalue gets used.  In particular, although it's not 
> specified whether the initialized value is re-read from the object, if the 
> object is atomic it seems wrong to do a non-atomic read from it to get the 
> value for the condition.

Ah, right.  I've added the call.
 
> Another case to test: a simple declaration with array type.  Maybe "the 
> value of the single declared object after initialization" means there is 
> no conversion from array to pointer, in which case it would be invalid in 
> both if and switch.

I.e., this:

  if (int arr[] = { 1 });

With the convert_lvalue_to_rvalue call this now works, which I think it should.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements C2y N3356, if declarations as described at
.

This feature is cognate with C++17 Selection statements with initializer
,
but they are not the same yet.  For example, C++17 allows

  if (lock (); int i = getval ())

whereas C2y does not.

The proposal adds new grammar productions.  selection-header is handled
in c_parser_selection_header which is the gist of the patch.
simple-declaration is handled by c_parser_declaration_or_fndef, which
gets a new parameter.

PR c/117019

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Adjust declaration.
(c_parser_external_declaration): Adjust a call to
c_parser_declaration_or_fndef.
(c_parser_declaration_or_fndef): New bool parameter.  Return a tree
instead of void.  Adjust for N3356.  Adjust a call to
c_parser_declaration_or_fndef.
(c_parser_compound_statement_nostart): Adjust calls to
c_parser_declaration_or_fndef.
(c_parser_selection_header): New.
(c_parser_paren_selection_header): New.
(c_parser_if_statement): Call c_parser_paren_selection_header
instead of c_parser_paren_condition.
(c_parser_switch_statement): Call c_parser_selection_header instead of
c_parser_expression.
(c_parser_for_statement): Adjust calls to c_parser_declaration_or_fndef.
(c_parser_objc_methodprotolist): Likewise.
(c_parser_oacc_routine): Likewise.
(c_parser_omp_loop_nest): Likewise.
(c_parser_omp_declare_simd): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/c23-if-decls-1.c: New test.
* gcc.dg/c23-if-decls-2.c: New test.
* gcc.dg/c2y-if-decls-1.c: New test.
* gcc.dg/c2y-if-decls-2.c: New test.
* gcc.dg/c2y-if-decls-3.c: New test.
* gcc.dg/c2y-if-decls-4.c: New test.
   

[RFC 2/9] aarch64: add new define_insn for subg

2024-11-07 Thread Indu Bhagat
subg (Subtract with Tag) is an Armv8.5-A memory tagging (MTE)
instruction.  It can be used to subtract an immediate value scaled by
the tag granule from the address in the source register.

gcc/ChangeLog:

* config/aarch64/aarch64.md (subg): New definition.
---
 gcc/config/aarch64/aarch64.md | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8d10197c9e8d..1ec872afef71 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -8193,6 +8193,23 @@
   [(set_attr "type" "memtag")]
 )
 
+(define_insn "subg"
+  [(set (match_operand:DI 0 "register_operand" "=rk")
+   (ior:DI
+(and:DI (minus:DI (match_operand:DI 1 "register_operand" "rk")
+ (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
+(const_int -1080863910568919041)) ;; 0xf0ff...
+(ashift:DI
+ (unspec:QI
+  [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
+   (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
+  UNSPEC_GEN_TAG)
+ (const_int 56]
+  "TARGET_MEMTAG"
+  "subg\\t%0, %1, #%2, #%3"
+  [(set_attr "type" "memtag")]
+)
+
 (define_insn "subp"
   [(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI
-- 
2.43.0



[RFC 3/9] aarch64: add new insn definition for st2g

2024-11-07 Thread Indu Bhagat
Store Allocation Tags (st2g) is an Armv8.5-A memory tagging (MTE)
instruction. It stores an allocation tag to two tag granules of memory.

TBD:
  - Not too sure what is the best way to generate the st2g yet; A
subsequent patch will emit them in one of the target hooks.
  - the current define_insn may need fixing.  The construct comparing
the two offsets should rather be defined as a new predicate?

gcc/ChangeLog:

* config/aarch64/aarch64.md (st2g): New definition.
---
 gcc/config/aarch64/aarch64.md | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1ec872afef71..a2a69a9c0d3e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -8252,6 +8252,26 @@
   [(set_attr "type" "memtag")]
 )
 
+;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
+;; once, without zero initialization.
+(define_insn "st2g"
+  [(set (mem:QI (unspec:DI
+[(plus:DI (match_operand:DI 1 "register_operand" "rk")
+  (match_operand:DI 2 "aarch64_granule16_simm9" "i"))]
+UNSPEC_TAG_SPACE))
+   (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
+(const_int 56)) (const_int 15)))
+   (set (mem:QI (unspec:DI
+[(plus:DI (match_dup 1)
+  (match_operand:DI 3 "aarch64_granule16_simm9" "i"))]
+UNSPEC_TAG_SPACE))
+   (and:QI (lshiftrt:DI (match_dup 0)
+(const_int 56)) (const_int 15)))]
+  "TARGET_MEMTAG && (INTVAL (operands[2]) - 16 == INTVAL (operands[3]))"
+  "st2g\\t%0, [%1, #%2]"
+  [(set_attr "type" "memtag")]
+)
+
 ;; Load/Store 64-bit (LS64) instructions.
 (define_insn "ld64b"
   [(set (match_operand:V8DI 0 "register_operand" "=r")
-- 
2.43.0



[RFC 4/9] opts: doc: aarch64: add new memtag sanitizer

2024-11-07 Thread Indu Bhagat
Add new command line option -fsanitize=memtag with the following
new params:
 --param memtag-instrument-stack [0,1] (default 1) to use MTE
insns for enabling dynamic checking of stack variables.
 --param memtag-instrument-alloca [0,1] (default 1) to use MTE
insns for enabling dynamic checking of stack allocas.

Add errors to convey that memtag sanitizer does not work with
hwaddress and address sanitizers.  Also error out if memtag ISA
extension is not enabled.

MEMTAG sanitizer will use the HWASAN machinery, but with a few
differences:
  - The tags are always generated at runtime by the hardware, so
-fsanitize=memtag enforces a --param hwasan-random-frame-tag=1

Add documentation in gcc/doc/invoke.texi.

TBD:
  - Add new command line option -fsanitize-memtag-mode=str, where str
can be sync, async or asymm (see
https://docs.kernel.org/arch/arm64/memory-tagging-extension.html).  This
option will not affect code generation; the information will eventually
be need to be passed to perhaps the linker.  This is contingent on
what ABI we define between userspace applications and the kernel for
communicating that the stack be PROT_MTE.

gcc/
* builtins.def: Adjust the macro to include the new
SANTIZIE_MEMTAG.
* flag-types.h (enum sanitize_code): Add new enumerator for
ANITIZE_MEMTAG.
* opts.cc (finish_options): MEMTAG conflicts with hwaddress and
address sanitizers.
(common_handle_option): SANITIZE_MEMTAG tags are always
generated (randomly) by the hardware.
* params.opt: Add new params for MEMTAG sanitizer.
doc/
* invoke.texi: Update documentation.

gcc/config/
* aarch64/aarch64.cc (aarch64_override_options_internal): Error
out if MTE is not available.
---
 gcc/builtins.def  |  1 +
 gcc/config/aarch64/aarch64.cc |  4 
 gcc/doc/invoke.texi   | 11 +--
 gcc/flag-types.h  |  2 ++
 gcc/opts.cc   | 15 +++
 gcc/params.opt|  8 
 6 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 0c76ebc5e31a..659c7c2b5c13 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -245,6 +245,7 @@ along with GCC; see the file COPYING3.  If not see
   true, true, true, ATTRS, true, \
  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
| SANITIZE_HWADDRESS \
+   | SANITIZE_MEMTAG \
| SANITIZE_UNDEFINED \
| SANITIZE_UNDEFINED_NONDEFAULT) \
   || flag_sanitize_coverage))
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f2b53475adbe..1ef2dbcf9030 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18518,6 +18518,10 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   && !fixed_regs[R18_REGNUM])
 error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
+  if (flag_sanitize & SANITIZE_MEMTAG && !TARGET_MEMTAG)
+error ("%<-fsanitize=memtag%> requires the ISA extension %qs",
+  "memtag");
+
   aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
   if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
   && !(isa_flags & AARCH64_FL_SME))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7146163d66d0..f8bd273b07ad 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17626,8 +17626,9 @@ the available options are shown at startup of the 
instrumented program.  See
 
@url{https://github.com/google/sanitizers/wiki/AddressSanitizerFlags#run-time-flags}
 for a list of supported options.
 The option cannot be combined with @option{-fsanitize=thread} or
-@option{-fsanitize=hwaddress}.  Note that the only target
-@option{-fsanitize=hwaddress} is currently supported on is AArch64.
+@option{-fsanitize=hwaddress} or @option{-fsanitize=memtag}.  Note that the
+only target @option{-fsanitize=hwaddress} and @option{-fsanitize=memtag} are
+currently supported on is AArch64.
 
 To get more accurate stack traces, it is possible to use options such as
 @option{-O0}, @option{-O1}, or @option{-Og} (which, for instance, prevent
@@ -17676,6 +17677,12 @@ possible by specifying the command-line options
 @option{--param hwasan-instrument-allocas=1} respectively. Using a random frame
 tag is not implemented for kernel instrumentation.
 
+@opindex fsanitize=memtag
+@item -fsanitize=memtag
+Use Memory Tagging Extension instructions instead of instrumentation to allow
+the detection of memory errors.  This option is available only on those AArch64
+architectures that support Memory Tagging Extensions.
+
 @opindex fsanitize=pointer-compare
 @item -fsanitize=pointer-compare
 Instrument comparison operation (<, <=, >, >=) with pointer operands.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index df56337

[committed 1/2] libstdc++: Define __is_pair variable template for C++11

2024-11-07 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (__is_pair): Define for C++11 and
C++14 as well.
---
Tested powerpc64le-linux. Pushed to trunk.

 libstdc++-v3/include/bits/stl_pair.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index e92fcad2d66..527fb9105f0 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -1189,12 +1189,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 inline constexpr size_t tuple_size_v> = 2;
+#endif
 
+#if __cplusplus >= 201103L
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++14-extensions" // variable templates
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // inline variables
   template
 inline constexpr bool __is_pair = false;
 
   template
 inline constexpr bool __is_pair> = true;
+#pragma GCC diagnostic pop
 #endif
 
   /// @cond undocumented
-- 
2.47.0



[RFC 7/9] hwasan: add support for generating MTE instructions for memory tagging

2024-11-07 Thread Indu Bhagat
Memory tagging is used for detecting memory safety bugs.  On AArch64, the
memory tagging extension (MTE) helps in reducing the overheads of memory
tagging:
 - CPU: MTE instructions for efficiently tagging and untagging memory.
 - Memory: New memory type, Normal Tagged Memory, added to the Arm
   Architecture.

The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
HWASAN.  MEMTAG and HWASAN are both hardware-assisted solutions, and
rely on the same sanitizer machinery in parts.  So, define new
constructs that allow MEMTAG and HWASAN to share the infrastructure:

  - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
SANITIZE_HWASAN is true.
  - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
stack variables are to be sanitized.

MEMTAG and HWASAN do have differences, however, and hence, the need to
conditionalize using memtag_sanitize_p () in the relevant places. E.g.,

  - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
memory.  Similar approach can be seen for handling
handle_builtin_alloca, where instead of doing the gimple
transformations, target hooks are used.

  - Add a new internal function HWASAN_ALLOCA_POISON to handle
dynamically allocated stack when MEMTAG sanitizer is enabled. At
expansion, this allows to, in turn, invoke target-hooks to increment
tag, and use the generated tag to finally tag the dynamically allocated
memory.

TBD:
 - Not sure if we really need param_memtag_instrument_mem_intrinsics
   explicitly.
 - Conditionalizing using hwassist_sanitize_p (), memtag_sanitize_p ()
   etc looks unappetizing in some cases.  Not sure if there is a better
   way.  Is this generally the right thing to do, or is there some
   desirable refactorings.
 - adding decl to hwasan_stack_var. double check if this is necessary.
   See how we update the RTL for decl at expand_one_stack_var_at. And
   then use the RTL for decl in hwasan_emit_prologue
 - In hwasan_frame_base (), see if checking for memtag_sanitize_p () for
   force_reg etc is really necessary.  Revisit to see what gives, fix or
   add documentation.
 - Error out if user specifies stack alloc alignment not a factor of 16 ?

gcc/ChangeLog:

* asan.cc (struct hwasan_stack_var):
(handle_builtin_stack_restore): Accommodate MEMTAG sanitizer.
(handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
(get_mem_refs_of_builtin_call): Include MEMTAG along with
HWASAN.
(memtag_sanitize_stack_p): New definition.
(memtag_sanitize_allocas_p): Likewise.
(memtag_memintrin): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(report_error_func): Include MEMTAG along with HWASAN.
(build_check_stmt): Likewise.
(instrument_derefs): MEMTAG too does not deal with globals yet.
(instrument_builtin_call):
(maybe_instrument_call): Include MEMTAG along with HWASAN.
(asan_expand_mark_ifn): Likewise.
(asan_expand_check_ifn): Likewise.
(asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
(asan_instrument):
(hwasan_frame_base):
(hwasan_record_stack_var):
(hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
(hwasan_emit_untag_frame): Likewise.
* asan.h (hwasan_record_stack_var):
(memtag_sanitize_stack_p): New declaration.
(memtag_sanitize_allocas_p): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(asan_sanitize_use_after_scope): Include MEMTAG along with
HWASAN.
* cfgexpand.cc (align_local_variable): Likewise.
(expand_one_stack_var_at): Likewise.
(expand_stack_vars): Likewise.
(expand_one_stack_var_1): Likewise.
(init_vars_expansion): Likewise.
(expand_used_vars): Likewise.
(pass_expand::execute): Likewise.
* gimplify.cc (asan_poison_variable): Likewise.
* internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition.
(expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG
sanitizer.
(expand_HWASAN_MARK): Likewise.
* internal-fn.def (HWASAN_ALLOCA_POISON): Define new.
* params.opt: Document new param. FIXME.
* sanopt.cc (pass_sanopt::execute): Include MEMTAG along with
HWASAN.
---
 gcc/asan.cc | 236 +---
 gcc/asan.h  |   9 +-
 gcc/cfgexpand.cc|  37 +++
 gcc/gimplify.cc |   5 +-
 gcc/internal-fn.cc  |  69 +++--
 gcc/internal-fn.def |   1 +
 gcc/params.opt  |   4 +
 gcc/sanopt.cc   |   2 +-
 8 files changed, 275 insertions(+), 88 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 95a9009011f7..92c16c67c7e5 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -299,6 

[RFC 8/9] asan: memtag: enable pass_asan for memtag sanitizer

2024-11-07 Thread Indu Bhagat
Check for SANITIZER_MEMTAG in the gate function for pass_asan gimple
pass; enable it.

TBD:
  - This commit was initially carved out in order to ensure each patch
works in isolation.  Need to revisit and double check this.

gcc/ChangeLog:

* asan.cc (memtag_sanitize_p): Fix definition.
(gate): Add gate_memtag ().
(gate_memtag): New definition.
* asan.h (gate_memtag): New declaration.
---
 gcc/asan.cc | 12 +---
 gcc/asan.h  |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 92c16c67c7e5..762c83ce5e5f 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1889,7 +1889,7 @@ hwasan_memintrin (void)
 bool
 memtag_sanitize_p ()
 {
-  return false;
+  return sanitize_flags_p (SANITIZE_MEMTAG);
 }
 
 /* Are we tagging the stack?  */
@@ -4414,7 +4414,7 @@ public:
   opt_pass * clone () final override { return new pass_asan (m_ctxt); }
   bool gate (function *) final override
   {
-return gate_asan () || gate_hwasan ();
+return gate_asan () || gate_hwasan () || gate_memtag ();
   }
   unsigned int execute (function *) final override
   {
@@ -4456,7 +4456,7 @@ public:
   /* opt_pass methods: */
   bool gate (function *) final override
 {
-  return !optimize && (gate_asan () || gate_hwasan ());
+  return !optimize && (gate_asan () || gate_hwasan () || gate_memtag ());
 }
   unsigned int execute (function *) final override
   {
@@ -4990,4 +4990,10 @@ gate_hwasan ()
   return hwasan_sanitize_p ();
 }
 
+bool
+gate_memtag ()
+{
+  return memtag_sanitize_p ();
+}
+
 #include "gt-asan.h"
diff --git a/gcc/asan.h b/gcc/asan.h
index d169a769f780..8aa16c9931ed 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -57,6 +57,7 @@ extern bool gate_hwasan (void);
 extern bool memtag_sanitize_p (void);
 extern bool memtag_sanitize_stack_p (void);
 extern bool memtag_sanitize_allocas_p (void);
+extern bool gate_memtag (void);
 
 bool hwassist_sanitize_p (void);
 bool hwassist_sanitize_stack_p (void);
-- 
2.43.0



[RFC 1/9] opts: use unsigned HOST_WIDE_INT for sanitizer flags

2024-11-07 Thread Indu Bhagat
Currently, the data type of sanitizer flags is unsigned int, with
SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
enumerator for enum sanitize_code.  Use 'unsigned HOST_WIDE_INT' data
type to allow for more distinct instrumentation modes be added when
needed.

FIXME:
1. Is using d_ulong_type for build_int_cst in gcc/d/d-attribs.cc, and
   uint64_type_node in gcc/c-family/c-attribs.cc OK ? To get type
   associated with unsigned HOST_WIDE_INT ?

gcc/ChangeLog:

* asan.h (sanitize_flags_p): Use 'unsigned HOST_WIDE_INT'
instead of 'unsigned int'.
* common.opt: Likewise.
* dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
* opts.cc (find_sanitizer_argument): Likewise.
(report_conflicting_sanitizer_options): Likewise.
(parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* opts.h (parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* tree-cfg.cc (print_no_sanitize_attr_value): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (add_no_sanitize_value): Likewise.
(handle_no_sanitize_attribute): Likewise.
(handle_no_sanitize_address_attribute): Likewise.
(handle_no_sanitize_thread_attribute): Likewise.
(handle_no_address_safety_analysis_attribute): Likewise.
* c-common.h (add_no_sanitize_value): Likewise.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Likewise.

gcc/cp/ChangeLog:

* typeck.cc (get_member_function_from_ptrfunc): Likewise.

gcc/d/ChangeLog:

* d-attribs.cc (d_handle_no_sanitize_attribute): Likewise.
---
 gcc/asan.h|  5 +++--
 gcc/c-family/c-attribs.cc | 17 +
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-parser.cc |  4 ++--
 gcc/common.opt|  6 +++---
 gcc/cp/typeck.cc  |  2 +-
 gcc/d/d-attribs.cc|  9 +
 gcc/dwarf2asm.cc  |  2 +-
 gcc/opts.cc   | 21 -
 gcc/opts.h|  9 +
 gcc/tree-cfg.cc   |  2 +-
 11 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index d1bf8b1e701b..751ead187e35 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -239,9 +239,10 @@ asan_protect_stack_decl (tree decl)
remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES.  */
 
 inline bool
-sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl)
+sanitize_flags_p (unsigned HOST_WIDE_INT flag,
+ const_tree fn = current_function_decl)
 {
-  unsigned int result_flags = flag_sanitize & flag;
+  unsigned HOST_WIDE_INT result_flags = flag_sanitize & flag;
   if (result_flags == 0)
 return false;
 
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 7fd480e6d41b..66e66ba5d575 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1401,23 +1401,24 @@ handle_cold_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
 /* Add FLAGS for a function NODE to no_sanitize_flags in DECL_ATTRIBUTES.  */
 
 void
-add_no_sanitize_value (tree node, unsigned int flags)
+add_no_sanitize_value (tree node, unsigned HOST_WIDE_INT flags)
 {
   tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node));
   if (attr)
 {
-  unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr));
+  unsigned HOST_WIDE_INT old_value = tree_to_uhwi (TREE_VALUE (attr));
   flags |= old_value;
 
   if (flags == old_value)
return;
 
-  TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags);
+  TREE_VALUE (attr) = build_int_cst (TREE_TYPE (attr), flags);
 }
   else
 DECL_ATTRIBUTES (node)
   = tree_cons (get_identifier ("no_sanitize"),
-  build_int_cst (unsigned_type_node, flags),
+  // FIXME
+  build_int_cst (uint64_type_node, flags),
   DECL_ATTRIBUTES (node));
 }
 
@@ -1428,7 +1429,7 @@ static tree
 handle_no_sanitize_attribute (tree *node, tree name, tree args, int,
  bool *no_add_attrs)
 {
-  unsigned int flags = 0;
+  unsigned HOST_WIDE_INT flags = 0;
   *no_add_attrs = true;
   if (TREE_CODE (*node) != FUNCTION_DECL)
 {
@@ -1465,7 +1466,7 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SANITIZE_ADDRESS);
+add_no_sanitize_value (*node, (HOST_WIDE_INT) SANITIZE_ADDRESS);
 
   return NULL_TREE;
 }
@@ -1481,7 +1482,7 @@ handle_no_sanitize_thread_attribute (tree *node, tree 
name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SANITIZE_THREAD);
+add_no_sanitize_value (*node, (HOST_WIDE_INT) SANITIZE_THREAD);
 
   retu

[RFC 5/9] targhooks: add new target hook TARGET_MEMTAG_TAG_MEMORY

2024-11-07 Thread Indu Bhagat
Add a new target hook TARGET_MEMTAG_TAG_MEMORY to tag (and untag)
memory.  The default implementation is empty.

Hardware-assisted sanitizers on architectures providing instructions to
tag/untag memory can then make use of this target hook.  On AArch64,
e.g., the MEMTAG sanitizer will use this hook to tag and untag memory
using MTE insns.

gcc/ChangeLog:

* doc/tm.texi: Re-generate.
* doc/tm.texi.in: Add documentation for new target hooks.
* target.def: Add new hook.
* targhooks.cc (default_memtag_tag_memory): New hook.
* targhooks.h (default_memtag_tag_memory): Likewise.
---
 gcc/doc/tm.texi| 5 +
 gcc/doc/tm.texi.in | 2 ++
 gcc/target.def | 6 ++
 gcc/targhooks.cc   | 7 +++
 gcc/targhooks.h| 1 +
 5 files changed, 21 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 4deb3d2c283a..fbc8efb7ede9 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12820,6 +12820,11 @@ Store the result in @var{target} if convenient.
 The default clears the top byte of the original pointer.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_TAG_MEMORY (rtx @var{base}, rtx 
@var{size}, rtx @var{tagged_pointer})
+Tag memory at address @var{base} TBD FIXME.
+The default clears the top byte of the original pointer.
+@end deftypefn
+
 @deftypevr {Target Hook} bool TARGET_HAVE_SHADOW_CALL_STACK
 This value is true if the target platform supports
 @option{-fsanitize=shadow-call-stack}.  The default value is false.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9f147ccb95cc..b25ef8cc0f91 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8121,6 +8121,8 @@ maintainer is familiar with.
 
 @hook TARGET_MEMTAG_UNTAGGED_POINTER
 
+@hook TARGET_MEMTAG_TAG_MEMORY
+
 @hook TARGET_HAVE_SHADOW_CALL_STACK
 
 @hook TARGET_HAVE_LIBATOMIC
diff --git a/gcc/target.def b/gcc/target.def
index 523ae7ec9aaa..415cb8076ca4 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7419,6 +7419,12 @@ Store the result in @var{target} if convenient.\n\
 The default clears the top byte of the original pointer.",
   rtx, (rtx tagged_pointer, rtx target), default_memtag_untagged_pointer)
 
+DEFHOOK
+(tag_memory,
+ "Tag memory at address @var{base} TBD FIXME.\n\
+The default clears the top byte of the original pointer.",
+  rtx, (rtx base, rtx size, rtx tagged_pointer), default_memtag_tag_memory)
+
 HOOK_VECTOR_END (memtag)
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 304b35ed7724..aa5d38a69fde 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2843,4 +2843,11 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx 
target)
   return untagged_base;
 }
 
+/* The default implementation of TARGET_MEMTAG_TAG_MEMORY.  */
+rtx
+default_memtag_tag_memory (rtx, rtx, rtx)
+{
+  gcc_unreachable ();
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 2704d6008f14..44f5a28e0dd2 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -308,5 +308,6 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, 
uint8_t);
 extern rtx default_memtag_set_tag (rtx, rtx, rtx);
 extern rtx default_memtag_extract_tag (rtx, rtx);
 extern rtx default_memtag_untagged_pointer (rtx, rtx);
+extern rtx default_memtag_tag_memory (rtx, rtx, rtx);
 
 #endif /* GCC_TARGHOOKS_H */
-- 
2.43.0



[RFC 6/9] aarch64: memtag: implement target hooks

2024-11-07 Thread Indu Bhagat
MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.

Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect.  Continue to use the default target hook if HWASAN is
being used.  Following target hooks are implemented:
   - TARGET_MEMTAG_INSERT_RANDOM_TAG
   - TARGET_MEMTAG_ADD_TAG
   - TARGET_MEMTAG_TAG_MEMORY

Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
   - TARGET_MEMTAG_TAG_SIZE
   - TARGET_MEMTAG_GRANULE_SIZE

As noted earlier, TARGET_MEMTAG_TAG_MEMORY is a target-specific hook,
the  _only_ use of which is done by the MEMTAG sanitizer.  On aarch64,
TARGET_MEMTAG_TAG_MEMORY will emit MTE instructions to tag/untag memory
of a given size.  TARGET_MEMTAG_TAG_MEMORY target hook implementation
may emit an actual loop to tag/untag memory when size of memory block is
an expression to be evaluated.  Both aarch64_memtag_tag_memory () and
aarch64_memtag_tag_memory_via_loop () may generate stg or st2g,
depending on the number of iterations.

TBD:
- rtx generation in the target-hooks not tested well.  WIP.
- See how AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD is defined and then
used to generate a loop to tag/untag memory.  Is there a better way
to do this ?

gcc/ChangeLog:

* asan.cc (memtag_sanitize_p): New definition.
* asan.h (memtag_sanitize_p): New declaration.
* config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
Define.
(AARCH64_MEMTAG_TAG_SIZE): Define.
(aarch64_can_tag_addresses): Add MEMTAG specific handling.
(aarch64_memtag_tag_size): Likewise.
(aarch64_memtag_granule_size): Likewise.
(aarch64_memtag_insert_random_tag): Generate irg insn.
(aarch64_memtag_add_tag): Generate addg/subg insn.
(AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Define.
(aarch64_memtag_tag_memory_via_loop): New definition.
(aarch64_memtag_tag_memory): Likewise.
(TARGET_MEMTAG_TAG_SIZE): Define target-hook.
(TARGET_MEMTAG_GRANULE_SIZE): Likewise.
(TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
(TARGET_MEMTAG_ADD_TAG): Likewise.
(TARGET_MEMTAG_TAG_MEMORY): Likewise.
---
 gcc/asan.cc   |  12 ++
 gcc/asan.h|   2 +
 gcc/config/aarch64/aarch64.cc | 266 +-
 3 files changed, 279 insertions(+), 1 deletion(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 408c25de4de3..95a9009011f7 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1832,6 +1832,18 @@ hwasan_memintrin (void)
   return (hwasan_sanitize_p () && param_hwasan_instrument_mem_intrinsics);
 }
 
+/* MEMoryTAGging sanitizer (memtag) uses a hardware based capability known as
+   memory tagging to detect memory safety vulnerabilities.  Similar to hwasan,
+   it is also a probabilistic method.  */
+
+/* Returns whether we are tagging pointers and checking those tags on memory
+   access.  */
+bool
+memtag_sanitize_p ()
+{
+  return false;
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
directly, epilogue sequence returned.  BASE is the register holding the
stack base, against which OFFSETS array offsets are relative to, OFFSETS
diff --git a/gcc/asan.h b/gcc/asan.h
index 751ead187e35..c34b1d304288 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -54,6 +54,8 @@ extern bool hwasan_expand_check_ifn (gimple_stmt_iterator *, 
bool);
 extern bool hwasan_expand_mark_ifn (gimple_stmt_iterator *);
 extern bool gate_hwasan (void);
 
+extern bool memtag_sanitize_p (void);
+
 extern gimple_stmt_iterator create_cond_insert_point
  (gimple_stmt_iterator *, bool, bool, bool, basic_block *, basic_block *);
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1ef2dbcf9030..1bd70568d80e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -29376,15 +29376,264 @@ aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, 
const_tree type1,
   return NULL;
 }
 
+#define AARCH64_MEMTAG_GRANULE_SIZE  16
+#define AARCH64_MEMTAG_TAG_SIZE  4
+
 /* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES.  Here we tell the rest of the
compiler that we automatically ignore the top byte of our pointers, which
-   allows using -fsanitize=hwaddress.  */
+   allows using -fsanitize=hwaddress.  In case of -fsanitize=memtag, we
+   additionally ensure that target supports MEMTAG insns.  */
 bool
 aarch64_can_tag_addresses ()
 {
+  if (memtag_sanitize_p ())
+return !TARGET_ILP32 && TARGET_MEMTAG;
   return !TARGET_ILP32;
 }
 
+/* Implement TARGET_MEMTAG_TAG_SIZE.  */
+unsigned char
+aarch64_memtag_tag_size ()
+{
+  if (memtag_sanitize_p ())
+return AARCH64_MEMTAG_TAG_SIZE;
+  return default_memtag_tag_size ();
+}
+
+/* Implement TARGET_MEMTAG_GRANULE_SIZE.  */
+unsigned char
+aarch64_memtag_gra

Re: [PATCH] c++: Fix ICE on constexpr virtual function [PR117317]

2024-11-07 Thread Jason Merrill

On 10/30/24 3:17 AM, Jakub Jelinek wrote:

Hi!

Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.

The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and eventually for backports?


OK.


2024-10-30  Jakub Jelinek  

PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.

* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.

--- gcc/cp/semantics.cc.jj  2024-10-25 10:00:29.433768358 +0200
+++ gcc/cp/semantics.cc 2024-10-29 13:10:32.234068524 +0100
@@ -5150,7 +5150,10 @@ emit_associated_thunks (tree fn)
   enabling you to output all the thunks with the function itself.  */
if (DECL_VIRTUAL_P (fn)
/* Do not emit thunks for extern template instantiations.  */
-  && ! DECL_REALLY_EXTERN (fn))
+  && ! DECL_REALLY_EXTERN (fn)
+  /* Do not emit thunks for tentative decls, those will be processed
+again at_eof if really needed.  */
+  && (DECL_INTERFACE_KNOWN (fn) || !DECL_DEFER_OUTPUT (fn)))
  {
tree thunk;
  
--- gcc/testsuite/g++.dg/cpp2a/pr117317-1.C.jj	2024-10-29 13:12:23.373519669 +0100

+++ gcc/testsuite/g++.dg/cpp2a/pr117317-1.C 2024-10-29 13:12:18.223591437 
+0100
@@ -0,0 +1,19 @@
+// PR c++/117317
+// { dg-do compile { target c++20 } }
+
+struct C {
+  constexpr bool operator== (const C &b) const { return foo (); }
+  constexpr virtual bool foo () const = 0;
+};
+class A : public C {};
+class B : public C {};
+template 
+struct D : A, B
+{
+  constexpr bool operator== (const D &) const = default;
+  constexpr bool foo () const override { return true; }
+};
+struct E : D<1> {};
+constexpr E e;
+constexpr E f;
+static_assert (e == f, "");
--- gcc/testsuite/g++.dg/cpp2a/pr117317-2.C.jj  2024-10-29 13:16:10.101359947 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/pr117317-2.C 2024-10-29 13:16:15.981278003 
+0100
@@ -0,0 +1,15 @@
+// PR c++/117317
+// { dg-do compile { target c++20 } }
+
+struct C {
+  constexpr virtual bool foo () const = 0;
+};
+struct A : public C {};
+struct B : public C {};
+template 
+struct D : A, B
+{
+  constexpr bool foo () const override { return true; }
+};
+constexpr D<0> d;
+static_assert (d.foo (), "");

Jakub





[RFC 9/9] memtag: testsuite: add new tests

2024-11-07 Thread Indu Bhagat
Add basic tests for MEMTAG sanitizer.  MEMTAG sanitizer uses target
hooks to emit AArch64 specific MTE instructions.

Add new target-specific tests.

The currently generated code has quite a few limitations:

1. For basic-1.c testcase, currently we generate:
subgx0, x0, #16, #0
stg x0, [x0, #0]
str w1, [x0]
   The subg can be optimized out. Adding #0 to the tag is
   non-consequential.  The address generation component (x0+16) can be
   folded into the addr operands of stg.

2. Need to generate stgp (pre-index, post-index) above.
   Need to look into how aarch64 backend generates the
   store-pair/load-pair operations currently.  We will likely need to use
   the same framework for generating the store-pair-with-tag (pre-indexed
   and post-indexed) variants for MTE.

3. Also stzp is not generated at all.

TBD:
  - Any suggestions on any of the above 3 will be helpful.
  - Are the tests fittingly placed in gcc.target/aarch64 ? Suggestions
on other tests are also most welcome.

gcc/testsuite/

* gcc.target/aarch64/memtag/alloca-1.c: New test.
* gcc.target/aarch64/memtag/alloca-3.c: New test.
* gcc.target/aarch64/memtag/arguments-1.c: New test.
* gcc.target/aarch64/memtag/arguments-2.c: New test.
* gcc.target/aarch64/memtag/arguments-4.c: New test.
* gcc.target/aarch64/memtag/arguments.c: New test.
* gcc.target/aarch64/memtag/basic-1.c: New test.
* gcc.target/aarch64/memtag/basic-3.c: New test.
* gcc.target/aarch64/memtag/basic-struct.c: New test.
* gcc.target/aarch64/memtag/large-array.c: New test.
* gcc.target/aarch64/memtag/local-no-escape.c: New test.
* gcc.target/aarch64/memtag/memtag.exp: New test.
* gcc.target/aarch64/memtag/no-sanitize-attribute.c: New test.
* gcc.target/aarch64/memtag/vararray-gimple.c: New test.
* gcc.target/aarch64/memtag/vararray.c: New test.
* lib/target-supports.exp: Define new proc to detect whether
AArch64 target supports MTE.
---
 .../gcc.target/aarch64/memtag/alloca-1.c  | 14 
 .../gcc.target/aarch64/memtag/alloca-3.c  | 24 ++
 .../gcc.target/aarch64/memtag/arguments-1.c   |  3 ++
 .../gcc.target/aarch64/memtag/arguments-2.c   |  3 ++
 .../gcc.target/aarch64/memtag/arguments-4.c   | 16 ++
 .../gcc.target/aarch64/memtag/arguments.c |  3 ++
 .../gcc.target/aarch64/memtag/basic-1.c   | 15 +
 .../gcc.target/aarch64/memtag/basic-3.c   | 16 ++
 .../gcc.target/aarch64/memtag/basic-struct.c  | 23 +
 .../gcc.target/aarch64/memtag/large-array.c   | 24 ++
 .../aarch64/memtag/local-no-escape.c  | 20 
 .../gcc.target/aarch64/memtag/memtag.exp  | 32 +++
 .../aarch64/memtag/no-sanitize-attribute.c| 17 ++
 .../aarch64/memtag/vararray-gimple.c  | 17 ++
 .../gcc.target/aarch64/memtag/vararray.c  | 14 
 gcc/testsuite/lib/target-supports.exp | 12 +++
 16 files changed, 253 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-struct.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/large-array.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/local-no-escape.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/memtag.exp
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/memtag/no-sanitize-attribute.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray-gimple.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray.c

diff --git a/gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c 
b/gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c
new file mode 100644
index ..76cf2fe64669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+
+extern int use (int * b);
+
+int foo (int n)
+{
+  int * b = __builtin_alloca (n);
+  int a = use (b);
+  return a;
+}
+
+/* { dg-final { scan-assembler-times {\tirg\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tstg\t} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/memtag/alloca-3.c 
b/gcc/testsuite/gcc.target/aarch64/memtag/alloca-3.c
new file mode 100644
index ..6a336158732a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch6

[RFC 0/9] Add -fsanitize=memtag

2024-11-07 Thread Indu Bhagat
Hi,

Sending the current state of the work.

I would like to get feedback on whether this is generally the right
direction of adding the MEMTAG sanitizer in GCC.  I have added some
TBD/FIXME notes to each commit log.  These are some of the things I am
aware of and need to be resolved.  Please let me know any comments on
those or other issues that you may see.

Another incentive to send the series in the current state is to get
started on the discussion on what else needs to be done on the toolchain
side to support userspace programs which use MTE extension.  See the
section on "Additional necessary pieces" with notes on "Kernel and User
space ABI" and "MTE aware exeption handling and unwinding routines"
towards the end of the cover letter.

Thanks!

==
MTE on AArch64 and Memory Tagging
-
Memory Tagging Extension (MTE) is an AArch64 extension.  This extension allows
coloring of 16-byte memory granules with 4-bit tag values.  The extension
provides additional instructions in ISA and a new memory type, Normal Tagged
Memory, added to the Arm Architecture.  This hardware-assisted mechanism can be
used to detect memory bugs like buffer overrun or use-after-free.  The
detection is probabilistic.

Under the hoods, the MTE extension introduces two types of tags:
  - Address Tags, and,
  - Allocation Tags (a.k.a., Memory Tags)

Address Tag: which acts as the key.  This adds four bits to the top of a
virtual address.  It is built on AArch64 'top-byte-ignore'(TBI) feature.

Allocation Tag: which acts as the lock.  Allocation tags also consist of four
bits, linked with every aligned 16-byte region in the physical memory space.
Arm refers to these 16-byte regions as tag granules.  The way Allocation tags
are stored is a hardware implementation detail.

A subset of the MTE instructions which are relevant in the current
context are:

[Xn, Xd are registers containing addresses].

- irg Xd, Xn
  Copy Xn into Xd, insert a random 4-bit Address Tag into Xd.
- addg Xd, Xn, #, #
  Xd = Xn + immA, with Address Tag modified by #immB. Similarly, there
  exists a subg.
- stg Xd, [Xn]
  (Store Allocation Tag) updates Allocation Tag for [Xn, Xn + 16) to the
  Address Tag of Xd.
- stzg Xd, [Xn]
  writes zero to [Xn, Xn + 16) and updates the Allocation Tag for
  [Xn, Xn + 16] to the Address Tag of Xd.
- stgp Xt, Xt2, [Xn]
  Similar to STP, writes a pair of registers to memory at [Xn, Xn + 16) and
  updates Allocation Tag to match the Address Tag of Xn.

Additionally, note that load and store instructions with SP base
register do not check tags.

MEMTAG sanitizer for stack
--
Use MTE instructions to instrument stack accesses to detect memory safety
issues.

Detecting stack-related memory bugs requires the compiler to:
  - ensure that each object on the stack is allocated in its own 16-byte
granule. 
  - Tag/Color: put tags into each stack variable pointer.
  - Untag: the function epilogue will untag the (stack) memory.
Above should work with dynamic stack allocation as well.

GCC has HWASAN machinery for coloring stack variables.  Extend the machinery to
emit MTE instructions when MEMTAG sanitizer is in effect.

Deploying and running user space programs built with -fsanitizer=memtag will
need following additional pieces in place.  If there is any existing work /
ideas on any of the following, please send comments to help define these
topics.

Additional necessary pieces


* Kernel and User space ABI
  A user program may exercise MTE on stack, heap and/or globals data accesses.
The applicable memory range must be mapped with the Normal-Tagged memory
attribute ([1]).  When available and enabled, the kernel advertises the feature
to userspace via HWCAP2_MTE.  The new flag PROT_MTE (for mmap () and mprotect
()) specified that the associated pages allow access to the MTE allocation
tags.

glibc currently provides a tunable glibc.mem.tagging ([3]) and MTE aware
malloc.  The tunable can be used to enable the malloc subsystem to allocate
tagged memory with either precise or deferred faulting mode.

For userspace programs using MTE on AArch64, we need to define the necessary
ABI, so that the OS can set up the initial memory maps with PROT_MTE.  Along
with enabling PROT_MTE on address ranges, the OS also will setup the faulting
mode: default, sync, async and asymm.  The latter is done by privileged insns
operating on privileged registers and hence, must be carried out by the OS.

This will mean additional Binutils support to tag the ELF components (object
files, executables and shared libraries) and additional assembler and linker
command line options.  The linker may want to warn when mixing code with mixed
MTE usage.

I will create a separate thread to discuss these ABI specification aspects soon.

* MTE aware exception handling and unwinding routines
The additional stack coloring must work with C++ exceptions and C 
setjmp

[committed 2/2] libstdc++: Fix conversions to key/value types for hash table insertion [PR115285]

2024-11-07 Thread Jonathan Wakely
The conversions to key_type and value_type that are performed when
inserting into _Hashtable need to be fixed to do any required
conversions explicitly. The current code assumes that conversions from
the parameter to the key_type or value_type can be done implicitly,
which isn't necessarily true.

Remove the _S_forward_key function which doesn't handle all cases and
either forward the parameter if it already has type cv key_type, or
explicitly construct a temporary of type key_type.

Similarly, the _ConvertToValueType specialization for maps doesn't
handle all cases either, for std::pair arguments only some value
categories are handled. Remove _ConvertToValueType and for the _M_insert
function for unique keys, either forward the argument unchanged or
explicitly construct a temporary of type value_type.

For the _M_insert overload for non-unique keys we don't need any
conversion at all, we can just forward the argument directly to where we
construct a node.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_S_forward_key): Remove.
(_Hashtable::_M_insert_unique_aux): Replace _S_forward_key with
a static_cast to a type defined using conditional_t.
(_Hashtable::_M_insert): Replace _ConvertToValueType with a
static_cast to a type defined using conditional_t.
* include/bits/hashtable_policy.h (_ConvertToValueType): Remove.
* testsuite/23_containers/unordered_map/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/96088.cc: Adjust
expected number of allocations.
---
This is the start of several patch series refactoring the internals
for  and .

This is a minimal fix suitable for backporting. More changes on trunk
are coming.

Tested powerpc64le-linux. Pushed to trunk.

 libstdc++-v3/include/bits/hashtable.h | 33 +
 libstdc++-v3/include/bits/hashtable_policy.h  | 34 --
 .../unordered_map/insert/115285.cc| 47 +++
 .../23_containers/unordered_set/96088.cc  |  2 +-
 .../unordered_set/insert/115285.cc| 28 +++
 5 files changed, 88 insertions(+), 56 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/unordered_map/insert/115285.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/unordered_set/insert/115285.cc

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 6c553fb4b08..bd514cab798 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -929,25 +929,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::pair
_M_insert_unique(_Kt&&, _Arg&&, _NodeGenerator&);
 
-  template
-   key_type
-   _S_forward_key(_Kt&& __k)
-   { return std::forward<_Kt>(__k); }
-
-  static const key_type&
-  _S_forward_key(const key_type& __k)
-  { return __k; }
-
-  static key_type&&
-  _S_forward_key(key_type&& __k)
-  { return std::move(__k); }
-
   template
std::pair
_M_insert_unique_aux(_Arg&& __arg, _NodeGenerator& __node_gen)
{
+ using _Kt = decltype(_ExtractKey{}(std::forward<_Arg>(__arg)));
+ constexpr bool __is_key_type
+   = is_same<__remove_cvref_t<_Kt>, key_type>::value;
+ using _Fwd_key = __conditional_t<__is_key_type, _Kt&&, key_type>;
  return _M_insert_unique(
-   _S_forward_key(_ExtractKey{}(std::forward<_Arg>(__arg))),
+   static_cast<_Fwd_key>(_ExtractKey{}(std::forward<_Arg>(__arg))),
std::forward<_Arg>(__arg), __node_gen);
}
 
@@ -956,10 +947,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_insert(_Arg&& __arg, _NodeGenerator& __node_gen,
  true_type /* __uks */)
{
- using __to_value
-   = __detail::_ConvertToValueType<_ExtractKey, value_type>;
+ using __detail::_Identity;
+ using _Vt = __conditional_t::value
+   || __is_pair<__remove_cvref_t<_Arg>>,
+ _Arg&&, value_type>;
  return _M_insert_unique_aux(
-   __to_value{}(std::forward<_Arg>(__arg)), __node_gen);
+  static_cast<_Vt>(std::forward<_Arg>(__arg)), __node_gen);
}
 
   template
@@ -967,10 +960,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_insert(_Arg&& __arg, _NodeGenerator& __node_gen,
  false_type __uks)
{
- using __to_value
-   = __detail::_ConvertToValueType<_ExtractKey, value_type>;
- return _M_insert(cend(),
-   __to_value{}(std::forward<_Arg>(__arg)), __node_gen, __uks);
+ return _M_insert(cend(), std::forward<_Arg>(__arg),
+  __node_gen, __uks);
}
 
   // Insert with hint, not used when keys are unique.
diff --git a/libstdc++-v3/include/b

[committed] libstdc++: Improve comment for _Hashtable::_M_insert_unique_node

2024-11-07 Thread Jonathan Wakely
Clarify the effects if rehashing is needed. Document the __n_elt
parameter.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_M_insert_unique_node): Improve
comment.
---
Pushed as obvious.

 libstdc++-v3/include/bits/hashtable.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index bd514cab798..6bcba2de368 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -893,9 +893,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   pair<__node_ptr, __hash_code>
   _M_compute_hash_code(__node_ptr __hint, const key_type& __k) const;
 
-  // Insert node __n with hash code __code, in bucket __bkt if no
-  // rehash (assumes no element with same key already present).
+  // Insert node __n with hash code __code, in bucket __bkt (or another
+  // bucket if rehashing is needed).
+  // Assumes no element with equivalent key is already present.
   // Takes ownership of __n if insertion succeeds, throws otherwise.
+  // __n_elt is an estimated number of elements we expect to insert,
+  // used as a hint for rehashing when inserting a range.
   iterator
   _M_insert_unique_node(size_type __bkt, __hash_code,
__node_ptr __n, size_type __n_elt = 1);
-- 
2.47.0



Re: [PATCH v2][GCC14] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2024-11-07 Thread Richard Sandiford
"Yuta Mukai (Fujitsu)"  writes:
> Thank you for pushing to trunk.
> Can I also ask for a backport to GCC14?
>
> I have attached the patch for GCC14.
> FP8 has been excluded from the list as it is not supported in GCC14.
>
> Bootstrapped/regtested on aarch64-unknown-linux-gnu.

LGTM, thanks.  Pushed to gcc-14 branch.

Richard

>
> Thanks,
> Yuta
> --
> Yuta Mukai
> Fujitsu Limited
>
>>> Thank you for the reviews! I attached a patch that fixes the problems.
>>>
> On 31 Oct 2024, at 11:50, Richard Sandiford  
> wrote:
> 
> "Yuta Mukai (Fujitsu)"  writes:
>> Hello,
>> 
>> This patch adds initial support for FUJITSU-MONAKA CPU, which we are 
>> developing.
>> This is the slides for the CPU: 
>> https://www.fujitsu.com/downloads/SUPER/topics/isc24/next-arm-based-processor-fujitsu-monaka-and-its-software-ecosystem.pdf
>> 
>> Bootstrapped/regtested on aarch64-unknown-linux-gnu.
>> 
>> We will post a patch for backporting to GCC 14 later.
>> 
>> We would be grateful if someone could push this on our behalf, as we do 
>> not have write access.
> 
> Thanks for the patch, it looks good.  I just have a couple of minor 
> comments:
> 
>> @@ -132,6 +132,7 @@ AARCH64_CORE("octeontx2f95mm", octeontx2f95mm, 
>> cortexa57, V8_2A,  (CRYPTO, PROFI
>> 
>> /* Fujitsu ('F') cores. */
>> AARCH64_CORE("a64fx", a64fx, a64fx, V8_2A,  (F16, SVE), a64fx, 0x46, 
>> 0x001, -1)
>> +AARCH64_CORE("fujitsu-monaka", fujitsu_monaka, cortexa57, V9_3A, (AES, 
>> CRYPTO, F16, F16FML, FP8, LS64, RCPC, RNG, SHA2, SHA3, SM4, SVE2_AES, 
>> SVE2_BITPERM, SVE2_SHA3, SVE2_SM4), fujitsu_monaka, 0x46, 0x003, -1)
> 
> Usually this file omits listing a feature if it is already implied by the
> architecture level.  In this case, I think V9_3A should enable F16FML and
> RCPC automatically, and so we could drop those features from the list.
> 
> Also, we should be able to rely on transitive dependencies for the
> SVE2 crypto extensions.  So I think it should be enough to list:
> 
> AARCH64_CORE("fujitsu-monaka", fujitsu_monaka, cortexa57, V9_3A, (F16, 
> FP8, LS64, RNG, SVE2_AES, SVE2_BITPERM, SVE2_SHA3, SVE2_SM4), 
> fujitsu_monaka, 0x46, 0x003, -1)
> 
> which should have the same effect.
> 
> Could you check whether that works?
>>>
>>> Thanks for the list.
>>> CRYPTO was found not to be implied by SHA2, so I left only it there.
>>>
>>> Incidentally, the manual says that LS64 is automatically enabled for V9_2A, 
>>> but it is not.
>>> Should the manual be corrected?
>>>
>>> https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-march
 ‘armv9.2-a’Armv9.2-A   ‘armv9.1-a’, ‘+ls64’
>>
>>Oops, yes!  Thanks for pointing that out.  I'll push a patch separately.
>>
>> diff --git a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h 
>> b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
>> new file mode 100644
>> index 0..8d6f297b8
>> --- /dev/null
>> +++ b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
>> @@ -0,0 +1,65 @@
>> +/* Tuning model description for AArch64 architecture.
> 
> It's probably worth changing "AArch64 architecture" to "FUJITSU-MONAKA".
>>>
>>> Fixed.
>>>
> 
> The patch looks good to me otherwise.

Looks ok to me modulo those comments as well.
The ChangeLog should be improved a little bit too.

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add fujitsu-monaka
* config/aarch64/aarch64-tune.md: Regenerate
* config/aarch64/aarch64.cc: Include fujitsu-monaka tuning model
* doc/invoke.texi: Document -mcpu=fujitsu-monaka
* config/aarch64/tuning_models/fujitsu_monaka.h: New file.

The sentences should end in full stop “.”
>>>
>>> Fixed.
>>
>>Thanks for the patch.  I've pushed it to trunk.
>>
>>Richard
>>


Re: [PATCH] testsuite: arm: Use effective-target arm_fp for pr68620.c test

2024-11-07 Thread Christophe Lyon
On Thu, 7 Nov 2024 at 19:09, Torbjorn SVENSSON
 wrote:
>
>
>
> On 2024-11-07 16:33, Richard Earnshaw (lists) wrote:
> > On 06/11/2024 19:50, Torbjorn SVENSSON wrote:
> >>
> >>
> >> On 2024-11-06 19:06, Richard Earnshaw (lists) wrote:
> >>> On 06/11/2024 13:50, Torbjorn SVENSSON wrote:
> 
> 
>  On 2024-11-06 14:04, Richard Earnshaw (lists) wrote:
> > On 06/11/2024 12:23, Torbjorn SVENSSON wrote:
> >>
> >>
> >> On 2024-11-06 12:26, Richard Earnshaw (lists) wrote:
> >>> On 06/11/2024 07:44, Christophe Lyon wrote:
>  On Wed, 6 Nov 2024 at 07:20, Torbjörn SVENSSON
>   wrote:
> >
> > While the regression was reported on GCC15, I'm sure that same
> > regression will be seen on GCC14 when it's tested in the
> > arm-linux-gnueabihf configuration.
> >
> > Ok for trunk and releases/gcc-14?
> >
> > --
> >
> > This fixes reported regression at
> > https://linaro.atlassian.net/browse/GNU-1407.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/arm/pr68620.c: Use effective-target arm_fp.
> >
> > Signed-off-by: Torbjörn SVENSSON 
> > ---
> >  gcc/testsuite/gcc.target/arm/pr68620.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/pr68620.c 
> > b/gcc/testsuite/gcc.target/arm/pr68620.c
> > index 6e38671752f..1ed84f4ac75 100644
> > --- a/gcc/testsuite/gcc.target/arm/pr68620.c
> > +++ b/gcc/testsuite/gcc.target/arm/pr68620.c
> > @@ -1,8 +1,10 @@
> >  /* { dg-do compile } */
> >  /* { dg-skip-if "-mpure-code supports M-profile without Neon 
> > only" { *-*-* } { "-mpure-code" } } */
> >  /* { dg-require-effective-target arm_arch_v7a_ok } */
> > -/* { dg-options "-mfp16-format=ieee -mfpu=auto -mfloat-abi=softfp" 
> > } */
> > +/* { dg-require-effective-target arm_fp_ok } */
> > +/* { dg-options "-mfp16-format=ieee -mfpu=auto" } */
> >  /* { dg-add-options arm_arch_v7a } */
> > +/* { dg-add-options arm_fp } */
> >
> 
>  So... this partially reverts your previous patch (bringing back
>  arm_fp). What is the problem now?
> 
> >>>
> >>> Yeah, that sounds wrong.  arm_fp_ok tries to find options to add to 
> >>> the basic testsuite options, but it can't be combined with 
> >>> arm_arch_v7a as that picks a totally different set of flags for the 
> >>> architecture.
> >>
> >> The problem is that for arm-linux-gnueabihf, we cannot use 
> >> -mfloat-abi=softfp as there is no multilib available for that ABI, or 
> >> at least that's my interpretation of below error message.
> >>
> >> This is the output from the CI run:
> >>
> >> Executing on host: 
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/bin/armv8l-unknown-linux-gnueabihf-gcc
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c
> >> -fdiagnostics-plain-output   -mfp16-format=ieee -mfpu=auto 
> >> -mfloat-abi=softfp -mcpu=unset -march=armv7-a+fp -S -o pr68620.s 
> >> (timeout = 600)
> >> spawn -ignore SIGHUP 
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/bin/armv8l-unknown-linux-gnueabihf-gcc
> >>  
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c
> >>  -fdiagnostics-plain-output -mfp16-format=ieee -mfpu=auto 
> >> -mfloat-abi=softfp -mcpu=unset -march=armv7-a+fp -S -o pr68620.s
> >> In file included from /usr/include/features.h:510,
> >> from 
> >> /usr/include/arm-linux-gnueabihf/bits/libc-header-start.h:33,
> >> from /usr/include/stdint.h:26,
> >> from 
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/stdint.h:11,
> >> from 
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_fp16.h:34,
> >> from 
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/armv8l-unknown-linux-gnueabihf/lib/gcc/armv8l-unknown-linux-gnueabihf/15.0.0/include/arm_neon.h:41,
> >> from 
> >> /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.target/arm/pr68620.c:7:
> >> /usr/include/arm-linux-gnueabihf/gnu/stubs.h:7:11: fatal error: 
> >> gnu/stubs-soft.h: No such file or direct

[PATCH] libstdc++: Simplify _Hashtable merge functions

2024-11-07 Thread Jonathan Wakely
I realised that _M_merge_unique and _M_merge_multi call extract(iter)
which then has to call _M_get_previous_node to iterate through the
bucket to find the node before the one iter points to. Since the merge
function is already iterating over the entire container, we had the
previous node a moment ago. Walking the whole bucket to find it again is
wasteful. We could just rewrite the loop in terms of node pointers
instead of iterators, and then call _M_extract_node directly. However,
this is only possible when the source container is the same type as the
destination, because otherwise we can't access the source's private
members (_M_before_begin, _M_begin, _M_extract_node etc.)

Add overloads of _M_merge_unique and _M_merge_multi that work with
source containers of the same type, to enable this optimization.

For both overloads of _M_merge_unique we can also remove the conditional
modifications to __n_elt and just consistently decrement it for every
element processed. Use a multiplier of one or zero that dictates whether
__n_elt is passed to _M_insert_unique_node or not. We can also remove
the repeated calls to size() and just keep track of the size in a local
variable.

Although _M_merge_unique and _M_merge_multi should be safe for
"self-merge", i.e. when doing c.merge(c), it's wasteful to search/insert
every element when we don't need to do anything. Add 'this == &source'
checks to the overloads taking an lvalue of the container's own type.
Because those checks aren't needed for the rvalue overloads, change
those to call the underlying _M_merge_xxx function directly instead of
going through the lvalue overload that checks the address.

I've also added more extensive tests for better coverage of the new
overloads added in this commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_M_merge_unique): Add overload for
merging from same type.
(_M_merge_unique): Simplify size tracking. Add
comment.
(_M_merge_multi): Add overload for merging from same type.
(_M_merge_multi): Add comment.
* include/bits/unordered_map.h (unordered_map::merge): Check for
self-merge in the lvalue overload. Call _M_merge_unique directly
for the rvalue overload.
(unordered_multimap::merge): Likewise.
* include/bits/unordered_set.h (unordered_set::merge): Likewise.
(unordered_multiset::merge): Likewise.
* testsuite/23_containers/unordered_map/modifiers/merge.cc:
Add more tests.
* testsuite/23_containers/unordered_multimap/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_set/modifiers/merge.cc:
Likewise.
---
Tested x86_64-linux.

Also available for review at:
https://forge.sourceware.org/gcc/gcc-TEST/pulls/18

 libstdc++-v3/include/bits/hashtable.h | 118 
 libstdc++-v3/include/bits/unordered_map.h |  19 ++-
 libstdc++-v3/include/bits/unordered_set.h |  19 ++-
 .../unordered_map/modifiers/merge.cc  | 130 ++
 .../unordered_multimap/modifiers/merge.cc | 119 
 .../unordered_multiset/modifiers/merge.cc | 121 
 .../unordered_set/modifiers/merge.cc  | 128 +
 7 files changed, 626 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 6bcba2de368..56cada368f4 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1159,6 +1159,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __nh;
   }
 
+  /// Merge from another container of the same type.
+  void
+  _M_merge_unique(_Hashtable& __src)
+  {
+   __glibcxx_assert(get_allocator() == __src.get_allocator());
+
+   auto __size = size();
+   auto __n_elt = __src.size();
+   size_type __first = 1;
+   // For a container of identical type we can use its private members.
+   auto __p = static_cast<__node_ptr>(&__src._M_before_begin);
+   while (__n_elt--)
+ {
+   const auto __prev = __p;
+   __p = __p->_M_next();
+   const auto& __node = *__p;
+   const key_type& __k = _ExtractKey{}(__node._M_v());
+   if (__size <= __small_size_threshold())
+ {
+   auto __n = _M_begin();
+   for (; __n; __n = __n->_M_next())
+ if (this->_M_key_equals(__k, *__n))
+   break;
+   if (__n)
+ continue;
+ }
+
+   __hash_code __code
+ = _M_src_hash_code(__src.hash_function(), __k, __node);
+   size_type __bkt = _M_bucket_index(__code);
+   if (__size > __small_size_threshold())
+ if (_M_find_node(__bkt, __k, __code) != nullptr)
+   continue;
+
+   __hash_code __src_code

Re: [PATCH] testsuite: arm: Use effective-target for nomve_fp_1 test

2024-11-07 Thread Christophe Lyon
On Thu, 7 Nov 2024 at 18:33, Torbjorn SVENSSON
 wrote:
>
>
>
> On 2024-11-07 11:40, Christophe Lyon wrote:
> > Hi Torbjörn,
> >
> > On Thu, 31 Oct 2024 at 19:34, Torbjörn SVENSSON
> >  wrote:
> >>
> >> Ok for trunk and releases/gcc-14?
> >>
> >> --
> >>
> >> Test uses MVE, so add effective-target arm_fp requirement.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * g++.target/arm/mve/general-c++/nomve_fp_1.c: Use
> >>  effective-target arm_fp.
> >>
> > I see I made a similar change to the corresponding "C" test:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624404.html
> >
> > Is your patch fixing the same issue?
>
> Yes, it looks like it's the same issue (and resolution).
>
Thanks for confirming. The patch is OK.

Thanks,

Christophe

> Kind regards,
> Torbjörn
>
> >
> > Thanks,
> >
> > Christophe
> >
> >> Signed-off-by: Torbjörn SVENSSON 
> >> ---
> >>   gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c 
> >> b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
> >> index e0692ceb8c8..a2069d353cf 100644
> >> --- a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
> >> +++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
> >> @@ -1,9 +1,11 @@
> >>   /* { dg-do compile } */
> >> +/* { dg-require-effective-target arm_fp_ok } */
> >>   /* { dg-require-effective-target arm_v8_1m_mve_ok } */
> >>   /* Do not use dg-add-options arm_v8_1m_mve, because this might expand to 
> >> "",
> >>  which could imply mve+fp depending on the user settings. We want to 
> >> make
> >>  sure the '+fp' extension is not enabled.  */
> >>   /* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
> >> +/* { dg-add-options arm_fp } */
> >>
> >>   #include 
> >>
> >> --
> >> 2.25.1
> >>
>


[PATCH] testsuite: arm: Use check-function-bodies in epilog-1.c test

2024-11-07 Thread Torbjörn SVENSSON
The generated assembler is:

armv7-m:
push{r4, lr}
ldr r4, .L6
ldr r4, [r4]
lslsr4, r4, #29
it  mi
addmi   r2, r2, #1
bl  bar
movsr0, #0
pop {r4, pc}


armv8.1-m.main:
push{r3, r4, r5, lr}
ldr r4, .L5
ldr r5, [r4]
tst r5, #4
csinc   r2, r2, r2, eq
bl  bar
movsr0, #0
pop {r3, r4, r5, pc}


Ok for trunk and releases/gcc-14?

--

Update test case for armv8.1-m.main that supports conditional
arithmetic.

gcc/testsuite/ChangeLog:

* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/epilog-1.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/epilog-1.c 
b/gcc/testsuite/gcc.target/arm/epilog-1.c
index f97f1ebeaaf..903251a70e6 100644
--- a/gcc/testsuite/gcc.target/arm/epilog-1.c
+++ b/gcc/testsuite/gcc.target/arm/epilog-1.c
@@ -2,16 +2,28 @@
 /* { dg-do compile } */
 /* { dg-options "-mthumb -Os" } */
 /* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 volatile int g_k;
 extern void bar(int, int, int, int);
 
+/*
+** foo:
+** ...
+** (
+** lslsr[0-9]+, r[0-9]+, #29
+** it  mi
+** addmi   r2, r2, #1
+** |
+** tst r[0-9]+, #4
+** csinc   r2, r2, r2, eq
+** )
+** bl  bar
+** ...
+*/
 int foo(int a, int b, int c, int d)
 {
   if (g_k & 4) c++;
   bar (a, b, c, d);
   return 0;
 }
-
-/* { dg-final { scan-assembler-times "lsls.*#29" 1 } } */
-/* { dg-final { scan-assembler-not "tst" } } */
-- 
2.25.1



RE: [PATCH 3/5] Add LOOP_VINFO_MAIN_LOOP_INFO

2024-11-07 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 6, 2024 2:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Tamar Christina
> 
> Subject: [PATCH 3/5] Add LOOP_VINFO_MAIN_LOOP_INFO
> 
> The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
> LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
> vectorized loop info and the preceeding vectorized epilogue.
> This is critical for correctness as we need to disallow never
> executed epilogues by costing in vect_analyze_loop_costing as
> we assume those do not exist when deciding to add a skip-vector
> edge during peeling.  The patch also changes how multiple vector
> epilogues are handled - instead of the epilogue_vinfos array in
> the main loop info we now record the single epilogue_vinfo there
> and further epilogues in the epilogue_vinfo member of the
> epilogue info.  This simplifies code.

Nice!

FWIW, makes sense and looks good to me :)

Cheers,
Tamar
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
>   * tree-vectorizer.h (_loop_vec_info::main_loop_info): New.
>   (LOOP_VINFO_MAIN_LOOP_INFO): Likewise.
>   (_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos
>   from array to single element.
>   * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
>   main_loop_info and epilogue_vinfo.  Remove epilogue_vinfos
>   allocation.
>   (_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos.
>   (vect_create_loop_vinfo): Rename parameter, set
>   LOOP_VINFO_MAIN_LOOP_INFO.
>   (vect_analyze_loop_1): Rename parameter.
>   (vect_analyze_loop_costing): Properly distinguish between
>   the main vector loop and the preceeding epilogue.
>   (vect_analyze_loop): Change for epilogue_vinfos no longer
>   being a vector.
>   * tree-vect-loop-manip.cc (vect_do_peeling): Simplify and
>   thereby handle a vector epilogue of a vector epilogue.
> ---
>  gcc/tree-vect-loop-manip.cc | 22 +---
>  gcc/tree-vect-loop.cc   | 67 -
>  gcc/tree-vectorizer.h   | 12 +--
>  3 files changed, 53 insertions(+), 48 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 5bbeeddd854..c8dc7153298 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3100,12 +3100,12 @@ vect_get_main_loop_result (loop_vec_info
> loop_vinfo, tree main_loop_value,
> The analysis resulting in this epilogue loop's loop_vec_info was performed
> in the same vect_analyze_loop call as the main loop's.  At that time
> vect_analyze_loop constructs a list of accepted loop_vec_info's for lower
> -   vectorization factors than the main loop.  This list is stored in the main
> -   loop's loop_vec_info in the 'epilogue_vinfos' member.  Everytime we 
> decide to
> -   vectorize the epilogue loop for a lower vectorization factor,  the
> -   loop_vec_info sitting at the top of the epilogue_vinfos list is removed,
> -   updated and linked to the epilogue loop.  This is later used to vectorize
> -   the epilogue.  The reason the loop_vec_info needs updating is that it was
> +   vectorization factors than the main loop.  This list is chained in the
> +   loop's loop_vec_info in the 'epilogue_vinfo' member.  When we decide to
> +   vectorize the epilogue loop for a lower vectorization factor, the
> +   loop_vec_info in epilogue_vinfo is updated and linked to the epilogue 
> loop.
> +   This is later used to vectorize the epilogue.
> +   The reason the loop_vec_info needs updating is that it was
> constructed based on the original main loop, and the epilogue loop is a
> copy of this loop, so all links pointing to statements in the original 
> loop
> need updating.  Furthermore, these loop_vec_infos share the
> @@ -3128,7 +3128,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters,
> tree nitersm1,
>profile_probability prob_prolog, prob_vector, prob_epilog;
>int estimated_vf;
>int prolog_peeling = 0;
> -  bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
> +  bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
>/* We currently do not support prolog peeling if the target alignment is 
> not
>   known at compile time.  'vect_gen_prolog_loop_niters' depends on the
>   target alignment being constant.  */
> @@ -3255,13 +3255,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
>else
>  niters_prolog = build_int_cst (type, 0);
> 
> -  loop_vec_info epilogue_vinfo = NULL;
> -  if (vect_epilogues)
> -{
> -  epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
> -  loop_vinfo->epilogue_vinfos.ordered_remove (0);
> -}
> -
> +  loop_vec_info epilogue_vinfo = loop_vinfo->epilogue_vinfo;
>tree niters_vector_mult_vf = NULL_TREE;
>/* Saving NITERs before the loop, as this may be changed by prologue.  */
>tree before_loop

Re: [PATCH] testsuite: arm: Use effective-target for nomve_fp_1 test

2024-11-07 Thread Torbjorn SVENSSON




On 2024-11-07 11:40, Christophe Lyon wrote:

Hi Torbjörn,

On Thu, 31 Oct 2024 at 19:34, Torbjörn SVENSSON
 wrote:


Ok for trunk and releases/gcc-14?

--

Test uses MVE, so add effective-target arm_fp requirement.

gcc/testsuite/ChangeLog:

 * g++.target/arm/mve/general-c++/nomve_fp_1.c: Use
 effective-target arm_fp.


I see I made a similar change to the corresponding "C" test:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624404.html

Is your patch fixing the same issue?


Yes, it looks like it's the same issue (and resolution).

Kind regards,
Torbjörn



Thanks,

Christophe


Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c 
b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
index e0692ceb8c8..a2069d353cf 100644
--- a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
@@ -1,9 +1,11 @@
  /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp_ok } */
  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
  /* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
 which could imply mve+fp depending on the user settings. We want to make
 sure the '+fp' extension is not enabled.  */
  /* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+/* { dg-add-options arm_fp } */

  #include 

--
2.25.1





Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-11-07 Thread Robin Dapp
> I think it'd be better if I abstain from this.  I probably disagree too
> much with the current structure and the way that the code is developing.
> I won't object if anyone else approves it though.

It's not that I'm happy with the current state either and I thought about
how to rewrite it more than once.  So if you have an idea for a good
rework of it (or larger parts of ifcvt) I'd be all ears.

My argument right now would be that this patch doesn't make things worse in
terms of complexity and improves SPEC 2017's deepsjeng considerably on riscv.
IMHO it doesn't inhibit a future rewrite and even simplifies things
ever so slightly.

-- 
Regards
 Robin



RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-07 Thread Tamar Christina
> -Original Message-
> From: Li, Pan2 
> Sent: Thursday, November 7, 2024 1:45 AM
> To: Tamar Christina ; Richard Biener
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned
> SAT_ADD into branchless
> 
> I see, thanks Tamar for the explanation.
> 
> > The problem with the rewrite is that it pessimists the code if the 
> > saturating
> > instructions are not recognized afterwards.
> 
> The original idea is somehow independent with the backend support IFN_SAT_* or
> not.
> Given we have sorts of form of IFN_SAT_*, some of them are cheap while others
> are heavy
> from the perspective of gimple. It is possible to do some canonicalization 
> here.
> 
> > Additionally the branch version will get the benefit of branch prediction 
> > when ran
> > inside a loop.
> 
> Not very sure if my understanding is correct, but the branchless version is
> preferred in most case IMO if
> they have nearly count of stmt. Not sure if it is still true during the 
> vectorization.

I don't think this is true.  The branchless version must be either less 
instructions or
cheaper instruction.  The branches, especially in a loop become mostly free due 
to
branch prediction. Modern branch predictors are really good at idoms like this.

That said, the other issue with doing this in gimple is that it interferes with 
the RTL
conditional execution pass.

For instance see https://godbolt.org/z/fvrq3aq6K

On ISAs with conditional operations the branch version gets ifconverted.

On AArch64 we get:

sat_add_u_1(unsigned int, unsigned int):
addsw0, w0, w1
csinv   w0, w0, wzr, cc
ret

so just 2 instructions, and also branchless. On x86_64 we get:

sat_add_u_1(unsigned int, unsigned int):
add edi, esi
mov eax, -1
cmovnc  eax, edi
ret

so 3 instructions but a dependency chain of 2.

also branchless.  This patch would regress both of these.

Thanks,
Tamar

> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Thursday, November 7, 2024 12:25 AM
> To: Li, Pan2 ; Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned
> SAT_ADD into branchless
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Wednesday, November 6, 2024 1:31 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; Tamar Christina ;
> > juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com;
> > rdapp@gmail.com
> > Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned
> > SAT_ADD into branchless
> >
> > Never mind and thanks Richard for comments.
> >
> > > Sorry for falling back in reviewing - it's not exactly clear the "cheap" 
> > > form is
> > > cheaper.  When I count the number of gimple statements (sub-expressions)
> > > the original appears as 3 while the result looks to have 5.
> >
> > I may have a question about how we count the stmts, not sure the x <= _1 and
> > then goto should be counted
> 
> They should, because they both become a instructions that need to be executed 
> in
> the execution chain.
> 
> > as 1 or 2 stmts in below example. I try to count it by myself but not very 
> > sure the
> > understanding is correct.
> >
> > Before this patch:
> >1   │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y)
> >2   │ {
> >3   │   uint8_t D.2809;
> >4   │
> >5   │   _1 = x + y;  // 1
> >6   │   if (x <= _1) goto ; else goto ; // 2 for x <= 
> > _1, 3 for goto
> > ???
> >7   │   :
> >8   │   D.2809 = x + y; // 4 (token)
> >9   │   goto ; // 5 (token) for goto ???
> >   10   │   :
> >   11   │   D.2809 = 255; // 4 (not token)
> >   12   │   :
> >   13   │   return D.2809; // 6 for token, 5 for not token
> >   14   │ }
> >
> > After this patch:
> >1   │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y)
> >2   │ {
> >3   │   uint8_t D.2809;
> >4   │
> >5   │   _1 = x + y; // 1
> >6   │   _2 = x + y; // 2
> >7   │   _3 = x > _2; // 3
> >8   │   _4 = (unsigned char) _3; // 4
> >9   │   _5 = -_4; // 5
> >   10   │   D.2809 = _1 | _5; // 6
> >   11   │   return D.2809; // 7
> >   12   │ }
> >
> > > The catch is of
> > > course that the original might involve control flow and a PHI.
> 
> _1 and _2 are counted as one, as they'll be CSE'd.
> After the patch your dependency chain is 5 instructions, as in,
> to create the result you must execute 5 sequential instructions.
> 
>1   │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y)
>2   │ {
>3   │   uint8_t D.2809;
>4   │
>5   │   _1 = x + y;
>6   │   _2 = x + y; // 1
>7   │   _3 = x > _2; // 2
>8   │   _4 = (unsigned char) _3; // 3
>9   │   _5 = -_4; // 4
>   10   │   D.2809 = _1 | _5; //5
>   11   │   return D.2809;

Re: [PATCH v2 1/2] VN: Handle `(a | b) !=/== 0` for predicates [PR117414]

2024-11-07 Thread Andrew Pinski
On Thu, Nov 7, 2024 at 12:48 AM Richard Biener
 wrote:
>
> On Thu, Nov 7, 2024 at 12:43 AM Andrew Pinski  
> wrote:
> >
> > For `(a | b) == 0`, we can "assert" on the true edge that
> > both `a == 0` and `b == 0` but nothing on the false edge.
> > For `(a | b) != 0`, we can "assert" on the false edge that
> > both `a == 0` and `b == 0` but nothing on the true edge.
> > This adds that predicate and allows us to optimize f0, f1,
> > and f2 in fre-predicated-[12].c.
> >
> > Changes since v1:
> > * v2: Use vn_valueize. Also canonicalize the comparison
> >   at the begining of insert_predicates_for_cond for
> >   constants to be on the rhs. Return early for
> >   non-ssa names on the lhs (after canonicalization).
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > PR tree-optimization/117414
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the 
> > comparison.
> > Don't insert anything if lhs is not a SSA_NAME. Handle `(a | b) 
> > !=/== 0`.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/fre-predicated-1.c: New test.
> > * gcc.dg/tree-ssa/fre-predicated-2.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  .../gcc.dg/tree-ssa/fre-predicated-1.c| 53 +++
> >  .../gcc.dg/tree-ssa/fre-predicated-2.c| 27 ++
> >  gcc/tree-ssa-sccvn.cc | 36 +
> >  3 files changed, 116 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-2.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-1.c
> > new file mode 100644
> > index 000..d56952f5f24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-1.c
> > @@ -0,0 +1,53 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR tree-optimization/117414 */
> > +
> > +/* Fre1 should figure out that `*aaa != 0`
> > +   For f0, f1, and f2. */
> > +
> > +
> > +void foo();
> > +int f0(int *aaa, int j, int t)
> > +{
> > +  int b = *aaa;
> > +  int c = b != 0;
> > +  int d = t !=  0;
> > +  if (d | c)
> > +return 0;
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (*aaa) foo();
> > +  }
> > +  return 0;
> > +}
> > +
> > +int f1(int *aaa, int j, int t)
> > +{
> > +  int b = *aaa;
> > +  if (b != 0 || t != 0)
> > +return 0;
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (*aaa) foo();
> > +  }
> > +  return 0;
> > +}
> > +
> > +
> > +int f2(int *aaa, int j, int t)
> > +{
> > +  int b = *aaa;
> > +  if (b != 0)
> > +return 0;
> > +  if (t != 0)
> > +return 0;
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (*aaa) foo();
> > +  }
> > +   return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
> > +/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-2.c
> > new file mode 100644
> > index 000..0123a5b54f7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/fre-predicated-2.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR tree-optimization/117414 */
> > +
> > +/* Fre1 should figure out that `*aaa != 0`
> > +   For f0, f1, and f2. */
> > +
> > +
> > +void foo();
> > +int f0(int *aaa, int j, int t)
> > +{
> > +  int b = *aaa;
> > +  int d = b | t;
> > +  if (d == 0)
> > +;
> > +  else
> > +return 0;
> > +  for(int i = 0; i < j; i++)
> > +  {
> > +if (*aaa) foo();
> > +  }
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "foo " "optimized" } } */
> > +/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index a11bf968670..c60ba6d 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -7901,6 +7901,21 @@ static void
> >  insert_predicates_for_cond (tree_code code, tree lhs, tree rhs,
> > edge true_e, edge false_e)
> >  {
> > +  /* If both edges are null, then there is nothing to be done. */
> > +  if (!true_e && !false_e)
> > +return;
> > +
> > +  /* Canonicalize the comparison so the rhs are constants.  */
> > +  if (CONSTANT_CLASS_P (lhs))
> > +{
> > +  std::swap (lhs, rhs);
> > +  code = swap_tree_comparison (code);
> > +}
> > +
> > +  /* If the lhs is not a ssa name, don't record anything. */
> > +  if (TREE_CODE (lhs) != SSA_NAME)
> > +return;
> > +
> >tree_code icode = invert_tree_comparison (code, HONOR_NANS (lhs));
> >tree ops[2];
> >ops[0] = lhs;
> > @@ -7929,6 +7944,27 @@ insert_predicates_for_cond (tree_code code, tree 
> > lhs, tree rhs,
> >if (false_e)
> > 

Re: [PATCH] RISC-V: Add norelax function attribute

2024-11-07 Thread Kito Cheng
LGTM, thanks!, and I will defer this for a little bit to make the
c-api side stable :)

On Fri, Nov 8, 2024 at 12:19 AM  wrote:
>
> From: yulong 
>
> This patch adds norelax function attribute that be discussed in 
> riscv-c-api-doc PR#94.
> URL:https://github.com/riscv-non-isa/riscv-c-api-doc/pull/94
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_declare_function_name): Add new 
> attribute.
>
> ---
>  gcc/config/riscv/riscv.cc | 18 +---
>  .../gcc.target/riscv/target-attr-norelax.c| 21 +++
>  2 files changed, 36 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 2e9ac280c8f2..42525ff6faa3 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -654,6 +654,10 @@ static const attribute_spec riscv_gnu_attributes[] =
>   types.  */
>{"riscv_rvv_vector_bits", 1, 1, false, true, false, true,
> riscv_handle_rvv_vector_bits_attribute, NULL},
> +  /* This attribute is used to declare a function, forcing it to use the
> +standard vector calling convention variant. Syntax:
> +__attribute__((norelax)). */
> +  {"norelax", 0, 0, true, false, false, false, NULL, NULL},
>  };
>
>  static const scoped_attribute_specs riscv_gnu_attribute_table  =
> @@ -10051,10 +10055,17 @@ riscv_declare_function_name (FILE *stream, const 
> char *name, tree fndecl)
>riscv_asm_output_variant_cc (stream, fndecl, name);
>ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
>ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl);
> -  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
> +  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
> +  || lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
>  {
>fprintf (stream, "\t.option push\n");
> -
> +  if (lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
> +   {
> + fprintf (stream, "\t.option norelax\n");
> +   }
> +}
> +  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
> +{
>struct cl_target_option *local_cl_target =
> TREE_TARGET_OPTION (DECL_FUNCTION_SPECIFIC_TARGET (fndecl));
>struct cl_target_option *global_cl_target =
> @@ -10078,7 +10089,8 @@ riscv_declare_function_size (FILE *stream, const char 
> *name, tree fndecl)
>if (!flag_inhibit_size_directive)
>  ASM_OUTPUT_MEASURED_SIZE (stream, name);
>
> -  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl))
> +  if (DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
> +  || lookup_attribute ("norelax", DECL_ATTRIBUTES (fndecl)))
>  {
>fprintf (stream, "\t.option pop\n");
>  }
> diff --git a/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c 
> b/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
> new file mode 100644
> index ..77de6195ad1e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/target-attr-norelax.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc" { target { rv32 } } } */
> +/* { dg-options "-march=rv64gc" { target { rv64 } } } */
> +
> +__attribute__((norelax))
> +void foo1()
> +{}
> +
> +void foo2(void)
> +{}
> +
> +int main()
> +{
> +  foo1();
> +  foo2();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times ".option push\t" 1 } } */
> +/* { dg-final { scan-assembler-times ".option norelax\t" 1 } } */
> +/* { dg-final { scan-assembler-times ".option pop\t" 1 } } */
> --
> 2.34.1
>


[PATCH v2] testsuite: arm: Use effective-target arm_libc_fp_abi for pr68620.c test

2024-11-07 Thread Torbjörn SVENSSON
Changes since v1:

- Switch to arm_libc_fp_abi from arm_fp

@Christophe, can you test this patch in the linaro farm to ensure that
it does not fail again?

Ok for trunk and releases/gcc-14?

--

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1407.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68620.c: Use effective-target
arm_libc_fp_abi.
* lib/target-supports.exp: Define effective-target
arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Richard Earnshaw 
---
 gcc/testsuite/gcc.target/arm/pr68620.c |  4 ++-
 gcc/testsuite/lib/target-supports.exp  | 35 ++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr68620.c 
b/gcc/testsuite/gcc.target/arm/pr68620.c
index 6e38671752f..3ffaa5c5a9c 100644
--- a/gcc/testsuite/gcc.target/arm/pr68620.c
+++ b/gcc/testsuite/gcc.target/arm/pr68620.c
@@ -1,8 +1,10 @@
 /* { dg-do compile } */
 /* { dg-skip-if "-mpure-code supports M-profile without Neon only" { *-*-* } { 
"-mpure-code" } } */
 /* { dg-require-effective-target arm_arch_v7a_ok } */
-/* { dg-options "-mfp16-format=ieee -mfpu=auto -mfloat-abi=softfp" } */
+/* { dg-require-effective-target arm_libc_fp_abi_ok } */
+/* { dg-options "-mfp16-format=ieee -mfpu=auto" } */
 /* { dg-add-options arm_arch_v7a } */
+/* { dg-add-options arm_libc_fp_abi } */
 
 #include "arm_neon.h"
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 75703ddca60..0c2fd83f45c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4950,6 +4950,41 @@ proc add_options_for_arm_fp { flags } {
 return "$flags $et_arm_fp_flags"
 }
 
+# Some libc headers will only compile correctly if the correct ABI flags
+# are picked for the target environment.  Try to find an ABI setting
+# that works.  Glibc falls into this category.  This test is intended
+# to enable FP as far as possible, so does not try -mfloat-abi=soft.
+proc check_effective_target_arm_libc_fp_abi_ok_nocache { } {
+global et_arm_libc_fp_abi_flags
+set et_arm_libc_fp_abi_flags ""
+if { [check_effective_target_arm32] } {
+   foreach flags {"-mfloat-abi=hard" "-mfloat-abi=softfp"} {
+   if { [check_no_compiler_messages_nocache arm_libc_fp_abi_ok object {
+   #include 
+   } "$flags"] } {
+   set et_arm_libc_fp_abi_flags $flags
+   return 1
+   }
+   }
+}
+return 0
+}
+
+proc  check_effective_target_arm_libc_fp_abi_ok { } {
+return [check_cached_effective_target arm_libc_fp_abi_ok \
+   check_effective_target_arm_libc_fp_abi_ok_nocache]
+}
+
+# Add flags that pick the right ABI for the supported libc headers on
+# this platform.
+proc add_options_for_arm_libc_fp_abi { flags } {
+if { ! [check_effective_target_arm_libc_fp_abi_ok] } {
+   return "$flags"
+}
+global et_arm_libc_fp_abi_flags
+return "$flags $et_arm_libc_fp_abi_flags"
+}
+
 # Return 1 if this is an ARM target defining __ARM_FP with
 # double-precision support. We may need -mfloat-abi=softfp or
 # equivalent options.  Some multilibs may be incompatible with these
-- 
2.25.1



RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-07 Thread Li, Pan2
Thanks Tamar and Jeff for comments.

> I'm not sure it's that simple.  It'll depend on the micro-architecture. 
> So things like strength of the branch predictors, how fetch blocks are 
> handled (can you have embedded not-taken branches, short-forward-branch 
> optimizations, etc).

> After:
> 
> .L.sat_add_u_1(unsigned int, unsigned int):
>  add 4,3,4
>  rldicl 9,4,0,32
>  subf 3,3,9
>  sradi 3,3,63
>  or 3,3,4
>  rldicl 3,3,0,32
>  blr
> 
> and before
> 
> .L.sat_add_u_1(unsigned int, unsigned int):
>  add 4,3,4
>  cmplw 0,4,3
>  bge 0,.L2
>  li 4,-1
> .L2:
>  rldicl 3,4,0,32
>  blr

I am not familiar with branch prediction, but the branch should be 50% token 
and 50% not-token
according to the range of sat add input. It is the worst case for branch 
prediction? I mean if we call
100 times with token, not-token, token, not-token... sequence, the branch 
version will be still faster?
Feel free to correct me if I'm wrong.

Back to these 16 forms of sat add as below, is there any suggestion which one 
or two form(s) may be
cheaper than others from the perspective of gimple IR? Independent with the 
backend implemented SAT_ADD or not.

#define DEF_SAT_U_ADD_1(T)   \
T sat_u_add_##T##_1 (T x, T y)   \
{\
  return (T)(x + y) >= x ? (x + y) : -1; \
}

#define DEF_SAT_U_ADD_2(T)  \
T sat_u_add_##T##_2 (T x, T y)  \
{   \
  return (T)(x + y) < x ? -1 : (x + y); \
}

#define DEF_SAT_U_ADD_3(T)   \
T sat_u_add_##T##_3 (T x, T y)   \
{\
  return x <= (T)(x + y) ? (x + y) : -1; \
}

#define DEF_SAT_U_ADD_4(T)  \
T sat_u_add_##T##_4 (T x, T y)  \
{   \
  return x > (T)(x + y) ? -1 : (x + y); \
}

#define DEF_SAT_U_ADD_5(T)  \
T sat_u_add_##T##_1 (T x, T y)  \
{   \
  if ((T)(x + y) >= x)  \
return x + y;   \
  else  \
return -1;  \
}

#define DEF_SAT_U_ADD_6(T)  \
T sat_u_add_##T##_6 (T x, T y)  \
{   \
  if ((T)(x + y) < x)   \
return -1;  \
  else  \
return x + y;   \
}

#define DEF_SAT_U_ADD_7(T)  \
T sat_u_add_##T##_7 (T x, T y)  \
{   \
  if (x <= (T)(x + y))  \
return x + y;   \
  else  \
return -1;  \
}

#define DEF_SAT_U_ADD_8(T)  \
T sat_u_add_##T##_8 (T x, T y)  \
{   \
  if (x > (T)(x + y))   \
return -1;  \
  else  \
return x + y;   \
}

#define DEF_SAT_U_ADD_9(T) \
T sat_u_add_##T##_9 (T x, T y) \
{  \
  T ret;   \
  return __builtin_add_overflow (x, y, &ret) == 0 ? ret : - 1; \
}

#define DEF_SAT_U_ADD_10(T)\
T sat_u_add_##T##_10 (T x, T y)\
{  \
  T ret;   \
  return !__builtin_add_overflow (x, y, &ret) ? ret : - 1; \
}

#define DEF_SAT_U_ADD_11(T) \
T sat_u_add_##T##_11 (T x, T y) \
{   \
  T ret;\
  if (__builtin_add_overflow (x, y, &ret) == 0) \
return ret; \
  else  \
return -1;  \
}

#define DEF_SAT_U_ADD_12(T) \
T sat_u_add_##T##_12 (T x, T y) \
{   \
  T ret;\
  if (!__builtin_add_overflow (x, y, &ret)) \
return ret; \
  else  \
return -1;  \
}

#define DEF_SAT_U_ADD_13(T)   \
T sat_u_add_##T##_13 (T x, T y)   \
{ \
  T ret;  \
  return __builtin_add_overflow (x, y, &ret) != 0 ? -1 : ret; \
}

#define DEF_SAT_U_ADD_14(T)  \
T sat_u_add_##T##_14 (T x, T y)  \
{\
  T ret; \
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
}

#define DEF_SAT_U_ADD_15(T) \
T sat

Re: [PATCH v4 7/8] i386: Add zero maskload else operand.

2024-11-07 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 1:58 AM Robin Dapp  wrote:
>
> From: Robin Dapp 
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (maskload):
> Call maskload..._1.
> (maskload_1): Rename.
Ok for x86 part.
> ---
>  gcc/config/i386/sse.md | 21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 22c6c817dd7..1523e2c4d75 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -28641,7 +28641,7 @@ (define_insn 
> "_maskstore"
> (set_attr "btver2_decode" "vector")
> (set_attr "mode" "")])
>
> -(define_expand "maskload"
> +(define_expand "maskload_1"
>[(set (match_operand:V48_128_256 0 "register_operand")
> (unspec:V48_128_256
>   [(match_operand: 2 "register_operand")
> @@ -28649,13 +28649,28 @@ (define_expand "maskload"
>   UNSPEC_MASKMOV))]
>"TARGET_AVX")
>
> +(define_expand "maskload"
> +  [(set (match_operand:V48_128_256 0 "register_operand")
> +   (unspec:V48_128_256
> + [(match_operand: 2 "register_operand")
> +  (match_operand:V48_128_256 1 "memory_operand")
> +  (match_operand:V48_128_256 3 "const0_operand")]
> + UNSPEC_MASKMOV))]
> +  "TARGET_AVX"
> +{
> +  emit_insn (gen_maskload_1 (operands[0],
> +  operands[1],
> +  operands[2]));
> +  DONE;
> +})
> +
>  (define_expand "maskload"
>[(set (match_operand:V48_AVX512VL 0 "register_operand")
> (vec_merge:V48_AVX512VL
>   (unspec:V48_AVX512VL
> [(match_operand:V48_AVX512VL 1 "memory_operand")]
> UNSPEC_MASKLOAD)
> - (match_dup 0)
> +  (match_operand:V48_AVX512VL 3 "const0_operand")
>   (match_operand: 2 "register_operand")))]
>"TARGET_AVX512F")
>
> @@ -28665,7 +28680,7 @@ (define_expand "maskload"
>   (unspec:VI12HFBF_AVX512VL
> [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand")]
> UNSPEC_MASKLOAD)
> - (match_dup 0)
> +  (match_operand:VI12HFBF_AVX512VL 3 "const0_operand")
>   (match_operand: 2 "register_operand")))]
>"TARGET_AVX512BW")
>
> --
> 2.47.0
>


-- 
BR,
Hongtao


Re: [PATCH] vect: Do not try to duplicate_and_interleave one-element mode.

2024-11-07 Thread Robin Dapp
> Could you walk me through the failure in more detail?  It sounds
> like can_duplicate_and_interleave_p eventually gets to the point of
> subdividing the original elements, instead of either combining consecutive
> elements (the best case), or leaving them as-is (the expected fallback
> for SVE).  But it sounds like those attempts fail in this case, but an
> attempt to subdivide the elements succeeds.  Is that right?  And if so,
> why does that happen?

Apologies for the very late response.

What I see is that we start with a base_vector_type vector([1,1]) long int
and a count of 2, so ELT_BYTES = 16.
We don't have a TI vector mode (and creating a single-element vector by
interleaving is futile anyway) so the first attempt fails.
The type in the second attempt is vector([1,1]) unsigned long but this
is rejected because of

  && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)), 

  │
 2, &half_nelts))

Then we try vector([2,2]) unsigned int which "succeeds".  This, however,
eventually causes the ICE when we try to build a vector with 0 elements.

Maybe another option would be to decline 1-element vectors right away?

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index eac16e80ecd..d3e52489fa8 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -427,7 +427,9 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
tree *permutes)
 {
   tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, count);
-  if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
+  if (!base_vector_type
+  || !VECTOR_MODE_P (TYPE_MODE (base_vector_type))
+  || maybe_lt (GET_MODE_NUNITS (TYPE_MODE (base_vector_type)), 2))
 return false;

Regards
 Robin


Re: [PATCHv2 2/3] ada: Fix GNU/Hurd priority range

2024-11-07 Thread Marc Poulhiès
Samuel Thibault  writes:

> GNU/Mach currently uses a 0..63 range.
>
> gcc/ada/ChangeLog:
>
>   * libgnat/system-gnu.ads: New file.
>   * Makefile.rtl (x86-gnuhurd): Use libgnat/system-gnu.ads instead of
>   libgnat/system-freebsd.ads.
>
> Signed-off-by: Samuel Thibault 
> ---

OK without the ChangeLog part.

Thanks,
Marc


Re: [PATCHv2 3/3] ada: Add GNU/Hurd x86_64 support

2024-11-07 Thread Marc Poulhiès
Samuel Thibault  writes:

> This is essentially the same as the i386-pc-gnu section, the differences
> are the same as between freebsd i386 and freebsd x86_64.
>
> gcc/ada/ChangeLog:
>
> * Makefile.rtl: Add x86_64-pc-gnu section.
>
> Signed-off-by: Samuel Thibault 

OK without the ChangeLog part.

Thanks,
Marc


[PATCH 09/15] arm: [MVE intrinsics] add load_ext_gather_offset shape

2024-11-07 Thread Christophe Lyon
This patch adds the load_ext_gather_offset shape description.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct load_ext_gather):
New.
(struct load_ext_gather_offset_def): New.
* config/arm/arm-mve-builtins-shapes.h (load_ext_gather_offset):
New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 58 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 59 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 03714ffb435..28b90454417 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1535,6 +1535,64 @@ struct load_ext_def : public nonoverloaded_base
 };
 SHAPE (load_ext)
 
+/* Base class for load_ext_gather_offset and load_ext_gather_shifted_offset,
+   which differ only in the units of the displacement.  */
+struct load_ext_gather : public overloaded_base<0>
+{
+  bool
+  explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) 
const override
+  {
+return true;
+  }
+
+  bool
+  mode_after_pred () const override
+  {
+return false;
+  }
+};
+
+/* _t vfoo[_t0](_t const *, _t)
+
+   where  might be tied to  (for non-extending loads) or might
+   depend on the function base name (for extending loads),
+has the same width as  but is of unsigned type.
+
+   Example: vldrhq_gather_offset
+   int16x8_t [__arm_]vldrhq_gather_offset[_s16](int16_t const *base, 
uint16x8_t offset)
+   int32x4_t [__arm_]vldrhq_gather_offset_z[_s32](int16_t const *base, 
uint32x4_t offset, mve_pred16_t p)  */
+struct load_ext_gather_offset_def : public load_ext_gather
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_offset, preserve_user_namespace);
+build_all (b, "v0,al,vu0", group, MODE_offset, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+mode_suffix_index mode = MODE_offset;
+type_suffix_index ptr_type;
+type_suffix_index offset_type;
+if (!r.check_gp_argument (2, i, nargs)
+   || (ptr_type = r.infer_pointer_type (0)) == NUM_TYPE_SUFFIXES
+   || (offset_type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+/* tclass comes from base argument, element bits come from the offset
+   argument.  */
+type_suffix_index type = find_type_suffix (type_suffixes[ptr_type].tclass,
+  type_suffixes[offset_type].element_bits);
+
+return r.resolve_to (mode, type);
+  }
+};
+SHAPE (load_ext_gather_offset)
+
 /* _t vfoo[_t0](_t)
_t vfoo_n_t0(_t)
 
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 1d361addd76..9113d55dab4 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -63,6 +63,7 @@ namespace arm_mve
 extern const function_shape *const inherent;
 extern const function_shape *const load;
 extern const function_shape *const load_ext;
+extern const function_shape *const load_ext_gather_offset;
 extern const function_shape *const mvn;
 extern const function_shape *const store;
 extern const function_shape *const store_scatter_base;
-- 
2.34.1



[PATCH 15/15] arm: [MVE intrinsics] remove useless call_properties implementations.

2024-11-07 Thread Christophe Lyon
vstrq_impl derives from store_truncating and vldrq_impl derives from
load_extending which both implement call_properties.

No need to re-implement them in the derived classes.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (vstrq_impl): Remove
call_properties.
(vldrq_impl): Likewise.
---
 gcc/config/arm/arm-mve-builtins-base.cc | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 7938efcdf68..737403527a9 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -203,11 +203,6 @@ class vstrq_impl : public store_truncating
 public:
   using store_truncating::store_truncating;
 
-  unsigned int call_properties (const function_instance &) const override
-  {
-return CP_WRITE_MEMORY;
-  }
-
   rtx expand (function_expander &e) const override
   {
 insn_code icode;
@@ -369,11 +364,6 @@ class vldrq_impl : public load_extending
 public:
   using load_extending::load_extending;
 
-  unsigned int call_properties (const function_instance &) const override
-  {
-return CP_READ_MEMORY;
-  }
-
   rtx expand (function_expander &e) const override
   {
 insn_code icode;
-- 
2.34.1



Re: [PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

2024-11-07 Thread Richard Sandiford
Soumya AR  writes:
> Changes since v1:
>
> This revision makes use of the extended definition of aarch64_ptrue_reg to
> generate predicate registers with the appropriate set bits.
>
> Earlier, there was a suggestion to add support for half floats as well. I
> extended the patch to include HFs but GCC still emits a libcall for ldexpf16.
> For example, in the following case, the call does not lower to fscale:
>
> _Float16 test_ldexpf16 (_Float16 x, int i) {
>   return __builtin_ldexpf16 (x, i);
> }
>
> Any suggestions as to why this may be?

You'd need to change:

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 2d455938271..469835b1d62 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, 
vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)

 /* FP scales.  */
-DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
+DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary)

 /* Ternary math functions.  */
 DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)

A couple of comments below, but otherwise it looks good:

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 0bc98315bb6..7f708ea14f9 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -449,6 +449,9 @@
>  ;; All fully-packed SVE floating-point vector modes.
>  (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF])
>  
> +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit floats.
> +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF])

The comment is out of date.  How about:

;; Fully-packed SVE floating-point vector modes and their scalar equivalents.
(define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF])

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> new file mode 100644
> index 000..251b4ef9188
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +
> +float
> +test_ldexpf (float x, int i)
> +{
> +  return __builtin_ldexpf (x, i);
> +}
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +
> +double
> +test_ldexp (double x, int i)
> +{
> +  return __builtin_ldexp (x, i);
> +} 
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */

It would be good to check the ptrues as well, to make sure that we only
enable one lane.

Thanks,
Richard


[PATCH] inline-asm, i386, v2: Add "redzone" clobber support

2024-11-07 Thread Jakub Jelinek
On Thu, Nov 07, 2024 at 09:12:34AM +0100, Uros Bizjak wrote:
> On Thu, Nov 7, 2024 at 9:00 AM Jakub Jelinek  wrote:
> >
> > On Thu, Nov 07, 2024 at 08:47:34AM +0100, Uros Bizjak wrote:
> > > Maybe we should always recognize "redzone", even for targets without
> > > it. This is the way we recognize "cc" even for targets without CC reg
> > > (e.g. alpha). This would simplify the definition and processing - if
> > > the hook returns NULL_RTX (the default), then it (obviously) won't be
> > > added to the clobber list.
> >
> > Dunno, am open to that, but thought it would be just weird if one says
> > "redzone" on targets which don't have such a concept.
> 
> Let's look at the situation with x86_32 and x86_64. The "redzone" for
> the former is just an afterthought, so we can safely say that it
> doesn't support it. So, the code that targets both targets (e.g. linux
> kernel) would (in a pedantic way) have to redefine many shared asm
> defines, one to have clobber and one without it. We don't want that,
> we want one definition and "let's compiler sort it out".
> 
> For targets without clobber concept, well - don't add it to the
> clobber list if it is always ineffective. One *can* add "cc" to all
> alpha asms, but well.. ;)

Ok, here is a variant of the patch which just ignores "redzone" clobber if
it doesn't make sense.

2024-11-07  Jakub Jelinek  

gcc/
* target.def (redzone_clobber): New target hook.
* varasm.cc (decode_reg_name_and_count): Return -5 for
"redzone".
* cfgexpand.cc (expand_asm_stmt): Handle redzone clobber.
* config/i386/i386.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/i386/i386.cc (ix86_compute_frame_layout): Don't
use red zone if cfun->machine->asm_redzone_clobber_seen.
(ix86_redzone_clobber): New function.
(TARGET_REDZONE_CLOBBER): Redefine.
* doc/extend.texi (Clobbers and Scratch Registers): Document
the "redzone" clobber.
* doc/tm.texi.in: Add @hook TARGET_REDZONE_CLOBBER.
* doc/tm.texi: Regenerate.
gcc/testsuite/
* gcc.dg/asm-redzone-1.c: New test.
* gcc.target/i386/asm-redzone-1.c: New test.

--- gcc/target.def.jj   2024-11-06 18:53:10.836843793 +0100
+++ gcc/target.def  2024-11-07 10:57:58.697898800 +0100
@@ -3376,6 +3376,16 @@ to be used.",
  bool, (machine_mode mode),
  NULL)
 
+DEFHOOK
+(redzone_clobber,
+ "Define this to return some RTL for the @code{redzone} @code{asm} clobber\n\
+if target has a red zone and wants to support the @code{redzone} clobber\n\
+or return NULL if the clobber should be ignored.\n\
+\n\
+The default is to ignore the @code{redzone} clobber.",
+ rtx, (),
+ NULL)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
--- gcc/varasm.cc.jj2024-11-06 18:53:10.838843765 +0100
+++ gcc/varasm.cc   2024-11-07 10:55:46.858763724 +0100
@@ -965,9 +965,11 @@ set_user_assembler_name (tree decl, cons
 
 /* Decode an `asm' spec for a declaration as a register name.
Return the register number, or -1 if nothing specified,
-   or -2 if the ASMSPEC is not `cc' or `memory' and is not recognized,
+   or -2 if the ASMSPEC is not `cc' or `memory' or `redzone' and is not
+   recognized,
or -3 if ASMSPEC is `cc' and is not recognized,
-   or -4 if ASMSPEC is `memory' and is not recognized.
+   or -4 if ASMSPEC is `memory' and is not recognized,
+   or -5 if ASMSPEC is `redzone' and is not recognized.
Accept an exact spelling or a decimal number.
Prefixes such as % are optional.  */
 
@@ -1034,6 +1036,9 @@ decode_reg_name_and_count (const char *a
   }
 #endif /* ADDITIONAL_REGISTER_NAMES */
 
+  if (!strcmp (asmspec, "redzone"))
+   return -5;
+
   if (!strcmp (asmspec, "memory"))
return -4;
 
--- gcc/cfgexpand.cc.jj 2024-11-06 18:53:10.803844259 +0100
+++ gcc/cfgexpand.cc2024-11-07 11:00:16.212953571 +0100
@@ -3205,6 +3205,12 @@ expand_asm_stmt (gasm *stmt)
  rtx x = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
  clobber_rvec.safe_push (x);
}
+ else if (j == -5)
+   {
+ if (targetm.redzone_clobber)
+   if (rtx x = targetm.redzone_clobber ())
+ clobber_rvec.safe_push (x);
+   }
  else
{
  /* Otherwise we should have -1 == empty string
--- gcc/config/i386/i386.h.jj   2024-11-06 18:53:10.807844203 +0100
+++ gcc/config/i386/i386.h  2024-11-07 10:55:46.904763076 +0100
@@ -2881,6 +2881,9 @@ struct GTY(()) machine_function {
   /* True if red zone is used.  */
   BOOL_BITFIELD red_zone_used : 1;
 
+  /* True if inline asm with redzone clobber has been seen.  */
+  BOOL_BITFIELD asm_redzone_clobber_seen : 1;
+
   /* The largest alignment, in bytes, of stack slot actually used.  */
   unsigned int max_used_stack_alignment;
 
--- gcc/config/i38

RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-07 Thread Li, Pan2
I see your point that the backend can leverage condition move to emit the 
branch code.

> For instance see https://godbolt.org/z/fvrq3aq6K
> On ISAs with conditional operations the branch version gets ifconverted.
> On AArch64 we get:
> sat_add_u_1(unsigned int, unsigned int):
> addsw0, w0, w1
> csinv   w0, w0, wzr, cc
> ret
> so just 2 instructions, and also branchless. On x86_64 we get:
> sat_add_u_1(unsigned int, unsigned int):
> add edi, esi
> mov eax, -1
> cmovnc  eax, edi
> ret
> so 3 instructions but a dependency chain of 2.
> also branchless.  This patch would regress both of these.

But the above Godbolt may not be a good example for evidence, because both 
x86_64 and aarch64 implemented usadd
already.
Thus, they all go to usadd. For example as below, the sat_add_u_1 and 
sat_add_u_2 are almost the
same when the backend implemented usadd.

#include 

#define T uint32_t

  T sat_add_u_1 (T x, T y)
  {
return (T)(x + y) < x ? -1 : (x + y);
  }

  T sat_add_u_2 (T x, T y)
  {
return (x + y) | -((x + y) < x);
  }

It will become different when take gcc 14.2 (which doesn’t have .SAT_ADD GIMPLE 
IR), the x86_64
will have below asm dump for -O3. Looks like no obvious difference here.

sat_add_u_1(unsigned int, unsigned int):
add edi, esi
mov eax, -1
cmovnc  eax, edi
ret

sat_add_u_2(unsigned int, unsigned int):
add edi, esi
sbb eax, eax
or  eax, edi
ret

As well as for the vector mode for x86_64 gcc 14.2 as below.

  void vec_sat_add_u_1 (T *a, T *b, T *out, int n)
  {
for (int i = 0; i < n; i++) {
T x = a[i];
T y = b[i];
out[i] = (T)(x + y) < x ? -1 : (x + y);
}
  }

  void vec_sat_add_u_2 (T * __restrict a, T * __restrict b,
  T * __restrict out, int n)
  {
for (int i = 0; i < n; i++) {
T x = a[i];
T y = b[i];
out[i] = (x + y) | -((x + y) < x);
}
  }

vec_sat_add_u_1(unsigned int*, unsigned int*, unsigned int*, int):

.L15:
movdqu  xmm0, XMMWORD PTR [rdi+rax]
movdqu  xmm1, XMMWORD PTR [rsi+rax]
paddd   xmm1, xmm0
psubd   xmm0, xmm2
movdqa  xmm3, xmm1
psubd   xmm3, xmm2
pcmpgtd xmm0, xmm3
por xmm0, xmm1
movups  XMMWORD PTR [rdx+rax], xmm0
add rax, 16
cmp r8, rax
jne .L15
mov eax, ecx
and eax, -4
mov r8d, eax
cmp ecx, eax
je  .L11
sub ecx, eax
mov r9d, ecx
cmp ecx, 1
je  .L17
.L14:
movqxmm2, QWORD PTR .LC2[rip]
mov ecx, r8d
movqxmm0, QWORD PTR [rdi+rcx*4]
movqxmm1, QWORD PTR [rsi+rcx*4]
paddd   xmm1, xmm0
psubd   xmm0, xmm2
movdqa  xmm3, xmm1
psubd   xmm3, xmm2
pcmpgtd xmm0, xmm3
movdqa  xmm2, xmm0
pandn   xmm2, xmm1
por xmm0, xmm2
movqQWORD PTR [rdx+rcx*4], xmm0


vec_sat_add_u_2(unsigned int*, unsigned int*, unsigned int*, int):
...
.L50:
movdqu  xmm0, XMMWORD PTR [rdi+rax]
movdqu  xmm1, XMMWORD PTR [rsi+rax]
paddd   xmm1, xmm0
psubd   xmm0, xmm2
movdqa  xmm3, xmm1
psubd   xmm3, xmm2
pcmpgtd xmm0, xmm3
por xmm0, xmm1
movups  XMMWORD PTR [rdx+rax], xmm0
add rax, 16
cmp rax, r8
jne .L50
mov eax, ecx
and eax, -4
mov r8d, eax
cmp ecx, eax
je  .L64
.L49:
sub ecx, r8d
cmp ecx, 1
je  .L52
movqxmm0, QWORD PTR [rdi+r8*4]
movqxmm1, QWORD PTR [rsi+r8*4]
movqxmm2, QWORD PTR .LC2[rip]
paddd   xmm1, xmm0
psubd   xmm0, xmm2
movdqa  xmm3, xmm1
psubd   xmm3, xmm2
pcmpgtd xmm0, xmm3
movdqa  xmm2, xmm0
pandn   xmm2, xmm1
por xmm0, xmm2
movqQWORD PTR [rdx+r8*4], xmm0
...

Pan

-Original Message-
From: Tamar Christina  
Sent: Thursday, November 7, 2024 7:43 PM
To: Li, Pan2 ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD 
into branchless

> -Original Message-
> From: Li, Pan2 
> Sent: Thursday, November 7, 2024 1:45 AM
> To: Tamar Christina ; Richard Biener
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com;
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned
> SAT_ADD into branchless
> 
> I see, thanks Tamar for the explanation.
> 
> > The problem with the rewrite is that it pessimists the code if the 
> > saturating
> > instructions are not recognized afte

Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-07 Thread Richard Sandiford
Wilco Dijkstra  writes:
> The IRA combine_and_move pass runs if the scheduler is disabled and 
> aggressively
> combines moves.  The movsf/df patterns allow all FP immediates since they rely
> on a split pattern.  However splits do not happen during IRA, so the result is
> extra literal loads.  To avoid this, use a more accurate check that blocks
> creating FP immediates that need a split during combine_and_move.
>
> double f(void) { return 128.0; }
>
> -O2 -fno-schedule-insns gives:
>
> adrpx0, .LC0
> ldr d0, [x0, #:lo12:.LC0]
> ret
>
> After patch:
>
> mov x0, 4638707616191610880
> fmovd0, x0
> ret
>
> Passes bootstrap & regress, OK for commit?
>
> gcc/ChangeLog:
> * config/aarch64/aarch64.md (movhf_aarch64): Use 
> aarch64_valid_fp_move.
> (movsf_aarch64): Likewise.
> (movdf_aarch64): Likewise.
> * config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
> * config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 9be64913091443a62dc6d1a80c295dc52aaeb950..f4839413cf3e995871b728e2a36e332b89cd6abf
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -758,6 +758,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
>  opt_machine_mode aarch64_vq_mode (scalar_mode);
>  opt_machine_mode aarch64_full_sve_mode (scalar_mode);
>  bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
> +bool aarch64_valid_fp_move (rtx, rtx, machine_mode);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
>  bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
> HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> a6cc00e74abd4d96fa47f5612f271eb4fc95e7a1..130c1ff1e363db253b008e71c7e8e5deec8c46c8
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -11144,6 +11144,37 @@ aarch64_can_const_movi_rtx_p (rtx x, machine_mode 
> mode)
>return aarch64_simd_valid_mov_imm (v_op);
>  }
>
> +/* Return TRUE if DST and SRC with mode MODE is a valid fp move.  */
> +bool
> +aarch64_valid_fp_move (rtx dst, rtx src, machine_mode mode)
> +{
> +  if (!TARGET_FLOAT)
> +return false;
> +
> +  if (aarch64_reg_or_fp_zero (src, mode))
> +return true;
> +
> +  if (!register_operand (dst, mode))
> +return false;
> +
> +  if (MEM_P (src))
> +return true;
> +
> +  if (!DECIMAL_FLOAT_MODE_P (mode))
> +{
> +  if (aarch64_can_const_movi_rtx_p (src, mode)
> + || aarch64_float_const_representable_p (src)
> + || aarch64_float_const_zero_rtx_p (src))
> +   return true;
> +
> +  /* Block combine_and_move pass from creating FP immediates which
> +require a split during IRA - only allow this before regalloc.  */
> +  if (aarch64_float_const_rtx_p (src))
> +   return can_create_pseudo_p () && !ira_in_progress;
> +}
> +
> +  return can_create_pseudo_p ();

It's ok for instructions to require properties that are false during
early RTL passes and then transition to true.  But they can't require
properties that go from true to false, since that would mean that
existing instructions become unrecognisable at certain points during
the compilation process.

Also, why are the conditions tighter for aarch64_float_const_rtx_p
(which we can split) but not for the general case (which we can't,
and presumably need to force to memory)?  I.e. for what cases do we want
the final return to be (sometimes) true?  If it's going to be forced
into memory anyway then wouldn't we get better optimisation by exposing
that early?

Would it be possible to handle the split during expand instead?
Or do we expect to discover new FP constants during RTL optimisation?
If so, where do they come from?

Sorry for all the questions :)

Richard

> +}
>
>  /* Return the fixed registers used for condition codes.  */
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 20956fc49d8232763b127629ded17037ad7d7960..5d3fa9628952031f52474291e160b957d774b011
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1644,8 +1644,7 @@ (define_expand "mov"
>  (define_insn "*mov_aarch64"
>[(set (match_operand:HFBF 0 "nonimmediate_operand")
> (match_operand:HFBF 1 "general_operand"))]
> -  "TARGET_FLOAT && (register_operand (operands[0], mode)
> -|| aarch64_reg_or_fp_zero (operands[1], mode))"
> +  "aarch64_valid_fp_move (operands[0], operands[1], mode)"
>{@ [ cons: =0 , 1   ; attrs: type , arch  ]
>   [ w, Y   ; neon_move   , simd  ] movi\t%0.4h, #0
>   [ w, ?rY ; f_mcr   , fp16  ] fmov\t%h0, %w1
> @@ -1668,8 +1667,7 @@ (define_insn "*mov_aarch64"
>  (define_insn "*mov_aarch64"
>[(set

[committed] libstdc++: Tweak comments on includes in hashtable headers

2024-11-07 Thread Jonathan Wakely
std::is_permutation is only used in  not in
, so move the comment referring to it.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Add is_permutation to comment.
* include/bits/hashtable_policy.h: Remove it from comment.
---
Pushed as obvious.

 libstdc++-v3/include/bits/hashtable.h| 2 +-
 libstdc++-v3/include/bits/hashtable_policy.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 8b312d25d7a..d36b32a7e3f 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -36,7 +36,7 @@
 
 #include 
 #include 
-#include  // fill_n
+#include  // fill_n, is_permutation
 #include  // __has_is_transparent_t
 #if __cplusplus > 201402L
 # include 
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 5d79e2ba26f..d8b201864c1 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -33,7 +33,7 @@
 
 #include// for std::tuple, std::forward_as_tuple
 #include  // for __is_fast_hash
-#include  // for std::min, std::is_permutation.
+#include  // for std::min
 #include  // for std::pair
 #include // for __gnu_cxx::__aligned_buffer
 #include   // for std::__alloc_rebind
-- 
2.47.0



Re: [patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines

2024-11-07 Thread Tobias Burnus

As there were no further remarks, I have now committed it as
r15-5017-ge52cfd4bc23de1 with minor changes:

* Referring to v6.0 not TR13 (same section numbers),
* fixed one item in the 5.2 to-do list:
  'declare mapper with iterator and present modifiers' comes from Appendix B
  and we had before additionally
  '|iterator| and|mapper| as map-type modifier in|declare mapper' which duplicated 'iterator'. (The item remains, but now 
only covering 'mapper' as map-type modifier.) |

Comments and follow-up suggestions are still welcome.

See (in about 9 hours for the new version) at:
-https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-5_002e2.html
  for the 'declare_mapper' implementation status
-https://gcc.gnu.org/onlinedocs/libgomp/Runtime-Library-Routines.html
  for the interop routines

Tobias

PS: Previous patch was posted on Wed Aug 28, 2024 to
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661711.html


Re: [r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-07 Thread Jakub Jelinek
On Thu, Nov 07, 2024 at 10:54:40AM +, Andrew Stubbs wrote:
> On 07/11/2024 00:37, haochen.jiang wrote:
> > d334f729e53867b838e867375b3f475ba793d96e is the first bad commit
> > commit d334f729e53867b838e867375b3f475ba793d96e
> > Author: Andrew Stubbs 
> > Date:   Wed Nov 6 12:26:08 2024 +
> > 
> >  openmp: Add testcases for omp_max_vf
> > 
> > caused
> > 
> > FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "\\+ 16" 1
> > FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "\\* 16" 2
> > FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp 
> > "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1
> > 
> > with GCC configured with
> > 
> > ../../gcc/configure 
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-4988/usr 
> > --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet 
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> > 
> > To reproduce:
> > 
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="gomp.exp=gcc.dg/gomp/max_vf-1.c --target_board='unix{-m32\ 
> > -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="gomp.exp=gcc.dg/gomp/max_vf-1.c --target_board='unix{-m64\ 
> > -march=cascadelake}'"
> 
> This problem was supposed to be avoided by explicitly passing "-msse2" in
> the testcase. Apparently -march=cascadelake silently overrides that setting
> ... maybe don't do that?

What do yo mean by overrides?
-march=cascadelake -msse2
(or the other ordering too) certainly doesn't override the enabling of SSE2,
the -mISA and -mno-ISA flags take precedence over -march=; though other
flags can be set too from the -march= or its default set when configuring
the compiler.
So, if the testcase relies on SSE2 enabled and SSE3 not enabled, it should
use -msse2 -mno-sse3.
Seems the testcase actually relies on AVX not being enabled, so it should
use "-msse2 -mno-avx".
That will work fine even with -march=cascadelake, whether it is the default
or requested through --target_board.

Jakub



Re: [PATCH 07/10] aarch64: Add testcase for C/C++ ops on SVE ACLE types.

2024-11-07 Thread Richard Sandiford
Tejas Belagod  writes:
> This patch adds a test case to cover C/C++ operators on SVE ACLE types.  This
> does not cover all types, but covers most representative types.
>
> gcc/testsuite:
>
>   * gcc.target/aarch64/sve/acle/general/cops.c: New test.
> ---
>  .../aarch64/sve/acle/general/cops.c   | 570 ++
>  1 file changed, 570 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c
> new file mode 100644
> index 000..28da602301e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c
> @@ -0,0 +1,570 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +#include 
> +
> +#define DECL_FUNC_UNARY(type, name, op, intr, su, sz, id) \
> +  __attribute__ ((noipa)) \
> +  type func_ ## name ## type ## _unary (type a) { \
> +return op (a); \
> +  } \
> +  void checkfunc_ ## name ## type ## _unary () { \
> +type data = svindex_ ## su ## sz (0, 1); \
> +type zr = svindex_ ## su ## sz (0, 0); \
> +type one = svindex_ ## su ## sz (1, 0); \
> +type mone = svindex_ ## su ## sz (-1, 0); \
> +svbool_t pg = svptrue_b ## sz (); \
> +type exp = intr ## su ## sz ## _z (pg, id, data); \
> +type actual = func_ ## name ## type ## _unary (data); \
> +svbool_t res = svcmpeq_ ## su ## sz (pg, exp, actual); \
> +if (svptest_any (pg, svnot_b_z (pg, res))) \
> +  __builtin_abort (); \
> +  }
> +
> +#define DECL_FUNC_UNARY_FLOAT(type, name, op, intr, su, sz, id) \
> +  __attribute__ ((noipa)) \
> +  type func_ ## name ## type ## _unary (type a) { \
> +return op (a); \
> +  } \
> +  void checkfunc_ ## name ## type ## _unary () { \
> +type data = svdup_n_ ## su ## sz (2.0); \
> +type zr = svdup_n_ ## su ## sz (0.0); \
> +type one = svdup_n_ ## su ## sz (1.0); \
> +type mone = svdup_n_ ## su ## sz (-1.0); \
> +svbool_t pg = svptrue_b ## sz (); \
> +type exp = intr ## su ## sz ## _z (pg, id, data); \
> +type actual = func_ ## name ## type ## _unary (data); \
> +svbool_t res = svcmpeq_ ## su ## sz (pg, exp, actual); \
> +if (svptest_any (pg, svnot_b_z (pg, res))) \
> +  __builtin_abort (); \
> +  }
> +
> +#define DECL_FUNC_INDEX(rtype, type, intr, su, sz)  \
> +  __attribute__ ((noipa)) \
> +  rtype func_ ## rtype ## type ## _vindex (type a, int n) { \
> +return (a[n]); \
> +  } \
> +  __attribute__ ((noipa)) \
> +  rtype func_ ## rtype ## type ## _cindex (type a) { \
> +return (a[0]); \
> +  } \
> +  void checkfunc_ ## rtype ## type ## _vindex () { \
> +type a = svindex_ ## su ## sz (0, 1); \
> +int n = 2; \
> +if (2 != func_ ## rtype ## type ## _vindex (a, n)) \
> +  __builtin_abort (); \
> +  } \
> +  void checkfunc_ ## rtype ## type ## _cindex () { \
> +type a = svindex_ ## su ## sz (1, 0); \
> +if (1 != func_ ## rtype ## type ## _cindex (a)) \
> +  __builtin_abort (); \
> +  }
> +
> +#define DECL_FUNC_INDEX_FLOAT(rtype, type, intr, su, sz)  \
> +  __attribute__ ((noipa)) \
> +  rtype func_ ## rtype ## type ## _vindex (type a, int n) { \
> +return (a[n]); \
> +  } \
> +  __attribute__ ((noipa)) \
> +  rtype func_ ## rtype ## type ## _cindex (type a) { \
> +return (a[0]); \
> +  } \
> +  void checkfunc_ ## rtype ## type ## _vindex () { \
> +type a = svdup_n_ ## su ## sz (2.0); \
> +int n = 2; \
> +if (2.0 != func_ ## rtype ## type ## _vindex (a, n)) \
> +  __builtin_abort (); \
> +  } \
> +  void checkfunc_ ## rtype ## type ## _cindex () { \
> +type a = svdup_n_ ## su ## sz (4.0); \
> +if (4.0 != func_ ## rtype ## type ## _cindex (a)) \
> +  __builtin_abort (); \
> +  }
> +
> +#define DECL_FUNC_BINARY(type, name, op, intr, su, sz)  \
> +  __attribute__ ((noipa)) \
> +  type func_ ## name  ## type ## _binary(type a, type b) { \
> +return (a) op (b); \
> +  } \
> +  void checkfunc_ ## name ## type ## _binary () { \
> +type a = svindex_ ## su ## sz (0, 1); \
> +type b = svindex_ ## su ## sz (0, 2); \
> +svbool_t all_true = svptrue_b ## sz (); \
> +type exp = intr ## su ## sz ## _z (all_true, a, b); \
> +type actual = func_ ## name ## type ## _binary (a, b); \
> +svbool_t res = svcmpeq_ ## su ## sz (all_true, exp, actual); \
> +if (svptest_any (all_true, svnot_b_z (all_true, res))) \
> +  __builtin_abort (); \
> +  }
> +
> +#define DECL_FUNC_BINARY_SHIFT(type, name, op, intr, su, sz)  \
> +  __attribute__ ((noipa)) \
> +  type func_ ## name  ## type ## _binary(type a, type b) { \
> +return (a) op (b); \
> +  } \
> +  void checkfunc_ ## name ## type ## _binary () { \
> +type a = svindex_ ## su ## sz (0, 1); \
> +svuint ## sz ## _t b = svindex_u ## sz (0, 2); \
> +type c = svindex_ ## su ## sz (0, 2); \
> +svbool_t all_true = svptrue_b ## sz (); \
> +t

Re: [PATCH 06/10] rtl: Validate subreg info when optimizing vec_select.

2024-11-07 Thread Richard Sandiford
Tejas Belagod  writes:
> When optimizing for NOPs in case of overlapping regs in VEC_SELECT 
> expressions,
> validate subreg data before using simplify_subreg_regno.  There is no real
> SUBREG rtx here, but a pseudo subreg call to check if subregs are possible.
>
> gcc/ChangeLog:
>
>   * rtlanal.cc (set_noop_p): Validate subreg constraints before checking
>   for overlapping regs using simplify_subreg_regno.

OK, thanks.  I think this can go in as an independent fix.

Richard

> ---
>  gcc/rtlanal.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index cb0c0c0d719..b58401e8309 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -1686,6 +1686,7 @@ set_noop_p (const_rtx set)
>   }
>return
>   REG_CAN_CHANGE_MODE_P (REGNO (dst), GET_MODE (src0), GET_MODE (dst))
> + && validate_subreg (GET_MODE (dst), GET_MODE (src0), src0, offset)
>   && simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
> offset, GET_MODE (dst)) == (int) REGNO (dst);
>  }


Re: [PATCH 00/10] aarch64: Enable C/C++ operations on SVE ACLE types.

2024-11-07 Thread Richard Sandiford
Tejas Belagod  writes:
> Hi,
>
> This patchset enables C/C++ operations on SVE ACLE types.

I've replied to some of the individual patches, but otherwise the
AArch64 parts look good to me.

Thanks,
Richard


Re: [PATCH 04/10] gimple: Disallow sizeless types in BIT_FIELD_REFs.

2024-11-07 Thread Richard Biener
On Thu, Nov 7, 2024 at 8:25 AM Tejas Belagod  wrote:
>
> On 11/6/24 6:02 PM, Richard Biener wrote:
> > On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod  wrote:
> >>
> >> Ensure sizeless types don't end up trying to be canonicalised to 
> >> BIT_FIELD_REFs.
> >
> > You mean variable-sized?  But don't we know, when there's a constant
> > array index,
> > that the size is at least so this indexing is OK?  So what's wrong with a
> > fixed position, fixed size BIT_FIELD_REF extraction of a VLA object?
> >
> > Richard.
> >
>
> Ah! The code and comment/description don't match, sorry. This change
> started out as gating out all canonicalizations of VLA vectors when I
> had limited understanding of how this worked, but eventually was
> simplified to gate in only those offsets that were known_le, but missed
> out fixing the comment/description. So, for eg.
>
> int foo (svint32_t v) { return v[3]; }
>
> canonicalises to a BIT_FIELD_REF 
>
> but something like:
>
> int foo (svint32_t v) { return v[4]; }

So this is possibly out-of-bounds?

> reduces to a VEC_EXTRACT <>

But if out-of-bounds a VEC_EXTRACT isn't any better than a BIT_FIELD_REF, no?

> I'll fix the comment/description.
>
> Thanks,
> Tejas.
>
> >> gcc/ChangeLog:
> >>
> >>  * gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Disallow 
> >> sizeless
> >>  types in BIT_FIELD_REFs.
> >> ---
> >>   gcc/gimple-fold.cc | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> >> index c19dac0dbfd..dd45d9f7348 100644
> >> --- a/gcc/gimple-fold.cc
> >> +++ b/gcc/gimple-fold.cc
> >> @@ -6281,6 +6281,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool 
> >> is_debug = false)
> >> && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 
> >> 0
> >>   {
> >> tree vtype = TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 0));
> >> +  /* BIT_FIELD_REF can only happen on constant-size vectors.  */
> >> if (VECTOR_TYPE_P (vtype))
> >>  {
> >>tree low = array_ref_low_bound (*t);
> >> @@ -6294,7 +6295,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool 
> >> is_debug = false)
> >>   (TYPE_SIZE (TREE_TYPE (*t;
> >>widest_int ext
> >>  = wi::add (idx, wi::to_widest (TYPE_SIZE (TREE_TYPE 
> >> (*t;
> >> - if (wi::les_p (ext, wi::to_widest (TYPE_SIZE (vtype
> >> + if (known_le (ext, wi::to_poly_widest (TYPE_SIZE 
> >> (vtype
> >>  {
> >>*t = build3_loc (EXPR_LOCATION (*t), BIT_FIELD_REF,
> >> TREE_TYPE (*t),
> >> --
> >> 2.25.1
> >>
>


[PATCH 07/15] arm: [MVE intrinsics] rework vstr scatter_base

2024-11-07 Thread Christophe Lyon
Implement vstr?q_scatter_base using the new MVE builtins framework.

We need to introduce a new iterator (MVE_4) to support the set needed
by vstr?q_scatter_base (V4SI V4SF V2DI).

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_strsbs_qualifiers)
(arm_strsbu_qualifiers, arm_strsbs_p_qualifiers)
(arm_strsbu_p_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (class
vstrq_scatter_base_impl): New.
(vstrwq_scatter_base, vstrdq_scatter_base): New.
* config/arm/arm-mve-builtins-base.def (vstrwq_scatter_base)
(vstrdq_scatter_base): New.
* config/arm/arm-mve-builtins-base.h (vstrwq_scatter_base)
(vstrdq_scatter_base): New.
* config/arm/arm_mve.h (vstrwq_scatter_base): Delete.
(vstrwq_scatter_base_p): Delete.
(vstrdq_scatter_base_p): Delete.
(vstrdq_scatter_base): Delete.
(vstrwq_scatter_base_s32): Delete.
(vstrwq_scatter_base_u32): Delete.
(vstrwq_scatter_base_p_s32): Delete.
(vstrwq_scatter_base_p_u32): Delete.
(vstrdq_scatter_base_p_s64): Delete.
(vstrdq_scatter_base_p_u64): Delete.
(vstrdq_scatter_base_s64): Delete.
(vstrdq_scatter_base_u64): Delete.
(vstrwq_scatter_base_f32): Delete.
(vstrwq_scatter_base_p_f32): Delete.
(__arm_vstrwq_scatter_base_s32): Delete.
(__arm_vstrwq_scatter_base_u32): Delete.
(__arm_vstrwq_scatter_base_p_s32): Delete.
(__arm_vstrwq_scatter_base_p_u32): Delete.
(__arm_vstrdq_scatter_base_p_s64): Delete.
(__arm_vstrdq_scatter_base_p_u64): Delete.
(__arm_vstrdq_scatter_base_s64): Delete.
(__arm_vstrdq_scatter_base_u64): Delete.
(__arm_vstrwq_scatter_base_f32): Delete.
(__arm_vstrwq_scatter_base_p_f32): Delete.
(__arm_vstrwq_scatter_base): Delete.
(__arm_vstrwq_scatter_base_p): Delete.
(__arm_vstrdq_scatter_base_p): Delete.
(__arm_vstrdq_scatter_base): Delete.
* config/arm/arm_mve_builtins.def (vstrwq_scatter_base_s)
(vstrwq_scatter_base_u, vstrwq_scatter_base_p_s)
(vstrwq_scatter_base_p_u, vstrdq_scatter_base_s)
(vstrwq_scatter_base_f, vstrdq_scatter_base_p_s)
(vstrwq_scatter_base_p_f, vstrdq_scatter_base_u)
(vstrdq_scatter_base_p_u): Delete.
* config/arm/iterators.md (MVE_4): New.
(supf): Remove VSTRWQSB_S, VSTRWQSB_U.
(VSTRWSBQ): Delete.
* config/arm/mve.md (mve_vstrwq_scatter_base_v4si): Delete.
(mve_vstrwq_scatter_base_p_v4si): Delete.
(mve_vstrdq_scatter_base_p_v2di): Delete.
(mve_vstrdq_scatter_base_v2di): Delete.
(mve_vstrwq_scatter_base_fv4sf): Delete.
(mve_vstrwq_scatter_base_p_fv4sf): Delete.
(@mve_vstrq_scatter_base_): New.
(@mve_vstrq_scatter_base_p_): New.
* config/arm/unspecs.md (VSTRWQSB_S, VSTRWQSB_U, VSTRWQSB_F):
Delete.
(VSTRSBQ, VSTRSBQ_P): New.
---
 gcc/config/arm/arm-builtins.cc   |  23 ---
 gcc/config/arm/arm-mve-builtins-base.cc  |  38 +
 gcc/config/arm/arm-mve-builtins-base.def |   3 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 196 ---
 gcc/config/arm/arm_mve_builtins.def  |  10 --
 gcc/config/arm/iterators.md  |   5 +-
 gcc/config/arm/mve.md| 150 +++--
 gcc/config/arm/unspecs.md|   5 +-
 9 files changed, 72 insertions(+), 360 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 416b76dc815..15f663e2a0e 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -610,29 +610,6 @@ 
arm_quadop_unone_unone_unone_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_strsbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_unsigned, qualifier_immediate, qualifier_none};
-#define STRSBS_QUALIFIERS (arm_strsbs_qualifiers)
-
-static enum arm_type_qualifiers
-arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
-#define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
-
-static enum arm_type_qualifiers
-arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_predicate};
-#define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
-
-static enum arm_type_qualifiers
-arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_predicate};
-#define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ldrgu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qua

[PATCH 10/15] arm: [MVE intrinsics] rework vldr gather_offset

2024-11-07 Thread Christophe Lyon
Implement vldr?q_gather_offset using the new MVE builtins framework.

The patch introduces a new attribute iterator (MVE_u_elem) to
accomodate the fact that ACLE's expected output description uses "uNN"
for all modes, except V8HF where it expects ".f16".  Using "V_sz_elem"
would work, but would require to update several testcases.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class vldrq_gather_impl):
New.
(vldrbq_gather, vldrdq_gather, vldrhq_gather, vldrwq_gather): New.
* config/arm/arm-mve-builtins-base.def (vldrbq_gather)
(vldrdq_gather, vldrhq_gather, vldrwq_gather): New.
* config/arm/arm-mve-builtins-base.h (vldrbq_gather)
(vldrdq_gather, vldrhq_gather, vldrwq_gather): New.
* config/arm/arm_mve.h (vldrbq_gather_offset): Delete.
(vldrbq_gather_offset_z): Delete.
(vldrhq_gather_offset): Delete.
(vldrhq_gather_offset_z): Delete.
(vldrdq_gather_offset): Delete.
(vldrdq_gather_offset_z): Delete.
(vldrwq_gather_offset): Delete.
(vldrwq_gather_offset_z): Delete.
(vldrbq_gather_offset_u8): Delete.
(vldrbq_gather_offset_s8): Delete.
(vldrbq_gather_offset_u16): Delete.
(vldrbq_gather_offset_s16): Delete.
(vldrbq_gather_offset_u32): Delete.
(vldrbq_gather_offset_s32): Delete.
(vldrbq_gather_offset_z_s16): Delete.
(vldrbq_gather_offset_z_u8): Delete.
(vldrbq_gather_offset_z_s32): Delete.
(vldrbq_gather_offset_z_u16): Delete.
(vldrbq_gather_offset_z_u32): Delete.
(vldrbq_gather_offset_z_s8): Delete.
(vldrhq_gather_offset_s32): Delete.
(vldrhq_gather_offset_s16): Delete.
(vldrhq_gather_offset_u32): Delete.
(vldrhq_gather_offset_u16): Delete.
(vldrhq_gather_offset_z_s32): Delete.
(vldrhq_gather_offset_z_s16): Delete.
(vldrhq_gather_offset_z_u32): Delete.
(vldrhq_gather_offset_z_u16): Delete.
(vldrdq_gather_offset_s64): Delete.
(vldrdq_gather_offset_u64): Delete.
(vldrdq_gather_offset_z_s64): Delete.
(vldrdq_gather_offset_z_u64): Delete.
(vldrhq_gather_offset_f16): Delete.
(vldrhq_gather_offset_z_f16): Delete.
(vldrwq_gather_offset_f32): Delete.
(vldrwq_gather_offset_s32): Delete.
(vldrwq_gather_offset_u32): Delete.
(vldrwq_gather_offset_z_f32): Delete.
(vldrwq_gather_offset_z_s32): Delete.
(vldrwq_gather_offset_z_u32): Delete.
(__arm_vldrbq_gather_offset_u8): Delete.
(__arm_vldrbq_gather_offset_s8): Delete.
(__arm_vldrbq_gather_offset_u16): Delete.
(__arm_vldrbq_gather_offset_s16): Delete.
(__arm_vldrbq_gather_offset_u32): Delete.
(__arm_vldrbq_gather_offset_s32): Delete.
(__arm_vldrbq_gather_offset_z_s8): Delete.
(__arm_vldrbq_gather_offset_z_s32): Delete.
(__arm_vldrbq_gather_offset_z_s16): Delete.
(__arm_vldrbq_gather_offset_z_u8): Delete.
(__arm_vldrbq_gather_offset_z_u32): Delete.
(__arm_vldrbq_gather_offset_z_u16): Delete.
(__arm_vldrhq_gather_offset_s32): Delete.
(__arm_vldrhq_gather_offset_s16): Delete.
(__arm_vldrhq_gather_offset_u32): Delete.
(__arm_vldrhq_gather_offset_u16): Delete.
(__arm_vldrhq_gather_offset_z_s32): Delete.
(__arm_vldrhq_gather_offset_z_s16): Delete.
(__arm_vldrhq_gather_offset_z_u32): Delete.
(__arm_vldrhq_gather_offset_z_u16): Delete.
(__arm_vldrdq_gather_offset_s64): Delete.
(__arm_vldrdq_gather_offset_u64): Delete.
(__arm_vldrdq_gather_offset_z_s64): Delete.
(__arm_vldrdq_gather_offset_z_u64): Delete.
(__arm_vldrwq_gather_offset_s32): Delete.
(__arm_vldrwq_gather_offset_u32): Delete.
(__arm_vldrwq_gather_offset_z_s32): Delete.
(__arm_vldrwq_gather_offset_z_u32): Delete.
(__arm_vldrhq_gather_offset_f16): Delete.
(__arm_vldrhq_gather_offset_z_f16): Delete.
(__arm_vldrwq_gather_offset_f32): Delete.
(__arm_vldrwq_gather_offset_z_f32): Delete.
(__arm_vldrbq_gather_offset): Delete.
(__arm_vldrbq_gather_offset_z): Delete.
(__arm_vldrhq_gather_offset): Delete.
(__arm_vldrhq_gather_offset_z): Delete.
(__arm_vldrdq_gather_offset): Delete.
(__arm_vldrdq_gather_offset_z): Delete.
(__arm_vldrwq_gather_offset): Delete.
(__arm_vldrwq_gather_offset_z): Delete.
* config/arm/arm_mve_builtins.def (vldrbq_gather_offset_u)
(vldrbq_gather_offset_s, vldrbq_gather_offset_z_s)
(vldrbq_gather_offset_z_u, vldrhq_gather_offset_z_u)
(vldrhq_gather_offset_u, vldrhq_gather_offset_z_s)
(vldrhq_gather_offset_s, vldrdq_gather_offset_s)
(vldrhq_gather_offset_f, vldrwq_gather_offset_f)
(vldrwq_gather_offset_s, vldrdq_gather_offset_z_s)
(vldrhq_gather_offset_z_f,

[PATCH 13/15] arm: [MVE intrinsics] rework vldr gather_base

2024-11-07 Thread Christophe Lyon
Implement vldr?q_gather_base using the new MVE builtins framework.

The patch updates two testcases rather than using different iterators
for predicated and non-predicated versions. According to ACLE:
vldrdq_gather_base_s64 is expected to generate VLDRD.64
vldrdq_gather_base_z_s64 is expected to generate VLDRDT.U64

Both are equally valid, however.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_ldrgbs_qualifiers)
(arm_ldrgbu_qualifiers, arm_ldrgbs_z_qualifiers)
(arm_ldrgbu_z_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (class
vldrq_gather_base_impl): New.
(vldrdq_gather_base, vldrwq_gather_base): New.
* config/arm/arm-mve-builtins-base.def (vldrdq_gather_base)
(vldrwq_gather_base): New.
* config/arm/arm-mve-builtins-base.h: (vldrdq_gather_base)
(vldrwq_gather_base): New.
* config/arm/arm_mve.h (vldrwq_gather_base_s32): Delete.
(vldrwq_gather_base_u32): Delete.
(vldrwq_gather_base_z_u32): Delete.
(vldrwq_gather_base_z_s32): Delete.
(vldrdq_gather_base_s64): Delete.
(vldrdq_gather_base_u64): Delete.
(vldrdq_gather_base_z_s64): Delete.
(vldrdq_gather_base_z_u64): Delete.
(vldrwq_gather_base_f32): Delete.
(vldrwq_gather_base_z_f32): Delete.
(__arm_vldrwq_gather_base_s32): Delete.
(__arm_vldrwq_gather_base_u32): Delete.
(__arm_vldrwq_gather_base_z_s32): Delete.
(__arm_vldrwq_gather_base_z_u32): Delete.
(__arm_vldrdq_gather_base_s64): Delete.
(__arm_vldrdq_gather_base_u64): Delete.
(__arm_vldrdq_gather_base_z_s64): Delete.
(__arm_vldrdq_gather_base_z_u64): Delete.
(__arm_vldrwq_gather_base_f32): Delete.
(__arm_vldrwq_gather_base_z_f32): Delete.
* config/arm/arm_mve_builtins.def (vldrwq_gather_base_s)
(vldrwq_gather_base_u, vldrwq_gather_base_z_s)
(vldrwq_gather_base_z_u, vldrdq_gather_base_s)
(vldrwq_gather_base_f, vldrdq_gather_base_z_s)
(vldrwq_gather_base_z_f, vldrdq_gather_base_u)
(vldrdq_gather_base_z_u): Delete.
* config/arm/iterators.md (supf): Remove VLDRWQGB_S, VLDRWQGB_U,
VLDRDQGB_S, VLDRDQGB_U.
(VLDRWGBQ, VLDRDGBQ): Delete.
* config/arm/mve.md (mve_vldrwq_gather_base_v4si): Delete.
(mve_vldrwq_gather_base_z_v4si): Delete.
(mve_vldrdq_gather_base_v2di): Delete.
(mve_vldrdq_gather_base_z_v2di): Delete.
(mve_vldrwq_gather_base_fv4sf): Delete.
(mve_vldrwq_gather_base_z_fv4sf): Delete.
(@mve_vldrq_gather_base_): New.
(@mve_vldrq_gather_base_z_): New.
* config/arm/unspecs.md (VLDRWQGB_S, VLDRWQGB_U, VLDRDQGB_S)
(VLDRDQGB_U, VLDRWQGB_F): Delete.
(VLDRGBQ, VLDRGBQ_Z): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_s64.c: Update
expected output.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_u64.c:
Likewise.
---
 gcc/config/arm/arm-builtins.cc|  22 ---
 gcc/config/arm/arm-mve-builtins-base.cc   |  32 
 gcc/config/arm/arm-mve-builtins-base.def  |   3 +
 gcc/config/arm/arm-mve-builtins-base.h|   2 +
 gcc/config/arm/arm_mve.h  |  80 -
 gcc/config/arm/arm_mve_builtins.def   |  10 --
 gcc/config/arm/iterators.md   |   5 -
 gcc/config/arm/mve.md | 155 --
 gcc/config/arm/unspecs.md |   7 +-
 .../mve/intrinsics/vldrdq_gather_base_s64.c   |   4 +-
 .../mve/intrinsics/vldrdq_gather_base_u64.c   |   4 +-
 11 files changed, 75 insertions(+), 249 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 40056f14981..60ee12839fb 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -610,28 +610,6 @@ 
arm_quadop_unone_unone_unone_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ldrgbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_unsigned, qualifier_immediate};
-#define LDRGBS_QUALIFIERS (arm_ldrgbs_qualifiers)
-
-static enum arm_type_qualifiers
-arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
-#define LDRGBU_QUALIFIERS (arm_ldrgbu_qualifiers)
-
-static enum arm_type_qualifiers
-arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_predicate};
-#define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
-
-static enum arm_type_qualifiers
-arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_predicate};
-#define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
-

Re: [PATCH v2] Doc: Add doc for standard name mask_len_strided_load{store}m

2024-11-07 Thread Richard Biener
On Thu, Nov 7, 2024 at 2:49 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> I would like to double confirm about the doc as I am not the native speaker.
> It may be referenced by all other developers and I am not sure if there is 
> something misleading or fuzzy.
> Thanks a lot.

The docs look good to me - but I'm also not a native speaker.

Thanks,
Richard.

> Pan
>
> -Original Message-
> From: Li, Pan2 
> Sent: Wednesday, October 30, 2024 7:56 PM
> To: gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
> rdapp@gmail.com; Li, Pan2 
> Subject: [PATCH v2] Doc: Add doc for standard name 
> mask_len_strided_load{store}m
>
> From: Pan Li 
>
> This patch would like to add doc for the below 2 standard names.
>
> 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
> 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)
>
> gcc/ChangeLog:
>
> * doc/md.texi: Add doc for mask_len_stried_load{store}.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Juzhe-Zhong 
> ---
>  gcc/doc/md.texi | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6d9c8643739..25ded86f0d1 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5135,6 +5135,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with zero as base and 
> operand 2 as step.
> +For each element the load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
> 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
> result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5172,6 +5186,19 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with zero as base and 
> operand 1 as step.
> +For each element the store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + 
> operand 5) elements of
> +mask (operand 3) to memory.  Element @var{i} of the mask is set if element 
> @var{i} of (operand 3)
> +should be stored.  Mask elements @var{i} with @var{i} > (operand 4 + operand 
> 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> --
> 2.43.0
>


[PATCH 04/15] arm: [MVE intrinsics] rework vstr_scatter_shifted_offset

2024-11-07 Thread Christophe Lyon
Implement vstr?q_scatter_shifted_offset intrinsics using the MVE
builtins framework.

We use the same approach as the previous patch, and we now have four
sets of patterns:
- vector scatter stores with shifted offset (non-truncating)
- predicated vector scatter stores with shifted offset (non-truncating)
- truncating vector scatter stores with shifted offset
- predicated truncating vector scatter stores with shifted offset

Note that the truncating patterns do not use an iterator since there
is only one such variant: V4SI to V4HI.

We need to introduce new iterators:
- MVE_VLD_ST_scatter_shifted, same as MVE_VLD_ST_scatter without V16QI
- MVE_scatter_shift to map the mode to the shift amount

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_strss_qualifiers)
(arm_strsu_qualifiers, arm_strsu_p_qualifiers)
(arm_strss_p_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl):
Add support for shifted version.
(vstrdq_scatter_shifted, vstrhq_scatter_shifted)
(vstrwq_scatter_shifted): New.
* config/arm/arm-mve-builtins-base.def (vstrhq_scatter_shifted)
(vstrwq_scatter_shifted, vstrdq_scatter_shifted): New.
* config/arm/arm-mve-builtins-base.h (vstrhq_scatter_shifted)
(vstrwq_scatter_shifted, vstrdq_scatter_shifted): New.
* config/arm/arm_mve.h (vstrhq_scatter_shifted_offset): Delete.
(vstrhq_scatter_shifted_offset_p): Delete.
(vstrdq_scatter_shifted_offset_p): Delete.
(vstrdq_scatter_shifted_offset): Delete.
(vstrwq_scatter_shifted_offset_p): Delete.
(vstrwq_scatter_shifted_offset): Delete.
(vstrhq_scatter_shifted_offset_s32): Delete.
(vstrhq_scatter_shifted_offset_s16): Delete.
(vstrhq_scatter_shifted_offset_u32): Delete.
(vstrhq_scatter_shifted_offset_u16): Delete.
(vstrhq_scatter_shifted_offset_p_s32): Delete.
(vstrhq_scatter_shifted_offset_p_s16): Delete.
(vstrhq_scatter_shifted_offset_p_u32): Delete.
(vstrhq_scatter_shifted_offset_p_u16): Delete.
(vstrdq_scatter_shifted_offset_p_s64): Delete.
(vstrdq_scatter_shifted_offset_p_u64): Delete.
(vstrdq_scatter_shifted_offset_s64): Delete.
(vstrdq_scatter_shifted_offset_u64): Delete.
(vstrhq_scatter_shifted_offset_f16): Delete.
(vstrhq_scatter_shifted_offset_p_f16): Delete.
(vstrwq_scatter_shifted_offset_f32): Delete.
(vstrwq_scatter_shifted_offset_p_f32): Delete.
(vstrwq_scatter_shifted_offset_p_s32): Delete.
(vstrwq_scatter_shifted_offset_p_u32): Delete.
(vstrwq_scatter_shifted_offset_s32): Delete.
(vstrwq_scatter_shifted_offset_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_s32): Delete.
(__arm_vstrhq_scatter_shifted_offset_s16): Delete.
(__arm_vstrhq_scatter_shifted_offset_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_u16): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_s32): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_s16): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_u16): Delete.
(__arm_vstrdq_scatter_shifted_offset_p_s64): Delete.
(__arm_vstrdq_scatter_shifted_offset_p_u64): Delete.
(__arm_vstrdq_scatter_shifted_offset_s64): Delete.
(__arm_vstrdq_scatter_shifted_offset_u64): Delete.
(__arm_vstrwq_scatter_shifted_offset_p_s32): Delete.
(__arm_vstrwq_scatter_shifted_offset_p_u32): Delete.
(__arm_vstrwq_scatter_shifted_offset_s32): Delete.
(__arm_vstrwq_scatter_shifted_offset_u32): Delete.
(__arm_vstrhq_scatter_shifted_offset_f16): Delete.
(__arm_vstrhq_scatter_shifted_offset_p_f16): Delete.
(__arm_vstrwq_scatter_shifted_offset_f32): Delete.
(__arm_vstrwq_scatter_shifted_offset_p_f32): Delete.
(__arm_vstrhq_scatter_shifted_offset): Delete.
(__arm_vstrhq_scatter_shifted_offset_p): Delete.
(__arm_vstrdq_scatter_shifted_offset_p): Delete.
(__arm_vstrdq_scatter_shifted_offset): Delete.
(__arm_vstrwq_scatter_shifted_offset_p): Delete.
(__arm_vstrwq_scatter_shifted_offset): Delete.
* config/arm/arm_mve_builtins.def
(vstrhq_scatter_shifted_offset_p_u)
(vstrhq_scatter_shifted_offset_u)
(vstrhq_scatter_shifted_offset_p_s)
(vstrhq_scatter_shifted_offset_s, vstrdq_scatter_shifted_offset_s)
(vstrhq_scatter_shifted_offset_f, vstrwq_scatter_shifted_offset_f)
(vstrwq_scatter_shifted_offset_s)
(vstrdq_scatter_shifted_offset_p_s)
(vstrhq_scatter_shifted_offset_p_f)
(vstrwq_scatter_shifted_offset_p_f)
(vstrwq_scatter_shifted_offset_p_s)
(vstrdq_scatter_shifted_offset_u, vstrwq_scatter_shifted_offset_u)
(vstrdq_scatter_shifted_offset_p_u)
(vstrwq_scatter_shifted_offset_p_u): De

[PATCH 08/15] arm: [MVE intrinsics] rework vstr scatter_base_wb

2024-11-07 Thread Christophe Lyon
Implement vstr?q_scatter_base_wb using the new MVE builtins framework.

The patch introduces a new 'b' type for signatures, which
represents the type of the 'base' argument of vstr?q_scatter_base_wb.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_strsbwbs_qualifiers)
(arm_strsbwbu_qualifiers, arm_strsbwbs_p_qualifiers)
(arm_strsbwbu_p_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (vstrq_scatter_base_impl):
Add support for MODE_wb.
* config/arm/arm-mve-builtins-shapes.cc (parse_type): Add support
for 'b' type.
(store_scatter_base): Add support for MODE_wb.
* config/arm/arm-mve-builtins.cc
(function_resolver::require_pointer_to_type): New.
* config/arm/arm-mve-builtins.h
(function_resolver::require_pointer_to_type): New.
* config/arm/arm_mve.h (vstrdq_scatter_base_wb): Delete.
(vstrdq_scatter_base_wb_p): Delete.
(vstrwq_scatter_base_wb_p): Delete.
(vstrwq_scatter_base_wb): Delete.
(vstrdq_scatter_base_wb_p_s64): Delete.
(vstrdq_scatter_base_wb_p_u64): Delete.
(vstrdq_scatter_base_wb_s64): Delete.
(vstrdq_scatter_base_wb_u64): Delete.
(vstrwq_scatter_base_wb_p_s32): Delete.
(vstrwq_scatter_base_wb_p_f32): Delete.
(vstrwq_scatter_base_wb_p_u32): Delete.
(vstrwq_scatter_base_wb_s32): Delete.
(vstrwq_scatter_base_wb_u32): Delete.
(vstrwq_scatter_base_wb_f32): Delete.
(__arm_vstrdq_scatter_base_wb_s64): Delete.
(__arm_vstrdq_scatter_base_wb_u64): Delete.
(__arm_vstrdq_scatter_base_wb_p_s64): Delete.
(__arm_vstrdq_scatter_base_wb_p_u64): Delete.
(__arm_vstrwq_scatter_base_wb_p_s32): Delete.
(__arm_vstrwq_scatter_base_wb_p_u32): Delete.
(__arm_vstrwq_scatter_base_wb_s32): Delete.
(__arm_vstrwq_scatter_base_wb_u32): Delete.
(__arm_vstrwq_scatter_base_wb_f32): Delete.
(__arm_vstrwq_scatter_base_wb_p_f32): Delete.
(__arm_vstrdq_scatter_base_wb): Delete.
(__arm_vstrdq_scatter_base_wb_p): Delete.
(__arm_vstrwq_scatter_base_wb_p): Delete.
(__arm_vstrwq_scatter_base_wb): Delete.
* config/arm/arm_mve_builtins.def (vstrwq_scatter_base_wb_u)
(vstrdq_scatter_base_wb_u, vstrwq_scatter_base_wb_p_u)
(vstrdq_scatter_base_wb_p_u, vstrwq_scatter_base_wb_s)
(vstrwq_scatter_base_wb_f, vstrdq_scatter_base_wb_s)
(vstrwq_scatter_base_wb_p_s, vstrwq_scatter_base_wb_p_f)
(vstrdq_scatter_base_wb_p_s): Delete.
* config/arm/iterators.md (supf): Remove VSTRWQSBWB_S,
VSTRWQSBWB_U, VSTRDQSBWB_S, VSTRDQSBWB_U.
(VSTRDSBQ, VSTRWSBWBQ, VSTRDSBWBQ): Delete.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_v4si): Delete.
(mve_vstrwq_scatter_base_wb_p_v4si): Delete.
(mve_vstrwq_scatter_base_wb_fv4sf): Delete.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Delete.
(mve_vstrdq_scatter_base_wb_v2di): Delete.
(mve_vstrdq_scatter_base_wb_p_v2di): Delete.
(@mve_vstrq_scatter_base_wb_): New.
(@mve_vstrq_scatter_base_wb_p_): New.
* config/arm/unspecs.md (VSTRWQSBWB_S, VSTRWQSBWB_U, VSTRWQSBWB_F)
(VSTRDQSBWB_S, VSTRDQSBWB_U): Delete.
(VSTRSBWBQ, VSTRSBWBQ_P): New.
---
 gcc/config/arm/arm-builtins.cc|  22 ---
 gcc/config/arm/arm-mve-builtins-base.cc   |  42 -
 gcc/config/arm/arm-mve-builtins-shapes.cc |  36 -
 gcc/config/arm/arm-mve-builtins.cc|  25 +++
 gcc/config/arm/arm-mve-builtins.h |   1 +
 gcc/config/arm/arm_mve.h  | 187 --
 gcc/config/arm/arm_mve_builtins.def   |  10 --
 gcc/config/arm/iterators.md   |   8 +-
 gcc/config/arm/mve.md | 168 ---
 gcc/config/arm/unspecs.md |   7 +-
 10 files changed, 128 insertions(+), 378 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 15f663e2a0e..72f63b16959 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -687,28 +687,6 @@ arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
-static enum arm_type_qualifiers
-arm_strsbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_const, qualifier_none};
-#define STRSBWBS_QUALIFIERS (arm_strsbwbs_qualifiers)
-
-static enum arm_type_qualifiers
-arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_const, 
qualifier_unsigned};
-#define STRSBWBU_QUALIFIERS (arm_strsbwbu_qualifiers)
-
-static enum arm_type_qualifiers
-arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_predicate};
-#define STRSBWBS_P_QUALIFIERS (arm_strsbw

[PATCH 14/15] arm: [MVE intrinsics] rework vldr gather_base_wb

2024-11-07 Thread Christophe Lyon
Implement vldr?q_gather_base_wb using the new MVE builtins framework.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_ldrgbwbxu_qualifiers)
(arm_ldrgbwbxu_z_qualifiers, arm_ldrgbwbs_qualifiers)
(arm_ldrgbwbu_qualifiers, arm_ldrgbwbs_z_qualifiers)
(arm_ldrgbwbu_z_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (vldrq_gather_base_impl):
Add support for MODE_wb.
* config/arm/arm-mve-builtins-shapes.cc (struct
load_gather_base_def): Likewise.
* config/arm/arm_mve.h (vldrdq_gather_base_wb_s64): Delete.
(vldrdq_gather_base_wb_u64): Delete.
(vldrdq_gather_base_wb_z_s64): Delete.
(vldrdq_gather_base_wb_z_u64): Delete.
(vldrwq_gather_base_wb_f32): Delete.
(vldrwq_gather_base_wb_s32): Delete.
(vldrwq_gather_base_wb_u32): Delete.
(vldrwq_gather_base_wb_z_f32): Delete.
(vldrwq_gather_base_wb_z_s32): Delete.
(vldrwq_gather_base_wb_z_u32): Delete.
(__arm_vldrdq_gather_base_wb_s64): Delete.
(__arm_vldrdq_gather_base_wb_u64): Delete.
(__arm_vldrdq_gather_base_wb_z_s64): Delete.
(__arm_vldrdq_gather_base_wb_z_u64): Delete.
(__arm_vldrwq_gather_base_wb_s32): Delete.
(__arm_vldrwq_gather_base_wb_u32): Delete.
(__arm_vldrwq_gather_base_wb_z_s32): Delete.
(__arm_vldrwq_gather_base_wb_z_u32): Delete.
(__arm_vldrwq_gather_base_wb_f32): Delete.
(__arm_vldrwq_gather_base_wb_z_f32): Delete.
* config/arm/arm_mve_builtins.def (vldrwq_gather_base_nowb_z_u)
(vldrdq_gather_base_nowb_z_u, vldrwq_gather_base_nowb_u)
(vldrdq_gather_base_nowb_u, vldrwq_gather_base_nowb_z_s)
(vldrwq_gather_base_nowb_z_f, vldrdq_gather_base_nowb_z_s)
(vldrwq_gather_base_nowb_s, vldrwq_gather_base_nowb_f)
(vldrdq_gather_base_nowb_s, vldrdq_gather_base_wb_z_s)
(vldrdq_gather_base_wb_z_u, vldrdq_gather_base_wb_s)
(vldrdq_gather_base_wb_u, vldrwq_gather_base_wb_z_s)
(vldrwq_gather_base_wb_z_f, vldrwq_gather_base_wb_z_u)
(vldrwq_gather_base_wb_s, vldrwq_gather_base_wb_f)
(vldrwq_gather_base_wb_u): Delete
* config/arm/iterators.md (supf): Remove VLDRWQGBWB_S,
VLDRWQGBWB_U, VLDRDQGBWB_S, VLDRDQGBWB_U.
(VLDRWGBWBQ, VLDRDGBWBQ): Delete.
* config/arm/mve.md (mve_vldrwq_gather_base_wb_v4si): Delete.
(mve_vldrwq_gather_base_nowb_v4si): Delete.
(mve_vldrwq_gather_base_wb_v4si_insn): Delete.
(mve_vldrwq_gather_base_wb_z_v4si): Delete.
(mve_vldrwq_gather_base_nowb_z_v4si): Delete.
(mve_vldrwq_gather_base_wb_z_v4si_insn): Delete.
(mve_vldrwq_gather_base_wb_fv4sf): Delete.
(mve_vldrwq_gather_base_nowb_fv4sf): Delete.
(mve_vldrwq_gather_base_wb_fv4sf_insn): Delete.
(mve_vldrwq_gather_base_wb_z_fv4sf): Delete.
(mve_vldrwq_gather_base_nowb_z_fv4sf): Delete.
(mve_vldrwq_gather_base_wb_z_fv4sf_insn): Delete.
(mve_vldrdq_gather_base_wb_v2di): Delete.
(mve_vldrdq_gather_base_nowb_v2di): Delete.
(mve_vldrdq_gather_base_wb_v2di_insn): Delete.
(mve_vldrdq_gather_base_wb_z_v2di): Delete.
(mve_vldrdq_gather_base_nowb_z_v2di): Delete.
(mve_vldrdq_gather_base_wb_z_v2di_insn): Delete.
(@mve_vldrq_gather_base_wb_): New.
(@mve_vldrq_gather_base_wb_z_): New.
* config/arm/unspecs.md (VLDRWQGBWB_S, VLDRWQGBWB_U, VLDRWQGBWB_F)
(VLDRDQGBWB_S, VLDRDQGBWB_U): Delete
(VLDRGBWBQ, VLDRGBWBQ_Z): New.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
Update expected output.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
Likewise.
---
 gcc/config/arm/arm-builtins.cc|  33 --
 gcc/config/arm/arm-mve-builtins-base.cc   |  39 +-
 gcc/config/arm/arm-mve-builtins-shapes.cc |   4 +-
 gcc/config/arm/arm_mve.h  | 110 --
 gcc/config/arm/arm_mve_builtins.def   |  20 -
 gcc/config/arm/iterators.md   |   5 +-
 gcc/config/arm/mve.md | 352 ++
 gcc/config/arm/unspecs.md |   7 +-
 .../intrinsics/vldrdq_gather_base_wb_s64.c|   4 +-
 .../intrinsics/vldrdq_gather_base_wb_u64.c|   4 +-
 10 files changed, 78 insertions(+), 500 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 60ee12839fb..01bdbbf943d 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -610,39 +610,6 @@ 
arm_quadop_unone_unone_unone_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
-#de

[PATCH 06/15] arm: [MVE intrinsics] Add store_scatter_base shape

2024-11-07 Thread Christophe Lyon
This patch adds the store_scatter_base shape description.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (store_scatter_base): New.
* config/arm/arm-mve-builtins-shapes.h (store_scatter_base): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 49 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 50 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 9350805c2a2..64d4ba5d74e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1677,6 +1677,55 @@ struct store_scatter_offset_def : public store_scatter
 };
 SHAPE (store_scatter_offset)
 
+/* void vfoo[_t0](_t, const int, _t)
+
+   where  is tied to .
+has the same width as  but is of unsigned type.
+
+   Example: vstrbq_scatter_base
+   void [__arm_]vstrwq_scatter_base[_s32](uint32x4_t addr, const int offset, 
int32x4_t value)
+   void [__arm_]vstrwq_scatter_base_p[_s32](uint32x4_t addr, const int offset, 
int32x4_t value, mve_pred16_t p)  */
+struct store_scatter_base_def : public store_scatter
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "_,vu0,ss64,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (3, i, nargs)
+   || !r.require_integer_immediate (1)
+   || (type = r.infer_vector_type (2)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+type_suffix_index base_type
+  = find_type_suffix (TYPE_unsigned, type_suffixes[type].element_bits);
+
+/* Base (arg 0) should be a vector of unsigned with same width as value
+   (arg 2).  */
+if (!r.require_matching_vector_type (0, base_type))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+int multiple = c.type_suffix (0).element_bits / 8;
+int bound = 127 * multiple;
+return c.require_immediate_range_multiple (1, -bound, bound, multiple);
+  }
+};
+SHAPE (store_scatter_base)
+
 /* _t vfoo[_t0](_t, _t, _t)
 
i.e. the standard shape for ternary operations that operate on
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index be0e09755cc..1d361addd76 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -65,6 +65,7 @@ namespace arm_mve
 extern const function_shape *const load_ext;
 extern const function_shape *const mvn;
 extern const function_shape *const store;
+extern const function_shape *const store_scatter_base;
 extern const function_shape *const store_scatter_offset;
 extern const function_shape *const ternary;
 extern const function_shape *const ternary_lshift;
-- 
2.34.1



Re: [r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-07 Thread Jakub Jelinek
On Thu, Nov 07, 2024 at 11:31:17AM +, Andrew Stubbs wrote:
> Anyway, I think the attached patch should fix it. It passes on my
> configuration, but I don't have a Cascade Lake.

You could have tested with whatever you have (if it has AVX) as -march=

> OK?

Yes, thanks.

Jakub



Re: [patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines

2024-11-07 Thread Tobias Burnus
I intended – but forgot – to actually attach the committed patch. Here 
it is …


Tobias Burnus wrote:

As there were no further remarks, I have now committed it as
r15-5017-ge52cfd4bc23de1 with minor changes:

* Referring to v6.0 not TR13 (same section numbers),
* fixed one item in the 5.2 to-do list:
   'declare mapper with iterator and present modifiers' comes from Appendix B
   and we had before additionally
   '|iterator| and|mapper| as map-type modifier in|declare mapper' which duplicated 'iterator'. (The item remains, but 
now only covering 'mapper' as map-type modifier.) |

Comments and follow-up suggestions are still welcome.

See (in about 9 hours for the new version) at:
-https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-5_002e2.html
   for the 'declare_mapper' implementation status
-https://gcc.gnu.org/onlinedocs/libgomp/Runtime-Library-Routines.html
   for the interop routines

Tobias

PS: Previous patch was posted on Wed Aug 28, 2024 to
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661711.htmlcommit e52cfd4bc23de14f1e1795bdf7ec161d94b8c087
Author: Tobias Burnus 
Date:   Thu Nov 7 16:13:06 2024 +0100

libgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

* libgomp.texi (OpenMP Technical Report 13): Remove 'iterator'
in 'map' clause of 'declare mapper' as it is already the list above.
(Interoperability Routines): Add.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Document that depobj_list may be omitted in C++ and Fortran.
---
 libgomp/libgomp.texi | 333 +++
 1 file changed, 312 insertions(+), 21 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 6860963f368..6679f6da4b9 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -443,8 +443,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
   of the @code{interop} construct @tab N @tab
 @item Invoke virtual member functions of C++ objects created on the host device
   on other devices @tab N @tab
-@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare mapper}
-  @tab N @tab
+@item @code{mapper} as map-type modifier in @code{declare mapper} @tab N @tab
 @end multitable
 
 
@@ -668,7 +667,7 @@ specification in version 5.2.
 * Lock Routines::
 * Timing Routines::
 * Event Routine::
-@c * Interoperability Routines::
+* Interoperability Routines::
 * Memory Management Routines::
 @c * Tool Control Routine::
 * Environment Display Routine::
@@ -2211,8 +2210,9 @@ to the destination device's @var{dst} address shifted by @var{dst_offset}.
 Task dependence is expressed by passing an array of depend objects to
 @var{depobj_list}, where the number of array elements is passed as
 @var{depobj_count}; if the count is zero, the @var{depobj_list} argument is
-ignored.  The routine returns zero if the copying process has successfully
-been started and non-zero otherwise.
+ignored.  In C++ and Fortran, the @var{depobj_list} argument can also be
+omitted in that case.   The routine returns zero if the copying process has
+successfully been started and non-zero otherwise.
 
 Running this routine in a @code{target} region except on the initial device
 is not supported.
@@ -2332,7 +2332,8 @@ respectively.  The offset per dimension to the first element to be copied is
 given by the @var{dst_offset} and @var{src_offset} arguments.  Task dependence
 is expressed by passing an array of depend objects to @var{depobj_list}, where
 the number of array elements is passed as @var{depobj_count}; if the count is
-zero, the @var{depobj_list} argument is ignored.  The routine
+zero, the @var{depobj_list} argument is ignored.  In C++ and Fortran, the
+@var{depobj_list} argument can also be omitted in that case.  The routine
 returns zero on success and non-zero otherwise.
 
 The OpenMP specification only requires that @var{num_dims} up to three is
@@ -2961,21 +2962,311 @@ event handle that has already been fulfilled is also undefined.
 
 
 
-@c @node Interoperability Routines
-@c @section Interoperability Routines
-@c
-@c Routines to obtain properties from an @code{omp_interop_t} object.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_get_num_interop_properties:: 
-@c * omp_get_interop_int:: 
-@c * omp_get_interop_ptr:: 
-@c * omp_get_interop_str:: 
-@c * omp_get_interop_name:: 
-@c * omp_get_interop_type_desc:: 
-@c * omp_get_interop_rc_desc:: 
-@c @end menu
+@node Interoperability Routines
+@section Interoperability Routines
+
+Routines to obtain properties from an object of OpenMP interop type.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_get_num_interop_properties:: Get the number of implementation-specific properties
+* omp_get_interop_int:: Obtain integer-valued interoperability property
+* omp_get_interop_ptr:: Obtain pointer-valued interoperability property
+* omp_get

[PATCH] rs6000: Add PowerPC inline asm redzone clobber support

2024-11-07 Thread Jakub Jelinek
Hi!

The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667949.html
patch adds rs6000 part of the support (the only other target I'm aware of
which clearly has red zone as well).

2024-11-07  Jakub Jelinek  

* config/rs6000/rs6000.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Force
info->push_p if cfun->machine->asm_redzone_clobber_seen.
* config/rs6000/rs6000.cc (TARGET_REDZONE_CLOBBER): Redefine.
(rs6000_redzone_clobber): New function.

* gcc.target/powerpc/asm-redzone-1.c: New test.

--- gcc/config/rs6000/rs6000.h.jj   2024-08-30 09:09:45.407624634 +0200
+++ gcc/config/rs6000/rs6000.h  2024-11-07 12:25:44.979466003 +0100
@@ -2424,6 +2424,7 @@ typedef struct GTY(()) machine_function
  global entry.  It helps to control the patchable area before and after
  local entry.  */
   bool global_entry_emitted;
+  bool asm_redzone_clobber_seen;
 } machine_function;
 #endif
 
--- gcc/config/rs6000/rs6000-logue.cc.jj2024-10-25 10:00:29.389768987 
+0200
+++ gcc/config/rs6000/rs6000-logue.cc   2024-11-07 12:36:05.985688899 +0100
@@ -918,7 +918,7 @@ rs6000_stack_info (void)
   else if (DEFAULT_ABI == ABI_V4)
 info->push_p = non_fixed_size != 0;
 
-  else if (frame_pointer_needed)
+  else if (frame_pointer_needed || cfun->machine->asm_redzone_clobber_seen)
 info->push_p = 1;
 
   else
--- gcc/config/rs6000/rs6000.cc.jj  2024-10-25 10:00:29.393768930 +0200
+++ gcc/config/rs6000/rs6000.cc 2024-11-07 12:34:21.679163134 +0100
@@ -1752,6 +1752,9 @@ static const scoped_attribute_specs *con
 #undef TARGET_CAN_CHANGE_MODE_CLASS
 #define TARGET_CAN_CHANGE_MODE_CLASS rs6000_can_change_mode_class
 
+#undef TARGET_REDZONE_CLOBBER
+#define TARGET_REDZONE_CLOBBER rs6000_redzone_clobber
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT rs6000_constant_alignment
 
@@ -13727,6 +13730,24 @@ rs6000_can_change_mode_class (machine_mo
   return true;
 }
 
+/* Implement TARGET_REDZONE_CLOBBER.  */
+
+static rtx
+rs6000_redzone_clobber ()
+{
+  cfun->machine->asm_redzone_clobber_seen = true;
+  if (DEFAULT_ABI != ABI_V4)
+{
+  int red_zone_size = TARGET_32BIT ? 220 : 288;
+  rtx base = plus_constant (Pmode, stack_pointer_rtx,
+   GEN_INT (-red_zone_size));
+  rtx mem = gen_rtx_MEM (BLKmode, base);
+  set_mem_size (mem, red_zone_size);
+  return mem;
+}
+  return NULL_RTX;
+}
+
 /* Debug version of rs6000_can_change_mode_class.  */
 static bool
 rs6000_debug_can_change_mode_class (machine_mode from,
--- gcc/testsuite/gcc.target/powerpc/asm-redzone-1.c.jj 2024-11-07 
13:01:36.935064863 +0100
+++ gcc/testsuite/gcc.target/powerpc/asm-redzone-1.c2024-11-07 
13:01:31.449142367 +0100
@@ -0,0 +1,71 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-options "-O2" } */
+
+__attribute__((noipa)) int
+foo (void)
+{
+  int a = 1;
+  int b = 2;
+  int c = 3;
+  int d = 4;
+  int e = 5;
+  int f = 6;
+  int g = 7;
+  int h = 8;
+  int i = 9;
+  int j = 10;
+  int k = 11;
+  int l = 12;
+  int m = 13;
+  int n = 14;
+  int o = 15;
+  int p = 16;
+  int q = 17;
+  int r = 18;
+  int s = 19;
+  int t = 20;
+  int u = 21;
+  int v = 22;
+  int w = 23;
+  int x = 24;
+  int y = 25;
+  int z = 26;
+  asm volatile ("" : "+g" (a), "+g" (b), "+g" (c), "+g" (d), "+g" (e));
+  asm volatile ("" : "+g" (f), "+g" (g), "+g" (h), "+g" (i), "+g" (j));
+  asm volatile ("" : "+g" (k), "+g" (l), "+g" (m), "+g" (n), "+g" (o));
+  asm volatile ("" : "+g" (k), "+g" (l), "+g" (m), "+g" (n), "+g" (o));
+  asm volatile ("" : "+g" (p), "+g" (q), "+g" (s), "+g" (t), "+g" (u));
+  asm volatile ("" : "+g" (v), "+g" (w), "+g" (y), "+g" (z));
+#ifdef __PPC64__
+  asm volatile ("std 1,-8(1); std 1,-16(1); std 1,-24(1); std 1,-32(1)"
+   : : : "18", "19", "20", "redzone");
+#elif defined(_AIX)
+  asm volatile ("stw 1,-4(1); stw 1,-8(1); stw 1,-12(1); stw 1,-16(1)"
+   : : : "18", "19", "20", "redzone");
+#endif
+  asm volatile ("" : "+g" (a), "+g" (b), "+g" (c), "+g" (d), "+g" (e));
+  asm volatile ("" : "+g" (f), "+g" (g), "+g" (h), "+g" (i), "+g" (j));
+  asm volatile ("" : "+g" (k), "+g" (l), "+g" (m), "+g" (n), "+g" (o));
+  asm volatile ("" : "+g" (p), "+g" (q), "+g" (s), "+g" (t), "+g" (u));
+  asm volatile ("" : "+g" (v), "+g" (w), "+g" (y), "+g" (z));
+  return a + b + c + d + e + f + g + h + i + j + k + l + m + n;
+}
+
+__attribute__((noipa)) void
+bar (char *p, long *q)
+{
+  (void) p;
+  *q = 42;
+}
+
+int
+main ()
+{
+  volatile int x = 256;
+  long y;
+  bar (__builtin_alloca (x), &y);
+  if (foo () != 105)
+__builtin_abort ();
+  if (y != 42)
+__builtin_abort ();
+}

Jakub



Re: [PATCH] Optimize incoming integer argument promotion

2024-11-07 Thread Richard Biener
On Thu, Nov 7, 2024 at 5:50 AM H.J. Lu  wrote:
>
> On Wed, Nov 6, 2024 at 6:01 PM Richard Biener
>  wrote:
> >
> > On Wed, Nov 6, 2024 at 10:52 AM H.J. Lu  wrote:
> > >
> > > On Wed, Nov 6, 2024 at 4:29 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 11/4/24 8:13 PM, H.J. Lu wrote:
> > > > > > > > > On Tue, Nov 5, 2024 at 10:57 AM Jeff Law 
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On 11/4/24 7:52 PM, H.J. Lu wrote:
> > > > > > > > >>> On Tue, Nov 5, 2024 at 8:48 AM Jeff Law 
> > > > > > > > >>>  wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > >  On 11/4/24 5:42 PM, H.J. Lu wrote:
> > > > > > > > > On Tue, Nov 5, 2024 at 8:07 AM Jeff Law 
> > > > > > > > >  wrote:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On 11/1/24 4:32 PM, H.J. Lu wrote:
> > > > > > > > >>> For targets, like x86, which define 
> > > > > > > > >>> TARGET_PROMOTE_PROTOTYPES to return
> > > > > > > > >>> true, all integer arguments smaller than int are passed 
> > > > > > > > >>> as int:
> > > > > > > > >>>
> > > > > > > > >>> [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > > > >>> extern int baz (char c1);
> > > > > > > > >>>
> > > > > > > > >>> int
> > > > > > > > >>> foo (char c1)
> > > > > > > > >>> {
> > > > > > > > >>>   return baz (c1);
> > > > > > > > >>> }
> > > > > > > > >>> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > > > >>> [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > > > >>>  .file   "x.c"
> > > > > > > > >>>  .text
> > > > > > > > >>>  .p2align 4
> > > > > > > > >>>  .globl  foo
> > > > > > > > >>>  .type   foo, @function
> > > > > > > > >>> foo:
> > > > > > > > >>> .LFB0:
> > > > > > > > >>>  .cfi_startproc
> > > > > > > > >>>  movsbl  4(%esp), %eax
> > > > > > > > >>>  movl%eax, 4(%esp)
> > > > > > > > >>>  jmp baz
> > > > > > > > >>>  .cfi_endproc
> > > > > > > > >>> .LFE0:
> > > > > > > > >>>  .size   foo, .-foo
> > > > > > > > >>>  .ident  "GCC: (GNU) 14.2.1 20240912 (Red Hat 
> > > > > > > > >>> 14.2.1-3)"
> > > > > > > > >>>  .section.note.GNU-stack,"",@progbits
> > > > > > > > >>> [hjl@gnu-tgl-3 pr14907]$
> > > > > > > > >>>
> > > > > > > > >>> But integer promotion:
> > > > > > > > >>>
> > > > > > > > >>>  movsbl  4(%esp), %eax
> > > > > > > > >>>  movl%eax, 4(%esp)
> > > > > > > > >>>
> > > > > > > > >>> isn't necessary if incoming arguments and outgoing 
> > > > > > > > >>> arguments are the
> > > > > > > > >>> same.  Use unpromoted incoming integer arguments as 
> > > > > > > > >>> outgoing arguments
> > > > > > > > >>> if incoming integer arguments are the same as outgoing 
> > > > > > > > >>> arguments to
> > > > > > > > >>> avoid unnecessary integer promotion.
> > > > > > > > >> Is there a particular reason x86 can't use the same 
> > > > > > > > >> mechanisms that
> > > > > > > > >
> > > > > > > > > Other targets define TARGET_PROMOTE_PROTOTYPES to return 
> > > > > > > > > false
> > > > > > > > > to avoid this issue.   Changing x86 
> > > > > > > > > TARGET_PROMOTE_PROTOTYPES
> > > > > > > > > to return false will break LLVM which assumes that 
> > > > > > > > > incoming char/short
> > > > > > > > > arguments on x86 are always extended to int.   The 
> > > > > > > > > following targets
> > > > > > > >  Then my suggestion would be to cover this in REE somehow.  
> > > > > > > >  We started
> > > > > > > >  looking at that a couple years ago and set it aside.   But 
> > > > > > > >  the basic
> > > > > > > >  idea was to expose the ABI guarantees to REE, then let REE 
> > > > > > > >  do its thing.
> > > > > > > > 
> > > > > > > >  Jeff
> > > > > > > > 
> > > > > > > > >>>
> > > > > > > > >>> For
> > > > > > > > >>>
> > > > > > > > >>> extern int baz (char c1);
> > > > > > > > >>>
> > > > > > > > >>> int
> > > > > > > > >>> foo (char c1)
> > > > > > > > >>> {
> > > > > > > > >>> return baz (c1);
> > > > > > > > >>> }
> > > > > > > > >>>
> > > > > > > > >>> on i386, we get these
> > > > > > > > >>>
> > > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102])
> > > > > > > > >>>   (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp)
> > > > > > > > >>> 

[committed] libstdc++: Fix typo in comment in hashtable.h

2024-11-07 Thread Jonathan Wakely
And tweak grammar in a couple of comments.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Fix spelling in comment.
---
Pushed as obvious.

 libstdc++-v3/include/bits/hashtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 47321a9cb13..8b312d25d7a 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -344,7 +344,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   struct __hash_code_base_access : __hash_code_base
   { using __hash_code_base::_M_bucket_index; };
 
-  // To get bucket index we need _RangeHash not to throw.
+  // To get bucket index we need _RangeHash to be non-throwing.
   static_assert(is_nothrow_default_constructible<_RangeHash>::value,
"Functor used to map hash code to bucket index"
" must be nothrow default constructible");
@@ -353,7 +353,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"Functor used to map hash code to bucket index must be"
" noexcept");
 
-  // To compute bucket index we also need _ExtratKey not to throw.
+  // To compute bucket index we also need _ExtractKey be non-throwing.
   static_assert(is_nothrow_default_constructible<_ExtractKey>::value,
"_ExtractKey must be nothrow default constructible");
   static_assert(noexcept(
-- 
2.47.0



[PATCH 11/15] arm: [MVE intrinsics] rework vldr gather_shifted_offset

2024-11-07 Thread Christophe Lyon
Implement vldr?q_gather_shifted_offset using the new MVE builtins
framework.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (arm_ldrgu_qualifiers)
(arm_ldrgs_qualifiers, arm_ldrgs_z_qualifiers)
(arm_ldrgu_z_qualifiers): Delete.
* config/arm/arm-mve-builtins-base.cc (vldrq_gather_impl): Add
support for shifted version.
(vldrdq_gather_shifted, vldrhq_gather_shifted)
(vldrwq_gather_shifted): New.
* config/arm/arm-mve-builtins-base.def (vldrdq_gather_shifted)
(vldrhq_gather_shifted, vldrwq_gather_shifted): New.
* config/arm/arm-mve-builtins-base.h (vldrdq_gather_shifted)
(vldrhq_gather_shifted, vldrwq_gather_shifted): New.
* config/arm/arm_mve.h (vldrhq_gather_shifted_offset): Delete.
(vldrhq_gather_shifted_offset_z): Delete.
(vldrdq_gather_shifted_offset): Delete.
(vldrdq_gather_shifted_offset_z): Delete.
(vldrwq_gather_shifted_offset): Delete.
(vldrwq_gather_shifted_offset_z): Delete.
(vldrhq_gather_shifted_offset_s32): Delete.
(vldrhq_gather_shifted_offset_s16): Delete.
(vldrhq_gather_shifted_offset_u32): Delete.
(vldrhq_gather_shifted_offset_u16): Delete.
(vldrhq_gather_shifted_offset_z_s32): Delete.
(vldrhq_gather_shifted_offset_z_s16): Delete.
(vldrhq_gather_shifted_offset_z_u32): Delete.
(vldrhq_gather_shifted_offset_z_u16): Delete.
(vldrdq_gather_shifted_offset_s64): Delete.
(vldrdq_gather_shifted_offset_u64): Delete.
(vldrdq_gather_shifted_offset_z_s64): Delete.
(vldrdq_gather_shifted_offset_z_u64): Delete.
(vldrhq_gather_shifted_offset_f16): Delete.
(vldrhq_gather_shifted_offset_z_f16): Delete.
(vldrwq_gather_shifted_offset_f32): Delete.
(vldrwq_gather_shifted_offset_s32): Delete.
(vldrwq_gather_shifted_offset_u32): Delete.
(vldrwq_gather_shifted_offset_z_f32): Delete.
(vldrwq_gather_shifted_offset_z_s32): Delete.
(vldrwq_gather_shifted_offset_z_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_s32): Delete.
(__arm_vldrhq_gather_shifted_offset_s16): Delete.
(__arm_vldrhq_gather_shifted_offset_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_u16): Delete.
(__arm_vldrhq_gather_shifted_offset_z_s32): Delete.
(__arm_vldrhq_gather_shifted_offset_z_s16): Delete.
(__arm_vldrhq_gather_shifted_offset_z_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_z_u16): Delete.
(__arm_vldrdq_gather_shifted_offset_s64): Delete.
(__arm_vldrdq_gather_shifted_offset_u64): Delete.
(__arm_vldrdq_gather_shifted_offset_z_s64): Delete.
(__arm_vldrdq_gather_shifted_offset_z_u64): Delete.
(__arm_vldrwq_gather_shifted_offset_s32): Delete.
(__arm_vldrwq_gather_shifted_offset_u32): Delete.
(__arm_vldrwq_gather_shifted_offset_z_s32): Delete.
(__arm_vldrwq_gather_shifted_offset_z_u32): Delete.
(__arm_vldrhq_gather_shifted_offset_f16): Delete.
(__arm_vldrhq_gather_shifted_offset_z_f16): Delete.
(__arm_vldrwq_gather_shifted_offset_f32): Delete.
(__arm_vldrwq_gather_shifted_offset_z_f32): Delete.
(__arm_vldrhq_gather_shifted_offset): Delete.
(__arm_vldrhq_gather_shifted_offset_z): Delete.
(__arm_vldrdq_gather_shifted_offset): Delete.
(__arm_vldrdq_gather_shifted_offset_z): Delete.
(__arm_vldrwq_gather_shifted_offset): Delete.
(__arm_vldrwq_gather_shifted_offset_z): Delete.
* config/arm/arm_mve_builtins.def
(vldrhq_gather_shifted_offset_z_u, vldrhq_gather_shifted_offset_u)
(vldrhq_gather_shifted_offset_z_s, vldrhq_gather_shifted_offset_s)
(vldrdq_gather_shifted_offset_s, vldrhq_gather_shifted_offset_f)
(vldrwq_gather_shifted_offset_f, vldrwq_gather_shifted_offset_s)
(vldrdq_gather_shifted_offset_z_s)
(vldrhq_gather_shifted_offset_z_f)
(vldrwq_gather_shifted_offset_z_f)
(vldrwq_gather_shifted_offset_z_s, vldrdq_gather_shifted_offset_u)
(vldrwq_gather_shifted_offset_u, vldrdq_gather_shifted_offset_z_u)
(vldrwq_gather_shifted_offset_z_u): Delete.
* config/arm/iterators.md (supf): Remove VLDRHQGSO_S, VLDRHQGSO_U,
VLDRDQGSO_S, VLDRDQGSO_U, VLDRWQGSO_S, VLDRWQGSO_U.
(VLDRHGSOQ, VLDRDGSOQ, VLDRWGSOQ): Delete.
* config/arm/mve.md
(mve_vldrhq_gather_shifted_offset_): Delete.
(mve_vldrhq_gather_shifted_offset_z_): Delete.
(mve_vldrdq_gather_shifted_offset_v2di): Delete.
(mve_vldrdq_gather_shifted_offset_z_v2di): Delete.
(mve_vldrhq_gather_shifted_offset_fv8hf): Delete.
(mve_vldrhq_gather_shifted_offset_z_fv8hf): Delete.
(mve_vldrwq_gather_shifted_offset_fv4sf): Delete.
(mve_vldrwq_gather_shifted_offset_v4si): Delete.
(mve_vldrwq_gather_shifted_offset_z_fv4sf): Delete

[PATCH 01/15] arm: [MVE intrinsics] add mode_after_pred helper in function_shape

2024-11-07 Thread Christophe Lyon
This new helper returns true if the mode suffix goes after the
predicate suffix.  This is true in most cases, so the base
implementations in nonoverloaded_base and overloaded_base return true.
For instance: vaddq_m_n_s32.

This will be useful in later patches to implement
vstr?q_scatter_offset_p (_p appears after _offset).

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct
nonoverloaded_base): Implement mode_after_pred.
(struct overloaded_base): Likewise.
* config/arm/arm-mve-builtins.cc (function_builder::get_name):
Call mode_after_pred as needed.
* config/arm/arm-mve-builtins.h (function_shape): Add
mode_after_pred.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 12 
 gcc/config/arm/arm-mve-builtins.cc|  9 -
 gcc/config/arm/arm-mve-builtins.h |  1 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 12e62122ae4..b5bd03a465b 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -387,6 +387,12 @@ struct nonoverloaded_base : public function_shape
 return false;
   }
 
+  bool
+  mode_after_pred () const override
+  {
+return true;
+  }
+
   tree
   resolve (function_resolver &) const override
   {
@@ -417,6 +423,12 @@ struct overloaded_base : public function_shape
   {
 return false;
   }
+
+  bool
+  mode_after_pred () const override
+  {
+return true;
+  }
 };
 
 /* _t vfoo[_t0](_t, _t)
diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index af1908691b6..4af32b5faa2 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -880,11 +880,18 @@ function_builder::get_name (const function_instance 
&instance,
   if (preserve_user_namespace)
 append_name ("__arm_");
   append_name (instance.base_name);
-  append_name (pred_suffixes[instance.pred]);
+
+  if (instance.shape->mode_after_pred ())
+append_name (pred_suffixes[instance.pred]);
+
   if (!overloaded_p
   || instance.shape->explicit_mode_suffix_p (instance.pred,
 instance.mode_suffix_id))
 append_name (instance.mode_suffix ().string);
+
+  if (!instance.shape->mode_after_pred ())
+append_name (pred_suffixes[instance.pred]);
+
   for (unsigned int i = 0; i < 2; ++i)
 if (!overloaded_p
|| instance.shape->explicit_type_suffix_p (i, instance.pred,
diff --git a/gcc/config/arm/arm-mve-builtins.h 
b/gcc/config/arm/arm-mve-builtins.h
index 2e48d91d5aa..3e0796f7c09 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -580,6 +580,7 @@ public:
   enum mode_suffix_index) const = 0;
   virtual bool skip_overload_p (enum predication_index,
enum mode_suffix_index) const = 0;
+  virtual bool mode_after_pred () const = 0;
 
   /* Define all functions associated with the given group.  */
   virtual void build (function_builder &,
-- 
2.34.1



[PATCH] rtl-optimization/117467 - 33% compile-time in rest of compilation

2024-11-07 Thread Richard Biener
ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time.
The following adds a timevar to it for proper blaming.

Bootstrap running on x86_64-unknown-linux-gnu.

PR rtl-optimization/117467
* timevar.def (TV_EXT_DCE): New.
* ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.
---
 gcc/ext-dce.cc  | 2 +-
 gcc/timevar.def | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index a449b9f6b49..0ece37726c7 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -1103,7 +1103,7 @@ const pass_data pass_data_ext_dce =
   RTL_PASS, /* type */
   "ext_dce", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_EXT_DCE, /* tv_id */
   PROP_cfglayout, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 0f9d2c0b032..ae80a311a2d 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -313,6 +313,7 @@ DEFTIMEVAR (TV_INITIALIZE_RTL, "initialize rtl")
 DEFTIMEVAR (TV_GIMPLE_LADDRESS   , "address lowering")
 DEFTIMEVAR (TV_TREE_LOOP_IFCVT   , "tree loop if-conversion")
 DEFTIMEVAR (TV_WARN_ACCESS   , "access analysis")
+DEFTIMEVAR (TV_EXT_DCE   , "ext dce")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL  , "early local passes")
-- 
2.43.0


[PATCH 05/15] arm: [MVE intrinsics] Check immediate is a multiple in a range

2024-11-07 Thread Christophe Lyon
This patch adds support to check that an immediate is a multiple of a
given value in a given range.

This will be used for instance by scatter_base to check that offset is
in +/-4*[0..127].

Unlike require_immediate_range, require_immediate_range_multiple
accepts signed range bounds to handle the above case.

gcc/ChangeLog:

* config/arm/arm-mve-builtins.cc (report_out_of_range_multiple):
New.
(function_checker::require_signed_immediate): New.
(function_checker::require_immediate_range_multiple): New.
* config/arm/arm-mve-builtins.h
(function_checker::require_immediate_range_multiple): New.
(function_checker::require_signed_immediate): New.
---
 gcc/config/arm/arm-mve-builtins.cc | 60 ++
 gcc/config/arm/arm-mve-builtins.h  |  3 ++
 2 files changed, 63 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 7b88ca6cce5..3b280228e66 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -633,6 +633,20 @@ report_out_of_range (location_t location, tree fndecl, 
unsigned int argno,
min, max);
 }
 
+/* Report that LOCATION has a call to FNDECL in which argument ARGNO has the
+   value ACTUAL, whereas the function requires a value multiple of MULT in the
+   range [MIN, MAX].  ARGNO counts from zero.  */
+static void
+report_out_of_range_multiple (location_t location, tree fndecl,
+ unsigned int argno,
+ HOST_WIDE_INT actual, HOST_WIDE_INT min,
+ HOST_WIDE_INT max, HOST_WIDE_INT mult)
+{
+  error_at (location, "passing %wd to argument %d of %qE, which expects"
+   " a value multiple of %wd in the range [%wd, %wd]", actual,
+   argno + 1, fndecl, mult, min, max);
+}
+
 /* Report that LOCATION has a call to FNDECL in which argument ARGNO has
the value ACTUAL, whereas the function requires a valid value of
enum type ENUMTYPE.  ARGNO counts from zero.  */
@@ -1977,6 +1991,26 @@ function_checker::require_immediate (unsigned int argno,
   return true;
 }
 
+/* Check that argument ARGNO is a signed integer constant expression and store
+   its value in VALUE_OUT if so.  The caller should first check that argument
+   ARGNO exists.  */
+bool
+function_checker::require_signed_immediate (unsigned int argno,
+   HOST_WIDE_INT &value_out)
+{
+  gcc_assert (argno < m_nargs);
+  tree arg = m_args[argno];
+
+  if (!tree_fits_shwi_p (arg))
+{
+  report_non_ice (location, fndecl, argno);
+  return false;
+}
+
+  value_out = tree_to_shwi (arg);
+  return true;
+}
+
 /* Check that argument REL_ARGNO is an integer constant expression that has
a valid value for enumeration type TYPE.  REL_ARGNO counts from the end
of the predication arguments.  */
@@ -2064,6 +2098,32 @@ function_checker::require_immediate_range (unsigned int 
rel_argno,
   return true;
 }
 
+/* Check that argument REL_ARGNO is a signed integer constant expression in the
+   range [MIN, MAX].  Also check that REL_ARGNO is a multiple of MULT.  */
+bool
+function_checker::require_immediate_range_multiple (unsigned int rel_argno,
+   HOST_WIDE_INT min,
+   HOST_WIDE_INT max,
+   HOST_WIDE_INT mult)
+{
+  unsigned int argno = m_base_arg + rel_argno;
+  if (!argument_exists_p (argno))
+return true;
+
+  HOST_WIDE_INT actual;
+  if (!require_signed_immediate (argno, actual))
+return false;
+
+  if (!IN_RANGE (actual, min, max)
+  || (actual % mult) != 0)
+{
+  report_out_of_range_multiple (location, fndecl, argno, actual, min, max, 
mult);
+  return false;
+}
+
+  return true;
+}
+
 /* Perform semantic checks on the call.  Return true if the call is valid,
otherwise report a suitable error.  */
 bool
diff --git a/gcc/config/arm/arm-mve-builtins.h 
b/gcc/config/arm/arm-mve-builtins.h
index 3e0796f7c09..5a191b0cde3 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -436,6 +436,8 @@ public:
   bool require_immediate_one_of (unsigned int, HOST_WIDE_INT, HOST_WIDE_INT,
 HOST_WIDE_INT, HOST_WIDE_INT);
   bool require_immediate_range (unsigned int, HOST_WIDE_INT, HOST_WIDE_INT);
+  bool require_immediate_range_multiple (unsigned int, HOST_WIDE_INT,
+HOST_WIDE_INT, HOST_WIDE_INT);
 
   bool check ();
 
@@ -443,6 +445,7 @@ private:
   bool argument_exists_p (unsigned int);
 
   bool require_immediate (unsigned int, HOST_WIDE_INT &);
+  bool require_signed_immediate (unsigned int, HOST_WIDE_INT &);
 
   /* The type of the resolved function.  */
   tree m_fntype;
-- 
2.34.1



[PATCH 02/15] arm: [MVE intrinsics] add store_scatter_offset shape

2024-11-07 Thread Christophe Lyon
This patch adds the store_scatter_offset shape and uses a new helper
class (store_scatter), which will also be used by later patches.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-shapes.cc (struct store_scatter): New.
(struct store_scatter_offset_def): New.
* config/arm/arm-mve-builtins-shapes.h (store_scatter_offset): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 64 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 65 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index b5bd03a465b..9350805c2a2 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1613,6 +1613,70 @@ struct store_def : public overloaded_base<0>
 };
 SHAPE (store)
 
+/* Base class for store_scatter_offset and store_scatter_shifted_offset, which
+   differ only in the units of the displacement.  Also used by
+   store_scatter_base.  */
+struct store_scatter : public overloaded_base<0>
+{
+  bool
+  explicit_mode_suffix_p (enum predication_index, enum mode_suffix_index) 
const override
+  {
+return true;
+  }
+
+  bool
+  mode_after_pred () const override
+  {
+return false;
+  }
+};
+
+/* void vfoo[_t0](_t *, _t, _t)
+
+   where  might be tied to  (for non-truncating stores) or might
+   depend on the function base name (for truncating stores),
+has the same width as  but is of unsigned type.
+
+   Example: vstrbq_scatter_offset
+   void [__arm_]vstrbq_scatter_offset[_s16](int8_t *base, uint16x8_t offset, 
int16x8_t value)
+   void [__arm_]vstrbq_scatter_offset_p[_s16](int8_t *base, uint16x8_t offset, 
int16x8_t value, mve_pred16_t p)  */
+struct store_scatter_offset_def : public store_scatter
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_offset, preserve_user_namespace);
+build_all (b, "_,as,vu0,v0", group, MODE_offset, preserve_user_namespace);
+  }
+
+  /* Resolve a scatter store that takes a scalar pointer base and a vector
+ displacement.
+
+ The stored data is the final argument, and it determines the
+ type suffix.  */
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (3, i, nargs)
+   || !r.require_pointer_type (0)
+   || (type = r.infer_vector_type (2)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+/* Offset (arg 1) should be a vector of unsigned with same width as value
+   (arg 2).  */
+type_suffix_index offset_type
+  = find_type_suffix (TYPE_unsigned, type_suffixes[type].element_bits);
+if (!r.require_matching_vector_type (1, offset_type))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (store_scatter_offset)
+
 /* _t vfoo[_t0](_t, _t, _t)
 
i.e. the standard shape for ternary operations that operate on
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index db7c6311728..be0e09755cc 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -65,6 +65,7 @@ namespace arm_mve
 extern const function_shape *const load_ext;
 extern const function_shape *const mvn;
 extern const function_shape *const store;
+extern const function_shape *const store_scatter_offset;
 extern const function_shape *const ternary;
 extern const function_shape *const ternary_lshift;
 extern const function_shape *const ternary_n;
-- 
2.34.1



Re: [PATCHv2 1/3] ada: Factorize bsd signal definitions

2024-11-07 Thread Marc Poulhiès
Samuel Thibault  writes:

> They are all the same on all BSD-like systems (including GNU/Hurd).
>
> gcc/ada/ChangeLog:
>
>   * libgnarl/a-intnam__freebsd.ads: Rename to...
>   * libgnarl/a-intnam__bsd.ads: ... new file.
>   * libgnarl/a-intnam__dragonfly.ads: Remove file.
>   * Makefile.rtl (x86-kfreebsd, x86-gnuhurd, x86_64-kfreebsd,
>   aarch64-freebsd, x86-freebsd, x86_64-freebsd): Use
>   libgnarl/a-intnam__bsd.ads instead of libgnarl/a-intnam__freebsd.ads.
>   * ada/Makefile.rtl (x86_64-dragonfly): Use libgnarl/a-intnam__bsd.ads
>   instead of libgnarl/a-intnam__dragonfly.ads.
>
> Signed-off-by: Samuel Thibault 

OK without the ChangeLog part.

Thanks,
Marc


Re: [PATCH 04/10] gimple: Disallow sizeless types in BIT_FIELD_REFs.

2024-11-07 Thread Tejas Belagod

On 11/7/24 2:36 PM, Richard Biener wrote:

On Thu, Nov 7, 2024 at 8:25 AM Tejas Belagod  wrote:


On 11/6/24 6:02 PM, Richard Biener wrote:

On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod  wrote:


Ensure sizeless types don't end up trying to be canonicalised to BIT_FIELD_REFs.


You mean variable-sized?  But don't we know, when there's a constant
array index,
that the size is at least so this indexing is OK?  So what's wrong with a
fixed position, fixed size BIT_FIELD_REF extraction of a VLA object?

Richard.



Ah! The code and comment/description don't match, sorry. This change
started out as gating out all canonicalizations of VLA vectors when I
had limited understanding of how this worked, but eventually was
simplified to gate in only those offsets that were known_le, but missed
out fixing the comment/description. So, for eg.

int foo (svint32_t v) { return v[3]; }

canonicalises to a BIT_FIELD_REF 

but something like:

int foo (svint32_t v) { return v[4]; }


So this is possibly out-of-bounds?


reduces to a VEC_EXTRACT <>


But if out-of-bounds a VEC_EXTRACT isn't any better than a BIT_FIELD_REF, no?


Someone may have code protecting accesses like so:

 /* svcntw () returns num of 32-bit elements in a vec */
 if (svcntw () >= 8)
   return v[4];

So I didn't error or warn (-Warray-bounds) for this or for that matter 
make it UB as it will be spurious. So technically, it may not be OOB access.


Therefore BIT_FIELD_REFs are generated for anything within the range of 
a Adv SIMD register and anything beyond is left to be vec_extracted with 
SVE instructions.


Thanks,
Tejas.





I'll fix the comment/description.

Thanks,
Tejas.


gcc/ChangeLog:

  * gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Disallow sizeless
  types in BIT_FIELD_REFs.
---
   gcc/gimple-fold.cc | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c19dac0dbfd..dd45d9f7348 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6281,6 +6281,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
 && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 0
   {
 tree vtype = TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 0));
+  /* BIT_FIELD_REF can only happen on constant-size vectors.  */
 if (VECTOR_TYPE_P (vtype))
  {
tree low = array_ref_low_bound (*t);
@@ -6294,7 +6295,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
   (TYPE_SIZE (TREE_TYPE (*t;
widest_int ext
  = wi::add (idx, wi::to_widest (TYPE_SIZE (TREE_TYPE 
(*t;
- if (wi::les_p (ext, wi::to_widest (TYPE_SIZE (vtype
+ if (known_le (ext, wi::to_poly_widest (TYPE_SIZE (vtype
  {
*t = build3_loc (EXPR_LOCATION (*t), BIT_FIELD_REF,
 TREE_TYPE (*t),
--
2.25.1







Re: [PATCH] arm: Don't ICE on arm_mve.h pragma without MVE types [PR117408]

2024-11-07 Thread Christophe Lyon
Hi,


On Fri, 1 Nov 2024 at 22:10, Torbjörn SVENSSON
 wrote:
>
> There is one more problem, that this patch does not address, and that is
> that there are warnings like below, but I do not know what's causing them.
>
> .../gcc/testsuite/gcc.target/arm/pr117408-1.c:8:9: warning: 'pure' attribute 
> on function returning 'void' [-Wattributes]
> .../gcc/testsuite/gcc.target/arm/pr117408-2.c:8:9: warning: 'pure' attribute 
> on function returning 'void' [-Wattributes]
>
> Both warnignss are repeted several times and generated by the #pragma-line.
> Should I dg-prune-output warning lines or is this ok as-is for trunk and
> releases/gcc-14?
>

It seems this warning is related to a patch series I pushed recently.
I didn't see them in my builds, but I'll have a look.

Regarding your patch, the precommit CI has reported that it breaks the build:
gcc/config/arm/arm-mve-builtins.cc:540:51: error: expected ';' before 'return'

Maybe the error message should be more helpful, and tell the user what to do?

Thanks,

Christophe


> --
>
> Starting with r14-435-g00d97bf3b5a, doing `#pragma arm "arm_mve.h"
> false` or `#pragma arm "arm_mve.h" true` without first doing
> `#pragma arm "arm_mve_types.h"` causes GCC to ICE.
>
> gcc/ChangeLog:
>
>   PR target/117408
>   * config/arm/arm-mve-builtins.cc(handle_arm_mve_h): Detect if MVE
>   types is missing and if so, return error.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/117408
>   * gcc.target/arm/mve/pr117408-1.c: New test.
>   * gcc.target/arm/mve/pr117408-2.c: Likewise.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/config/arm/arm-mve-builtins.cc| 6 ++
>  gcc/testsuite/gcc.target/arm/mve/pr117408-1.c | 7 +++
>  gcc/testsuite/gcc.target/arm/mve/pr117408-2.c | 7 +++
>  3 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
>
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index af1908691b6..c730fe1f0b9 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -535,6 +535,12 @@ handle_arm_mve_h (bool preserve_user_namespace)
>return;
>  }
>
> +  if (!handle_arm_mve_types_p)
> +{
> +  error ("this definition requires MVE types")
> +  return;
> +}
> +
>/* Define MVE functions.  */
>function_table = new hash_table (1023);
>function_builder builder;
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
> new file mode 100644
> index 000..5dddf86efa0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +
> +/* It doesn't really matter if this produces errors missing types,
> +  but it mustn't trigger an ICE.  */
> +#pragma GCC arm "arm_mve.h" false /* { dg-error "this definition requires 
> MVE types" } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
> new file mode 100644
> index 000..6451ee3577e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +
> +/* It doesn't really matter if this produces errors missing types,
> +  but it mustn't trigger an ICE.  */
> +#pragma GCC arm "arm_mve.h" true /* { dg-error "this definition requires MVE 
> types" } */
> --
> 2.25.1
>


[PATCH] c++: Disallow decomposition of lambda bases [PR90321]

2024-11-07 Thread Nathaniel Shead
Bootstrapped and lightly regtested on x86_64-pc-linux-gnu (so far just
dg.exp), OK for trunk if full regtest succeeds?

-- >8 --

Decomposition of lambda closure types is not allowed by
[dcl.struct.bind] p6, since members of a closure have no name.

r244909 made this an error, but missed the case where a lambda is used
as a base.  This patch moves the check to find_decomp_class_base to
handle this case.

As a drive-by improvement, we also slightly improve the diagnostics to
indicate why a base class was being inspected.  Ideally the diagnostic
would point directly at the relevant base, but there doesn't seem to be
an easy way to get this location just from the binfo so I don't worry
about that here.

PR c++/90321

gcc/cp/ChangeLog:

* decl.cc (find_decomp_class_base): Check for decomposing a
lambda closure type.  Report base class chains if needed.
(cp_finish_decomp): Remove no-longer-needed check.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/decomp62.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl.cc| 20 ++--
 gcc/testsuite/g++.dg/cpp1z/decomp62.C | 12 
 2 files changed, 26 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/decomp62.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 0e4533c6fab..87480dca1ac 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9268,6 +9268,14 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
 static tree
 find_decomp_class_base (location_t loc, tree type, tree ret)
 {
+  if (LAMBDA_TYPE_P (type))
+{
+  error_at (loc, "cannot decompose lambda closure type %qT", type);
+  inform (DECL_SOURCE_LOCATION (TYPE_NAME (type)),
+ "lambda declared here");
+  return error_mark_node;
+}
+
   bool member_seen = false;
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 if (TREE_CODE (field) != FIELD_DECL
@@ -9310,9 +9318,14 @@ find_decomp_class_base (location_t loc, tree type, tree 
ret)
   for (binfo = TYPE_BINFO (type), i = 0;
BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
 {
+  auto_diagnostic_group d;
   tree t = find_decomp_class_base (loc, TREE_TYPE (base_binfo), ret);
   if (t == error_mark_node)
-   return error_mark_node;
+   {
+ inform (DECL_SOURCE_LOCATION (TYPE_NAME (type)),
+ "in base class of %qT", type);
+ return error_mark_node;
+   }
   if (t != NULL_TREE && t != ret)
{
  if (ret == type)
@@ -9768,11 +9781,6 @@ cp_finish_decomp (tree decl, cp_decomp *decomp, bool 
test_p)
   error_at (loc, "cannot decompose non-array non-class type %qT", type);
   goto error_out;
 }
-  else if (LAMBDA_TYPE_P (type))
-{
-  error_at (loc, "cannot decompose lambda closure type %qT", type);
-  goto error_out;
-}
   else if (processing_template_decl && complete_type (type) == error_mark_node)
 goto error_out;
   else if (processing_template_decl && !COMPLETE_TYPE_P (type))
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp62.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp62.C
new file mode 100644
index 000..b0ce10570c7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp62.C
@@ -0,0 +1,12 @@
+// PR c++/90321
+// { dg-do compile { target c++17 } }
+
+template struct hack : F { };
+template hack(F) -> hack;
+
+int main()
+{
+auto f = [x = 1, y = 2]() { };
+auto [a, b] = hack { f };  // { dg-error "cannot decompose lambda closure 
type" }
+return b;
+}
-- 
2.47.0



RE: [PATCH 5/5] Allow multiple vectorized epilogs via --param vect-epilogues-nomask=N

2024-11-07 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 6, 2024 2:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: RISC-V CI ; Tamar Christina
> ; Richard Sandiford 
> Subject: [PATCH 5/5] Allow multiple vectorized epilogs via --param 
> vect-epilogues-
> nomask=N
> 
> The following is a prototype allowing N possible vector epilogues.
> In the end I'd like the target to tell us a set of (or no) vector modes
> to consider for the epilogue of the main or the current epilog analyzed loop
> in a way similar as to how we communicate back suggested_unroll_factor.
> 
> The main motivation is SPEC CPU 2017 525.x264_r which when doing
> AVX512 vectorization ends up with using the scalar epilogue in
> a hot function because the AVX2 epilogue has a too high VF.  Using
> two vector epilogues mitigates this and also avoids regressing in
> 527.cam4_r which has a loop iteration count exactly matching the
> AVX2 epilogue (one of the original ideas was to always use a SSE2
> vector epilogue, even with a AVX512 main loop).
> 
> It turns out that two vector epilogues even create smaller code
> in some cases since we tend to fully unroll epilogues with less
> than 16 iterations.  So a simple (int x[])
> 
>   for (int i = 0; i < n; ++i)
> x[i] *= 3;
> 
> has a -O3 -march=znver4 code size
> 
> N vector epilogues   size
> 0615
> 1429
> 2388
> 3392
> 
> I'm unsure how important/effective multiple vector epilogues are
> for non-x86 ISAs who all seem to have only a single vector size
> or VLA vectors.  For better target control on x86 I'd like to
> tell the vectorizer the array of modes to consider for the
> epilogue of the current loop plus a flag whether to consider
> using partial vectors (x86 does not have that encoded into the mode).
> So I'd add m_epilog_vec_modes[] and m_epilog_vec_mode_partial,
> since currently x86 doesn't do cost compares the latter can be a
> flag and we'd try that first when set, together with (only?) the
> first mode?  Alternatively only hint a single mode, but this won't
> ever scale to cost compare targets?
> 
> So using --param vect-epilogues-nomask=N is mainly for this RFC,
> not sure if it has to prevail.
> 
> Note I didn't manage to get aarch64 to use more than one epilogue,
> not even with -msve-vector-bits=512.
> 

My guess is it's probably due to partial SVE vector type support not
being as robust as full vector.  And once you say all vectors are 512bits
to use a smaller one it needs support for partial vectors.

I think this change would be useful for AArch64 as well, but I (personally)
think the most useful mode for us is to be able to generate different
kinds of epilogues.

With that I mean, having an unpredicated SVE main loop,
unpredicated Adv. SIMD first epilogue and predicated SVE second epilogue.

For that I think this change is a good step forward :)

> Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
> built SPEC CPU 2017 with --param vect-epilogues-nomask=2 - as
> said, I want the target to have more control, even on x86 we
> probably only want two epilogues when doing 512bit vectorization
> for the main loop and possibly depend on its VF.

Agreed, for AArch64 we'd definitely like this as the cases we'd generate more
than one epilogue would have a large overlap with ones where we unrolled.

Cheers,
Tamar

> 
> Any comments sofar?
> 
> Thanks,
> Richard.
> 
>   * doc/invoke.texi (vect-epilogues-nomask): Adjust.
>   * params.opt (vect-epilogues-nomask): Adjust max value and
>   documentation.
>   * tree-vect-loop.cc (vect_analyze_loop): Hack in multiple
>   vectorized epilogs.
> ---
>  gcc/doc/invoke.texi   |  3 ++-
>  gcc/params.opt|  2 +-
>  gcc/tree-vect-loop.cc | 23 +--
>  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index f2555ec83a1..73e54a47381 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16870,7 +16870,8 @@ The maximum number of insns in loop header
> duplicated
>  by the copy loop headers pass.
> 
>  @item vect-epilogues-nomask
> -Enable loop epilogue vectorization using smaller vector size.
> +Enable loop epilogue vectorization using smaller vector size with up to N
> +vector epilogue loops.
> 
>  @item vect-partial-vector-usage
>  Controls when the loop vectorizer considers using partial vector loads
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 4dab7a26f9b..c77472e7ad3 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1175,7 +1175,7 @@ Common Joined UInteger
> Var(param_use_canonical_types) Init(1) IntegerRange(0, 1)
>  Whether to use canonical types.
> 
>  -param=vect-epilogues-nomask=
> -Common Joined UInteger Var(param_vect_epilogues_nomask) Init(1)
> IntegerRange(0, 1) Param Optimization
> +Common Joined UInteger Var(param_vect_epilogues_nomask) Init(1)
> IntegerRange(0, 8) Param Optimization
>  Enabl

  1   2   >