date:20240826

Re: [PATCH] vect: Fix STMT_VINFO_DEF_TYPE check for odd/even widen mult [PR116348]

2024-08-26 Thread Richard Biener

On Mon, 26 Aug 2024, Xi Ruoyao wrote:

> After fixing PR116142 some code started to trigger an ICE with -O3
> -march=znver4.  Per Richard Biener who actually made this fix:
> 
> "supportable_widening_operation fails at transform time - that's likely
> because vectorizable_reduction "puns" defs to internal_def"
> 
> so the check should use STMT_VINFO_REDUC_DEF instead of checking if
> STMT_VINFO_DEF_TYPE is vect_reduction_def.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   PR tree-optimization/PR116348
>   * tree-vect-stmts.cc (supportable_widening_operation): Use
>   STMT_VINFO_REDUC_DEF (x) instead of
>   STMT_VINFO_DEF_TYPE (x) == vect_reduction_def.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/PR116348
>   * gcc.c-torture/compile/pr116438.c: New test.
> 
> Co-authored-by: Richard Biener 
> ---
> 
> Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk?
> 
>  gcc/testsuite/gcc.c-torture/compile/pr116438.c | 14 ++
>  gcc/tree-vect-stmts.cc |  3 +--
>  2 files changed, 15 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr116438.c
> 
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr116438.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr116438.c
> new file mode 100644
> index 000..97ab0181ab8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr116438.c
> @@ -0,0 +1,14 @@
> +/* { dg-additional-options "-march=znver4" { target x86_64-*-* i?86-*-* } } 
> */
> +
> +int *a;
> +int b;
> +long long c, d;
> +void
> +e (int f)
> +{
> +  for (; f; f++)
> +{
> +  d += (long long)a[f] * b;
> +  c += (long long)a[f] * 3;
> +}
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 385e63163c2..9eb73a59933 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -14193,8 +14193,7 @@ supportable_widening_operation (vec_info *vinfo,
>   by STMT is only directly used in the reduction statement.  */
> tree lhs = gimple_assign_lhs (vect_orig_stmt (stmt_info)->stmt);
> stmt_vec_info use_stmt_info = loop_info->lookup_single_use (lhs);
> -   if (use_stmt_info
> -   && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
> +   if (use_stmt_info && STMT_VINFO_REDUC_DEF (use_stmt_info))
>   return true;
>  }
>c1 = VEC_WIDEN_MULT_LO_EXPR;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH 2/2] [x86] Update ix86_mode_tieable_p and ix86_rtx_costs.

2024-08-26 Thread liuhongt

For mode2 bigger than 16-bytes, when it can be allocated to FIRST_SSE_REGS,
then it can only be allocated to ALL_SSE_REGS, and it can be tiebale
to all mode1 with smaller size which is available to FIRST_SSE_REGS.
When modes is equal to 16 bytes, exclude non-vector modes(TI/TFmode).
This is need for cse of all-ones/all-zeros, CSE checks costs with
ix86_modes_tieable_p with different size modes.

ALso update ix86_rtx_cost to prevent CONST0_RTX be propogated, it will
fail CSE of CONST0_RTX.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/92080
* config/i386/i386.cc (ix86_modes_tieable_p): Relax
MODE_SIZE (mode1) to <= 64/32/16 bytes when it can be
allocated to FIRST_SSE_REG.
doesn't need to be exactly the same when >= 16.
(ix86_rtx_costs): Increase cost of const_double/const_vector
0/-1 a little to prevent propagation and enable more CSE.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr92080_vec_dup.c: New test.
* gcc.target/i386/pr92080_zero.c: New test.
---
 gcc/config/i386/i386.cc   | 14 +++--
 .../gcc.target/i386/pr92080_vec_dup.c | 48 +
 gcc/testsuite/gcc.target/i386/pr92080_zero.c  | 51 +++
 3 files changed, 108 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080_zero.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 224a78cc832..72b9859e376 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20933,15 +20933,17 @@ ix86_modes_tieable_p (machine_mode mode1, 
machine_mode mode2)
  any other mode acceptable to SSE registers.  */
   if (GET_MODE_SIZE (mode2) == 64
   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
-return (GET_MODE_SIZE (mode1) == 64
+return (GET_MODE_SIZE (mode1) <= 64
&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
   if (GET_MODE_SIZE (mode2) == 32
   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
-return (GET_MODE_SIZE (mode1) == 32
+return (GET_MODE_SIZE (mode1) <= 32
&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
   if (GET_MODE_SIZE (mode2) == 16
   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
-return (GET_MODE_SIZE (mode1) == 16
+return ((VECTOR_MODE_P (mode2)
+? GET_MODE_SIZE (mode1) <= 16
+: GET_MODE_SIZE (mode1) == 16)
&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
 
   /* If MODE2 is appropriate for an MMX register, then tie
@@ -21507,10 +21509,12 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
case 0:
  break;
case 1:  /* 0: xor eliminates false dependency */
- *total = 0;
+ /* Add extra cost 1 to prevent propagation of CONST_VECTOR
+for SET, which will enable more CSE optimization.  */
+ *total = 0 + (outer_code == SET);
  return true;
default: /* -1: cmp contains false dependency */
- *total = 1;
+ *total = 1 + (outer_code == SET);
  return true;
}
   /* FALLTHRU */
diff --git a/gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c 
b/gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c
new file mode 100644
index 000..67fdd15d69c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v4 -O2" } */
+/* { dg-final { scan-assembler-times "vpbroadcast\[bwd\]" 3 } } */
+
+typedef int v16si __attribute__((vector_size(64)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef int v4si __attribute__((vector_size(16)));
+
+typedef short v32hi __attribute__((vector_size(64)));
+typedef short v16hi __attribute__((vector_size(32)));
+typedef short v8hi __attribute__((vector_size(16)));
+
+typedef char v64qi __attribute__((vector_size(64)));
+typedef char v32qi __attribute__((vector_size(32)));
+typedef char v16qi __attribute__((vector_size(16)));
+
+v16si sinksz;
+v8si sinksy;
+v4si sinksx;
+v32hi sinkhz;
+v16hi sinkhy;
+v8hi sinkhx;
+v64qi sinkbz;
+v32qi sinkby;
+v16qi sinkbx;
+
+void foo(char c) {
+  sinksz = __extension__(v16si){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinksy = __extension__(v8si){c,c,c,c,c,c,c,c};
+  sinksx = __extension__(v4si){c,c,c,c};
+}
+
+void foo1(char c) {
+  sinkhz = __extension__(v32hi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkhy = __extension__(v16hi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkhx = __extension__(v8hi){c,c,c,c,c,c,c,c};
+}
+
+void foo2(char c) {
+  sinkbz = __extension__(v64qi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkby = __extension__(v32qi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkbx =

[PATCH 1/2] Enhance cse_insn to handle all-zeros and all-ones for vector mode.

2024-08-26 Thread liuhongt

Also try to handle redundant broadcasts when there's already a
broadcast to a bigger mode with exactly the same component value.
For broadcast, component mode needs to be the same.
For all-zeros/ones, only need to check the bigger mode.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and 
aarch64-linux-gnu{-m32,}.
OK for trunk?

gcc/ChangeLog:

PR rtl-optimization/92080
* cse.cc (cse_insn): Handle all-ones/all-zeros, and vec_dup
with variables.
---
 gcc/cse.cc | 79 ++
 1 file changed, 79 insertions(+)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 65794ac5f2c..baf90910b94 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4870,6 +4870,50 @@ cse_insn (rtx_insn *insn)
}
}
 
+  /* Try to handle special const_vector with elt 0 or -1.
+They can be represented with different modes, and can be cse.  */
+  if (src_const && src_related == 0 && CONST_VECTOR_P (src_const)
+ && (src_const == CONST0_RTX (mode)
+ || src_const == CONSTM1_RTX (mode))
+ && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+   {
+ machine_mode mode_iter;
+
+ for (int l = 0; l != 2; l++)
+   {
+ FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_VECTOR_INT)
+   {
+ if (maybe_lt (GET_MODE_SIZE (mode_iter),
+   GET_MODE_SIZE (mode)))
+   continue;
+
+ rtx src_const_iter = (src_const == CONST0_RTX (mode)
+   ? CONST0_RTX (mode_iter)
+   : CONSTM1_RTX (mode_iter));
+
+ struct table_elt *const_elt
+   = lookup (src_const_iter, HASH (src_const_iter, mode_iter),
+ mode_iter);
+
+ if (const_elt == 0)
+   continue;
+
+ for (const_elt = const_elt->first_same_value;
+  const_elt; const_elt = const_elt->next_same_value)
+   if (REG_P (const_elt->exp))
+ {
+   src_related = gen_lowpart (mode, const_elt->exp);
+   break;
+ }
+
+ if (src_related != 0)
+   break;
+   }
+ if (src_related != 0)
+   break;
+   }
+   }
+
   /* See if we have a CONST_INT that is already in a register in a
 wider mode.  */
 
@@ -5041,6 +5085,41 @@ cse_insn (rtx_insn *insn)
}
}
 
+  /* Try to find something like (vec_dup:v16si (reg:c))
+for (vec_dup:v8si (reg:c)).  */
+  if (src_related == 0
+ && VECTOR_MODE_P (mode)
+ && GET_CODE (src) == VEC_DUPLICATE)
+   {
+ poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (src)) * 2;
+ rtx inner_elt = XEXP (src, 0);
+ machine_mode result_mode;
+ struct table_elt *src_related_elt = NULL;;
+ while (related_vector_mode (mode, GET_MODE_INNER (mode),
+ nunits).exists (&result_mode))
+   {
+ rtx vec_dup = gen_rtx_VEC_DUPLICATE (result_mode, inner_elt);
+ struct table_elt* tmp = lookup (vec_dup, HASH (vec_dup, 
result_mode),
+ result_mode);
+ if (tmp)
+   src_related_elt = tmp;
+ nunits *= 2;
+   }
+
+ if (src_related_elt)
+   {
+ for (src_related_elt = src_related_elt->first_same_value;
+  src_related_elt;
+  src_related_elt = src_related_elt->next_same_value)
+   if (REG_P (src_related_elt->exp))
+ {
+   src_related = gen_lowpart (mode, src_related_elt->exp);
+   break;
+ }
+   }
+   }
+
+
   if (src == src_folded)
src_folded = 0;
 
-- 
2.31.1

[PATCH v2] Match: Add int type fits check for .SAT_ADD imm operand

2024-08-26 Thread pan2 . li

From: Pan Li 

This patch would like to add strict check for imm operand of .SAT_ADD
matching.  We have no type checking for imm operand in previous, which
may result in unexpected IL to be catched by .SAT_ADD pattern.

We leverage the int_fits_type_p here to make sure the imm operand is
a int type fits the result type of the .SAT_ADD.  For example:

Fits uint8_t:
uint8_t a;
uint8_t sum = .SAT_ADD (a, 12);
uint8_t sum = .SAT_ADD (a, 12u);
uint8_t sum = .SAT_ADD (a, 126u);
uint8_t sum = .SAT_ADD (a, 128u);
uint8_t sum = .SAT_ADD (a, 228);
uint8_t sum = .SAT_ADD (a, 223u);

Not fits uint8_t:
uint8_t a;
uint8_t sum = .SAT_ADD (a, -1);
uint8_t sum = .SAT_ADD (a, 256u);
uint8_t sum = .SAT_ADD (a, 257);

The below test suite are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add int_fits_type_p check for .SAT_ADD imm operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_add_imm-11.c: Adjust test case for imm.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_add_imm_type_check-1.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-10.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-11.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-12.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-13.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-14.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-15.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-16.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-17.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-18.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-19.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-2.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-20.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-21.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-22.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-23.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-24.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-25.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-26.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-27.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-28.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-29.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-3.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-30.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-31.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-32.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-33.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-34.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-35.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-36.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-37.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-38.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-39.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-4.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-40.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-41.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-42.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-43.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-44.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-45.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-46.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-47.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-48.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-49.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-5.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-50.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-51.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-52.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-6.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-7.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-8.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/match.pd |  2 +-
 gcc/testsuite/gcc.target/riscv/sat_arith.h   | 16 ++

[PATCH] libgccjit: Remove obsolete texinfo statements

2024-08-26 Thread Wilken Gottwalt

Remove texinfo statements which are obsolete for a while now.

libgccjit.texi:18: warning: @definfoenclose is obsolete
libgccjit.texi:19: warning: @definfoenclose is obsolete

gcc/jit:
* docs/_build/texinfo/libgccjit.texi: Remove obsolete texinfo 
statements.

Signed-off-by: Wilken Gottwalt 
---
 gcc/jit/docs/_build/texinfo/libgccjit.texi | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/jit/docs/_build/texinfo/libgccjit.texi 
b/gcc/jit/docs/_build/texinfo/libgccjit.texi
index a69efeaa4a8..e7abc49b160 100644
--- a/gcc/jit/docs/_build/texinfo/libgccjit.texi
+++ b/gcc/jit/docs/_build/texinfo/libgccjit.texi
@@ -14,9 +14,6 @@
 @direntry
 * libgccjit: (libgccjit.info). GCC-based Just In Time compiler library.
 @end direntry
-
-@definfoenclose strong,`,'
-@definfoenclose emph,`,'
 @c %**end of header
 
 @copying
-- 
2.46.0

Re: [PATCH] c++: Check template parameter number in member class template specialization [PR115716]

2024-08-26 Thread Simon Martin

Hi Jason,

On 22 Aug 2024, at 19:28, Jason Merrill wrote:

> On 8/22/24 12:51 PM, Simon Martin wrote:
>> We currently ICE upon the following invalid code, because we don't 
>> check the
>> number of template parameters in member class template 
>> specializations. This
>> patch fixes the PR by adding such a check.
>>
>> === cut here ===
>> template  struct x {
>>template  struct y {
>>  typedef T result2;
>>};
>> };
>> template<> template struct x::y {
>>typedef double result2;
>> };
>> int main() {
>>x::y::result2 xxx2;
>> }
>> === cut here ===
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/115716
>>
>> gcc/cp/ChangeLog:
>>
>>  * pt.cc (maybe_process_partial_specialization): Check number of
>>  template parameters in specialization.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/template/spec42.C: New test.
>>
>> ---
>>   gcc/cp/pt.cc   | 14 ++
>>   gcc/testsuite/g++.dg/template/spec42.C | 17 +
>>   2 files changed, 31 insertions(+)
>>   create mode 100644 gcc/testsuite/g++.dg/template/spec42.C
>>
>> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
>> index bc3ad5edcc5..db8c2a3b4de 100644
>> --- a/gcc/cp/pt.cc
>> +++ b/gcc/cp/pt.cc
>> @@ -1173,6 +1173,20 @@ maybe_process_partial_specialization (tree 
>> type)
>> type, inst);
>>  }
>>  + /* Check that the number of template parameters matches the 
>> template
>> + being specialized.  */
>> +  gcc_assert (current_template_parms);
>> +  if (TREE_VEC_LENGTH (INNERMOST_TEMPLATE_ARGS
>> +   (CLASSTYPE_TI_ARGS (type)))
>> +  != TREE_VEC_LENGTH (INNERMOST_TEMPLATE_PARMS
>> +  (current_template_parms)))
>> +{
>> +  error ("wrong number of template parameters for %qT", type);
>> +  inform (DECL_SOURCE_LOCATION (tmpl), "from definition of 
>> %q#D",
>> +  tmpl);
>
> How about printing the numbers for each place?
>
> What if the mismatch is other than in the number of parameters?  Can 
> you use template_parameter_lists_equivalent_p?  Or if that's 
> complicated, compare current_template_args() to CLASSTYPE_TI_ARGS 
> (type)?
>
Thanks for the review. After checking further, I believe we just miss a 
call to redeclare_class_template, that will catch various template 
parameter mismatch and properly report them.

This is what the updated attached patch does, successfully tested on 
x86_64-pc-linux-gnu. OK for trunk?

Thanks,
   Simon

> Jason
From 7e817c158e7d2e83e0283c3fa892370dbf9a238a Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Sun, 25 Aug 2024 21:59:31 +0200
Subject: [PATCH] c++: Check template parameters in member class template 
specialization [PR115716]

We currently ICE upon the following invalid code, because we don't check
that the template parameters in a member class template specialization
are correct.

=== cut here ===
template  struct x {
  template  struct y {
typedef T result2;
  };
};
template<> template struct x::y {
  typedef double result2;
};
int main() {
  x::y::result2 xxx2;
}
=== cut here ===

This patch fixes the PR by calling redeclare_class_template.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/115716

gcc/cp/ChangeLog:

* pt.cc (maybe_process_partial_specialization): Call
redeclare_class_template.

gcc/testsuite/ChangeLog:

* g++.dg/template/spec42.C: New test.
* g++.dg/template/spec43.C: New test.

---
 gcc/cp/pt.cc   |  5 +
 gcc/testsuite/g++.dg/template/spec42.C | 17 +
 gcc/testsuite/g++.dg/template/spec43.C | 18 ++
 3 files changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/spec42.C
 create mode 100644 gcc/testsuite/g++.dg/template/spec43.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index bc3ad5edcc5..24a6241d3a5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1173,6 +1173,11 @@ maybe_process_partial_specialization (tree type)
   type, inst);
}
 
+ /* Make sure that the specialization is valid.  */
+ if (!redeclare_class_template (type, current_template_parms,
+current_template_constraints ()))
+   return error_mark_node;
+
  /* Mark TYPE as a specialization.  And as a result, we only
 have one level of template argument for the innermost
 class template.  */
diff --git a/gcc/testsuite/g++.dg/template/spec42.C 
b/gcc/testsuite/g++.dg/template/spec42.C
new file mode 100644
index 000..cac1264fc9f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/spec42.C
@@ -0,0 +1,17 @@
+// PR c++/115716
+// { dg-do compile }
+template  struct x {
+  template  struct y { // { dg-note "used 1 template parameter" }
+typedef T result2;
+  };
+};
+
+template<>
+template
+struct x::y { // { dg-error "redeclared with 2 template parameters" }

Re: [PATCH] Fix bootstap-errors due to enabling -gvariable-location-views

2024-08-26 Thread Richard Biener

On Mon, 26 Aug 2024, Bernd Edlinger wrote:

> This recent change triggered various bootsteap-errors, mostly on
> x86 targets because line info advance address entries were output
> in the wrong section table.
> The switch to the wrong line table happened in dwarfout_set_ignored_loc.
> It must use the same section as the earlier called
> dwarf2out_switch_text_section.
> 
> But also ft32-elf was affected, because the assembler choked on
> something simple as ".2byte .LM2-.LM1", but fortunately it is
> able to use native location views, the configure test was just
> not executed because the ft32 "nop" instruction was missing.

OK for the configure part, I don't understand how using
current_function_section is correct or how it even makes a
differnce to function_section.

It seems both would rely on the fact that fde->decl should be
the same as cfun->decl and both eventually resort to how
first_function_block_is_cold is set.

Is this from final_scan_insn_1 where we seem to switch
in_cold_section_p?

The [current_]function_section API might be just confusing to me
of course.  I note that dwarf2out mixes both uses and
current_function_section seems newer than function_section.  Huh.

Richard.

> gcc/ChangeLog:
> 
>   PR debug/116470
>   * configure.ac: Add the "nop" instruction for cpu type ft32.
>   * configure: Regenerate.
>   * dwarf2out.cc (dwarf2out_set_ignored_loc): Use the correct
>   line info section.
> ---
>  gcc/configure| 2 +-
>  gcc/configure.ac | 2 +-
>  gcc/dwarf2out.cc | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/configure b/gcc/configure
> index 557ea5fa3ac..3d301b6ecd3 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -31398,7 +31398,7 @@ esac
>  case "$cpu_type" in
>aarch64 | alpha | arc | arm | avr | bfin | cris | csky | i386 | loongarch 
> | m32c \
>| m68k | microblaze | mips | nds32 | nios2 | pa | riscv | rs6000 | score | 
> sparc \
> -  | visium | xstormy16 | xtensa)
> +  | visium | xstormy16 | xtensa | ft32)
>  insn="nop"
>  ;;
>ia64 | s390)
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index eaa01d0d7e5..8a2d2b0438e 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -5610,7 +5610,7 @@ esac
>  case "$cpu_type" in
>aarch64 | alpha | arc | arm | avr | bfin | cris | csky | i386 | loongarch 
> | m32c \
>| m68k | microblaze | mips | nds32 | nios2 | pa | riscv | rs6000 | score | 
> sparc \
> -  | visium | xstormy16 | xtensa)
> +  | visium | xstormy16 | xtensa | ft32)
>  insn="nop"
>  ;;
>ia64 | s390)
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index a26a07e3424..1187d32352b 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -28976,7 +28976,7 @@ dwarf2out_set_ignored_loc (unsigned int line, 
> unsigned int column,
>dw_fde_ref fde = cfun->fde;
>  
>fde->ignored_debug = false;
> -  set_cur_line_info_table (function_section (fde->decl));
> +  set_cur_line_info_table (current_function_section ());
>  
>dwarf2out_source_line (line, column, filename, 0, true);
>  }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] tree-optimization/116166 - forward jump-threading going wild

2024-08-26 Thread Richard Biener

On Mon, 26 Aug 2024, Aldy Hernandez wrote:

> [I'm slowly coming up to speed here after my absence, so please bear with 
> me...]
> 
> I suspect there's a few things going on here, both in the forward and
> the backwards threader.  For the forward threader, you mention some
> very good points in the PR.  First, there's unnecessary recursion in
> simplify_control_stmt_condition_1 that ranger should be able to handle
> on its own.  Secondly, since we're doing a DOM walk, we should be able
> to re-use most of the path_ranger's cache instead of having to reset
> all of it on every path, especially when we're just adding empty
> blocks.  I can investigate both of these things.
> 
> The end game here is to get rid of the forward threader, so we should
> really find out why the backwards threader is choking so bad.  I
> suspect whatever the case is, will affect both threaders.  I thought
> you had added some limits in the search space last cycle?  Are they
> not being triggered?

They trigger, but they assume that the work ranger does computing
ranges on a path is O(path-size) but it seems it's rather
O(function-size) as it happily ventures outside of the path and for
that parts (obviously(?)) ends up not using a common cache (aka
only global ranges?).  I've identified the dominator walk in the
PR (or the one of the related PRs) which walks the whole function
dominator tree rather than limiting itself.

> For the record, the reason we can't get rid of the forward threader
> yet (apart from having to fix whatever is going on in PR114855 at -O2
> :)), is that we still rely on the pointer equivalency tracking with
> the DOM equiv lookup tables.  Prange does not yet handle pointer
> equivs.  Also, we'd need to audit to make sure frange handles whatever
> floating point operations were being simplified in the DOM equiv
> lookup as well.  I suspect not much, but we still need to make sure.

I see.  I'll note that with forward threading re-using the cache
when just adding blocks at the tail of the path is obvious (it
might miss optimizations though) while for backwards threading
backtracking on the head of the path requires pruning of the whole
cache (we could eventually keep a "stack" of caches...).

> Minor nit, wouldn't it be cleaner for "limit" to be a class local
> variable instead of passing it around as a function parameter?

Maybe, it's old habit of doing things.

As said, the main problem is ranger itself doing work not within the
bounds that the backwards threader expects.

Richard.

> Thanks for all your work here.
> Aldy
> 
> On Tue, Aug 6, 2024 at 3:12 PM Richard Biener  wrote:
> >
> > Currently the forward threader isn't limited as to the search space
> > it explores and with it now using path-ranger for simplifying
> > conditions it runs into it became pretty slow for degenerate cases
> > like compiling insn-emit.cc for RISC-V esp. when compiling for
> > a host with LOGICAL_OP_NON_SHORT_CIRCUIT disabled.
> >
> > The following makes the forward threader honor the search space
> > limit I introduced for the backward threader.  This reduces
> > compile-time from minutes to seconds for the testcase in PR116166.
> >
> > Note this wasn't necessary before we had ranger but with ranger
> > the work we do is quadatic in the length of the threading path
> > we build up (the same is true for the backwards threader).
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > OK if that succeeds?
> >
> > Thanks,
> > Richard.
> >
> > PR tree-optimization/116166
> > * tree-ssa-threadedge.h (jump_threader::thread_around_empty_blocks):
> > Add limit parameter.
> > (jump_threader::thread_through_normal_block): Likewise.
> > * tree-ssa-threadedge.cc 
> > (jump_threader::thread_around_empty_blocks):
> > Honor and decrement limit parameter.
> > (jump_threader::thread_through_normal_block): Likewise.
> > (jump_threader::thread_across_edge): Initialize limit from
> > param_max_jump_thread_paths and pass it down to workers.
> > ---
> >  gcc/tree-ssa-threadedge.cc | 30 ++
> >  gcc/tree-ssa-threadedge.h  |  4 ++--
> >  2 files changed, 24 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-threadedge.cc b/gcc/tree-ssa-threadedge.cc
> > index 7f82639b8ec..0aa2aa85143 100644
> > --- a/gcc/tree-ssa-threadedge.cc
> > +++ b/gcc/tree-ssa-threadedge.cc
> > @@ -786,13 +786,17 @@ propagate_threaded_block_debug_into (basic_block 
> > dest, basic_block src)
> >  bool
> >  jump_threader::thread_around_empty_blocks (vec *path,
> >edge taken_edge,
> > -  bitmap visited)
> > +  bitmap visited, unsigned &limit)
> >  {
> >basic_block bb = taken_edge->dest;
> >gimple_stmt_iterator gsi;
> >gimple *stmt;
> >tree cond;
> >
> > +  if (limit == 0)
> > +return false;
> > +  --limit;
> >

Re: [PATCH] expand: Use the correct mode for store flags for popcount [PR116480]

2024-08-26 Thread Richard Biener

On Sun, Aug 25, 2024 at 10:28 PM Andrew Pinski  wrote:
>
> When expanding popcount used for equal to 1 (or rather 
> __builtin_stdc_has_single_bit),
> the wrong mode was bsing used for the mode of the store flags. We were using 
> the mode
> of the argument to popcount but since popcount's return value is always int, 
> the mode
> of the expansion here should have been the mode of the return type rater than 
> the argument.
>
> Built and tested on aarch64-linux-gnu with no regressions.
> Also bootstrapped and tested on x86_64-linux-gnu.

OK.

Richard.

> PR middle-end/116480
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_POPCOUNT): Use the correct mode
> for store flags.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116480-1.c: New test.
> * gcc.dg/torture/pr116480-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/internal-fn.cc| 3 ++-
>  gcc/testsuite/gcc.dg/torture/pr116480-1.c | 8 
>  gcc/testsuite/gcc.dg/torture/pr116480-2.c | 8 
>  3 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116480-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116480-2.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index a96e61e527c..89da13b38ce 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -5311,6 +5311,7 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
>bool nonzero_arg = integer_zerop (gimple_call_arg (stmt, 1));
>tree type = TREE_TYPE (arg);
>machine_mode mode = TYPE_MODE (type);
> +  machine_mode lhsmode = TYPE_MODE (TREE_TYPE (lhs));
>do_pending_stack_adjust ();
>start_sequence ();
>expand_unary_optab_fn (fn, stmt, popcount_optab);
> @@ -5318,7 +5319,7 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
>end_sequence ();
>start_sequence ();
>rtx plhs = expand_normal (lhs);
> -  rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, mode, 0, 0);
> +  rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, lhsmode, 0, 0);
>if (pcmp == NULL_RTX)
>  {
>  fail:
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116480-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr116480-1.c
> new file mode 100644
> index 000..15a5727941c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116480-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile { target int128 } } */
> +
> +int
> +foo(unsigned __int128 b)
> +{
> +  return __builtin_popcountg(b) == 1;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116480-2.c 
> b/gcc/testsuite/gcc.dg/torture/pr116480-2.c
> new file mode 100644
> index 000..7bf690283b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116480-2.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile { target bitint } } */
> +
> +int
> +foo(unsigned _BitInt(127) b)
> +{
> +  return __builtin_popcountg(b) == 1;
> +}
> +
> --
> 2.43.0
>

Re: [PATCH 1/2] Enhance cse_insn to handle all-zeros and all-ones for vector mode.

2024-08-26 Thread Richard Biener

On Mon, Aug 26, 2024 at 9:34 AM liuhongt  wrote:
>
> Also try to handle redundant broadcasts when there's already a
> broadcast to a bigger mode with exactly the same component value.
> For broadcast, component mode needs to be the same.
> For all-zeros/ones, only need to check the bigger mode.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and 
> aarch64-linux-gnu{-m32,}.
> OK for trunk?
>
> gcc/ChangeLog:
>
> PR rtl-optimization/92080
> * cse.cc (cse_insn): Handle all-ones/all-zeros, and vec_dup
> with variables.
> ---
>  gcc/cse.cc | 79 ++
>  1 file changed, 79 insertions(+)
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 65794ac5f2c..baf90910b94 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4870,6 +4870,50 @@ cse_insn (rtx_insn *insn)
> }
> }
>
> +  /* Try to handle special const_vector with elt 0 or -1.
> +They can be represented with different modes, and can be cse.  */
> +  if (src_const && src_related == 0 && CONST_VECTOR_P (src_const)
> + && (src_const == CONST0_RTX (mode)
> + || src_const == CONSTM1_RTX (mode))
> + && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +   {
> + machine_mode mode_iter;
> +
> + for (int l = 0; l != 2; l++)
> +   {
> + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_VECTOR_INT)
> +   {
> + if (maybe_lt (GET_MODE_SIZE (mode_iter),
> +   GET_MODE_SIZE (mode)))
> +   continue;
> +
> + rtx src_const_iter = (src_const == CONST0_RTX (mode)
> +   ? CONST0_RTX (mode_iter)
> +   : CONSTM1_RTX (mode_iter));
> +
> + struct table_elt *const_elt
> +   = lookup (src_const_iter, HASH (src_const_iter, 
> mode_iter),
> + mode_iter);
> +
> + if (const_elt == 0)
> +   continue;
> +
> + for (const_elt = const_elt->first_same_value;
> +  const_elt; const_elt = const_elt->next_same_value)
> +   if (REG_P (const_elt->exp))
> + {
> +   src_related = gen_lowpart (mode, const_elt->exp);
> +   break;
> + }
> +
> + if (src_related != 0)
> +   break;
> +   }
> + if (src_related != 0)
> +   break;
> +   }
> +   }
> +
>/* See if we have a CONST_INT that is already in a register in a
>  wider mode.  */
>
> @@ -5041,6 +5085,41 @@ cse_insn (rtx_insn *insn)
> }
> }
>
> +  /* Try to find something like (vec_dup:v16si (reg:c))
> +for (vec_dup:v8si (reg:c)).  */
> +  if (src_related == 0
> + && VECTOR_MODE_P (mode)
> + && GET_CODE (src) == VEC_DUPLICATE)
> +   {
> + poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (src)) * 2;
> + rtx inner_elt = XEXP (src, 0);
> + machine_mode result_mode;
> + struct table_elt *src_related_elt = NULL;;
> + while (related_vector_mode (mode, GET_MODE_INNER (mode),
> + nunits).exists (&result_mode))
> +   {
> + rtx vec_dup = gen_rtx_VEC_DUPLICATE (result_mode, inner_elt);
> + struct table_elt* tmp = lookup (vec_dup, HASH (vec_dup, 
> result_mode),
> + result_mode);
> + if (tmp)
> +   src_related_elt = tmp;

You are possibly overwriting src_related_elt - I'd suggest to either break
here or do the loop below for each found elt?

> + nunits *= 2;
> +   }
> +
> + if (src_related_elt)
> +   {
> + for (src_related_elt = src_related_elt->first_same_value;
> +  src_related_elt;
> +  src_related_elt = src_related_elt->next_same_value)
> +   if (REG_P (src_related_elt->exp))
> + {
> +   src_related = gen_lowpart (mode, src_related_elt->exp);

Do we know that will always succeed?

> +   break;
> + }
> +   }
> +   }

So on the GIMPLE side we are trying to handle such cases by maintaining
only a single element in the hashtables, thus hash and compare them
the same - them in this case (vec_dup:M (reg:c)) and (vec_dup:N (reg:c)),
leaving it up to the consumer to reject or pun mismatches.

For constants that would hold even more - note CSEing vs. duplicating
constants might not be universally good.

Richard.

> +
>if (src == src_folded)
> src_folded = 0;
>
> --
> 2.31.1
>

Re: [PATCH v2] Match: Add int type fits check for .SAT_ADD imm operand

2024-08-26 Thread Richard Biener

On Mon, Aug 26, 2024 at 9:47 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add strict check for imm operand of .SAT_ADD
> matching.  We have no type checking for imm operand in previous, which
> may result in unexpected IL to be catched by .SAT_ADD pattern.
>
> We leverage the int_fits_type_p here to make sure the imm operand is
> a int type fits the result type of the .SAT_ADD.  For example:
>
> Fits uint8_t:
> uint8_t a;
> uint8_t sum = .SAT_ADD (a, 12);
> uint8_t sum = .SAT_ADD (a, 12u);
> uint8_t sum = .SAT_ADD (a, 126u);
> uint8_t sum = .SAT_ADD (a, 128u);
> uint8_t sum = .SAT_ADD (a, 228);
> uint8_t sum = .SAT_ADD (a, 223u);
>
> Not fits uint8_t:
> uint8_t a;
> uint8_t sum = .SAT_ADD (a, -1);
> uint8_t sum = .SAT_ADD (a, 256u);
> uint8_t sum = .SAT_ADD (a, 257);
>
> The below test suite are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.

OK.

> gcc/ChangeLog:
>
> * match.pd: Add int_fits_type_p check for .SAT_ADD imm operand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/sat_arith.h: Add test helper macros.
> * gcc.target/riscv/sat_u_add_imm-11.c: Adjust test case for imm.
> * gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
> * gcc.target/riscv/sat_u_add_imm-12.c: Ditto.
> * gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
> * gcc.target/riscv/sat_u_add_imm-16.c: Ditto.
> * gcc.target/riscv/sat_u_add_imm_type_check-1.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-10.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-11.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-12.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-13.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-14.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-15.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-16.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-17.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-18.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-19.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-2.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-20.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-21.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-22.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-23.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-24.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-25.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-26.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-27.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-28.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-29.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-3.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-30.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-31.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-32.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-33.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-34.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-35.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-36.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-37.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-38.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-39.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-4.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-40.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-41.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-42.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-43.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-44.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-45.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-46.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-47.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-48.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-49.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-5.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-50.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-51.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-52.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-6.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-7.c: New test.
> * gcc.target/riscv/sat_u_add_imm_type_check-8.c: New tes

[PATCH v1] RISC-V: Support IMM for operand 1 of ussub pattern

2024-08-26 Thread pan2 . li

From: Pan Li 

This patch would like to allow IMM for the operand 1 of ussub pattern.
Aka .SAT_SUB(x, 22) as the below example.

Form 2:
  #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
  T __attribute__((noinline)) \
  sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
  {   \
return x >= (T)IMM ? x - (T)IMM : 0;  \
  }

DEF_SAT_U_SUB_IMM_FMT_2(uint64_t, 1022)

It is almost the as support imm for operand 0 of ussub pattern, but
allow the second operand to be imm insted of the first operand.

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_ussub): Gen xmode for the
second operand, aka y in parameter.
* config/riscv/riscv.md (ussub3): Allow const_int for operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc |  2 +-
 gcc/config/riscv/riscv.md |  2 +-
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  9 +++
 .../gcc.target/riscv/sat_u_sub_imm-5.c| 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-5_1.c  | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-5_2.c  | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-6.c| 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-6_1.c  | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-6_2.c  | 22 
 .../gcc.target/riscv/sat_u_sub_imm-7.c| 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-7_1.c  | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-7_2.c  | 22 
 .../gcc.target/riscv/sat_u_sub_imm-8.c| 18 ++
 .../gcc.target/riscv/sat_u_sub_imm-run-5.c| 55 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-6.c| 55 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-7.c| 54 ++
 .../gcc.target/riscv/sat_u_sub_imm-run-8.c| 48 
 17 files changed, 423 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-5_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-5_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-6_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-6_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-7_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-7_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-8.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 90a6e936558..1f544c1287e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11965,7 +11965,7 @@ riscv_expand_ussub (rtx dest, rtx x, rtx y)
 {
   machine_mode mode = GET_MODE (dest);
   rtx xmode_x = riscv_gen_unsigned_xmode_reg (x, mode);
-  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_y = riscv_gen_unsigned_xmode_reg (y, mode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
   rtx xmode_minus = gen_reg_rtx (Xmode);
   rtx xmode_dest = gen_reg_rtx (Xmode);
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index a94705a8e7c..3289ed2155a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4370,7 +4370,7 @@ (define_expand "usadd3"
 (define_expand "ussub3"
   [(match_operand:ANYI 0 "register_operand")
(match_operand:ANYI 1 "reg_or_int_operand")
-   (match_operand:ANYI 2 "register_operand")]
+   (match_operand:ANYI 2 "reg_or_int_operand")]
   ""
   {
 riscv_expand_ussub (operands[0], operands[1], operands[2]);
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.

[nvptx] Fix code-gen for alias attribute

2024-08-26 Thread Prathamesh Kulkarni

Hi,
For the following test (adapted from pr96390.c):

__attribute__((noipa)) int foo () { return 42; }
int bar () __attribute__((alias ("foo")));
int baz () __attribute__((alias ("bar")));

int main ()
{
  int n;
  #pragma omp target map(from:n)
n = baz ();
  return n;
}

Compiling with -fopenmp -foffload=nvptx-none -foffload=-malias 
-foffload=-mptx=6.3 results in:

ptxas fatal   : Internal error: alias to unknown symbol
nvptx-as: ptxas returned 255 exit status
nvptx mkoffload: fatal error: 
../../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 
exit status
compilation terminated.
lto-wrapper: fatal error: 
/home/prathameshk/gnu-toolchain/gcc/grcogcc-38/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0//accel/nvptx-none/mkoffload
 returned 1 exit status
compilation terminated. 

This happens because ptx code-gen shows:

// BEGIN GLOBAL FUNCTION DEF: foo
.visible .func (.param.u32 %value_out) foo
{
.reg.u32 %value;
mov.u32 %value, 42;
st.param.u32[%value_out], %value;
ret;
}
.visible .func (.param.u32 %value_out) bar;
.alias bar,foo;
.visible .func (.param.u32 %value_out) baz;
.alias baz,bar;

.alias baz, bar is invalid since PTX requires aliasee to be a defined function:
https://sw-docs-dgx-station.nvidia.com/cuda-latest/parallel-thread-execution/latest-internal/#kernel-and-function-directives-alias

The patch uses cgraph_node::get(name)->ultimate_alias_target () instead of the 
provided value in nvptx_asm_output_def_from_decls.
For the above case, it now generates the following ptx:

.alias baz,foo; 
instead of:
.alias baz,bar;

which fixes the issue.

Does the patch look in the right direction ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh

[nvptx] Fix code-gen for alias attribute.

For the following test (adapted from pr96390.c):

__attribute__((noipa)) int foo () { return 42; }
int bar () __attribute__((alias ("foo")));
int baz () __attribute__((alias ("bar")));

int main ()
{
  int n;
  #pragma omp target map(from:n)
n = baz ();
  return n;
}

gcc emits following ptx for baz:
.visible .func (.param.u32 %value_out) bar;
.alias bar,foo;
.visible .func (.param.u32 %value_out) baz;
.alias baz,bar;

which is incorrect since PTX requires aliasee to be a defined function.
The patch instead uses cgraph_node::get(name)->ultimate_alias_target,
which generates the following PTX:

.visible .func (.param.u32 %value_out) baz;
.alias baz,foo;

gcc/ChangeLog:

* config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls): Use
cgraph_node::get(name)->ultimate_alias_target instead of value.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 2a8f713c680..9688b0e6f2d 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -7583,7 +7583,8 @@ nvptx_mem_local_p (rtx mem)
   while (0)
 
 void
-nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
+nvptx_asm_output_def_from_decls (FILE *stream, tree name,
+tree value ATTRIBUTE_UNUSED)
 {
   if (nvptx_alias == 0 || !TARGET_PTX_6_3)
 {
@@ -7618,7 +7619,8 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree name, 
tree value)
   return;
 }
 
-  if (!cgraph_node::get (name)->referred_to_p ())
+  cgraph_node *cnode = cgraph_node::get (name);
+  if (!cnode->referred_to_p ())
 /* Prevent "Internal error: reference to deleted section".  */
 return;
 
@@ -7627,8 +7629,10 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
name, tree value)
   fputs (s.str ().c_str (), stream);
 
   tree id = DECL_ASSEMBLER_NAME (name);
+  symtab_node *alias_target_node = cnode->ultimate_alias_target ();
+  tree alias_target_id = DECL_ASSEMBLER_NAME (alias_target_node->decl);
   NVPTX_ASM_OUTPUT_DEF (stream, IDENTIFIER_POINTER (id),
-   IDENTIFIER_POINTER (value));
+   IDENTIFIER_POINTER (alias_target_id));
 }
 
 #undef NVPTX_ASM_OUTPUT_DEF

Re: [PATCH 3/3] Match: Add pattern for `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c` [PR103660]

2024-08-26 Thread Marc Glisse


--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2339,6 +2339,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type))
   (bit_and @0 @1)))

+/* Fold `(a ? b : 0) | (a ? 0 : c)` into (a ? b : c).
+Handle also ^ and + in replacement of `|`. */
+(for cnd (cond vec_cond)
+ (for op (bit_ior bit_xor plus)
+  (simplify
+   (op:c
+(cnd:s @0 @00 integer_zerop)
+(cnd:s @0 integer_zerop @01))
+   (cnd @0 @00 @01


Wouldn't it fall into something more generic like

(for cnd (cond vec_cond)
 (for op (any_binary)
  (simplify
   (op
(cnd:s @0 @1 @2)
(cnd:s @0 @3 @4))
   (cnd @0 (op! @1 @3) (op! @2 @4)

?

The example given in the doc for the use of '!' is pretty close

@smallexample
(simplify
  (plus (vec_cond:s @@0 @@1 @@2) @@3)
  (vec_cond @@0 (plus! @@1 @@3) (plus! @@2 @@3)))
@end smallexample

--
Marc Glisse

[PATCH 1/2] Delay edge removal in forwprop

2024-08-26 Thread Richard Biener

SSA forwprop has switch simplification code that calls remove edge
and as side-effect releases dominator info.  For a followup we want
to retain that so the following delays removing edges until the end
of the pass.  As usual we have to deal with parts of the edge
vanishing due to EH/abnormal pruning so record edges as basic-block
index pairs and remove them only when they are still there.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-forwprop.cc (simplify_gimple_switch_label_vec):
Delay removing edges and releasing dominator info, instead
record into edges_to_remove vector.
(simplify_gimple_switch): Pass through vector of to remove
edges.
(pass_forwprop::execute): Likewise.  Remove queued edges.
---
 gcc/tree-ssa-forwprop.cc | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 959138c..e7342b4dc09 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -972,7 +972,8 @@ forward_propagate_addr_expr (tree name, tree rhs, bool 
parent_single_use_p)
have values outside the range of the new type.  */
 
 static void
-simplify_gimple_switch_label_vec (gswitch *stmt, tree index_type)
+simplify_gimple_switch_label_vec (gswitch *stmt, tree index_type,
+ vec > &edges_to_remove)
 {
   unsigned int branch_num = gimple_switch_num_labels (stmt);
   auto_vec labels (branch_num);
@@ -1026,11 +1027,8 @@ simplify_gimple_switch_label_vec (gswitch *stmt, tree 
index_type)
   for (ei = ei_start (gimple_bb (stmt)->succs); (e = ei_safe_edge (ei)); )
{
  if (! bitmap_bit_p (target_blocks, e->dest->index))
-   {
- remove_edge (e);
- cfg_changed = true;
- free_dominance_info (CDI_DOMINATORS);
-   }
+   edges_to_remove.safe_push (std::make_pair (e->src->index,
+  e->dest->index));
  else
ei_next (&ei);
} 
@@ -1042,7 +1040,8 @@ simplify_gimple_switch_label_vec (gswitch *stmt, tree 
index_type)
the condition which we may be able to optimize better.  */
 
 static bool
-simplify_gimple_switch (gswitch *stmt)
+simplify_gimple_switch (gswitch *stmt,
+   vec > &edges_to_remove)
 {
   /* The optimization that we really care about is removing unnecessary
  casts.  That will let us do much better in propagating the inferred
@@ -1078,7 +1077,8 @@ simplify_gimple_switch (gswitch *stmt)
  && (!max || int_fits_type_p (max, ti)))
{
  gimple_switch_set_index (stmt, def);
- simplify_gimple_switch_label_vec (stmt, ti);
+ simplify_gimple_switch_label_vec (stmt, ti,
+   edges_to_remove);
  update_stmt (stmt);
  return true;
}
@@ -3518,6 +3518,7 @@ pass_forwprop::execute (function *fun)
 |= EDGE_EXECUTABLE;
   auto_vec to_fixup;
   auto_vec to_remove;
+  auto_vec, 10> edges_to_remove;
   auto_bitmap simple_dce_worklist;
   auto_bitmap need_ab_cleanup;
   to_purge = BITMAP_ALLOC (NULL);
@@ -4024,7 +4025,8 @@ pass_forwprop::execute (function *fun)
  }
 
case GIMPLE_SWITCH:
- changed = simplify_gimple_switch (as_a  (stmt));
+ changed = simplify_gimple_switch (as_a  (stmt),
+   edges_to_remove);
  break;
 
case GIMPLE_COND:
@@ -4173,6 +4175,20 @@ pass_forwprop::execute (function *fun)
   cfg_changed |= gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
   BITMAP_FREE (to_purge);
 
+  /* Remove edges queued from switch stmt simplification.  */
+  for (auto ep : edges_to_remove)
+{
+  basic_block src = BASIC_BLOCK_FOR_FN (fun, ep.first);
+  basic_block dest = BASIC_BLOCK_FOR_FN (fun, ep.second);
+  edge e;
+  if (src && dest && (e = find_edge (src, dest)))
+   {
+ free_dominance_info (CDI_DOMINATORS);
+ remove_edge (e);
+ cfg_changed = true;
+   }
+}
+
   if (get_range_query (fun) != get_global_range_query ())
 disable_ranger (fun);
 
-- 
2.43.0

[PATCH 2/2] tree-optimization/116460 - improve forwprop compile-time

2024-08-26 Thread Richard Biener

The following improves forwprop block reachability which I noticed
when debugging PR116460 and what is also noted in the comment.  It
avoids processing blocks in natural loops determined unreachable,
thereby making the issue in PR116460 latent.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): Do not
process blocks in unreachable natural loops.
---
 gcc/tree-ssa-forwprop.cc | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index e7342b4dc09..2964420ad1a 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3498,6 +3498,8 @@ pass_forwprop::execute (function *fun)
 
   cfg_changed = false;
 
+  calculate_dominance_info (CDI_DOMINATORS);
+
   /* Combine stmts with the stmts defining their operands.  Do that
  in an order that guarantees visiting SSA defs before SSA uses.  */
   lattice.create (num_ssa_names);
@@ -3537,12 +3539,11 @@ pass_forwprop::execute (function *fun)
   FOR_EACH_EDGE (e, ei, bb->preds)
{
  if ((e->flags & EDGE_EXECUTABLE)
- /* With dominators we could improve backedge handling
-when e->src is dominated by bb.  But for irreducible
-regions we have to take all backedges conservatively.
-We can handle single-block cycles as we know the
-dominator relationship here.  */
- || bb_to_rpo[e->src->index] > i)
+ /* We can handle backedges in natural loops correctly but
+for irreducible regions we have to take all backedges
+conservatively when we did not visit the source yet.  */
+ || (bb_to_rpo[e->src->index] > i
+ && !dominated_by_p (CDI_DOMINATORS, e->src, e->dest)))
{
  any = true;
  break;
-- 
2.43.0

Re: [PATCH] Fix bootstap-errors due to enabling -gvariable-location-views

2024-08-26 Thread Bernd Edlinger

On 8/26/24 10:31, Richard Biener wrote:
> On Mon, 26 Aug 2024, Bernd Edlinger wrote:
> 
>> This recent change triggered various bootsteap-errors, mostly on
>> x86 targets because line info advance address entries were output
>> in the wrong section table.
>> The switch to the wrong line table happened in dwarfout_set_ignored_loc.
>> It must use the same section as the earlier called
>> dwarf2out_switch_text_section.
>>
>> But also ft32-elf was affected, because the assembler choked on
>> something simple as ".2byte .LM2-.LM1", but fortunately it is
>> able to use native location views, the configure test was just
>> not executed because the ft32 "nop" instruction was missing.
> 
> OK for the configure part, I don't understand how using
> current_function_section is correct or how it even makes a
> differnce to function_section.
> 
> It seems both would rely on the fact that fde->decl should be
> the same as cfun->decl and both eventually resort to how
> first_function_block_is_cold is set.
> 
> Is this from final_scan_insn_1 where we seem to switch
> in_cold_section_p?
> 
> The [current_]function_section API might be just confusing to me
> of course.  I note that dwarf2out mixes both uses and
> current_function_section seems newer than function_section.  Huh.
> 
Well, this is how I debugged it:
I use the successfully bootstrapped x86_64-pc-linux-gnu-gcc as host compiler
and build this:
 ../gcc-trunk/configure --target=i386-linux-gnu CC="gcc -m32 
-gno-as-loc-support" CXX="g++ -m32 -gno-as-loc-support"

make stops here:

g++ -m32 -gno-as-loc-support  -fno-PIE -c   -g -O2 -DIN_GCC 
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings  -DHAVE_CONFIG_H -fno-PIE -I. -I. -I../../gcc-trunk/gcc 
-I../../gcc-trunk/gcc/. -I../../gcc-trunk/gcc/../include  
-I../../gcc-trunk/gcc/../libcpp/include -I../../gcc-trunk/gcc/../libcody 
-I/home/ed/gnu/gcc-build-x/./gmp -I/home/ed/gnu/gcc-trunk/gmp 
-I/home/ed/gnu/gcc-build-x/./mpfr/src -I/home/ed/gnu/gcc-trunk/mpfr/src 
-I/home/ed/gnu/gcc-trunk/mpc/src  -I../../gcc-trunk/gcc/../libdecnumber 
-I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber 
-I../../gcc-trunk/gcc/../libbacktrace -I/home/ed/gnu/gcc-build-x/./isl/include 
-I/home/ed/gnu/gcc-trunk/isl/include  -o gtype-desc.o -MT gtype-desc.o -MMD -MP 
-MF ./.deps/gtype-desc.TPo gtype-desc.cc
/tmp/ccB94xhL.s: Assembler messages:
/tmp/ccB94xhL.s:563836: Error: can't resolve .text.unlikely - .LM4229
/tmp/ccB94xhL.s:563841: Error: can't resolve .text - .LM4230
/tmp/ccB94xhL.s:564103: Error: can't resolve .text.unlikely - .LM4282
/tmp/ccB94xhL.s:564108: Error: can't resolve .text - .LM4283
/tmp/ccB94xhL.s:564115: Error: can't resolve .text.unlikely - .LM4284
make[2]: *** [Makefile:1194: gtype-desc.o] Error 1

I took the original g++ command, and replace "-c" with "-S" and
"-o gtype-desc.o" with "-o /proc/self/fd/1" and add "-wrapper gdb,--args":

$ g++ -m32 -gno-as-loc-support  -fno-PIE -g -O2 -DIN_GCC 
-DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings  -DHAVE_CONFIG_H -fno-PIE -I. -I. -I../../gcc-trunk/gcc 
-I../../gcc-trunk/gcc/. -I../../gcc-trunk/gcc/../include  
-I../../gcc-trunk/gcc/../libcpp/include -I../../gcc-trunk/gcc/../libcody 
-I/home/ed/gnu/gcc-build-x/./gmp -I/home/ed/gnu/gcc-trunk/gmp 
-I/home/ed/gnu/gcc-build-x/./mpfr/src -I/home/ed/gnu/gcc-trunk/mpfr/src 
-I/home/ed/gnu/gcc-trunk/mpc/src  -I../../gcc-trunk/gcc/../libdecnumber 
-I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber 
-I../../gcc-trunk/gcc/../libbacktrace -I/home/ed/gnu/gcc-build-x/./isl/include 
-I/home/ed/gnu/gcc-trunk/isl/include -MT gtype-desc.o -MMD -MP -MF 
./.deps/gtype-desc.TPo gtype-desc.cc -S -o /proc/self/fd/1 -wrapper gdb,--args
(gdb) b dwarf2out_set_ignored_loc
(gdb) display in_cold_section_p
(gdb) display first_function_block_is_cold
(gdb) r
The first breakpoint is uninteresting:
[...]
.size   _Z22gt_pch_p_11eh_region_dPvS_PFvS_S_S_ES_, 
.-_Z22gt_pch_p_11eh_region_dPvS_PFvS_S_S_ES_
.p2align 4
.globl  _Z21gt_pch_p_10eh_catch_dPvS_PFvS_S_S_ES_
.type   _Z21gt_pch_p_10eh_catch_dPvS_PFvS_S_S_ES_, @function
_Z21gt_pch_p_10eh_catch_dPvS_PFvS_S_S_ES_:
.LFB8506:
.cfi_startproc

Breakpoint 1, dwarf2out_set_ignored_loc (line=12009, column=1, 
filename=0x391bd90 "gtype-desc.cc")
at ../../gcc-trunk/gcc/dwarf2out.cc:28976
28976 dw_fde_ref fde = cfun->fde;
1: in_cold_section_p = false
2: first_function_block_is_cold = false
(gdb) 
The next breakpoint is where the problem starts:
(gdb) c
[...]
.size   
_Z27hashtab_entry_not

Re: [PATCH] Fix bootstap-errors due to enabling -gvariable-location-views

2024-08-26 Thread Richard Biener

On Mon, 26 Aug 2024, Bernd Edlinger wrote:

> On 8/26/24 10:31, Richard Biener wrote:
> > On Mon, 26 Aug 2024, Bernd Edlinger wrote:
> > 
> >> This recent change triggered various bootsteap-errors, mostly on
> >> x86 targets because line info advance address entries were output
> >> in the wrong section table.
> >> The switch to the wrong line table happened in dwarfout_set_ignored_loc.
> >> It must use the same section as the earlier called
> >> dwarf2out_switch_text_section.
> >>
> >> But also ft32-elf was affected, because the assembler choked on
> >> something simple as ".2byte .LM2-.LM1", but fortunately it is
> >> able to use native location views, the configure test was just
> >> not executed because the ft32 "nop" instruction was missing.
> > 
> > OK for the configure part, I don't understand how using
> > current_function_section is correct or how it even makes a
> > differnce to function_section.
> > 
> > It seems both would rely on the fact that fde->decl should be
> > the same as cfun->decl and both eventually resort to how
> > first_function_block_is_cold is set.
> > 
> > Is this from final_scan_insn_1 where we seem to switch
> > in_cold_section_p?
> > 
> > The [current_]function_section API might be just confusing to me
> > of course.  I note that dwarf2out mixes both uses and
> > current_function_section seems newer than function_section.  Huh.
> > 
> Well, this is how I debugged it:
> I use the successfully bootstrapped x86_64-pc-linux-gnu-gcc as host compiler
> and build this:
>  ../gcc-trunk/configure --target=i386-linux-gnu CC="gcc -m32 
> -gno-as-loc-support" CXX="g++ -m32 -gno-as-loc-support"
> 
> make stops here:
> 
> g++ -m32 -gno-as-loc-support  -fno-PIE -c   -g -O2 -DIN_GCC 
> -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings  -DHAVE_CONFIG_H -fno-PIE -I. -I. 
> -I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. 
> -I../../gcc-trunk/gcc/../include  -I../../gcc-trunk/gcc/../libcpp/include 
> -I../../gcc-trunk/gcc/../libcody -I/home/ed/gnu/gcc-build-x/./gmp 
> -I/home/ed/gnu/gcc-trunk/gmp -I/home/ed/gnu/gcc-build-x/./mpfr/src 
> -I/home/ed/gnu/gcc-trunk/mpfr/src -I/home/ed/gnu/gcc-trunk/mpc/src  
> -I../../gcc-trunk/gcc/../libdecnumber 
> -I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber 
> -I../../gcc-trunk/gcc/../libbacktrace 
> -I/home/ed/gnu/gcc-build-x/./isl/include -I/home/ed/gnu/gcc-trunk/isl/include 
>  -o gtype-desc.o -MT gtype-desc.o -MMD -MP -MF ./.deps/gtype-desc.TPo 
> gtype-desc.c
 c
> /tmp/ccB94xhL.s: Assembler messages:
> /tmp/ccB94xhL.s:563836: Error: can't resolve .text.unlikely - .LM4229
> /tmp/ccB94xhL.s:563841: Error: can't resolve .text - .LM4230
> /tmp/ccB94xhL.s:564103: Error: can't resolve .text.unlikely - .LM4282
> /tmp/ccB94xhL.s:564108: Error: can't resolve .text - .LM4283
> /tmp/ccB94xhL.s:564115: Error: can't resolve .text.unlikely - .LM4284
> make[2]: *** [Makefile:1194: gtype-desc.o] Error 1
> 
> I took the original g++ command, and replace "-c" with "-S" and
> "-o gtype-desc.o" with "-o /proc/self/fd/1" and add "-wrapper gdb,--args":
> 
> $ g++ -m32 -gno-as-loc-support  -fno-PIE -g -O2 -DIN_GCC 
> -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings  -DHAVE_CONFIG_H -fno-PIE -I. -I. 
> -I../../gcc-trunk/gcc -I../../gcc-trunk/gcc/. 
> -I../../gcc-trunk/gcc/../include  -I../../gcc-trunk/gcc/../libcpp/include 
> -I../../gcc-trunk/gcc/../libcody -I/home/ed/gnu/gcc-build-x/./gmp 
> -I/home/ed/gnu/gcc-trunk/gmp -I/home/ed/gnu/gcc-build-x/./mpfr/src 
> -I/home/ed/gnu/gcc-trunk/mpfr/src -I/home/ed/gnu/gcc-trunk/mpc/src  
> -I../../gcc-trunk/gcc/../libdecnumber 
> -I../../gcc-trunk/gcc/../libdecnumber/bid -I../libdecnumber 
> -I../../gcc-trunk/gcc/../libbacktrace 
> -I/home/ed/gnu/gcc-build-x/./isl/include -I/home/ed/gnu/gcc-trunk/isl/include 
> -MT gtype-desc.o -MMD -MP -MF ./.deps/gtype-desc.TPo gtype-desc.cc -S -o 
> /proc/self/fd/1 -w
 rapper gdb,--args
> (gdb) b dwarf2out_set_ignored_loc
> (gdb) display in_cold_section_p
> (gdb) display first_function_block_is_cold
> (gdb) r
> The first breakpoint is uninteresting:
> [...]
>   .size   _Z22gt_pch_p_11eh_region_dPvS_PFvS_S_S_ES_, 
> .-_Z22gt_pch_p_11eh_region_dPvS_PFvS_S_S_ES_
>   .p2align 4
>   .globl  _Z21gt_pch_p_10eh_catch_dPvS_PFvS_S_S_ES_
>   .type   _Z21gt_pch_p_10eh_catch_dPvS_PFvS_S_S_ES_, @function
> _Z21gt_pch_p_10eh_catch_dPvS_PFvS_S_S_ES_:
> .LFB8506:
>   .cfi_startproc
> 
> Breakpoint 1, dwarf2out_set_ignored_loc (line=12009, column=1, 
> filename=0x391bd90 "gtype-desc.cc")
> at ../../gcc-

Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-26 Thread Richard Biener

On Mon, Aug 26, 2024 at 4:20 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
>   T __attribute__((noinline))  \
>   sat_s_add_##T##_fmt_1 (T x, T y) \
>   {\
> T sum = (UT)x + (UT)y; \
> return (x ^ y) < 0 \
>   ? sum\
>   : (sum ^ x) >= 0 \
> ? sum  \
> : x < 0 ? MIN : MAX;   \
>   }
>
> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t sum;
>8   │   long unsigned int x.0_1;
>9   │   long unsigned int y.1_2;
>   10   │   long unsigned int _3;
>   11   │   long int _4;
>   12   │   long int _5;
>   13   │   int64_t _6;
>   14   │   _Bool _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │   long int _14;
>   18   │   long int _16;
>   19   │   long int _17;
>   20   │
>   21   │ ;;   basic block 2, loop depth 0
>   22   │ ;;pred:   ENTRY
>   23   │   x.0_1 = (long unsigned int) x_7(D);
>   24   │   y.1_2 = (long unsigned int) y_8(D);
>   25   │   _3 = x.0_1 + y.1_2;
>   26   │   sum_9 = (int64_t) _3;
>   27   │   _4 = x_7(D) ^ y_8(D);
>   28   │   _5 = x_7(D) ^ sum_9;
>   29   │   _17 = ~_4;
>   30   │   _16 = _5 & _17;
>   31   │   if (_16 < 0)
>   32   │ goto ; [41.00%]
>   33   │   else
>   34   │ goto ; [59.00%]
>   35   │ ;;succ:   3
>   36   │ ;;4
>   37   │
>   38   │ ;;   basic block 3, loop depth 0
>   39   │ ;;pred:   2
>   40   │   _11 = x_7(D) < 0;
>   41   │   _12 = (long int) _11;
>   42   │   _13 = -_12;
>   43   │   _14 = _13 ^ 9223372036854775807;
>   44   │ ;;succ:   4
>   45   │
>   46   │ ;;   basic block 4, loop depth 0
>   47   │ ;;pred:   2
>   48   │ ;;3
>   49   │   # _6 = PHI 
>   50   │   return _6;
>   51   │ ;;succ:   EXIT
>   52   │
>   53   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t _4;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   12   │   return _4;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add the matching for signed .SAT_ADD.
> * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
> matching func decl.
> (match_unsigned_saturation_add): Try signed .SAT_ADD and rename
> to ...
> (match_saturation_add): ... here.
> (math_opts_dom_walker::after_dom_children): Update the above renamed
> func from caller.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 18 ++
>  gcc/tree-ssa-math-opts.cc | 35 ++-
>  2 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 78f1957e8c7..b059e313415 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3192,6 +3192,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0
>
> +/* Signed saturation add, case 1:
> +   T sum = (UT)X + (UT)Y;
> +   SAT_S_ADD = (X ^ Y) < 0
> + ? sum
> + : (sum ^ x) >= 0
> +   ? sum
> +   : x < 0 ? MIN : MAX;
> +   T and UT are type pair like T=int8_t, UT=uint8_t.  */
> +(match (signed_integer_sat_add @0 @1)
> + (cond^ (lt (bit_and:c (bit_xor:c @0 (convert@2 (plus:c (convert @0)
> +   (convert @1

I think you want to use nop_convert here, for sure a truncation or
extension wouldn't be valid?

> +  (bit_not (bit_xor:c @0 @1)))

I think you don't need :c on both the inner plus and the bit_xor here?

> +   integer_zerop)
> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)

The comment above quotes 'MIN' but that's not present here - that is,
the comment quotes a source form while we match what we see on
GIMPLE?  I do expect the matching will be quite fragile when not
being isolated.

How do you select the cases you want to support?

> +   @2)
> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> +  && types_m

Re: [PATCH] gimple ssa: switchconv: Use __builtin_popcount and support more types in exp transform [PR116355]

2024-08-26 Thread Richard Biener

On Sat, 24 Aug 2024, Filip Kastl wrote:

> Hi,
> 
> bootstrapped and regtested on x86_64-linux.  Ok to push?
> 
> Cheers,
> Filip Kastl
> 
> 
> -- 8< --
> 
> 
> The gen_pow2p function generates (a & -a) == a as a fallback for
> POPCOUNT (a) == 1.  Not only is the bitmagic not equivalent to
> POPCOUNT (a) == 1 but it also introduces UB (consider signed
> a = INT_MIN).
> 
> This patch rewrites gen_pow2p to always use __builtin_popcount instead.
> This means that what the end result GIMPLE code is gets decided by an
> already existing machinery in a later pass.  That is a cleaner solution
> I think.  This existing machinery also uses a ^ (a - 1) > a - 1 which is
> the correct bitmagic.
> 
> While rewriting gen_pow2p I had to add logic for converting the
> operand's type to a type that __builtin_popcount accepts.  I naturally
> also added this logic to gen_log2.  Thanks to this, exponential index
> transform gains the capability to handle all operand types with
> precision at most that of long long int.
> 
> PR tree-optimization/116355
> 
> gcc/ChangeLog:
> 
>   * tree-switch-conversion.cc (can_log2): Take into account the
>   conversion added to gen_log2.
>   (gen_log2): Add a conversion to a type compatible with FFS.
>   (can_pow2p): New function.
>   (gen_pow2p): Rewrite to use __builtin_popcount instead of
>   manually inserting an internal fn call or bitmagic.
>   (switch_conversion::is_exp_index_transform_viable): Call
>   can_pow2p.
>   (switch_conversion::exp_index_transform): Params of gen_pow2p
>   changed so update its call.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/switch-exp-transform-1.c: Don't test for
>   presence of POPCOUNT internal fn after switch conversion.  Test
>   for it after __builtin_popcount has had a chance to get
>   expanded.
>   * gcc.target/i386/switch-exp-transform-3.c: Also test char and
>   short.
> 
> Signed-off-by: Filip Kastl 
> ---
>  .../gcc.target/i386/switch-exp-transform-1.c  |   7 +-
>  .../gcc.target/i386/switch-exp-transform-3.c  |  98 ++-
>  gcc/tree-switch-conversion.cc | 117 ++
>  3 files changed, 192 insertions(+), 30 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
> b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> index 53d31460ba3..a8c9e03e515 100644
> --- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> @@ -1,9 +1,10 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-switchconv -mpopcnt -mbmi" } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul 
> -mpopcnt -mbmi" } */
>  
>  /* Checks that exponential index transform enables switch conversion to 
> convert
> this switch into an array lookup.  Also checks that the "index variable 
> is a
> -   power of two" check has been generated.  */
> +   power of two" check has been generated and that it has been later expanded
> +   into an internal function.  */
>  
>  int foo(unsigned bar)
>  {
> @@ -29,4 +30,4 @@ int foo(unsigned bar)
>  }
>  
>  /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
> -/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
> +/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
> b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> index 64a7b146172..5011d1ebb0e 100644
> --- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> @@ -3,10 +3,104 @@
>  
>  /* Checks that the exponential index transformation is done for all these 
> types
> of the index variable:
> +   - (unsigned) char
> +   - (unsigned) short
> - (unsigned) int
> - (unsigned) long
> - (unsigned) long long  */
>  
> +int unopt_char(char bit_position)
> +{
> +switch (bit_position)
> +{
> +case (1 << 0):
> +return 0;
> +case (1 << 1):
> +return 1;
> +case (1 << 2):
> +return 2;
> +case (1 << 3):
> +return 3;
> +case (1 << 4):
> +return 4;
> +case (1 << 5):
> +return 5;
> +case (1 << 6):
> +return 6;
> +default:
> +return 0;
> +}
> +}
> +
> +int unopt_unsigned_char(unsigned char bit_position)
> +{
> +switch (bit_position)
> +{
> +case (1 << 0):
> +return 0;
> +case (1 << 1):
> +return 1;
> +case (1 << 2):
> +return 2;
> +case (1 << 3):
> +return 3;
> +case (1 << 4):
> +return 4;
> +case (1 << 5):
> +return 5;
> +case (1 << 6):
> +return 6;
> +default:
> +return 0;
> +}
> +}
> +
> +int unopt_short(short bit_pos

Re: [PATCH v1] RISC-V: Support IMM for operand 1 of ussub pattern

2024-08-26 Thread Jeff Law





On 8/26/24 4:26 AM, pan2...@intel.com wrote:

From: Pan Li 

This patch would like to allow IMM for the operand 1 of ussub pattern.
Aka .SAT_SUB(x, 22) as the below example.

Form 2:
   #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
   T __attribute__((noinline)) \
   sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
   {   \
 return x >= (T)IMM ? x - (T)IMM : 0;  \
   }

DEF_SAT_U_SUB_IMM_FMT_2(uint64_t, 1022)

It is almost the as support imm for operand 0 of ussub pattern, but
allow the second operand to be imm insted of the first operand.

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_ussub): Gen xmode for the
second operand, aka y in parameter.
* config/riscv/riscv.md (ussub3): Allow const_int for operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: New test.

OK
jeff

[PATCH] tree-optimization/116460 - ICE with DCE in forwprop

2024-08-26 Thread Richard Biener

The following avoids removing stmts with defs that might still have
uses in the IL before calling simple_dce_from_worklist which might
remove those as that will wreck debug stmt generation.  Instead first
perform use-based DCE and then remove stmts which may have uses in
code that CFG cleanup will remove.  This requires tracking stmts
in to_remove by their SSA def so we can check whether it was removed
before without running into the issue that PHIs can be ggc_free()d
upon removal.  So this adds to_remove_defs in addition to to_remove
which has to stay to track GIMPLE_NOPs we want to elide.

Bootstrapped on x86_64-unknown-linux-gnu (on trunk and 14 branch),
testing in progress.

Richard.

PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): First do
simple_dce_from_worklist and then remove stmts in to_remove.
Track defs to be removed in to_remove_defs.

* g++.dg/torture/pr116460.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr116460.C | 609 
 gcc/tree-ssa-forwprop.cc|  38 +-
 2 files changed, 637 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116460.C

diff --git a/gcc/testsuite/g++.dg/torture/pr116460.C 
b/gcc/testsuite/g++.dg/torture/pr116460.C
new file mode 100644
index 000..3c7d6372fba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr116460.C
@@ -0,0 +1,609 @@
+// { dg-do compile }
+// { dg-additional-options "-g" }
+
+namespace std {
+typedef __SIZE_TYPE__ size_t;
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+void __throw_length_error(const char *) __attribute__((__noreturn__, 
__cold__));
+}
+extern "C++" {
+namespace std __attribute__((__visibility__("default"))) {
+  template  struct __is_integer {
+enum { __value = 1 };
+  };
+  template  struct __is_nonvolatile_trivially_copyable {
+enum { __value = __is_trivially_copyable(_Tp) };
+  };
+  template  struct __memcpyable {};
+  template 
+  struct __memcpyable<_Tp *, _Tp *> : __is_nonvolatile_trivially_copyable<_Tp> 
{
+  };
+  template 
+  struct __memcpyable<_Tp *, const _Tp *>
+  : __is_nonvolatile_trivially_copyable<_Tp> {};
+  template  struct __is_move_iterator {
+enum { __value = 0 };
+  };
+  template  inline _Iterator __miter_base(_Iterator __it) {
+return __it;
+  }
+} // namespace )
+}
+namespace __gnu_cxx __attribute__((__visibility__("default"))) {
+  template 
+  struct __is_integer_nonstrict : public std::__is_integer<_Tp> {
+using std::__is_integer<_Tp>::__value;
+enum { __width = __value ? sizeof(_Tp) * 8 : 0 };
+  };
+  template  struct __numeric_traits_integer {
+static const bool __is_signed = (_Value)(-1) < 0;
+static const int __digits =
+__is_integer_nonstrict<_Value>::__width - __is_signed;
+static const _Value __max =
+__is_signed ? (_Value)1 << (__digits - 1)) - 1) << 1) + 1)
+: ~(_Value)0;
+  };
+  template 
+  struct __numeric_traits : public __numeric_traits_integer<_Value> {};
+} // namespace )
+namespace std __attribute__((__visibility__("default"))) {
+  template  struct integral_constant {
+static constexpr _Tp value = __v;
+using type = integral_constant<_Tp, __v>;
+  };
+  template  using __bool_constant = integral_constant;
+  using true_type = __bool_constant;
+  using false_type = __bool_constant;
+  template  struct enable_if {};
+  template  struct enable_if { using type = _Tp; };
+  template 
+  using __enable_if_t = typename enable_if<_Cond, _Tp>::type;
+  template  struct __conditional {
+template  using type = _Tp;
+  };
+  template 
+  using __conditional_t =
+  typename __conditional<_Cond>::template type<_If, _Else>;
+  namespace __detail {
+  template  auto __and_fn(...) -> false_type;
+  }
+  template 
+  struct __and_ : decltype(__detail::__and_fn<_Bn...>(0)) {};
+  template  struct __not_ : __bool_constant 
{};
+  template  using __void_t = void;
+  template 
+  struct is_trivial : public __bool_constant<__is_trivial(_Tp)> {};
+  template  _Up __declval(int);
+  template  auto declval() noexcept->decltype(__declval<_Tp>(0));
+  template 
+  using __is_constructible_impl =
+  __bool_constant<__is_constructible(_Tp, _Args...)>;
+  template 
+  struct __add_lvalue_reference_helper {
+using type = _Tp &;
+  };
+  template 
+  using __add_lval_ref_t = typename __add_lvalue_reference_helper<_Tp>::type;
+  template 
+  struct is_copy_constructible
+  : public __is_constructible_impl<_Tp, __add_lval_ref_t> {};
+  template 
+  struct __add_rvalue_reference_helper {
+using type = _Tp;
+  };
+  template 
+  using __add_rval_ref_t = typename __add_rvalue_reference_helper<_Tp>::type;
+  template 
+  struct is_move_constructible
+  : public __is_constructible_impl<_Tp, __add_rval_ref_t<_Tp>> {};
+  template 
+  using __is_nothrow_constructible_impl =
+  __bool_constant<__is_nothrow_constructible(_Tp, _Args...)>;
+  template 
+  struct is_nothrow_defau

Re: LRA: Fix setup_sp_offset

2024-08-26 Thread Michael Matz

Hello,

On Sun, 25 Aug 2024, Jeff Law wrote:

> >550: [--sp] = 0 sp_off = 0  {pushexthisi_const}
> >551: [--sp] = 37sp_off = -4 {pushexthisi_const}
> >552: [--sp] = r37   sp_off = -8 {movsi_m68k2}
> >554: [--sp] = r116 - r37sp_off = -12 {subsi3}
> >556: call   sp_off = -16
> > 
> > insn 554 doesn't match its constraints and needs some reloads:
> 
> I think you're right in that the current code isn't correct, but the 
> natural question is how in the world has this worked to-date.  Though I 
> guess targets which push arguments are a dying breed (though I would 
> have expected i386 to have tripped over this at some point).

Yeah, I wondered as well.  For things to go wrong some instructions that 
contain pre/post-inc/dec of the stack pointer need to have reloads in such 
a way that the actual SP-change sideeffect moves to a different 
instruction.  In this case it was:

554: [--sp] = r116 - r37

-->

996: r262 = r116
554: r262 = r262 - r37
997: [--sp] = r262

And for this to happen the targets needs to have instructions that have 
SP-change sideeffect _and_ accept complicated expressions _and_ constrain 
the operand containing the side-effect in some way to the operands of 
these expressions.  (In this case: the subsi3 accepts a generic 
mem-operand destination, which includes pre-increment, and two generic 
register input operands; but constrains it such that the dest must be same 
as op0 of the minus).

I guess that LRA targets until now, when they have SP-change (e.g. x86 
push/pop) are simple enough that the SP-change doesn't need reload.  E.g. 
the push on i386 only accepts a simple register or memory as input, and 
doesn't otherwise tie the SP-memory operands to the input:

   [--sp] = op0 # a simple push of simple general_operand op0

If any reloads are necessary for some reason then it will be on op0 which 
most likely will simply be a force_reg:

   regT = op0
   [--sp] = regT

The identity of the instruction that does the SP-change doesn't change.
setup_sp_offset will only be called on the new regT setter which doesn't 
contain any interesting effects on SP whatsoever, and the sp_offset value 
of the push will remain correct for it.

But if there are output reloads that contain the [--sp] things will go 
wrong, as here.  Typical RISC ISAs, even if they have SP-changes will have 
them in their load/store insns, in which any reloads are similar to the 
x86 case: the side-effect will remain on the original instruction and 
everything will work.

> OK. Though I fear there may be fallout on this one...

Me as well, but I can have hope, can I? :-)

Ciao,
Michael.

Re: LRA: Fix setup_sp_offset

2024-08-26 Thread Paul Koning




> On Aug 26, 2024, at 10:14 AM, Michael Matz  wrote:
> 
> Hello,
> 
> On Sun, 25 Aug 2024, Jeff Law wrote:
> 
>>>   550: [--sp] = 0 sp_off = 0  {pushexthisi_const}
>>>   551: [--sp] = 37sp_off = -4 {pushexthisi_const}
>>>   552: [--sp] = r37   sp_off = -8 {movsi_m68k2}
>>>   554: [--sp] = r116 - r37sp_off = -12 {subsi3}
>>>   556: call   sp_off = -16
>>> 
>>> insn 554 doesn't match its constraints and needs some reloads:
>> 
>> I think you're right in that the current code isn't correct, but the 
>> natural question is how in the world has this worked to-date.  Though I 
>> guess targets which push arguments are a dying breed (though I would 
>> have expected i386 to have tripped over this at some point).
> 
> Yeah, I wondered as well.  For things to go wrong some instructions that 
> contain pre/post-inc/dec of the stack pointer need to have reloads in such 
> a way that the actual SP-change sideeffect moves to a different 
> instruction.  

I think I've seen that in the past on PDP11, and reported it, but I thought 
that particular issue was fixed not too long after.

paul

Re: [PATCH] c++/coros: do not assume coros don't nest [PR113457]

2024-08-26 Thread Jason Merrill


On 8/23/24 3:49 PM, Arsen Arsenović wrote:

Iain Sandoe  writes:


static tree
get_awaitable_var (suspend_point_kind suspend_kind, tree v_type)
{
-  static int awn = 0;
+  auto cinfo = get_coroutine_info (current_function_decl);
+  gcc_assert (cinfo);


If the purpose of this is to check for mistakes during development (i.e. we do
not see a reason for having it in a released compiler) - then it’s better to use
gcc_checking_assert() which will disappear for non-checking builds.


I figured it was OK since this check is extremely light - I can use
gcc_checking_assert if you prefer.  No strong feelings in this instance.


In addition to the expense consideration, I think of _checking_ as being 
for things that I expect to be true, but the code might do something 
reasonable even if they aren't, and gcc_assert for things where the 
result is going to be complete nonsense if it isn't true.


Here, it'll immediately SEGV when trying to dereference a null pointer 
anyway in most cases, just not in the INITIAL/FINAL_SUSPEND_POINT cases. 
 They don't seem to have any dependence on nonnull cinfo, so I think 
_checking_ is the right choice.


OK with that tweak and the testsuite fix.

Jason

[PATCH] c++, coroutines: The frame pointer is used in the helpers [PR116482].

2024-08-26 Thread Iain Sandoe

As the PR notes, we now have two bogus warnings that the single frame
pointer parameter is unused in each of the helper functions.

This started when we began to use start_preparsed_function/finish_function
to wrap the helper function code generation.  I am puzzled a little about
why the use is not evident without marking - or perhaps it is always needed
to mark use in synthetic code?

For the destroy function, in particular, the use of the parameter is simple
- an indirect ref and then it is passed to the call to the actor.

The fix here is somewhat trivial - to mark the param as used as soon as it
is.

Tested on x86_64-darwin, reg-strapping on x86_64-darwin/linux and powerpc64
linux, OK for trunk assuming that the reg-straps are successful?
thanks
Iain

--- 8< ---

We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions.  Fixed
thus.

PR c++/116482

gcc/cp/ChangeLog:

* coroutines.cc (build_actor_fn): Mark the frame pointer as
used.
(build_destroy_fn): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116482.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc   |  3 ++-
 gcc/testsuite/g++.dg/coroutines/pr116482.C | 30 ++
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116482.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index c3e08221cc9..3dee844ad4e 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -2255,6 +2255,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   tree actor_begin_label
 = create_named_label_with_ctx (loc, "actor.begin", actor);
   tree actor_frame = build1_loc (loc, INDIRECT_REF, coro_frame_type, actor_fp);
+  mark_used (actor_fp);
 
   /* Declare the continuation handle.  */
   add_decl_expr (continuation);
@@ -2526,7 +2527,7 @@ build_destroy_fn (location_t loc, tree coro_frame_type, 
tree destroy,
   tree destr_frame
 = cp_build_indirect_ref (loc, destr_fp, RO_UNARY_STAR,
 tf_warning_or_error);
-
+  mark_used (destr_fp);
   tree rat_field = lookup_member (coro_frame_type, coro_resume_index_id,
  1, 0, tf_warning_or_error);
   tree rat
diff --git a/gcc/testsuite/g++.dg/coroutines/pr116482.C 
b/gcc/testsuite/g++.dg/coroutines/pr116482.C
new file mode 100644
index 000..702d1e235bb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr116482.C
@@ -0,0 +1,30 @@
+// Override default options.
+// { dg-options "-std=c++20 -fno-exceptions -Wall -Wextra" }
+
+#include 
+
+struct SuspendNever {
+bool await_ready();
+void await_suspend(std::coroutine_handle<>);
+void await_resume();
+};
+
+struct Coroutine;
+
+struct PromiseType {
+Coroutine get_return_object();
+SuspendNever initial_suspend();
+SuspendNever final_suspend();
+#if __cpp_exceptions
+void unhandled_exception() { /*std::terminate();*/ };
+#endif
+void return_void();
+};
+
+struct Coroutine {
+using promise_type = PromiseType;
+};
+
+Coroutine __async_test_input_basic() {
+co_return;
+}
-- 
2.39.2 (Apple Git-143)

Re: LRA: Fix setup_sp_offset

2024-08-26 Thread Michael Matz

Hello,

On Mon, 26 Aug 2024, Paul Koning wrote:

> >>>   550: [--sp] = 0 sp_off = 0  {pushexthisi_const}
> >>>   551: [--sp] = 37sp_off = -4 {pushexthisi_const}
> >>>   552: [--sp] = r37   sp_off = -8 {movsi_m68k2}
> >>>   554: [--sp] = r116 - r37sp_off = -12 {subsi3}
> >>>   556: call   sp_off = -16
> >>> 
> >>> insn 554 doesn't match its constraints and needs some reloads:
> >> 
> >> I think you're right in that the current code isn't correct, but the 
> >> natural question is how in the world has this worked to-date.  Though I 
> >> guess targets which push arguments are a dying breed (though I would 
> >> have expected i386 to have tripped over this at some point).
> > 
> > Yeah, I wondered as well.  For things to go wrong some instructions that 
> > contain pre/post-inc/dec of the stack pointer need to have reloads in such 
> > a way that the actual SP-change sideeffect moves to a different 
> > instruction.  
> 
> I think I've seen that in the past on PDP11, and reported it, but I 
> thought that particular issue was fixed not too long after.

Do you have a reference handy?  I'd like to take a look, if for nothing 
else than curiosity ;-)


Ciao,
Michael.

RE: [PATCH 3/3] Match: Add pattern for `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c` [PR103660]

2024-08-26 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Marc Glisse 
> Sent: Monday, August 26, 2024 4:46 AM
> To: Richard Biener 
> Cc: Andrew Pinski (QUIC) ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 3/3] Match: Add pattern for `(a ? b : 0) | (a
> ? 0 : c)` into `a ? b : c` [PR103660]
> 
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -2339,6 +2339,16 @@
> DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>   (if (INTEGRAL_TYPE_P (type))
> >>(bit_and @0 @1)))
> >>
> >> +/* Fold `(a ? b : 0) | (a ? 0 : c)` into (a ? b : c).
> >> +Handle also ^ and + in replacement of `|`. */ (for cnd
> (cond
> >> +vec_cond)  (for op (bit_ior bit_xor plus)
> >> +  (simplify
> >> +   (op:c
> >> +(cnd:s @0 @00 integer_zerop)
> >> +(cnd:s @0 integer_zerop @01))
> >> +   (cnd @0 @00 @01
> 
> Wouldn't it fall into something more generic like
> 
> (for cnd (cond vec_cond)
>   (for op (any_binary)
>(simplify
> (op
>  (cnd:s @0 @1 @2)
>  (cnd:s @0 @3 @4))
> (cnd @0 (op! @1 @3) (op! @2 @4)
> 
> ?
> 
> The example given in the doc for the use of '!' is pretty close

Yes we can extend the pattern that is already there for vec_cond too. Though I 
also think we keep the special case for the newly added because then we need to 
extra steps to see that op is no longer there.

Another thing longer term is to remove VEC_COND_EXPR and merge it with 
COND_EXPP. I know this was already mentioned in a different thread but  I don't 
want to duplicate work someone else might be doing; so, I have held back on 
trying to implement that.

Thanks,
Andrew Pinski 

> 
> @smallexample
> (simplify
>(plus (vec_cond:s @@0 @@1 @@2) @@3)
>(vec_cond @@0 (plus! @@1 @@3) (plus! @@2 @@3)))
> @end smallexample
> 
> --
> Marc Glisse

Re: [PATCH 1/2] c++/modules: Clean up include translation [PR110980]

2024-08-26 Thread Jason Merrill


On 8/22/24 7:49 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

Currently the handling of include translation is confusing to read,
using a tri-state integer without much clarity on what different states
mean.  This patch cleans this up to use explicit enumerators indicating
the different possible states instead, and fixes a bug where the option
'-flang-info-include-translate' ended being accidentally unusable.

PR c++/110980

gcc/cp/ChangeLog:

* module.cc (maybe_translate_include): Replace xlate with enum,
fix note_include_translate_yes.

gcc/testsuite/ChangeLog:

* g++.dg/modules/inc-xlate-2_a.H: New test.
* g++.dg/modules/inc-xlate-2_b.H: New test.
* g++.dg/modules/inc-xlate-3.h: New test.
* g++.dg/modules/inc-xlate-3_a.H: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 23 
  gcc/testsuite/g++.dg/modules/inc-xlate-2_a.H |  3 +++
  gcc/testsuite/g++.dg/modules/inc-xlate-2_b.H |  5 +
  gcc/testsuite/g++.dg/modules/inc-xlate-3.h   |  2 ++
  gcc/testsuite/g++.dg/modules/inc-xlate-3_a.H |  5 +
  5 files changed, 29 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-2_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-2_b.H
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-3.h
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-3_a.H

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 07477d33955..4cd7e1c284b 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -20118,15 +20118,19 @@ maybe_translate_include (cpp_reader *reader, 
line_maps *lmaps, location_t loc,
size_t len = strlen (path);
path = canonicalize_header_name (NULL, loc, true, path, len);
auto packet = mapper->IncludeTranslate (path, Cody::Flags::None, len);
-  int xlate = false;
+
+  enum class xlate_kind {
+unknown, text, import,
+  } translate = xlate_kind::unknown;
+
if (packet.GetCode () == Cody::Client::PC_BOOL)
-xlate = -int (packet.GetInteger ());
+translate = packet.GetInteger () ? xlate_kind::text : xlate_kind::unknown;
else if (packet.GetCode () == Cody::Client::PC_PATHNAME)
  {
/* Record the CMI name for when we do the import.  */
module_state *import = get_module (build_string (len, path));
import->set_filename (packet);
-  xlate = +1;
+  translate = xlate_kind::import;
  }
else
  {
@@ -20136,9 +20140,9 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
  }
  
bool note = false;

-  if (note_include_translate_yes && xlate > 1)
+  if (note_include_translate_yes && translate == xlate_kind::import)
  note = true;
-  else if (note_include_translate_no && xlate == 0)
+  else if (note_include_translate_no && translate == xlate_kind::unknown)
  note = true;
else if (note_includes)
  /* We do not expect the note_includes vector to be large, so O(N)
@@ -20148,15 +20152,16 @@ maybe_translate_include (cpp_reader *reader, 
line_maps *lmaps, location_t loc,
note = true;
  
if (note)

-inform (loc, xlate
+inform (loc, translate == xlate_kind::import
? G_("include %qs translated to import")
-   : G_("include %qs processed textually") , path);
+   : G_("include %qs processed textually"), path);
  
-  dump () && dump (xlate ? "Translating include to import"

+  dump () && dump (translate == xlate_kind::import
+  ? "Translating include to import"
   : "Keeping include as include");
dump.pop (0);
  
-  if (!(xlate > 0))

+  if (translate != xlate_kind::import)
  return nullptr;

/* Create the translation text.  */

diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-2_a.H 
b/gcc/testsuite/g++.dg/modules/inc-xlate-2_a.H
new file mode 100644
index 000..d6a4866a676
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inc-xlate-2_a.H
@@ -0,0 +1,3 @@
+// PR c++/110980
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-2_b.H 
b/gcc/testsuite/g++.dg/modules/inc-xlate-2_b.H
new file mode 100644
index 000..f04dd430fec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inc-xlate-2_b.H
@@ -0,0 +1,5 @@
+// PR c++/110980
+// { dg-additional-options "-fmodule-header -flang-info-include-translate" }
+// { dg-module-cmi {} }
+
+#include "inc-xlate-2_a.H"  // { dg-message "translated to import" }
diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-3.h 
b/gcc/testsuite/g++.dg/modules/inc-xlate-3.h
new file mode 100644
index 000..c0584bada0c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inc-xlate-3.h
@@ -0,0 +1,2 @@
+// PR c++/110980
+// Just an empty file to be an include target.
diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-3_a.H 
b/gcc/testsuite/g++.dg/modules/inc-xlate-3_a

Re: [PATCH 2/2] c++/modules: Fix include translation for already-seen headers [PR99243]

2024-08-26 Thread Jason Merrill


On 8/22/24 7:51 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

After importing a header unit we learn about and setup any header
modules that we transitively depend on.  However, this causes
'set_filename' to fail an assertion if we then come across this header
as an #include and attempt to translate it into a module.  We still need
to do this translation so that libcpp learns that this is a header unit,
but we shouldn't error just because we've already seen it as an import.

Instead this patch merely checks and errors to handle the case of a
broken mapper implementation which supplies a different CMI path from
the one we already got.

As a drive-by fix, also make failing to find the CMI for a module be a
fatal error: any further errors in the TU are unlikely to be helpful.

PR c++/99243

gcc/cp/ChangeLog:

* module.cc (module_state::set_filename): Handle repeated calls
to 'set_filename' as long as the CMI path matches.
(maybe_translate_include): Adjust comment.

gcc/testsuite/ChangeLog:

* g++.dg/modules/map-2.C: Prune additional fatal error message.
* g++.dg/modules/inc-xlate-4_a.H: New test.
* g++.dg/modules/inc-xlate-4_b.H: New test.
* g++.dg/modules/inc-xlate-4_c.H: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 18 +-
  gcc/testsuite/g++.dg/modules/inc-xlate-4_a.H |  5 +
  gcc/testsuite/g++.dg/modules/inc-xlate-4_b.H |  5 +
  gcc/testsuite/g++.dg/modules/inc-xlate-4_c.H |  6 ++
  gcc/testsuite/g++.dg/modules/map-2.C |  3 ++-
  5 files changed, 31 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-4_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-4_b.H
  create mode 100644 gcc/testsuite/g++.dg/modules/inc-xlate-4_c.H

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4cd7e1c284b..95c2405fcd4 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -20086,14 +20086,21 @@ canonicalize_header_name (cpp_reader *reader, 
location_t loc, bool unquoted,
  
  void module_state::set_filename (const Cody::Packet &packet)

  {
-  gcc_checking_assert (!filename);
if (packet.GetCode () == Cody::Client::PC_PATHNAME)
-filename = xstrdup (packet.GetString ().c_str ());
+{
+  /* If we've seen this import before we better have the same CMI.  */
+  const std::string &path = packet.GetString ();
+  if (!filename)
+   filename = xstrdup (packet.GetString ().c_str ());
+  else if (filename != path)
+   error_at (loc, "mismatching compiled module interface: "
+ "had %qs, got %qs", filename, path.c_str ());
+}
else
  {
gcc_checking_assert (packet.GetCode () == Cody::Client::PC_ERROR);
-  error_at (loc, "unknown Compiled Module Interface: %s",
-   packet.GetString ().c_str ());
+  fatal_error (loc, "unknown compiled module interface: %s",
+  packet.GetString ().c_str ());
  }
  }
  
@@ -20127,7 +20134,8 @@ maybe_translate_include (cpp_reader *reader, line_maps *lmaps, location_t loc,

  translate = packet.GetInteger () ? xlate_kind::text : xlate_kind::unknown;
else if (packet.GetCode () == Cody::Client::PC_PATHNAME)
  {
-  /* Record the CMI name for when we do the import.  */
+  /* Record the CMI name for when we do the import.
+We may already know about this import, but libcpp doesn't yet.  */
module_state *import = get_module (build_string (len, path));
import->set_filename (packet);
translate = xlate_kind::import;
diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-4_a.H 
b/gcc/testsuite/g++.dg/modules/inc-xlate-4_a.H
new file mode 100644
index 000..8afb49d01a5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inc-xlate-4_a.H
@@ -0,0 +1,5 @@
+// PR c++/99243
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+void foo();
diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-4_b.H 
b/gcc/testsuite/g++.dg/modules/inc-xlate-4_b.H
new file mode 100644
index 000..0e67566f571
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inc-xlate-4_b.H
@@ -0,0 +1,5 @@
+// PR c++/99243
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+#include "inc-xlate-4_a.H"
diff --git a/gcc/testsuite/g++.dg/modules/inc-xlate-4_c.H 
b/gcc/testsuite/g++.dg/modules/inc-xlate-4_c.H
new file mode 100644
index 000..c2fa647bce8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inc-xlate-4_c.H
@@ -0,0 +1,6 @@
+// PR c++/99243
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+#include "inc-xlate-4_b.H"
+#include "inc-xlate-4_a.H"
diff --git a/gcc/testsuite/g++.dg/modules/map-2.C 
b/gcc/testsuite/g++.dg/modules/map-2.C
index 94d3f7a1a41..3f95aea3670 100644
--- a/gcc/testsuite/g++.dg/modules/map-2.C
+++ b/gcc/testsuite/g++.dg/modules/map-2.C
@@

Re: [PATCH] c++: Check template parameter number in member class template specialization [PR115716]

2024-08-26 Thread Jason Merrill


On 8/26/24 4:26 AM, Simon Martin wrote:

Hi Jason,

On 22 Aug 2024, at 19:28, Jason Merrill wrote:


On 8/22/24 12:51 PM, Simon Martin wrote:

We currently ICE upon the following invalid code, because we don't
check the
number of template parameters in member class template
specializations. This
patch fixes the PR by adding such a check.

=== cut here ===
template  struct x {
template  struct y {
  typedef T result2;
};
};
template<> template struct x::y {
typedef double result2;
};
int main() {
x::y::result2 xxx2;
}
=== cut here ===

Successfully tested on x86_64-pc-linux-gnu.

PR c++/115716

gcc/cp/ChangeLog:

* pt.cc (maybe_process_partial_specialization): Check number of
template parameters in specialization.

gcc/testsuite/ChangeLog:

* g++.dg/template/spec42.C: New test.

---
   gcc/cp/pt.cc   | 14 ++
   gcc/testsuite/g++.dg/template/spec42.C | 17 +
   2 files changed, 31 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/template/spec42.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index bc3ad5edcc5..db8c2a3b4de 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1173,6 +1173,20 @@ maybe_process_partial_specialization (tree
type)
   type, inst);
}
  +   /* Check that the number of template parameters matches the
template
+being specialized.  */
+ gcc_assert (current_template_parms);
+ if (TREE_VEC_LENGTH (INNERMOST_TEMPLATE_ARGS
+  (CLASSTYPE_TI_ARGS (type)))
+ != TREE_VEC_LENGTH (INNERMOST_TEMPLATE_PARMS
+ (current_template_parms)))
+   {
+ error ("wrong number of template parameters for %qT", type);
+ inform (DECL_SOURCE_LOCATION (tmpl), "from definition of
%q#D",
+ tmpl);


How about printing the numbers for each place?

What if the mismatch is other than in the number of parameters?  Can
you use template_parameter_lists_equivalent_p?  Or if that's
complicated, compare current_template_args() to CLASSTYPE_TI_ARGS
(type)?


Thanks for the review. After checking further, I believe we just miss a
call to redeclare_class_template, that will catch various template
parameter mismatch and properly report them.

This is what the updated attached patch does, successfully tested on
x86_64-pc-linux-gnu. OK for trunk?


OK.

Re: [RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-08-26 Thread Matevos Mehrabyan

On Mon, Aug 26, 2024 at 2:44 AM Jeff Law  wrote:

>
>
> On 8/20/24 5:41 AM, Richard Biener wrote:
>
> >
> > So the store-merging variant IIRC tracks a single overall source
> > only (unless it was extended and I missed that) and operates at
> > a byte granularity.  I did want to extend it to support vector shuffles
> > at one point (with two sources then), but didn't get to it.  The
> > current implementation manages to be quite efficient - mainly due
> > to the restriction to a single source I guess.
> >
> > How does that compare to the symbolic execution engine?
> >
> > What can the symbolic execution engine handle?  The store-merging
> > machinery can only handle plain copies, can the symbolic
> > execution engine tell that for example bits 3-7 are bits 1-5 from A
> > plus constant 9 with appropriately truncated result?
> Conceptually this is the kind of thing it's supposed to handle, but
> there may be implementation details that are missing for the case you want.
>
> More importantly, the execution engine has a limited set of expressions
> it knows how to evaluate, so there's a reasonable chance if you feed it
> something more general than what is typically seen in a CRC loop that
> it's going to give up because it doesn't know how to handle more than
> just a few operators.
>
>
>
By using this symbolic execution engine, you can determine that bits 3-7
are bits 1-5 from A.
I think the documentation will help others to understand how it works and
what it does.
Since the documentation is not ready, here is a simple demo example:
For the following code:

foo(byte A) {
byte tmp = A ^ 5;
byte result = tmp << 2;
result = result | 4;
return result;
}

the symbolic executor would:

define(A);  // A = 
// Here, each bit of A is mapped to its origin A. So
A[3]->get_origin() will return A.
// Besides that, each bit has an index field that denotes its initial
position.
// So A[3]->get_index() will return 3 even if it is moved or assigned to
another variable.
xor(tmp, A, 5);  // tmp = 
shift_left(result, tmp, 2);  // result = 
or(result, result, 4);  // result = , set result[2] = 1

After these operations, we can examine the result and see that bits 3-7 of
the result are 1-5 bits of the A argument.
For example, result[4] is the (A2 ^ 1) xor expression (can be checked by
is_a),
so it has left and right operands: one of them is the A2 symbolic bit, and
the other is the constant 1.
So result[4]->get_left()->get_origin() will return A and
result[4]->get_left()->get_index() will return 2
as its initial bit position was that.

The symbolic executor supports few operations, it may need to be extended
to use elsewhere.
Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB, MUL,
and COMPLEMENT.

>
> >
> > Note we should always have an eye on compile-time complexity,
> > GCC does get used on multi-megabyte machine-generated sources
> > that tend to look very uniform - variants with loops and bit operations
> > supported by symbolic execution would not catch me in surprise.
> Which is why it's a two phase recognition.  It uses simple tests to
> filter out the vast majority of loops, leaving just a few that have a
> good chance of being a CRC for the more expensive verification step
> using the symbolic execution engine.
>
> jeff
>
>
Best Regards,
Matevos.

Re: [PATCH] rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]

2024-08-26 Thread Segher Boessenkool

Hi!

On Thu, Aug 22, 2024 at 05:39:36PM +0800, Kewen.Lin wrote:
> > - if (ALTIVEC_OR_VSX_VECTOR_MODE (mode) || mode == TImode)
> > + if (ALTIVEC_OR_VSX_VECTOR_MODE (mode) || mode == TImode
> > + || mode == PTImode)
> 
> Maybe we can introduce a macro to this file like
> 
> #define TI_OR_PTI_MODE(mode) ((mode) == TImode || (mode) == PTImode)

INTEGRAL_MODE_P (mode) && MODE_UNIT_SIZE (mode) == 16  or such?

Or you might just want the check for 16, that covers all applicable
modes and nothing else, right?

The correct indentation is

  if (ALTIVEC_OR_VSX_VECTOR_MODE (mode)
  || mode == TImode
  || mode == PTImode)

btw, or you can put the TImode and PTImode on one line if you really
have to, but don't put unalike things on the same line.

I don't like macros that you use just one or two times.  It is much
clearer if you write it out whereever you use it.

> OK for trunk and all active release branches with/without these nits tweaked,
> but please give others two days or so to comment, thanks!

Okay for trunk right now, and backports after the customary wait.  Also
okay with just the  MODE_UNIT_SIZE (mode) == 16  thing, after you tested
that of course :-)

Thanks!

Segher

Re: [PATCH] rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]

2024-08-26 Thread Segher Boessenkool

Hi!

On Thu, Aug 22, 2024 at 08:48:19PM -0500, Peter Bergner wrote:
> I was a little surprised we didn't have that macro already.  Ok, consider
> it changed with your suggestion.
> 
> I agree, there probably is code in the backend that currently handles TImode
> that should probably be changed to handle both by using your new macro.

That is what mode classes are for, or just the mode size here :-)

> >> +/* { dg-do run } */
> >> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > 
> > Nit: This dg-skip-if line looks not necessary as p8vector_hw excludes 
> > *-*-darwin*.

Every single "skip-if darwin" is incorrect if it isn't added by Iain.
Skipping a test on some target is fine, if it would fail on that target
(but do put a comment in!), but skipping because you are scared it will
fail on some target, or you don't care about that target, or just
cargo-cult, is wrong, and encourages more wrongness.

Segher

[r15-3176 Regression] FAIL: gcc.target/i386/funcspec-6.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

8db80b2735782d793a83a9ef7eb012d83be7660d is the first bad commit
commit 8db80b2735782d793a83a9ef7eb012d83be7660d
Author: Hongyu Wang 
Date:   Mon Aug 26 10:53:37 2024 +0800

[PATCH 1/2] AVX10.2: Support media instructions

caused

FAIL: gcc.target/i386/avx10_2-builtin-1.c (test for excess errors)
FAIL: gcc.target/i386/avx10_2-media-1.c (test for excess errors)
FAIL: gcc.target/i386/funcspec-5.c (test for excess errors)
FAIL: gcc.target/i386/funcspec-6.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3176/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-builtin-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-builtin-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-media-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-media-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/funcspec-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/funcspec-6.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r15-3179 Regression] FAIL: gcc.target/i386/avx10_2-bf16-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

9023662464ac7a0bbac72d94078ea0845bf99c86 is the first bad commit
commit 9023662464ac7a0bbac72d94078ea0845bf99c86
Author: konglin1 
Date:   Mon Aug 26 10:53:43 2024 +0800

[PATCH 1/2] AVX10.2: Support BF16 instructions

caused

FAIL: gcc.target/i386/avx10_2-bf16-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3179/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-bf16-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-bf16-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-bf16-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-bf16-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r15-3177 Regression] FAIL: gcc.target/i386/avx10_2-builtin-2.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

af0a06274fce2ca64456f5b13b4bc8ff864a45e4 is the first bad commit
commit af0a06274fce2ca64456f5b13b4bc8ff864a45e4
Author: Haochen Jiang 
Date:   Mon Aug 26 10:53:39 2024 +0800

[PATCH 2/2] AVX10.2: Support media instructions

caused

FAIL: gcc.target/i386/avx10_2-builtin-2.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3177/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-builtin-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-builtin-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r15-3185 Regression] FAIL: gcc.target/i386/avx10_2-compare-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

576bd309ded9dfe258023f26924c064a7bf12875 is the first bad commit
commit 576bd309ded9dfe258023f26924c064a7bf12875
Author: Zhang, Jun 
Date:   Mon Aug 26 10:53:54 2024 +0800

AVX10.2: Support compare instructions

caused

FAIL: gcc.target/i386/avx10_2-compare-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3185/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-compare-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-compare-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r15-3178 Regression] FAIL: gcc.target/i386/avx10_2-convert-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

2a046117a8376578337dc385f171e908155782b7 is the first bad commit
commit 2a046117a8376578337dc385f171e908155782b7
Author: Levy Hsu 
Date:   Mon Aug 26 10:53:41 2024 +0800

AVX10.2: Support convert instructions

caused

FAIL: gcc.target/i386/avx10_2-convert-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3178/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-convert-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-convert-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r15-3184 Regression] FAIL: gcc.target/i386/avx10_2-vmovw-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

f6fe2962daf7b8d8532c768c3b9eab00f99cce5b is the first bad commit
commit f6fe2962daf7b8d8532c768c3b9eab00f99cce5b
Author: Zhang, Jun 
Date:   Mon Aug 26 10:53:52 2024 +0800

AVX10.2: Support vector copy instructions

caused

FAIL: gcc.target/i386/avx10_2-vmovd-1.c (test for excess errors)
FAIL: gcc.target/i386/avx10_2-vmovw-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3184/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-vmovd-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-vmovd-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-vmovw-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-vmovw-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r15-3183 Regression] FAIL: gcc.target/i386/avx10_2-minmax-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread haochen.jiang

On Linux/x86_64,

889f6dd0d8c7317f62578c900c0f662e919786a2 is the first bad commit
commit 889f6dd0d8c7317f62578c900c0f662e919786a2
Author: Mo, Zewei 
Date:   Mon Aug 26 10:53:50 2024 +0800

AVX10.2: Support minmax instructions

caused

FAIL: gcc.target/i386/avx10_2-minmax-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3183/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-minmax-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-minmax-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [patch][v2] libgomp.texi: Document OpenMP's Interoperability Routines

2024-08-26 Thread Sandra Loosemore


On 8/23/24 07:03, Tobias Burnus wrote:

Add documentation for OpenMP's interoperability routines.


I have only a few copy-editing type comments.


+Implementation remark: In GCC, the Fortran interface differs from the one shown
+below: the function has C binding, @var{interop} is passed by value and an
+integer of @code{c_int} kind is returnd, permitting to have the same ABI as the


s/returnd/returned/ (this seems to be the only instance in the patch)

s/permitting to have/which permits use of/g (multiple instances)


+C function.  This does not affect the usage of the function when GCC's
+@code{omp_lib} module or @code{omp_lib.h} header is used.


Stepping back to consider this from a higher-level perspective, 
shouldn't the interface documented in the GCC manual reflect what GCC 
implements, rather than what the spec says that is explicitly *not* what 
is implemented?  Or is the way you have documented this consistent with 
the way other libgomp features that don't strictly conform to the spec 
have already been documented?



+The @code{omp_get_interop_name} function returns the name of the property
+itself as string; for the properties specified by the OpenMP specification,
+the name matches the name of the named constant with the @code{omp_ipr_}
+prefix removed.


That should be @samp{omp_ipr_}, not @code markup.


+In GCC, this function returns the C/C++ data type for this property or


the name of the C/C++ data type


+@samp{N/A} if this property is not available for the given foreign runtime.


@code{"N/A"}, I think.  (It's a string literal, right?)


+If @var{interop} is @code{omp_interop_none} or for invalid property values,
+a null pointer is returned. The the effect of running this routine in a


s/The the/The/


+The @code{omp_get_interop_rc_desc} function returns a string value describing
+the @var{ret_code} in human readable form.


s/human readable form/human-readable form/

I know the libgomp manual uses different formatting conventions than the 
GCC manual or other Texinfo manuals.  Have you inspected the formatted 
output to make sure it's what you expect and consistent with the rest of 
the document?


-Sandra

[pushed] json.h: fix typo in comment

2024-08-26 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3196-gb835710328847a.

gcc/ChangeLog:
* json.h: Fix typo in comment about missing INCLUDE_MEMORY.

Signed-off-by: David Malcolm 
---
 gcc/json.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/json.h b/gcc/json.h
index 21f71fe1c4ab..0bafa5fbea3f 100644
--- a/gcc/json.h
+++ b/gcc/json.h
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
json.h.  */
 
 #ifndef INCLUDE_MEMORY
-# error "You must define INCLUDE_MEMORY before including system.h to use 
make-unique.h"
+# error "You must define INCLUDE_MEMORY before including system.h to use 
json.h"
 #endif
 
 /* Implementation of JSON, a lightweight data-interchange format.
-- 
2.26.3

[pushed] pretty-print: fixes to selftests

2024-08-26 Thread David Malcolm

Add selftest coverage for %{ and %} in pretty-print.cc

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3197-g276cc4324b9e8d.

gcc/ChangeLog:
* pretty-print.cc (selftest::test_urls): Make static.
(selftest::test_urls_from_braces): New.
(selftest::test_null_urls): Make static.
(selftest::test_urlification): Likewise.
(selftest::pretty_print_cc_tests): Call test_urls_from_braces.

Signed-off-by: David Malcolm 
---
 gcc/pretty-print.cc | 39 +++
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 64713803dbe7..1d91da828212 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -3135,7 +3135,7 @@ test_prefixes_and_wrapping ()
 
 /* Verify that URL-printing works as expected.  */
 
-void
+static void
 test_urls ()
 {
   {
@@ -3169,9 +3169,40 @@ test_urls ()
   }
 }
 
+static void
+test_urls_from_braces ()
+{
+  {
+pretty_printer pp;
+pp.set_url_format (URL_FORMAT_NONE);
+pp_printf (&pp, "before %{text%} after",
+   "http://example.com";);
+ASSERT_STREQ ("before text after",
+ pp_formatted_text (&pp));
+  }
+
+  {
+pretty_printer pp;
+pp.set_url_format (URL_FORMAT_ST);
+pp_printf (&pp, "before %{text%} after",
+   "http://example.com";);
+ASSERT_STREQ ("before \33]8;;http://example.com\33\\text\33]8;;\33\\ 
after",
+ pp_formatted_text (&pp));
+  }
+
+  {
+pretty_printer pp;
+pp.set_url_format (URL_FORMAT_BEL);
+pp_printf (&pp, "before %{text%} after",
+   "http://example.com";);
+ASSERT_STREQ ("before \33]8;;http://example.com\atext\33]8;;\a after",
+ pp_formatted_text (&pp));
+  }
+}
+
 /* Verify that we gracefully reject null URLs.  */
 
-void
+static void
 test_null_urls ()
 {
   {
@@ -3221,8 +3252,7 @@ pp_printf_with_urlifier (pretty_printer *pp,
   va_end (ap);
 }
 
-
-void
+static void
 test_urlification ()
 {
   class test_urlifier : public urlifier
@@ -3424,6 +3454,7 @@ pretty_print_cc_tests ()
   test_pp_format ();
   test_prefixes_and_wrapping ();
   test_urls ();
+  test_urls_from_braces ();
   test_null_urls ();
   test_urlification ();
   test_utf8 ();
-- 
2.26.3

[pushed] testsuite: generalize support for Python tests for SARIF output

2024-08-26 Thread David Malcolm

In r15-2354-g4d1f71d49e396c I added the ability to use Python to write
tests of SARIF output via a new "run-sarif-pytest" based
on "run-gcov-pytest", with a sarif.py support script in
testsuite/gcc.dg/sarif-output.

This followup patch:
(a) removes the limitation of such tests needing to be in
testsuite/gcc.dg/sarif-output by moving sarif.py to testsuite/lib
and adding logic to add that directory to PYTHONPATH when invoking
pytest.

(b) uses this to replace fragile regexp-based tests in
gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.c with
Python logic that verifies the structure within the generated JSON,
and to add test coverage for SARIF output relating to GCC plugins.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3198-gaa3b950291119a.

gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add comments noting that we don't
yet capture any diagnostic_metadata::rules associated with a
diagnostic.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-metadata-sarif.c: New test,
based on diagnostic-test-metadata.c.
* gcc.dg/plugin/diagnostic-test-metadata-sarif.py: New script.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.c:
Replace scan-sarif-file directives with run-sarif-pytest, to
run...
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py:
...this new test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
diagnostic-test-metadata-sarif.c.
* gcc.dg/sarif-output/sarif.py: Move to...
* lib/sarif.py: ...here.
* lib/scansarif.exp (run-sarif-pytest): Prepend "lib" to
PYTHONPATH before running python scripts.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-sarif.cc|   6 +-
 .../plugin/diagnostic-test-metadata-sarif.c   |  17 +++
 .../plugin/diagnostic-test-metadata-sarif.py  |  55 +
 ...iagnostic-test-paths-multithreaded-sarif.c |  17 +--
 ...agnostic-test-paths-multithreaded-sarif.py | 109 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   4 +-
 .../{gcc.dg/sarif-output => lib}/sarif.py |   0
 gcc/testsuite/lib/scansarif.exp   |  16 +++
 8 files changed, 208 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.py
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py
 rename gcc/testsuite/{gcc.dg/sarif-output => lib}/sarif.py (100%)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 963a185f6ced..1d99c904ff0c 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -580,7 +580,9 @@ public:
  (SARIF v2.1.0 section 3.27.13).
- doesn't capture -Werror cleanly
- doesn't capture inlining information (can SARIF handle this?)
-   - doesn't capture macro expansion information (can SARIF handle this?).  */
+   - doesn't capture macro expansion information (can SARIF handle this?).
+   - doesn't capture any diagnostic_metadata::rules associated with
+ a diagnostic.  */
 
 class sarif_builder
 {
@@ -1522,6 +1524,8 @@ sarif_builder::make_result_object (diagnostic_context 
&context,
}
 
   diagnostic.metadata->maybe_add_sarif_properties (*result_obj);
+
+  /* We don't yet support diagnostic_metadata::rule.  */
 }
 
   /* "level" property (SARIF v2.1.0 section 3.27.10).  */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c
new file mode 100644
index ..246a8429090d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-format=sarif-file" } */
+
+extern char *gets (char *s);
+
+void test_cwe (void)
+{
+  char buf[1024];
+  gets (buf);
+}
+
+/* Verify that some JSON was written to a file with the expected name.  */
+/* { dg-final { verify-sarif-file } } */
+
+/* Use a Python script to verify various properties about the generated
+   .sarif file:
+   { dg-final { run-sarif-pytest diagnostic-test-metadata-sarif.c 
"diagnostic-test-metadata-sarif.py" } } */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.py 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.py
new file mode 100644
index ..959e6f2e9942
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-metadata-sarif.py
@@ -0,0 +1,55 @@
+# We expect a warning with this textual form:
+#
+# . PATH/diagnostic-test-metadata-sarif.c: In function 'test_cwe':
+# . PATH/diagnostic-test-metadata-sarif.c:8:3: warning: never use 'gets' 
[CWE-242] [STR34-C]
+
+from sarif import *
+
+import pytest
+
+@pytest.fixture(scope='function', autouse=True)
+def sarif():
+return sarif_from_env()
+
+def tes

[pushed] diagnostics: move output formats from diagnostic.{c, h} to their own files

2024-08-26 Thread David Malcolm

In particular, move the classic text output code to a
diagnostic-text.cc (analogous to -json.cc and -sarif.cc).

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3201-g92c5265d22afaa.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostic-format-text.o.
* diagnostic-format-json.cc: Include "diagnostic-format.h".
* diagnostic-format-sarif.cc: Likewise.
* diagnostic-format-text.cc: New file, using material from
diagnostics.cc.
* diagnostic-global-context.cc: Include
"diagnostic-format.h".
* diagnostic-format-text.h: New file, using material from
diagnostics.h.
* diagnostic-format.h: New file, using material from
diagnostics.h.
* diagnostic.cc: Include "diagnostic-format.h" and
"diagnostic-format-text.h".
(diagnostic_text_output_format::~diagnostic_text_output_format):
Move to diagnostic-format-text.cc.
(diagnostic_text_output_format::on_report_diagnostic): Likewise.
(diagnostic_text_output_format::on_diagram): Likewise.
(diagnostic_text_output_format::print_any_cwe): Likewise.
(diagnostic_text_output_format::print_any_rules): Likewise.
(diagnostic_text_output_format::print_option_information):
Likewise.
* diagnostic.h (class diagnostic_output_format): Move to
diagnostic-format.h.
(class diagnostic_text_output_format): Move to
diagnostic-format-text.h.
(diagnostic_output_format_init): Move to
diagnostic-format.h.
(diagnostic_output_format_init_json_stderr): Likewise.
(diagnostic_output_format_init_json_file): Likewise.
(diagnostic_output_format_init_sarif_stderr): Likewise.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
* gcc.cc: Include "diagnostic-format.h".
* opts.cc: Include "diagnostic-format.h".

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_group_plugin.c: Include
"diagnostic-format-text.h".

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in   |   1 +
 gcc/diagnostic-format-json.cc |   1 +
 gcc/diagnostic-format-sarif.cc|   1 +
 gcc/diagnostic-format-text.cc | 209 ++
 gcc/diagnostic-format-text.h  |  56 +
 gcc/diagnostic-format.h   |  83 +++
 gcc/diagnostic-global-context.cc  |   1 +
 gcc/diagnostic.cc | 176 +--
 gcc/diagnostic.h  |  85 +--
 gcc/gcc.cc|   1 +
 gcc/opts.cc   |   1 +
 .../gcc.dg/plugin/diagnostic_group_plugin.c   |   1 +
 12 files changed, 359 insertions(+), 257 deletions(-)
 create mode 100644 gcc/diagnostic-format-text.cc
 create mode 100644 gcc/diagnostic-format-text.h
 create mode 100644 gcc/diagnostic-format.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8fba8f7db6a2..68fda1a75918 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1821,6 +1821,7 @@ OBJS = \
 OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \
diagnostic-format-json.o \
diagnostic-format-sarif.o \
+   diagnostic-format-text.o \
diagnostic-global-context.o \
diagnostic-macro-unwinding.o \
diagnostic-path.o \
diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
index f2e9d0d79e51..c94f5f73bb5a 100644
--- a/gcc/diagnostic-format-json.cc
+++ b/gcc/diagnostic-format-json.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest-diagnostic.h"
 #include "diagnostic-metadata.h"
 #include "diagnostic-path.h"
+#include "diagnostic-format.h"
 #include "json.h"
 #include "selftest.h"
 #include "logical-location.h"
diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 554bf3cb2d5c..59d9cd721839 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "diagnostic-metadata.h"
 #include "diagnostic-path.h"
+#include "diagnostic-format.h"
 #include "json.h"
 #include "cpplib.h"
 #include "logical-location.h"
diff --git a/gcc/diagnostic-format-text.cc b/gcc/diagnostic-format-text.cc
new file mode 100644
index ..b984803ff380
--- /dev/null
+++ b/gcc/diagnostic-format-text.cc
@@ -0,0 +1,209 @@
+/* Classic text-based output of diagnostics.
+   Copyright (C) 1999-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+v

[pushed] testsuite: add event IDs to multithreaded event plugin test

2024-08-26 Thread David Malcolm

Add test coverage of "%@" in event messages in a multithreaded
execution path.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3199-g6a1c359e28442c.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c:
Update expected output.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py:
Likewise.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c
(test_diagnostic_path::add_event_2): Return the id of the added
event.
(test_diagnostic_path::add_event_2_with_event_id): New.
(example_4): Add event IDs to the deadlock messages indicating
where the locks where acquired.

Signed-off-by: David Malcolm 
---
 ...c-test-paths-multithreaded-inline-events.c |  4 +-
 ...agnostic-test-paths-multithreaded-sarif.py |  4 +-
 ...test-paths-multithreaded-separate-events.c |  4 +-
 .../plugin/diagnostic_plugin_test_paths.c | 46 +--
 4 files changed, 38 insertions(+), 20 deletions(-)

diff --git 
a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c
 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c
index 333ef7359440..b306bcc1a0f3 100644
--- 
a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c
+++ 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c
@@ -58,7 +58,7 @@ Thread: 'Thread 1'
|   NN |   acquire_lock_b ();
|  |   ^
|  |   |
-   |  |   (5) deadlocked due to waiting for lock b in thread 1...
+   |  |   (5) deadlocked due to waiting for lock b in thread 1 
(acquired by thread 2 at (4))...
|
 
 Thread: 'Thread 2'
@@ -67,6 +67,6 @@ Thread: 'Thread 2'
|   NN |   acquire_lock_a ();
|  |   ^
|  |   |
-   |  |   (6) ...whilst waiting for lock a in thread 2
+   |  |   (6) ...whilst waiting for lock a in thread 2 (acquired 
by thread 1 at (2))
|
  { dg-end-multiline-output "" } */
diff --git 
a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py
index cff78aa8ac8e..cb00faf1532a 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py
@@ -95,7 +95,7 @@ def test_result(sarif):
 == "lock a is now held by thread 1"
 assert tf0['locations'][2]['executionOrder'] == 5
 assert tf0['locations'][2]['location']['message']['text'] \
-== "deadlocked due to waiting for lock b in thread 1..."
+== "deadlocked due to waiting for lock b in thread 1 (acquired by 
thread 2 at (4))..."
 
 assert len(tf1['locations']) == 3
 assert tf1['locations'][0]['executionOrder'] == 3
@@ -106,4 +106,4 @@ def test_result(sarif):
 == "lock b is now held by thread 2"
 assert tf1['locations'][2]['executionOrder'] == 6
 assert tf1['locations'][2]['location']['message']['text'] \
-== "...whilst waiting for lock a in thread 2"
+== "...whilst waiting for lock a in thread 2 (acquired by thread 1 at 
(2))"
diff --git 
a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c
 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c
index 914918bb9e16..90464320b8e7 100644
--- 
a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c
+++ 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c
@@ -7,12 +7,12 @@ extern void acquire_lock_b(void);
 void foo ()
 { /* { dg-message "\\(1\\) entering 'foo'" } */
   acquire_lock_a (); /* { dg-message "\\(2\\) lock a is now held by thread 1" 
} */
-  acquire_lock_b (); /* { dg-message "\\(5\\) deadlocked due to waiting for 
lock b in thread 1\.\.\." } */
+  acquire_lock_b (); /* { dg-message "\\(5\\) deadlocked due to waiting for 
lock b in thread 1 \\(acquired by thread 2 at \\(4\\)\\)\.\.\." } */
 }
 
 void bar ()
 { /* { dg-message "\\(3\\) entering 'bar'" } */
   acquire_lock_b (); /* { dg-message "\\(4\\) lock b is now held by thread 2" 
} */
   acquire_lock_a (); /* { dg-warning "deadlock due to inconsistent lock 
acquisition order" } */
-  /* { dg-message "\\(6\\) \.\.\.whilst waiting for lock a in thread 2" "" { 
target *-*-* } .-1 } */
+  /* { dg-message "\\(6\\) \.\.\.whilst waiting for lock a in thread 2 
\\(acquired by thread 1 at \\(2\\)\\)" "" { target *-*-* } .-1 } */
 }
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_paths.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_paths.c
index efa4ec475ab3..43e16a6fce11 100644
--- a/gcc/testsuit

[pushed] diagnostics: consolidate on_{begin, end}_diagnostic into on_report_diagnostic

2024-08-26 Thread David Malcolm

Previously diagnostic_context::report_diagnostic had, after the call to
pp_format (phases 1 and 2 of formatting the message):

  m_output_format->on_begin_diagnostic (*diagnostic);
  pp_output_formatted_text (this->printer, m_urlifier);
  if (m_show_cwe)
print_any_cwe (*diagnostic);
  if (m_show_rules)
print_any_rules (*diagnostic);
  if (m_show_option_requested)
  print_option_information (*diagnostic, orig_diag_kind);
  m_output_format->on_end_diagnostic (*diagnostic, orig_diag_kind);

This patch replaces all of the above with a single call to

  m_output_format->on_report_diagnostic (*diagnostic, orig_diag_kind);

moving responsibility for phase 3 of formatting and printing the result
from diagnostic_context to the output format.

This simplifies diagnostic_context::report_diagnostic and allows us to
move the code that prints CWEs, rules, and option information in textual
form from diagnostic_context to diagnostic_text_output_format, where it
belongs.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3200-gac707d30ce449f.

gcc/ChangeLog:
* diagnostic-format-json.cc
(json_output_format::on_begin_diagnostic): Delete.
(json_output_format::on_end_diagnostic): Rename to...
(json_output_format::on_report_diagnostic): ...this and add call
to pp_output_formatted_text.
(diagnostic_output_format_init_json): Drop unnecessary calls
to disable textual printing of CWEs, rules, and options.
* diagnostic-format-sarif.cc (sarif_builder::end_diagnostic):
Rename to...
(sarif_builder::on_report_diagnostic): ...this and add call to
pp_output_formatted_text.
(sarif_output_format::on_begin_diagnostic): Delete.
(sarif_output_format::on_end_diagnostic): Rename to...
(sarif_output_format::on_report_diagnostic): ...this and update
call to m_builder accordingly.
(diagnostic_output_format_init_sarif): Drop unnecessary calls
to disable textual printing of CWEs, rules, and options.
* diagnostic.cc (diagnostic_context::print_any_cwe): Convert to...
(diagnostic_text_output_format::print_any_cwe): ...this.
(diagnostic_context::print_any_rules): Convert to...
(diagnostic_text_output_format::print_any_rules): ...this.
(diagnostic_context::print_option_information): Convert to...
(diagnostic_text_output_format::print_option_information):
...this.
(diagnostic_context::report_diagnostic): Replace calls to the
output format's on_begin_diagnostic, to pp_output_formatted_text,
printing CWE, rules, option info, and the call to the format's
on_end_diagnostic with a call to the format's
on_report_diagnostic.
(diagnostic_text_output_format::on_begin_diagnostic): Delete.
(diagnostic_text_output_format::on_end_diagnostic): Delete.
(diagnostic_text_output_format::on_report_diagnostic): New vfunc,
which effectively does the on_begin_diagnostic, the call to
pp_output_formatted_text, the calls for printing CWE, rules,
option info, and the call to the diagnostic_finalizer.
* diagnostic.h (diagnostic_output_format::on_begin_diagnostic):
Delete.
(diagnostic_output_format::on_end_diagnostic): Delete.
(diagnostic_output_format::on_report_diagnostic): New.
(diagnostic_text_output_format::on_begin_diagnostic): Delete.
(diagnostic_text_output_format::on_end_diagnostic): Delete.
(diagnostic_text_output_format::on_report_diagnostic): New.
(class diagnostic_context): Add friend class
diagnostic_text_output_format.
(diagnostic_context::get_urlifier): New accessor.
(diagnostic_context::print_any_cwe): Move decl...
(diagnostic_text_output_format::print_any_cwe): ...to here.
(diagnostic_context::print_any_rules): Move decl...
(diagnostic_text_output_format::print_any_rules): ...to here.
(diagnostic_context::print_option_information): Move decl...
(diagnostic_text_output_format::print_option_information): ...to
here.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-json.cc  |  24 +--
 gcc/diagnostic-format-sarif.cc |  34 ++---
 gcc/diagnostic.cc  | 261 +
 gcc/diagnostic.h   |  28 ++--
 4 files changed, 171 insertions(+), 176 deletions(-)

diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
index b78cb92cfd7a..f2e9d0d79e51 100644
--- a/gcc/diagnostic-format-json.cc
+++ b/gcc/diagnostic-format-json.cc
@@ -47,13 +47,8 @@ public:
 m_cur_children_array = nullptr;
   }
   void
-  on_begin_diagnostic (const diagnostic_info &) final override
-  {
-/* No-op.  */
-  }
-  void
-  on_end_diagnostic (const diagnostic_info &diagnostic,
-diagnostic_t orig_diag_kind) final override;
+  on

[PATCH] c++: Don't show constructor internal name in error message [PR105483]

2024-08-26 Thread Simon Martin

We mention 'X::__ct' instead of 'X::X' in the "names the constructor,
not the type" error for this invalid code:

=== cut here ===
struct X {};
void g () {
  X::X x;
}
=== cut here ===

The problem is that we use %<%T::%D%> to build the error message, while
%qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This is
what this patch does, along with skipping until the end of the statement
to avoid emitting extra (useless) errors.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/105483

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): Use %qE instead of
incorrect %<%T::%D%>, and skip to end of statement.

gcc/testsuite/ChangeLog:

* g++.dg/tc1/dr147.C: Adjust test expectation.
* g++.dg/diagnostic/pr105483.C: New test.

---
 gcc/cp/parser.cc   | 7 ---
 gcc/testsuite/g++.dg/diagnostic/pr105483.C | 7 +++
 gcc/testsuite/g++.dg/tc1/dr147.C   | 2 +-
 3 files changed, 12 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr105483.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 28ebf2beb60..ef4e3838a86 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13240,10 +13240,11 @@ cp_parser_expression_statement (cp_parser* parser, 
tree in_statement_expr)
   && DECL_CONSTRUCTOR_P (get_first_fn (statement)))
{
  /* A::A a; */
- tree fn = get_first_fn (statement);
  error_at (token->location,
-   "%<%T::%D%> names the constructor, not the type",
-   DECL_CONTEXT (fn), DECL_NAME (fn));
+   "%qE names the constructor, not the type",
+   get_first_fn (statement));
+ cp_parser_skip_to_end_of_block_or_statement (parser);
+ return error_mark_node;
}
 }
 
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr105483.C 
b/gcc/testsuite/g++.dg/diagnostic/pr105483.C
new file mode 100644
index 000..b935bacea11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr105483.C
@@ -0,0 +1,7 @@
+// PR c++/105483
+// { dg-do compile }
+
+struct X { };
+void g () {
+  X::X x; // { dg-error "'X::X' names the constructor" }
+}
diff --git a/gcc/testsuite/g++.dg/tc1/dr147.C b/gcc/testsuite/g++.dg/tc1/dr147.C
index 6b656491e81..ced18d1879c 100644
--- a/gcc/testsuite/g++.dg/tc1/dr147.C
+++ b/gcc/testsuite/g++.dg/tc1/dr147.C
@@ -21,7 +21,7 @@ void A::f()
 void f()
 {
   A::A a; // { dg-error "constructor" "constructor" }
-} // { dg-error "" "error cascade" { target *-*-* } .-1 } error cascade
+}
 }
 
 namespace N2 {
-- 
2.44.0

Re: [PATCH] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-08-26 Thread Jason Merrill


On 8/25/24 12:37 PM, Simon Martin wrote:

On 24 Aug 2024, at 23:59, Simon Martin wrote:

On 24 Aug 2024, at 15:13, Jason Merrill wrote:


On 8/23/24 12:44 PM, Simon Martin wrote:

We currently emit an incorrect -Woverloaded-virtual warning upon the



following
test case

=== cut here ===
struct A {
virtual operator int() { return 42; }
virtual operator char() = 0;
};
struct B : public A {
operator char() { return 'A'; }
};
=== cut here ===

The problem is that warn_hidden relies on get_basefndecls to find



the
methods
in A possibly hidden B's operator char(), and gets both the
conversion operator
to int and to char. It eventually wrongly concludes that the
conversion to int
is hidden.

This patch fixes this by filtering out conversion operators to
different types
from the list returned by get_basefndecls.


Hmm, same_signature_p already tries to handle comparing conversion
operators, why isn't that working?


It does indeed.

However, `ovl_range (fns)` does not only contain `char B::operator()` -
for which `any_override` gets true - but also `conv_op_marker` - for
which `any_override` gets false, causing `seen_non_override` to get to
true. Because of that, we run the last loop, that will emit a warning
for all `base_fndecls` (except `char B::operator()` that has been
removed).

We could test `fndecl` and `base_fndecls[k]` against `conv_op_marker` in
the loop, but we’d still need to inspect the “converting to” type
in the last loop (for when `warn_overloaded_virtual` is 2). This would
make the code much more complex than the current patch.


Makes sense.


It would however probably be better if `get_basefndecls` only returned
the right conversion operator, not all of them. I’ll draft another
version of the patch that does that and submit it in this thread.


I have explored my suggestion further and it actually ends up more
complicated than the initial patch.


Yeah, you'd need to do lookup again for each member of fns.


Please find attached a new revision to fix the reported issue, as well
as new ones I discovered while testing with -Woverloaded-virtual=2.

It’s pretty close to the initial patch, but (1) adds a missing
“continue;” (2) fixes a location problem when
-Woverloaded-virtual==2 (3) adds more test cases. The commit log is also
more comprehensive, and should describe well the various problems and

why the patch is correct.



+   if (IDENTIFIER_CONV_OP_P (name)
+   && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
+DECL_CONV_FN_TYPE (base_fndecls[k])))
+ {
+   base_fndecls[k] = NULL_TREE;
+   continue;
+ }


So this removes base_fndecls[k] if it doesn't return the same type as 
fndecl.  But what if there's another conversion op in fns that does 
return the same type as base_fndecls[k]?


If I add an operator int() to both base and derived in 
Woverloaded-virt7.C, the warning disappears.



   else if (TREE_CODE (t) == OVERLOAD)
+t = OVL_FIRST (t) != conv_op_marker ? OVL_FIRST (t) : OVL_CHAIN (t);


Usually OVL_CHAIN will be another OVERLOAD, you want OVL_FIRST 
(OVL_CHAIN (t)) in that case.


Jason

Re: [PATCH] c++: Add most missing C++20 and C++23 names to cxxapi-data.csv

2024-08-26 Thread Jason Merrill


On 8/24/24 9:55 AM, Jonathan Wakely wrote:

On Sat, 24 Aug 2024 at 14:14, Jason Merrill wrote:


On 8/23/24 8:41 AM, Jonathan Wakely wrote:

Tested x86_64-linux. OK for trunk?


OK.


I've just noticed that this changes the copyright dates from 2022-2024
to just 2024 (see the excerpts of the patch retained below). The
python script just prints the current year, so have previous edits to
that file manually restored the "2022-" part after auto-generating it?

Do we want this change to the script, so that the generated files
don't need to be fixed up?


OK.

Jason

Re: [PATCH] c++: Don't show constructor internal name in error message [PR105483]

2024-08-26 Thread Jason Merrill


On 8/26/24 12:49 PM, Simon Martin wrote:

We mention 'X::__ct' instead of 'X::X' in the "names the constructor,
not the type" error for this invalid code:

=== cut here ===
struct X {};
void g () {
   X::X x;
}
=== cut here ===

The problem is that we use %<%T::%D%> to build the error message, while
%qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This is
what this patch does, along with skipping until the end of the statement
to avoid emitting extra (useless) errors.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/105483

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): Use %qE instead of
incorrect %<%T::%D%>, and skip to end of statement.

gcc/testsuite/ChangeLog:

* g++.dg/tc1/dr147.C: Adjust test expectation.
* g++.dg/diagnostic/pr105483.C: New test.

---
  gcc/cp/parser.cc   | 7 ---
  gcc/testsuite/g++.dg/diagnostic/pr105483.C | 7 +++
  gcc/testsuite/g++.dg/tc1/dr147.C   | 2 +-
  3 files changed, 12 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr105483.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 28ebf2beb60..ef4e3838a86 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13240,10 +13240,11 @@ cp_parser_expression_statement (cp_parser* parser, 
tree in_statement_expr)
   && DECL_CONSTRUCTOR_P (get_first_fn (statement)))
{
  /* A::A a; */
- tree fn = get_first_fn (statement);
  error_at (token->location,
-   "%<%T::%D%> names the constructor, not the type",
-   DECL_CONTEXT (fn), DECL_NAME (fn));
+   "%qE names the constructor, not the type",
+   get_first_fn (statement));
+ cp_parser_skip_to_end_of_block_or_statement (parser);


Why block_or_statement rather than just _statement?

Maybe move the skip+return out of this block to share it with the 
preceding typename error?


Jason

Re: [PATCH] c++, coroutines: The frame pointer is used in the helpers [PR116482].

2024-08-26 Thread Jason Merrill


On 8/26/24 10:33 AM, Iain Sandoe wrote:

As the PR notes, we now have two bogus warnings that the single frame
pointer parameter is unused in each of the helper functions.

This started when we began to use start_preparsed_function/finish_function
to wrap the helper function code generation.  I am puzzled a little about
why the use is not evident without marking - or perhaps it is always needed
to mark use in synthetic code?

For the destroy function, in particular, the use of the parameter is simple
- an indirect ref and then it is passed to the call to the actor.

The fix here is somewhat trivial - to mark the param as used as soon as it
is.


You also wouldn't get the warning if the param were marked 
DECL_ARTIFICIAL, which seems desirable anyway?


Jason

[PATCH v2] c++, coroutines: The frame pointer is used in the helpers [PR116482].

2024-08-26 Thread Iain Sandoe

Hi Jason,

>>As the PR notes, we now have two bogus warnings that the single frame
>>pointer parameter is unused in each of the helper functions.
>>This started when we began to use start_preparsed_function/finish_function
>>to wrap the helper function code generation.  I am puzzled a little about
>>why the use is not evident without marking - or perhaps it is always needed
>>to mark use in synthetic code?
>>For the destroy function, in particular, the use of the parameter is simple
>>- an indirect ref and then it is passed to the call to the actor.
>>The fix here is somewhat trivial - to mark the param as used as soon as it
>>is.

>You also wouldn't get the warning if the param were marked DECL_ARTIFICIAL, 
>which seems desirable anyway?

Yes, done as attached, OK for trunk assuming that reg-testing passes?
thanks
Iain

--- 8< ---

We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions.  Fixed
by making the parameters DECL_ARTIFICIAL.

PR c++/116482

gcc/cp/ChangeLog:

* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116482.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc   |  1 +
 gcc/testsuite/g++.dg/coroutines/pr116482.C | 30 ++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116482.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index c3e08221cc9..8f899513691 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4058,6 +4058,7 @@ coro_build_actor_or_destroy_function (tree orig, tree 
fn_type,
 
   tree id = get_identifier ("frame_ptr");
   tree fp = build_lang_decl (PARM_DECL, id, coro_frame_ptr);
+  DECL_ARTIFICIAL (fp) = true;
   DECL_CONTEXT (fp) = fn;
   DECL_ARG_TYPE (fp) = type_passed_as (coro_frame_ptr);
   DECL_ARGUMENTS (fn) = fp;
diff --git a/gcc/testsuite/g++.dg/coroutines/pr116482.C 
b/gcc/testsuite/g++.dg/coroutines/pr116482.C
new file mode 100644
index 000..702d1e235bb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr116482.C
@@ -0,0 +1,30 @@
+// Override default options.
+// { dg-options "-std=c++20 -fno-exceptions -Wall -Wextra" }
+
+#include 
+
+struct SuspendNever {
+bool await_ready();
+void await_suspend(std::coroutine_handle<>);
+void await_resume();
+};
+
+struct Coroutine;
+
+struct PromiseType {
+Coroutine get_return_object();
+SuspendNever initial_suspend();
+SuspendNever final_suspend();
+#if __cpp_exceptions
+void unhandled_exception() { /*std::terminate();*/ };
+#endif
+void return_void();
+};
+
+struct Coroutine {
+using promise_type = PromiseType;
+};
+
+Coroutine __async_test_input_basic() {
+co_return;
+}
-- 
2.39.2 (Apple Git-143)

Re: [PATCH v2] c++, coroutines: The frame pointer is used in the helpers [PR116482].

2024-08-26 Thread Jason Merrill


On 8/26/24 2:34 PM, Iain Sandoe wrote:

Hi Jason,


As the PR notes, we now have two bogus warnings that the single frame
pointer parameter is unused in each of the helper functions.
This started when we began to use start_preparsed_function/finish_function
to wrap the helper function code generation.  I am puzzled a little about
why the use is not evident without marking - or perhaps it is always needed
to mark use in synthetic code?
For the destroy function, in particular, the use of the parameter is simple
- an indirect ref and then it is passed to the call to the actor.
The fix here is somewhat trivial - to mark the param as used as soon as it
is.



You also wouldn't get the warning if the param were marked DECL_ARTIFICIAL, 
which seems desirable anyway?


Yes, done as attached, OK for trunk assuming that reg-testing passes?


OK.


--- 8< ---

We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions.  Fixed
by making the parameters DECL_ARTIFICIAL.

PR c++/116482

gcc/cp/ChangeLog:

* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr116482.C: New test.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc   |  1 +
  gcc/testsuite/g++.dg/coroutines/pr116482.C | 30 ++
  2 files changed, 31 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr116482.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index c3e08221cc9..8f899513691 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4058,6 +4058,7 @@ coro_build_actor_or_destroy_function (tree orig, tree 
fn_type,
  
tree id = get_identifier ("frame_ptr");

tree fp = build_lang_decl (PARM_DECL, id, coro_frame_ptr);
+  DECL_ARTIFICIAL (fp) = true;
DECL_CONTEXT (fp) = fn;
DECL_ARG_TYPE (fp) = type_passed_as (coro_frame_ptr);
DECL_ARGUMENTS (fn) = fp;
diff --git a/gcc/testsuite/g++.dg/coroutines/pr116482.C 
b/gcc/testsuite/g++.dg/coroutines/pr116482.C
new file mode 100644
index 000..702d1e235bb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr116482.C
@@ -0,0 +1,30 @@
+// Override default options.
+// { dg-options "-std=c++20 -fno-exceptions -Wall -Wextra" }
+
+#include 
+
+struct SuspendNever {
+bool await_ready();
+void await_suspend(std::coroutine_handle<>);
+void await_resume();
+};
+
+struct Coroutine;
+
+struct PromiseType {
+Coroutine get_return_object();
+SuspendNever initial_suspend();
+SuspendNever final_suspend();
+#if __cpp_exceptions
+void unhandled_exception() { /*std::terminate();*/ };
+#endif
+void return_void();
+};
+
+struct Coroutine {
+using promise_type = PromiseType;
+};
+
+Coroutine __async_test_input_basic() {
+co_return;
+}

Re: [patch] libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn

2024-08-26 Thread Sandra Loosemore


On 8/24/24 09:13, Tobias Burnus wrote:


+@node Foreign-runtime support for AMD GPUs
+@subsection OpenMP @code{interop} -- Foreign-Runtime Support for AMD GPUs


Em-dash in Texinfo is three dashes with no surrounding spaces.  I 
believe that the libgomp manual uses the incorrect markup you have here 
in its usual template for documenting specific runtime routines (and I'm 
not telling you to change that everywhere), but that's not how you're 
using it here.



+An interoperability object of OpenMP @code{interop} type can be obtained using
+the @code{interop} directive; supported as foreign runtimes are HIP
+(C++ Heterogeneous-Compute Interface for Portability) and HSA (Heterogeneous
+System Architecture).  If no @code{prefer_type} argument has been specified,
+HIP is used.
+
+The following properties can then be extracted using the @ref{Interoperability
+Routines}.  Each listed property name has an associated named constant,


It would be much better to rewrite this in the active voice, phrasing as 
"you can do X" instead of "X can be done", and "GCC supports the foreign 
runtimes ..." instead of the awkward "supported as foreign runtimes are 
...".


Also, "if no @code{prefer_type} argument has been specified" 
argument to what?  Looking it up in the spec, it appears that is 
actually a modifier to the @code{init} clause of the @code{interop} 
directive?  I think this needs to be said explicitly.



+consisting of @code{omp_ipr_} followed by the property name.  The following


s/@code/@samp/ (as already noted in my comments on your previous patch)


+table uses ``@emph{int}'', ``@emph{str}'' and ``@emph{ptr}'' to denote the
+routine to be used to obtain the property value.


Ugh, we should never be using literal quotes for markup purposes.  :-( 
I'd say just use @samp instead of @emph here.



+Available properties for an HIP interop object:
+@multitable @columnfractions .30 .30 .30
+@headitem Property  @tab data type@tab 
value (if constant)
+@item @code{fr_id}  @tab @samp{omp_interop_fr_t} @emph{(int)} @tab 
@samp{omp_fr_hip}
+@item @code{fr_name}@tab @samp{const char *} @emph{(str)} @tab 
@samp{hip}
+@item @code{vendor} @tab @samp{int}  @emph{(int)} @tab 
@samp{1}
+@item @code{vendor_name}@tab @samp{const char *} @emph{(str)} @tab 
@samp{amd}
+@item @code{device_num} @tab @samp{int}  @emph{(int)} @tab
+@item @code{platform}   @tab N/A  @tab
+@item @code{device} @tab @samp{hipDevice_t}  @emph{(int)} @tab
+@item @code{device_context} @tab @samp{hipCtx_t} @emph{(ptr)} @tab
+@item @code{targetsync} @tab @samp{hipStream_t}  @emph{(ptr)} @tab
+@end multitable


Ugh, again.  :-(  You have 4 columns in the table that you are trying to 
format into 3, and I think you really want to have 5 columns (property 
name, constant name, data type, accessor function, and value).  And I 
think you are going to overflow the right margin for PDF output of the 
manual if you try making the table that wide.  Can you instead format it 
as a vertical table for each property, like


@table @code

@item fr_id
@table @asis
@item Named constant:
@code{omp_ipr_fr_id}
@item Data type:
@code{omp_interop_fr_t}
@item Accessor:
@code{omp_get_interop_int}
@item Property value:
@code{omp_fr_hip}
@end table

...
@end table

I don't think the @samp and @emph markup you have is helpful, just use 
@code for all the fields.


Similarly for all the other tables in your patch.

Also, here:


+@node Foreign-runtime support for Nvidia GPUs
+@subsection OpenMP @code{interop} -- Foreign-Runtime Support for Nvidia GPUs
+
+An interoperability object of OpenMP @code{interop} type can be obtained using
+the @code{interop} directive; supported as foreign runtimes are the CUDA
+runtime API, the CUDA driver API, and the C++ Heterogeneous-Compute Interface
+for Portability (HIP), which is -- for CUDA-based systems -- a very thin layer
+on top of the CUDA APIs.  If no @code{prefer_type} argument has been specified,
+the CUDA runtime API is used.


You have more problems with incorrect em-dash markup (the ones in the 
body of the paragraph are certainly incorrect Texinfo usage even if you 
want to argue about the one in the @subsection name), use of the passive 
voice, and @code{prefer_type} confusion, as previously noted.


-Sandra

Re: [patch,avr] Overhaul avr-ifelse RTL optimization pass

2024-08-26 Thread Denis Chertykov

вс, 25 авг. 2024 г. в 20:15, Denis Chertykov :
>
>
>
> вс, 25 авг. 2024 г. в 17:55, Jeff Law :
>>
>>
>>
>> On 8/23/24 6:20 AM, Richard Biener wrote:
>> > On Fri, Aug 23, 2024 at 2:16 PM Georg-Johann Lay  wrote:
>> >>
>> >> This patch overhauls the avr-ifelse mini-pass that optimizes
>> >> two cbranch insns to one comparison and two branches.
>> >>
>> >> More optimization opportunities are realized, and the code
>> >> has been refactored.
>> >>
>> >> No new regressions.  Ok for trunk?
>> >>
>> >> There is currently no avr maintainer, so some global reviewer
>> >> might please have a look at this.
>> >
>> > I see Denis still listed?  Possibly Jeff can have a look though.
>> I think Denis is inactive at this point.  I don't really have any
>> significant interest in avr, nor do I actually know the architecture.
>> So I'm mostly just looking for high level issues rather than diving into
>> really thinking about the codegen impact.
>>
>> IIRC I've asked Georg-Johann if he'd like to take maintainership of the
>> avr port, but he declined.  So we're a bit stuck.
>
>
> Yes, I was inactive but I'm here.
> I'm interested in converting the port to LRC.
> Starting to review the patch...

I think that we can go forward.
Johann, please apply the patch.

>
> Denis
>>
>>
>>
>> Jeff

Re: [patch,avr] Overhaul avr-ifelse RTL optimization pass

2024-08-26 Thread Georg-Johann Lay


Am 25.08.24 um 18:15 schrieb Denis Chertykov:

Starting to review the patch...

Denis


Great to see you back!

Prior to commenting on the attached new versions of
the overhaul, let me answer Jeff's questions from the
other mail:


On 8/23/24 6:16 AM, Georg-Johann Lay wrote:

This patch overhauls the avr-ifelse mini-pass that optimizes
two cbranch insns to one comparison and two branches.

More optimization opportunities are realized, and the code
has been refactored.

No new regressions.  Ok for trunk?

[...]

AVR: Overhaul the avr-ifelse RTL optimization pass.

Mini-pass avr-ifelse realizes optimizations that replace two cbranch
insns with one comparison and two branches.  This patch adds the
following improvements:

- The right operand of the comparisons may also be REGs.
   Formerly only CONST_INT was handled.

- The code of the first comparison in no more restricted
   to (effectively) EQ.

- When the second cbranch is located in the fallthrough path
   of the first cbranch, then difficult (expensive) comparisons
   can always be avoided.  This may require to swap the branch
   targets.  (When the second cbranch if located after the target
   label of the first one, then getting rid of difficult branches
   would require to reorder blocks.)

- The code has been cleaned up:  avr_rest_of_handle_ifelse() now
   just scans the insn stream for optimization candidates.  The code
   that actually performs the transformation has been outsourced to
   the new function avr_optimize_2ifelse().

- The code to find a better representation for reg-const_int comparisons
   has been split into two parts:  First try to find codes such that the
   right-hand sides of the comparisons are the same (avr_2comparisons_rhs).
   When this succeeds then one comparison can serve two branches, and
   avr_redundant_compare() tries to get rid of difficult branches that
   may have been introduced by avr_2comparisons_rhs().  This is always
   possible when the second cbranch is located in the fallthrough path
   of the first one, or when the first code is EQ.

Some final notes on why we don't use compare-elim:  1) The two cbranch
insns may come with different scratch operands depending on the chosen
constraint alternatives.  There are cases where the outgoing comparison
requires a scratch but only one incoming cbranch has one.  2) Avoiding
difficult branches can be achieved by rewiring basic blocks.
compare-elim doesn't do that; it doesn't even know the costs of the
branch codes.  3)  avr_2comparisons_rhs() may de-canonicalize a
comparison to achieve its goal.  compare-elim doesn't know how to do
that.  4) There are more reasons, see for example the commit
message and discussion for PR115830.

gcc/
 * config/avr/avr.cc (cfganal.h): Include it.
 (avr_2comparisons_rhs, avr_redundant_compare_regs)
 (avr_strict_signed_p, avr_strict_unsigned_p): New static functions.
 (avr_redundant_compare): Overhaul: Allow more cases.
 (avr_optimize_2ifelse): New static function, outsourced from...
 (avr_rest_of_handle_ifelse): ...this method.
gcc/testsuite/
 * gcc.target/avr/torture/ifelse-c.h: New file.
 * gcc.target/avr/torture/ifelse-d.h: New file.
 * gcc.target/avr/torture/ifelse-q.h: New file.
 * gcc.target/avr/torture/ifelse-r.h: New file.
 * gcc.target/avr/torture/ifelse-c-i8.c: New test.
 * gcc.target/avr/torture/ifelse-d-i8.c: New test.
 * gcc.target/avr/torture/ifelse-q-i8.c: New test.
 * gcc.target/avr/torture/ifelse-r-i8.c: New test.
 * gcc.target/avr/torture/ifelse-c-i16.c: New test.
 * gcc.target/avr/torture/ifelse-d-i16.c: New test.
 * gcc.target/avr/torture/ifelse-q-i16.c: New test.
 * gcc.target/avr/torture/ifelse-r-i16.c: New test.
 * gcc.target/avr/torture/ifelse-c-u16.c: New test.
 * gcc.target/avr/torture/ifelse-d-u16.c: New test.
 * gcc.target/avr/torture/ifelse-q-u16.c: New test.
 * gcc.target/avr/torture/ifelse-r-u16.c: New test.

ifelse-tweak.diff

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index c520b98a178..90606b73114 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
[...]
+static rtx
+avr_2comparisons_rhs (rtx_code &cond1, rtx xval1,
+  rtx_code &cond2, rtx xval2, machine_mode mode)
+{
+  HOST_WIDE_INT val1 = INTVAL (xval1);
+  HOST_WIDE_INT val2 = INTVAL (xval2);
+
+  if (val1 == val2)
+return xval1;
+
+  if (! IN_RANGE (val1 - val2, -2, 2))
+return NULL_RTX;
+
+  // First, ten exceptional cases that occur near the unsigned boundaries.
+  // All outgoing codes will have at least one EQ or NE.
+  // Similar cases will occur near the signed boundaries, but they are
+  // less common (and even more tedious).
+
+  if (cond1 == EQ && cond2 == EQ)
+{
+  if (val1 == 1 && val2 == 0)
+{
+  cond2 = LTU;
+  return xval1;
+}
+  else if (val1 == 0 && val2 == 1)
+{
+  cond1 = LTU;
+  return xval2;
+}
+  else if (val1 == -2 && val2 == -1)
+{
+

Re: [PATCH] c++: local class memfn synth from noexcept context [PR113063]

2024-08-26 Thread Jason Merrill


On 8/20/24 11:52 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk only?


OK.


-- >8 --

Extending the PR113063 testcase to additionally constant evaluate the <=>
expression causes us to trip over the assert in cxx_eval_call_expression

   /* We used to shortcut trivial constructor/op= here, but nowadays
  we can only get a trivial function here with -fno-elide-constructors.  */
   gcc_checking_assert (!trivial_fn_p (fun)
|| !flag_elide_constructors
/* We don't elide constructors when processing
   a noexcept-expression.  */
|| cp_noexcept_operand);

since the local class's <=> was first used and therefore synthesized in
a noexcept context and so its definition contains unelided trivial
constructors.

This patch fixes this by clearing cp_noexcept_operand alongside
cp_unevaluated_context in the local class case.

PR c++/113063

gcc/cp/ChangeLog:

* name-lookup.cc (local_state_t): Clear and restore
cp_noexcept_operand as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-synth16.C: Also constant evaluate the
<=> expression.
* g++.dg/cpp2a/spaceship-synth16a.C: Likewise.
---
  gcc/cp/name-lookup.cc   | 4 
  gcc/testsuite/g++.dg/cpp2a/spaceship-synth16.C  | 1 +
  gcc/testsuite/g++.dg/cpp2a/spaceship-synth16a.C | 1 +
  3 files changed, 6 insertions(+)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 70ad4cbf3b5..6fb664b0082 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -8775,6 +8775,7 @@ struct local_state_t
  {
int cp_unevaluated_operand;
int c_inhibit_evaluation_warnings;
+  int saved_cp_noexcept_operand;
  
static local_state_t

save_and_clear ()
@@ -8784,6 +8785,8 @@ struct local_state_t
  ::cp_unevaluated_operand = 0;
  s.c_inhibit_evaluation_warnings = ::c_inhibit_evaluation_warnings;
  ::c_inhibit_evaluation_warnings = 0;
+s.saved_cp_noexcept_operand = cp_noexcept_operand;
+cp_noexcept_operand = 0;
  return s;
}
  
@@ -8792,6 +8795,7 @@ struct local_state_t

{
  ::cp_unevaluated_operand = this->cp_unevaluated_operand;
  ::c_inhibit_evaluation_warnings = this->c_inhibit_evaluation_warnings;
+cp_noexcept_operand = this->saved_cp_noexcept_operand;
}
  };
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16.C b/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16.C

index 37a183de0f5..7dbe7e1db75 100644
--- a/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16.C
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16.C
@@ -10,4 +10,5 @@ int main() {
X x;
static_assert(noexcept(x <=> x));
x <=> x;
+  constexpr auto r = x <=> x;
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16a.C 
b/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16a.C
index 68388a680b2..bc0e7a54b7e 100644
--- a/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16a.C
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-synth16a.C
@@ -13,4 +13,5 @@ int main() {
X x;
static_assert(noexcept(x <=> x));
x <=> x;
+  constexpr auto r = X{} <=> X{};
  }

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Qing Zhao

Hi, Martin,

Looks like that there is some issue when I tried to use the _Generic for the 
testing cases, and then I narrowed down to a
small testing case that shows the problem without any change to GCC.

[opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
struct annotated {
  char b;
  int c[];
} *array_annotated;  
extern void * counted_by_ref (int *);

int main(int argc, char *argv[])
{
  typeof(counted_by_ref (array_annotated->c)) ret
= counted_by_ref (array_annotated->c); 
   _Generic (ret, void* : (void)0, default: *ret = 10);

  return 0;
}
[opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
t1.c: In function ‘main’:
t1.c:12:44: warning: dereferencing ‘void *’ pointer
   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
  |^~~~
t1.c:12:49: error: invalid use of void expression
   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
  | ^

Actually, I debugged this issue into gcc’s C routine 
“c_parser_generic_selection”.
And found that, the “default” branch of the _Generic is always parsed even 
though there is already
a match in the previous conditions. Therefore, *ret = 10 is parsed even when 
ret is a void *, therefore the compilation error.

So, I am not sure whether this is the correct behavior of the operator 
_Generic? 
Or is there any obvious error in the above small testing case?
If So, then looks like that we cannot use the _Generic operator for this 
purpose.

Any comments on this?

Thanks a lot for your help.

Qing

> On Aug 21, 2024, at 11:43, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao:
>>> 
>>> But if we changed it to return a void pointer,  we could make this
>>> a compile-time check:
>>> 
>>> auto ret = __builtin_get_counted_by(__p->FAM);
>>> 
>>> _Generic(ret, void*: (void)0, default: *ret = COUNT);
>> 
>> Is there any benefit to return a void pointer than a SIZE_T pointer for
>> the NULL pointer?
> 
> Yes! You can test with _Generic (or __builtin_types_compatible_p)
> at compile-time based on the type whether you can set *ret to COUNT
> or not as in the example above.
> 
> So it is not a weird run-time test which needs to be optimized
> away.
>

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Bill Wendling

On Wed, Aug 21, 2024 at 8:43 AM Martin Uecker  wrote:
>
> Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao:
> > >
> > > But if we changed it to return a void pointer,  we could make this
> > > a compile-time check:
> > >
> > > auto ret = __builtin_get_counted_by(__p->FAM);
> > >
> > > _Generic(ret, void*: (void)0, default: *ret = COUNT);
> >
> > Is there any benefit to return a void pointer than a SIZE_T pointer for
> > the NULL pointer?
>
> Yes! You can test with _Generic (or __builtin_types_compatible_p)
> at compile-time based on the type whether you can set *ret to COUNT
> or not as in the example above.
>
> So it is not a weird run-time test which needs to be optimized
> away.
>
Using a '_Generic' moves so much of the work onto the programmer that
it would be far easier, and cleaner, for them simply to specify the
'counter' field in the macro and be done with it. Something like:

  #define alloc(PTR, COUNT, FAM, COUNTER)

If the FAM doesn't have a 'counted_by' field:

  #define alloc(PTR, COUNT, FAM)

(It would use VAR_ARGS of course). Why not simply have the compiler
automatically adjust the return type? It's perfectly capable of Doing
the Right Thing(tm). Otherwise, this builtin becomes even less
desirable to use than it currently is.

> > > > Yes, I do feel that the approach __builtin_get_counted_by is not very 
> > > > good.
> > > > Maybe it’s better to provide
> > > > A. __builtin_set_counted_by
> > > > or
> > > > B. The unary operator __counted_by(PTR) to return a Lvalue, in this 
> > > > case,
> > > > we need a __builtin_has_attribute first to check whether PTR has the
> > > > counted_by attribute first.
> > >
> > > You could potentially do the same __counted_by and test for type void.
> > >
> > > _Generic(typeof(__counted_by(PTR)), void: (void)0, __counted_by(PTR) = 
> > > COUNT);
> >
> > Oh, so, is there any benefit for the unary operator __counted_by(PTR) than
> > the current __builtin_get_counted_by?
>
> I don't know. You suggested it ;-)
>
> It probably makes it harder to test the type because you need the
> typeof / C2Y Generic combination, but maybe there are other ways
> to test.
>
>
> Martin

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Bill Wendling

On Thu, Aug 22, 2024 at 8:03 AM Qing Zhao  wrote:
> > On Aug 21, 2024, at 18:08, Bill Wendling  wrote:
> >> For the unary operator __counted_by(PTR),   “PTR” must have a counted_by 
> >> attribute, if not, there will be a compilation time error.
> >>
> >> Then the user could write the following code:
> >>
> >>   If __builtin_has_attriubtes (PTR,counted_by)
> >>   __counted_by(PTR) = COUNT;
> >>
> >>
> >> From the design point of view, I think this might be the cleanest solution.
> >>
> >> However, currently, CLANG doesn’t have __builtin_has_attributes.  In order 
> >> to provide a consistent interface, __builtin_get_counted_by(PTR) is fine.
> >>
> > This was the confusion I had during our meeting today. For the above
> > to be a compilation time error, we would have to diagnose it after
> > DCE, which is okay, but seems like we're opening ourselves up to
> > future issues when DCE misses. Maybe not the biggest concern, but...
>
> Does the DCE above mean "dead code elimination”?

Yes.

> If so, I am a little confused: CLANG has dead code elimination pass in the FE?

Not that I'm aware of.

> Could you explain a little bit here in details to clarify the issues? A small 
> example will be helpful.

Clang's front-end goes through a few phases, of course: parsing, sema,
LLVM IR code generation. I'm implementing our version partly in Sema
and the rest in LLVM IR generation. During Sema, I check the
'counter's type and adjust the builtin to return a pointer to that
type. Future checks determine that the types are compatible. Then IR
generation converts the builtins into accesses to the counter. My
worry about DCE isn't super high on my list of things to worry about,
as it should eliminate a 'if (0) ...' pretty easily, but I don't like
relying on future passes to clean up bad code. It's probably me just
being too paranoid though, so we don't need to discuss it further.

> (In the current GCC’s implementation, I implement this feature completely in 
> C parser).
>

-bw

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Martin Uecker

Am Montag, dem 26.08.2024 um 19:30 + schrieb Qing Zhao:
> Hi, Martin,
> 
> Looks like that there is some issue when I tried to use the _Generic for the 
> testing cases, and then I narrowed down to a
> small testing case that shows the problem without any change to GCC.
> 
> [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> struct annotated {
>   char b;
>   int c[];
> } *array_annotated;  
> extern void * counted_by_ref (int *);
> 
> int main(int argc, char *argv[])
> {
>   typeof(counted_by_ref (array_annotated->c)) ret
> = counted_by_ref (array_annotated->c); 
>_Generic (ret, void* : (void)0, default: *ret = 10);
> 
>   return 0;
> }
> [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> t1.c: In function ‘main’:
> t1.c:12:44: warning: dereferencing ‘void *’ pointer
>12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>   |^~~~
> t1.c:12:49: error: invalid use of void expression
>12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>   | ^
> 
> Actually, I debugged this issue into gcc’s C routine 
> “c_parser_generic_selection”.
> And found that, the “default” branch of the _Generic is always parsed even 
> though there is already
> a match in the previous conditions. Therefore, *ret = 10 is parsed even when 
> ret is a void *, therefore the compilation error.
> 
> So, I am not sure whether this is the correct behavior of the operator 
> _Generic? 
> Or is there any obvious error in the above small testing case?
> If So, then looks like that we cannot use the _Generic operator for this 
> purpose.
> 
> Any comments on this?
> 

Ah, right.  This is indeed the correct behavior for _Generic, 
and I have overlooked this.  One could work around it like this:

 __auto_type ret = counted_by_ref (array_annotated->c); 
 *_Generic (ret, void*: &(int){ }, default: ret) = 10;

or, if one expects only specific types:

 __auto_type ret = counted_by_ref (array_annotated->c); 
 _Generic (ret, void*: 0, int*: *(int*)ret = 10,
size_t*: *(size_t*)ret = 10);

But yes, a bit less elegant.

Martin


> Thanks a lot for your help.
> 
> Qing
> 
> 
> > On Aug 21, 2024, at 11:43, Martin Uecker  wrote:
> > 
> > Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao:
> > > > 
> > > > But if we changed it to return a void pointer,  we could make this
> > > > a compile-time check:
> > > > 
> > > > auto ret = __builtin_get_counted_by(__p->FAM);
> > > > 
> > > > _Generic(ret, void*: (void)0, default: *ret = COUNT);
> > > 
> > > Is there any benefit to return a void pointer than a SIZE_T pointer for
> > > the NULL pointer?
> > 
> > Yes! You can test with _Generic (or __builtin_types_compatible_p)
> > at compile-time based on the type whether you can set *ret to COUNT
> > or not as in the example above.
> > 
> > So it is not a weird run-time test which needs to be optimized
> > away.
> > 
>

Re: LRA: Fix setup_sp_offset

2024-08-26 Thread Paul Koning



> On Aug 26, 2024, at 10:40 AM, Michael Matz  wrote:
> 
> Hello,
> 
> On Mon, 26 Aug 2024, Paul Koning wrote:
> 
>  550: [--sp] = 0 sp_off = 0  {pushexthisi_const}
>  551: [--sp] = 37sp_off = -4 {pushexthisi_const}
>  552: [--sp] = r37   sp_off = -8 {movsi_m68k2}
>  554: [--sp] = r116 - r37sp_off = -12 {subsi3}
>  556: call   sp_off = -16
> 
> insn 554 doesn't match its constraints and needs some reloads:
 
 I think you're right in that the current code isn't correct, but the 
 natural question is how in the world has this worked to-date.  Though I 
 guess targets which push arguments are a dying breed (though I would 
 have expected i386 to have tripped over this at some point).
>>> 
>>> Yeah, I wondered as well.  For things to go wrong some instructions that 
>>> contain pre/post-inc/dec of the stack pointer need to have reloads in such 
>>> a way that the actual SP-change sideeffect moves to a different 
>>> instruction.  
>> 
>> I think I've seen that in the past on PDP11, and reported it, but I 
>> thought that particular issue was fixed not too long after.
> 
> Do you have a reference handy?  I'd like to take a look, if for nothing 
> else than curiosity ;-)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87944 
 which says it was fixed in 
GCC 14 on 5/30/2023.

paul

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Kees Cook

On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
> Hi, Martin,
> 
> Looks like that there is some issue when I tried to use the _Generic for the 
> testing cases, and then I narrowed down to a
> small testing case that shows the problem without any change to GCC.
> 
> [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> struct annotated {
>   char b;
>   int c[];
> } *array_annotated;  
> extern void * counted_by_ref (int *);
> 
> int main(int argc, char *argv[])
> {
>   typeof(counted_by_ref (array_annotated->c)) ret
> = counted_by_ref (array_annotated->c); 
>_Generic (ret, void* : (void)0, default: *ret = 10);
> 
>   return 0;
> }
> [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> t1.c: In function ‘main’:
> t1.c:12:44: warning: dereferencing ‘void *’ pointer
>12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>   |^~~~
> t1.c:12:49: error: invalid use of void expression
>12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>   | ^

I implemented it like this[1] in the Linux kernel. So yours could be:

struct annotated {
  char b;
  int c[] __attribute__((counted_by(b));
};
extern struct annotated *array_annotated;

int main(int argc, char *argv[])
{
  typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
   void *: (size_t *)NULL,
   default: __builtin_get_counted_by(array_annotated->c)))
ret = __builtin_get_counted_by(array_annotated->c);
  if (ret)
*ret = 10;

  return 0;
}

It's a bit cumbersome, but it does what's needed.

This is, however, just doing exactly what Bill has suggested: it is
converting the (void *)NULL into (size_t *)NULL when there is no
counted_by annotation...

-Kees

[1] 
https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/

-- 
Kees Cook

[PATCH] Add new warning Wmissing-designated-initializers [PR39589]

2024-08-26 Thread Peter Frost

Currently the behaviour of Wmissing-field-initializers is inconsistent
between C and C++. The C warning assumes that missing designated 
initializers are deliberate, and does not warn. The C++ warning does warn
for missing designated initializers.

This patch changes the behaviour of Wmissing-field-initializers to
universally not warn about missing designated initializers, and adds a new
warning for specifically for missing designated initializers.

NOTE TO MAINTAINERS: This is my first gcc contribution, so I don't have
git write access.

Successfully tested on x86_64-pc-linux-gnu.

PR c/39589

gcc/c-family/ChangeLog:

* c.opt:

gcc/c/ChangeLog:

* c-typeck.cc (pop_init_level):

gcc/cp/ChangeLog:

* typeck2.cc (process_init_constructor_record):

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/base.C:
* gcc.dg/20011021-1.c:
* gcc.dg/missing-field-init-1.c:
* gcc.dg/pr60784.c:
* g++.dg/warn/missing-designated-initializers-1.C: New test.
* g++.dg/warn/missing-designated-initializers-2.C: New test.
* gcc.dg/missing-designated-initializers-1.c: New test.
* gcc.dg/missing-designated-initializers-2.c: New test.


---
 gcc/c-family/c.opt|  4 +++
 gcc/c/c-typeck.cc | 36 +++
 gcc/cp/typeck2.cc | 20 ---
 gcc/testsuite/g++.dg/diagnostic/base.C|  4 +--
 .../warn/missing-designated-initializers-1.C  | 11 ++
 .../warn/missing-designated-initializers-2.C  | 11 ++
 gcc/testsuite/gcc.dg/20011021-1.c |  4 +--
 .../missing-designated-initializers-1.c   | 13 +++
 .../missing-designated-initializers-2.c   | 13 +++
 gcc/testsuite/gcc.dg/missing-field-init-1.c   |  2 +-
 gcc/testsuite/gcc.dg/pr60784.c|  2 +-
 11 files changed, 96 insertions(+), 24 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/warn/missing-designated-initializers-1.C
 create mode 100644 
gcc/testsuite/g++.dg/warn/missing-designated-initializers-2.C
 create mode 100644 gcc/testsuite/gcc.dg/missing-designated-initializers-1.c
 create mode 100644 gcc/testsuite/gcc.dg/missing-designated-initializers-2.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 491aa02e1a3..81e52f1417e 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -977,6 +977,10 @@ Wmissing-field-initializers
 C ObjC C++ ObjC++ Var(warn_missing_field_initializers) Warning 
EnabledBy(Wextra)
 Warn about missing fields in struct initializers.
 
+Wmissing-designated-initializers
+C ObjC C++ ObjC++ Var(warn_missing_designated_initializers) Warning 
EnabledBy(Wextra)
+Warn about missing designated initialisers in struct initializers.
+
 Wmissing-format-attribute
 C ObjC C++ ObjC++ Warning Alias(Wsuggest-attribute=format)
 ;
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 094e41fa202..72b544e8f67 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -9795,7 +9795,7 @@ pop_init_level (location_t loc, int implicit,
 }
 
   /* Warn when some struct elements are implicitly initialized to zero.  */
-  if (warn_missing_field_initializers
+  if ((warn_missing_field_initializers || warn_missing_designated_initializers)
   && constructor_type
   && TREE_CODE (constructor_type) == RECORD_TYPE
   && constructor_unfilled_fields)
@@ -9806,21 +9806,29 @@ pop_init_level (location_t loc, int implicit,
   || integer_zerop (DECL_SIZE (constructor_unfilled_fields
  constructor_unfilled_fields = DECL_CHAIN 
(constructor_unfilled_fields);
 
-   if (constructor_unfilled_fields
-   /* Do not warn if this level of the initializer uses member
-  designators; it is likely to be deliberate.  */
-   && !constructor_designated
-   /* Do not warn about initializing with { 0 } or with { }.  */
-   && !constructor_zeroinit)
- {
-   if (warning_at (input_location, OPT_Wmissing_field_initializers,
+ if (constructor_unfilled_fields
+ /* Do not warn about initializing with { 0 } or with { }.  */
+ && !constructor_zeroinit)
+   {
+   if (!constructor_designated)
+ {
+   if (warning_at (input_location, 
OPT_Wmissing_field_initializers,
"missing initializer for field %qD of %qT",
-   constructor_unfilled_fields,
-   constructor_type))
- inform (DECL_SOURCE_LOCATION (constructor_unfilled_fields),
+   constructor_unfilled_fields, constructor_type))
+ inform (DECL_SOURCE_LOCATION 
(constructor_unfilled_fields),
+ "%qD declared here", constructor_unfilled_fields);
+ }
+   else if (warn_missing_designated_initializers)
+ {
+   if (warning_at (
+

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Martin Uecker

Am Montag, dem 26.08.2024 um 13:30 -0700 schrieb Kees Cook:
> On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
> > Hi, Martin,
> > 
> > Looks like that there is some issue when I tried to use the _Generic for 
> > the testing cases, and then I narrowed down to a
> > small testing case that shows the problem without any change to GCC.
> > 
> > [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> > struct annotated {
> >   char b;
> >   int c[];
> > } *array_annotated;  
> > extern void * counted_by_ref (int *);
> > 
> > int main(int argc, char *argv[])
> > {
> >   typeof(counted_by_ref (array_annotated->c)) ret
> > = counted_by_ref (array_annotated->c); 
> >_Generic (ret, void* : (void)0, default: *ret = 10);
> > 
> >   return 0;
> > }
> > [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> > t1.c: In function ‘main’:
> > t1.c:12:44: warning: dereferencing ‘void *’ pointer
> >12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> >   |^~~~
> > t1.c:12:49: error: invalid use of void expression
> >12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> >   | ^
> 
> I implemented it like this[1] in the Linux kernel. So yours could be:
> 
> struct annotated {
>   char b;
>   int c[] __attribute__((counted_by(b));
> };
> extern struct annotated *array_annotated;
> 
> int main(int argc, char *argv[])
> {
>   typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
>  void *: (size_t *)NULL,
>  default: __builtin_get_counted_by(array_annotated->c)))
>   ret = __builtin_get_counted_by(array_annotated->c);
>   if (ret)
>   *ret = 10;
> 
>   return 0;
> }
> 
> It's a bit cumbersome, but it does what's needed.
> 
> This is, however, just doing exactly what Bill has suggested: it is
> converting the (void *)NULL into (size_t *)NULL when there is no
> counted_by annotation...
> 
> -Kees
> 
> [1] 
> https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/

Interesting. Will __builtin_get_counted_by(array_annotated->c) give
a null pointer (or an invalid pointer) of the correct type if 
array_annotated is a null pointer of an annotated struct type?

I also wonder a bit about the multiple macro evaluations of the arguments
for P and SIZE.

Martin


>

[PATCH 1/5] Handle namespaced names for CodeView

2024-08-26 Thread Mark Harmstone

Run all CodeView names through a new function get_name, which chains
together a DIE's DW_AT_name with that of its parent to create a
C++-style name.

gcc/
* dwarf2codeview.cc (get_name): New function.
(add_enum_forward_def): Call get_name.
(get_type_num_enumeration_type): Call get_name.
(add_struct_forward_def): Call get_name.
(get_type_num_struct): Call get_name.
(add_variable): Call get_name.
(add function): Call get_name.
* dwarf2out.cc (get_die_parent): Rename to dw_get_die_parent and make
non-static.
(generate_type_signature): Handle renamed get_die_parent.
* dwarf2out.h (dw_get_die_parent): Add declaration.
---
 gcc/dwarf2codeview.cc | 92 +--
 gcc/dwarf2out.cc  |  6 +--
 gcc/dwarf2out.h   |  1 +
 3 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index e4c67f921cd..b71c592b70c 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -4511,6 +4511,79 @@ get_type_num_volatile_type (dw_die_ref type, bool 
in_struct)
   return ct->num;
 }
 
+/* Return the name of a DIE, traversing its parents in order to construct a
+   C++-style name if necessary.  */
+static char *
+get_name (dw_die_ref die)
+{
+  dw_die_ref decl = get_AT_ref (die, DW_AT_specification);
+  dw_die_ref parent;
+  const char *name;
+  char *str;
+  size_t len;
+
+  static const char anon[] = "";
+  static const char sep[] = "::";
+
+  if (decl)
+die = decl;
+
+  name = get_AT_string (die, DW_AT_name);
+
+  if (!name)
+return NULL;
+
+  parent = dw_get_die_parent (die);
+
+  if (!parent || dw_get_die_tag (parent) == DW_TAG_compile_unit)
+return xstrdup (name);
+
+  len = strlen (name);
+  while (parent && dw_get_die_tag (parent) != DW_TAG_compile_unit)
+{
+  const char *ns_name = get_AT_string (parent, DW_AT_name);
+
+  len += sizeof (sep) - 1;
+
+  if (ns_name)
+   len += strlen (ns_name);
+  else
+   len += sizeof (anon) - 1;
+
+  parent = dw_get_die_parent (parent);
+}
+
+  str = (char *) xmalloc (len + 1);
+  str[len] = 0;
+
+  len -= strlen (name);
+  memcpy (str + len, name, strlen (name));
+
+  parent = dw_get_die_parent (die);
+  while (parent && dw_get_die_tag (parent) != DW_TAG_compile_unit)
+{
+  const char *ns_name = get_AT_string (parent, DW_AT_name);
+
+  len -= sizeof (sep) - 1;
+  memcpy (str + len, sep, sizeof (sep) - 1);
+
+  if (ns_name)
+   {
+ len -= strlen (ns_name);
+ memcpy (str + len, ns_name, strlen (ns_name));
+   }
+  else
+   {
+ len -= sizeof (anon) - 1;
+ memcpy (str + len, anon, sizeof (anon) - 1);
+   }
+
+  parent = dw_get_die_parent (parent);
+}
+
+  return str;
+}
+
 /* Add a forward declaration for an enum.  This is legal from C++11 onwards.  
*/
 
 static uint32_t
@@ -4528,7 +4601,7 @@ add_enum_forward_def (dw_die_ref type)
   ct->lf_enum.underlying_type = get_type_num (get_AT_ref (type, DW_AT_type),
  false, false);
   ct->lf_enum.fieldlist = 0;
-  ct->lf_enum.name = xstrdup (get_AT_string (type, DW_AT_name));
+  ct->lf_enum.name = get_name (type);
 
   add_custom_type (ct);
 
@@ -4688,7 +4761,7 @@ get_type_num_enumeration_type (dw_die_ref type, bool 
in_struct)
   ct->lf_enum.underlying_type = get_type_num (get_AT_ref (type, DW_AT_type),
  in_struct, false);
   ct->lf_enum.fieldlist = last_type;
-  ct->lf_enum.name = xstrdup (get_AT_string (type, DW_AT_name));
+  ct->lf_enum.name = get_name (type);
 
   add_custom_type (ct);
 
@@ -4775,7 +4848,7 @@ add_struct_forward_def (dw_die_ref type)
   ct->lf_structure.vshape = 0;
   ct->lf_structure.length.neg = false;
   ct->lf_structure.length.num = 0;
-  ct->lf_structure.name = xstrdup (get_AT_string (type, DW_AT_name));
+  ct->lf_structure.name = get_name (type);
 
   add_custom_type (ct);
 
@@ -4823,7 +4896,6 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   codeview_custom_type *ct;
   uint16_t num_members = 0;
   uint32_t last_type = 0;
-  const char *name;
 
   if ((in_struct && get_AT_string (type, DW_AT_name))
   || get_AT_flag (type, DW_AT_declaration))
@@ -5010,13 +5082,7 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   ct->lf_structure.vshape = 0;
   ct->lf_structure.length.neg = false;
   ct->lf_structure.length.num = get_AT_unsigned (type, DW_AT_byte_size);
-
-  name = get_AT_string (type, DW_AT_name);
-
-  if (name)
-ct->lf_structure.name = xstrdup (name);
-  else
-ct->lf_structure.name = NULL;
+  ct->lf_structure.name = get_name (type);
 
   add_custom_type (ct);
 
@@ -5384,7 +5450,7 @@ add_variable (dw_die_ref die)
   s->kind = get_AT (die, DW_AT_external) ? S_GDATA32 : S_LDATA32;
   s->data_symbol.type = get_type_num (get_AT_ref (die, DW_AT_type), false,

[PATCH 5/5] Write LF_MFUNC_ID types for CodeView struct member functions

2024-08-26 Thread Mark Harmstone

If recording the definition of a struct member function, write an
LF_MFUNC_ID type rather than an LF_FUNC_ID. This links directly to the
struct type, rather than to an LF_STRING_ID with its name.

gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add LF_MFUNC_ID.
(write_lf_mfunc_id): New function.
(add_lf_func_id): New function.
(add_lf_mfunc_id): New function.
(add_function): Call add_lf_func_id or add_lf_mfunc_id.
---
 gcc/dwarf2codeview.cc | 150 ++
 1 file changed, 137 insertions(+), 13 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 1987575985a..6142431655b 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -112,6 +112,7 @@ enum cv_leaf_type {
   LF_METHOD = 0x150f,
   LF_ONEMETHOD = 0x1511,
   LF_FUNC_ID = 0x1601,
+  LF_MFUNC_ID = 0x1602,
   LF_STRING_ID = 0x1605,
   LF_CHAR = 0x8000,
   LF_SHORT = 0x8001,
@@ -4293,6 +4294,56 @@ write_lf_func_id (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_MFUNC_ID type, which is the version of LF_FUNC_ID for struct
+   functions.  Instead of an LF_STRING_ID for the parent scope, we write the
+   type number of the parent struct.  */
+
+static void
+write_lf_mfunc_id (codeview_custom_type *t)
+{
+  size_t name_len;
+
+  /* This is lf_mfunc_id in binutils and lfMFuncId in Microsoft's cvinfo.h:
+
+struct lf_mfunc_id
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t parent_type;
+  uint32_t function_type;
+  char name[];
+} ATTRIBUTE_PACKED
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_mfunc_id.parent_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_mfunc_id.function_type);
+  putc ('\n', asm_out_file);
+
+  name_len = strlen (t->lf_mfunc_id.name) + 1;
+
+  ASM_OUTPUT_ASCII (asm_out_file, t->lf_mfunc_id.name, name_len);
+
+  write_cv_padding (4 - (name_len % 4));
+
+  free (t->lf_mfunc_id.name);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write an LF_STRING_ID type, which provides a deduplicated string that other
types can reference.  */
 
@@ -4523,6 +4574,10 @@ write_custom_types (void)
  write_lf_func_id (custom_types);
  break;
 
+   case LF_MFUNC_ID:
+ write_lf_mfunc_id (custom_types);
+ break;
+
case LF_STRING_ID:
  write_lf_string_id (custom_types);
  break;
@@ -6190,21 +6245,13 @@ get_scope_string_id (dw_die_ref die)
   return ret;
 }
 
-/* Process a DW_TAG_subprogram DIE, and add an S_GPROC32_ID or S_LPROC32_ID
-   symbol for this.  */
+/* Add an LF_FUNC_ID type and return its number (see write_lf_func_id).  */
 
-static void
-add_function (dw_die_ref die)
+static uint32_t
+add_lf_func_id (dw_die_ref die, const char *name)
 {
+  uint32_t function_type, scope_type;
   codeview_custom_type *ct;
-  const char *name = get_AT_string (die, DW_AT_name);
-  uint32_t function_type, func_id_type, scope_type;
-  codeview_symbol *s;
-
-  if (!name)
-return;
-
-  /* Add an LF_FUNC_ID type for this function.  */
 
   function_type = get_type_num_subroutine_type (die, false, 0, 0, 0);
   scope_type = get_scope_string_id (die);
@@ -6219,7 +6266,84 @@ add_function (dw_die_ref die)
 
   add_custom_type (ct);
 
-  func_id_type = ct->num;
+  return ct->num;
+}
+
+/* Add an LF_MFUNC_ID type and return its number (see write_lf_mfunc_id).  */
+
+static uint32_t
+add_lf_mfunc_id (dw_die_ref die, const char *name)
+{
+  uint32_t function_type = 0, parent_type;
+  codeview_custom_type *ct;
+  dw_die_ref spec = get_AT_ref (die, DW_AT_specification);
+
+  parent_type = get_type_num (dw_get_die_parent (spec), false, false);
+
+  if (types_htab)
+{
+  codeview_type **slot;
+
+  slot = types_htab->find_slot_with_hash (spec, htab_hash_pointer (spec),
+ NO_INSERT);
+
+  if (slot && *slot)
+   function_type = (*slot)->num;
+}
+
+  if (function_type == 0)
+{
+  function_type = get_type_num_subroutine_type (die, false, parent_type,
+   0, 0);
+}
+
+  ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
+
+  ct->next = NULL;
+  ct->kind = LF_MFUNC_ID;
+  ct->lf_mfunc_id.parent_type = parent_type;
+  ct->lf_mfunc_id.function_type = function_type;
+  ct->lf_mfunc_id.name = xstrdup (name);
+
+  add_custom_type (ct);
+
+  return ct->num;
+}
+
+/* Process a DW_TAG_subprogram DIE, and

[PATCH 2/5] Handle scoping in CodeView LF_FUNC_ID types

2024-08-26 Thread Mark Harmstone

If a function is in a namespace, create an LF_STRING_ID type for the
name of its parent, and record this in the LF_FUNC_ID type we create
for the function.

gcc/
* dwarf2codeview.cc (enum cf_leaf_type): Add LF_STRING_ID.
(struct codeview_custom_type): Add lf_string_id to union.
(struct string_id_hasher): New type.
(string_id_htab): New global variable.
(write_lf_string_id): New function.
(write_custom_types): Call write_lf_string_id.
(codeview_debug_finish): Free string_id_htab.
(add_string_id): New function.
(get_scope_string_id): New function.
(add_function): Call get_scope_string_id and set scope.
---
 gcc/dwarf2codeview.cc | 139 +-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index b71c592b70c..2535777d4cb 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -107,6 +107,7 @@ enum cv_leaf_type {
   LF_ENUM = 0x1507,
   LF_MEMBER = 0x150d,
   LF_FUNC_ID = 0x1601,
+  LF_STRING_ID = 0x1605,
   LF_CHAR = 0x8000,
   LF_SHORT = 0x8001,
   LF_USHORT = 0x8002,
@@ -1293,6 +1294,11 @@ struct codeview_custom_type
   uint32_t function_type;
   char *name;
 } lf_func_id;
+struct
+{
+  uint32_t substring;
+  char *string;
+} lf_string_id;
   };
 };
 
@@ -1302,6 +1308,21 @@ struct codeview_deferred_type
   dw_die_ref type;
 };
 
+struct string_id_hasher : nofree_ptr_hash 
+{
+  typedef const char *compare_type;
+
+  static hashval_t hash (const codeview_custom_type *x)
+  {
+return htab_hash_string (x->lf_string_id.string);
+  }
+
+  static bool equal (const codeview_custom_type *x, const char *y)
+  {
+return !strcmp (x->lf_string_id.string, y);
+  }
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -1317,6 +1338,7 @@ static codeview_symbol *sym, *last_sym;
 static hash_table *types_htab;
 static codeview_custom_type *custom_types, *last_custom_type;
 static codeview_deferred_type *deferred_types, *last_deferred_type;
+static hash_table *string_id_htab;
 
 static uint32_t get_type_num (dw_die_ref type, bool in_struct, bool 
no_fwd_ref);
 
@@ -4089,6 +4111,50 @@ write_lf_func_id (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_STRING_ID type, which provides a deduplicated string that other
+   types can reference.  */
+
+static void
+write_lf_string_id (codeview_custom_type *t)
+{
+  size_t string_len;
+
+  /* This is lf_string_id in binutils and lfStringId in Microsoft's cvinfo.h:
+
+struct lf_string_id
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t substring;
+  char string[];
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_string_id.substring);
+  putc ('\n', asm_out_file);
+
+  string_len = strlen (t->lf_string_id.string) + 1;
+
+  ASM_OUTPUT_ASCII (asm_out_file, t->lf_string_id.string, string_len);
+
+  write_cv_padding (4 - (string_len % 4));
+
+  free (t->lf_string_id.string);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -4152,6 +4218,10 @@ write_custom_types (void)
  write_lf_func_id (custom_types);
  break;
 
+   case LF_STRING_ID:
+ write_lf_string_id (custom_types);
+ break;
+
default:
  break;
}
@@ -4182,6 +4252,9 @@ codeview_debug_finish (void)
 
   if (types_htab)
 delete types_htab;
+
+  if (string_id_htab)
+delete string_id_htab;
 }
 
 /* Translate a DWARF base type (DW_TAG_base_type) into its CodeView
@@ -5461,6 +5534,67 @@ add_variable (dw_die_ref die)
   last_sym = s;
 }
 
+/* Return the type number of the LF_STRING_ID entry corresponding to the given
+   string, creating a new one if necessary.  */
+
+static uint32_t
+add_string_id (const char *s)
+{
+  codeview_custom_type **slot;
+  codeview_custom_type *ct;
+
+  if (!string_id_htab)
+string_id_htab = new hash_table (10);
+
+  slot = string_id_htab->find_slot_with_hash (s, htab_hash_string (s),
+ INSERT);
+  if (*slot)
+return (*slot)->num;
+
+  ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
+
+  ct->next = NULL;
+  ct->kind = LF_STRING_ID;
+  ct->lf_string_id.substring = 0;
+  ct->lf_string_id.string = xstrdup (s);
+
+  add_custom_type (ct);
+
+  *slot = ct;
+
+  return ct->num;
+}
+

[PATCH 3/5] Record static data members in CodeView structs

2024-08-26 Thread Mark Harmstone

Record LF_STMEMBER field list subtypes to represent static data members
in structs.

gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add LF_STMEMBER.
(struct codeview_subtype): Add lf_static_member to union.
(write_lf_fieldlist): Handle LF_STMEMBER.
(add_struct_member): New function.
(add_struct_static_member): New function.
(get_accessibility): New function.
(get_type_num_struct): Split out into add_struct_member and
get_accessibility, and handle static members.
---
 gcc/dwarf2codeview.cc | 183 +++---
 1 file changed, 135 insertions(+), 48 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 2535777d4cb..610f884d73d 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -106,6 +106,7 @@ enum cv_leaf_type {
   LF_UNION = 0x1506,
   LF_ENUM = 0x1507,
   LF_MEMBER = 0x150d,
+  LF_STMEMBER = 0x150e,
   LF_FUNC_ID = 0x1601,
   LF_STRING_ID = 0x1605,
   LF_CHAR = 0x8000,
@@ -1218,6 +1219,12 @@ struct codeview_subtype
   codeview_integer offset;
   char *name;
 } lf_member;
+struct
+{
+  uint16_t attributes;
+  uint32_t type;
+  char *name;
+} lf_static_member;
   };
 };
 
@@ -3663,6 +3670,40 @@ write_lf_fieldlist (codeview_custom_type *t)
 
  break;
 
+   case LF_STMEMBER:
+ /* This is lf_static_member in binutils and lfSTMember in Microsoft's
+cvinfo.h:
+
+   struct lf_static_member
+   {
+ uint16_t kind;
+ uint16_t attributes;
+ uint32_t type;
+ char name[];
+   } ATTRIBUTE_PACKED;
+ */
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_STMEMBER);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, v->lf_static_member.attributes);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (4, false), asm_out_file);
+ fprint_whex (asm_out_file, v->lf_static_member.type);
+ putc ('\n', asm_out_file);
+
+ name_len = strlen (v->lf_static_member.name) + 1;
+ ASM_OUTPUT_ASCII (asm_out_file, v->lf_static_member.name, name_len);
+
+ leaf_len = 8 + name_len;
+ write_cv_padding (4 - (leaf_len % 4));
+
+ free (v->lf_static_member.name);
+ break;
+
default:
  break;
}
@@ -4958,6 +4999,91 @@ create_bitfield (dw_die_ref c)
   return ct->num;
 }
 
+/* Create an LF_MEMBER field list subtype for a struct member, returning its
+   pointer in el and its size in el_len.  */
+
+static void
+add_struct_member (dw_die_ref c, uint16_t accessibility,
+  codeview_subtype **el, size_t *el_len)
+{
+  *el = (codeview_subtype *) xmalloc (sizeof (**el));
+  (*el)->next = NULL;
+  (*el)->kind = LF_MEMBER;
+  (*el)->lf_member.attributes = accessibility;
+
+  if (get_AT (c, DW_AT_data_bit_offset))
+(*el)->lf_member.type = create_bitfield (c);
+  else
+(*el)->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type),
+ true, false);
+
+  (*el)->lf_member.offset.neg = false;
+  (*el)->lf_member.offset.num = get_AT_unsigned (c, 
DW_AT_data_member_location);
+
+  *el_len = 11 + cv_integer_len (&(*el)->lf_member.offset);
+
+  if (get_AT_string (c, DW_AT_name))
+{
+  (*el)->lf_member.name = xstrdup (get_AT_string (c, DW_AT_name));
+  *el_len += strlen ((*el)->lf_member.name);
+}
+  else
+{
+  (*el)->lf_member.name = NULL;
+}
+
+  if (*el_len % 4)
+*el_len += 4 - (*el_len % 4);
+}
+
+/* Create an LF_STMEMBER field list subtype for a static struct member,
+   returning its pointer in el and its size in el_len.  */
+
+static void
+add_struct_static_member (dw_die_ref c, uint16_t accessibility,
+ codeview_subtype **el, size_t *el_len)
+{
+  *el = (codeview_subtype *) xmalloc (sizeof (**el));
+  (*el)->next = NULL;
+  (*el)->kind = LF_STMEMBER;
+  (*el)->lf_static_member.attributes = accessibility;
+  (*el)->lf_static_member.type = get_type_num (get_AT_ref (c, DW_AT_type),
+  true, false);
+  (*el)->lf_static_member.name = xstrdup (get_AT_string (c, DW_AT_name));
+
+  *el_len = 9 + strlen ((*el)->lf_static_member.name);
+
+  if (*el_len % 4)
+*el_len += 4 - (*el_len % 4);
+}
+
+/* Translate a DWARF DW_AT_accessibility constant into its CodeView
+   equivalent.  If implicit, follow the C++ rules.  */
+
+static uint16_t
+get_accessibility (dw_die_ref c)
+{
+  switch (get_AT_unsigned (c, DW_AT_accessibility))
+{
+case DW_ACCESS_private:
+  return CV_ACCESS_PRIVATE;
+
+case DW_ACCESS_protected:
+  return CV_ACCESS_PROTECTED;
+
+case DW_ACCESS_public:
+  return CV_ACCESS_PUBLIC;
+
+/* Members in a C++ struct or union are public by default, members
+  i

[PATCH 4/5] Record member functions in CodeView struct definitions

2024-08-26 Thread Mark Harmstone

CodeView has two ways of recording struct member functions.
Non-overloaded functions have an LF_ONEMETHOD sub-type in the field
list, which records the name and the function type (LF_MFUNCTION).
Overloaded functions have an LF_METHOD instead, which points to an
LF_METHODLIST, which is an array of links to various LF_MFUNCTION types.

gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add LF_MFUNCTION,
LF_METHODLIST, LF_METHOD, and LF_ONEMETHOD.
(struct codeview_subtype): Add lf_onemethod and lf_method to union.
(struct lf_methodlist_entry): New type.
(struct codeview_custom_type): Add lf_mfunc_id, lf_mfunction, and
lf_methodlist to union.
(struct codeview_method): New type.
(struct method_hasher): New type.
(get_type_num_subroutine_type): Add forward declaration.
(write_lf_fieldlist): Handle LF_ONEMETHOD and LF_METHOD.
(write_lf_mfunction): New function.
(write_lf_methodlist): New function.
(write_custom_types): Handle LF_MFUNCTION and LF_METHODLIST.
(add_struct_function): New function.
(get_mfunction_type): New function.
(is_templated_func): New function.
(get_type_num_struct): Handle DW_TAG_subprogram child DIEs.
(get_type_num_subroutine_type): Add containing_class_type, this_type,
and this_adjustment params, and handle creating LF_MFUNCTION types as
well as LF_PROCEDURE.
(get_type_num): New params for get_type_num_subroutine_type.
(add_function): New params for get_type_num_subroutine_type.
* dwarf2codeview.h (CV_METHOD_VANILLA, CV_METHOD_VIRTUAL): Define.
(CV_METHOD_STATIC, CV_METHOD_FRIEND, CV_METHOD_INTRO): Likewise.
(CV_METHOD_PUREVIRT, CV_METHOD_PUREINTRO): Likewise.
---
 gcc/dwarf2codeview.cc | 530 +-
 gcc/dwarf2codeview.h  |   9 +
 2 files changed, 528 insertions(+), 11 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 610f884d73d..1987575985a 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -95,9 +95,11 @@ enum cv_leaf_type {
   LF_MODIFIER = 0x1001,
   LF_POINTER = 0x1002,
   LF_PROCEDURE = 0x1008,
+  LF_MFUNCTION = 0x1009,
   LF_ARGLIST = 0x1201,
   LF_FIELDLIST = 0x1203,
   LF_BITFIELD = 0x1205,
+  LF_METHODLIST = 0x1206,
   LF_INDEX = 0x1404,
   LF_ENUMERATE = 0x1502,
   LF_ARRAY = 0x1503,
@@ -107,6 +109,8 @@ enum cv_leaf_type {
   LF_ENUM = 0x1507,
   LF_MEMBER = 0x150d,
   LF_STMEMBER = 0x150e,
+  LF_METHOD = 0x150f,
+  LF_ONEMETHOD = 0x1511,
   LF_FUNC_ID = 0x1601,
   LF_STRING_ID = 0x1605,
   LF_CHAR = 0x8000,
@@ -1225,9 +1229,27 @@ struct codeview_subtype
   uint32_t type;
   char *name;
 } lf_static_member;
+struct
+{
+  uint16_t method_attribute;
+  uint32_t method_type;
+  char *name;
+} lf_onemethod;
+struct
+{
+  uint16_t count;
+  uint32_t method_list;
+  char *name;
+} lf_method;
   };
 };
 
+struct lf_methodlist_entry
+{
+  uint16_t method_attribute;
+  uint32_t method_type;
+};
+
 struct codeview_custom_type
 {
   struct codeview_custom_type *next;
@@ -1302,10 +1324,32 @@ struct codeview_custom_type
   char *name;
 } lf_func_id;
 struct
+{
+  uint32_t parent_type;
+  uint32_t function_type;
+  char *name;
+} lf_mfunc_id;
+struct
 {
   uint32_t substring;
   char *string;
 } lf_string_id;
+struct
+{
+  uint32_t return_type;
+  uint32_t containing_class_type;
+  uint32_t this_type;
+  uint8_t calling_convention;
+  uint8_t attributes;
+  uint16_t num_parameters;
+  uint32_t arglist;
+  int32_t this_adjustment;
+} lf_mfunction;
+struct
+{
+  unsigned int count;
+  lf_methodlist_entry *entries;
+} lf_methodlist;
   };
 };
 
@@ -1330,6 +1374,31 @@ struct string_id_hasher : nofree_ptr_hash 
   }
 };
 
+struct codeview_method
+{
+  uint16_t attribute;
+  uint32_t type;
+  char *name;
+  unsigned int count;
+  struct codeview_method *next;
+  struct codeview_method *last;
+};
+
+struct method_hasher : nofree_ptr_hash 
+{
+  typedef const char *compare_type;
+
+  static hashval_t hash (const codeview_method *x)
+  {
+return htab_hash_string (x->name);
+  }
+
+  static bool equal (const codeview_method *x, const char *y)
+  {
+return !strcmp (x->name, y);
+  }
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -1348,6 +1417,10 @@ static codeview_deferred_type *deferred_types, 
*last_deferred_type;
 static hash_table *string_id_htab;
 
 static uint32_t get_type_num (dw_die_ref type, bool in_struct, bool 
no_fwd_ref);
+static uint32_t get_type_num_subroutine_type (dw_die_ref type, bool in_struct,
+ uint32_t containing_class_type,
+ uint32_t this_type,
+

New Chinese (simplified) PO file for 'gcc' (version 14.2.0)

2024-08-26 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

https://translationproject.org/latest/gcc/zh_CN.po

(This file, 'gcc-14.2.0.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Ping: [PATCH] testsuite: Prune compilation messages for modules tests

2024-08-26 Thread Hans-Peter Nilsson

Ping...

> From: Hans-Peter Nilsson 
> Date: Mon, 19 Aug 2024 00:28:30 +0200
> 
> As noticed when verifying the dejagnu fix.  Tested cris-elf
> with a new newlib that arranges to emit the mentioned
> warning, with/without the update in dejagnu to handle the
> miniscule "in".  Ok to commit?
> 
> -- >8 --
> All testsuite compiler-calls pass default_target_compile in the
> dejagnu installation (typically /usr/share/dejagnu/target.exp) which
> also calls the dejagnu-installed prune_warnings.
> 
> Normally, tests using the dg framework (most or all tests these days)
> compile and link by calling various wrappers that end up calling
> dg-test in the dejagnu installation, typically installed as
> /usr/share/dejagnu/dg.exp.  That, besides the compiler call, also
> calls ${tool}-dg-prune (g++-dg-prune) on the messages, which in turn
> ends up calling prune_gcc_output in gcc/testsuite/lib/prune.exp.  That
> gcc-specific "pruning" function handles more cases than the dejagnu
> prune_warnings, and also has updated patterns.
> 
> But, module_do_it in modules.exp calls the lower-level
> ${tool}_target_compile "directly", i.e. g++_target_compile defined in
> gcc/testsuite/lib/g++.exp.  That does not call ${tool}-dg-prune,
> meaning those test-cases miss the gcc-specific pruning.
> 
> Noticed while testing a dejagnu update that handled the miniscule "in"
> in the warning (line-breaks added below besides the original one after
> "(void*)':")
> 
> "/path/to/cris-elf/bin/ld:
> /gccobj/cris-elf/./libstdc++-v3/src/.libs/libstdc++.a(random.o): in
> function `std::(anonymous namespace)::__libc_getentropy(void*)':
> /gccsrc/libstdc++-v3/src/c++11/random.cc:183: warning: _getentropy is
> not implemented and will always fail"
> 
> The line saying "in function" rather than "In function" (from the
> binutils linker since 2018) is pruned by prune_gcc_output. The
> prune_warnings in dejagnu-1.6.3 and earlier handles the second line
> separately.  It's an unfortunate wart that neither consumes the
> delimiting line-break, leaving to the callers to prune residual empty
> lines.  See prune_warnings in dejagnu (default_target_compile and
> dg-test) for those other line-break fixups, as alluded in the comment.
> 
>   * g++.dg/modules/modules.exp (module_do_it): Prune compilation
>   messages.
> ---
>  gcc/testsuite/g++.dg/modules/modules.exp | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index 3e8df9b89309..e6bf28d8b1a0 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -205,9 +205,19 @@ proc module_do_it { do_what testcase std asm_list } {
>  if { !$ok } {
>   unresolved "$ident link"
>  } else {
> + global target_triplet
>   set out [${tool}_target_compile $asm_list \
>$execname executable $options]
>   eval $xfail
> +
> + # Do gcc-specific pruning.
> + set out [${tool}-dg-prune $target_triplet $out]
> + # Fix up remaining line-breaks similar to "regular" pruning
> + # calls.  Otherwise, a multi-line message stripped e.g. one
> + # part by the default prune_warnings and one part part by the
> + # gcc prune_gcc_output will have a residual line-break.
> + regsub "^\[\r\n\]+" $out "" out
> +
>   if { $out == "" } {
>   pass "$ident link"
>   } else {
> -- 
> 2.30.2
>

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Kees Cook

On Mon, Aug 26, 2024 at 11:01:08PM +0200, Martin Uecker wrote:
> Am Montag, dem 26.08.2024 um 13:30 -0700 schrieb Kees Cook:
> > On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
> > > Hi, Martin,
> > > 
> > > Looks like that there is some issue when I tried to use the _Generic for 
> > > the testing cases, and then I narrowed down to a
> > > small testing case that shows the problem without any change to GCC.
> > > 
> > > [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> > > struct annotated {
> > >   char b;
> > >   int c[];
> > > } *array_annotated;  
> > > extern void * counted_by_ref (int *);
> > > 
> > > int main(int argc, char *argv[])
> > > {
> > >   typeof(counted_by_ref (array_annotated->c)) ret
> > > = counted_by_ref (array_annotated->c); 
> > >_Generic (ret, void* : (void)0, default: *ret = 10);
> > > 
> > >   return 0;
> > > }
> > > [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> > > t1.c: In function ‘main’:
> > > t1.c:12:44: warning: dereferencing ‘void *’ pointer
> > >12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> > >   |^~~~
> > > t1.c:12:49: error: invalid use of void expression
> > >12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> > >   | ^
> > 
> > I implemented it like this[1] in the Linux kernel. So yours could be:
> > 
> > struct annotated {
> >   char b;
> >   int c[] __attribute__((counted_by(b));
> > };
> > extern struct annotated *array_annotated;
> > 
> > int main(int argc, char *argv[])
> > {
> >   typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
> >void *: (size_t *)NULL,
> >default: __builtin_get_counted_by(array_annotated->c)))
> > ret = __builtin_get_counted_by(array_annotated->c);
> >   if (ret)
> > *ret = 10;
> > 
> >   return 0;
> > }
> > 
> > It's a bit cumbersome, but it does what's needed.
> > 
> > This is, however, just doing exactly what Bill has suggested: it is
> > converting the (void *)NULL into (size_t *)NULL when there is no
> > counted_by annotation...
> > 
> > -Kees
> > 
> > [1] 
> > https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/
> 
> Interesting. Will __builtin_get_counted_by(array_annotated->c) give
> a null pointer (or an invalid pointer) of the correct type if 
> array_annotated is a null pointer of an annotated struct type?

If you mean this part:

typeof(P) __obj_ptr = NULL; \
/* Just query the counter type for type_max checking. */ \
typeof(_Generic(__flex_counter(__obj_ptr->FAM), \
void *: (size_t *)NULL, \
default: __flex_counter(__obj_ptr->FAM))) \
__counter_type_ptr = NULL; \

Where __obj_ptr starts as NULL, then yes. (Or at least, yes it does
currently with Qing's GCC patch and Bill's Clang patch.)

> I also wonder a bit about the multiple macro evaluations of the arguments
> for P and SIZE.

I tried to design it so they aren't used with anything that should
have side-effects.

Anyway, if __builtin_get_counted_by returns (size_t *)NULL then I think
the _Generic wrapping isn't needed. That would make it easier to use?

-Kees

-- 
Kees Cook

[PATCH] MIPS: Support vector reduc for MSA

2024-08-26 Thread YunQiang Su

From: YunQiang Su 

We have SHF.fmt and HADD_S/U.fmt with MSA, which can be used for
vector reduc.

For min/max for U8/S8, we can
SHF.B W1, W0, 0xb1  # swap byte inner every half
MIN.B W1, W1, W0
SHF.H W2, W1, 0xb1  # swap half inner every word
MIN.B W2, W2, W1
SHF.W W3, W2, 0xb1  # swap word inner every doubleword
MIN.B W4, W3, W2
SHF.W W4, W4, 0x4e  # swap the two doubleword
MIN.B W4, W4, W3

For plus of S8/U8, we can use HADD
HADD.H  W0, W0, W0
HADD.W  W0, W0, W0
HADD.D  W0, W0, W0
SHF.W   W1, W0, 0x4e  # swap the two doubleword
ADDV.D  W1, W1, W0
COPY_S.B  T0, W1  # COPY_U.B for U8

We can do similar for S16/U16/S32/U32/S64/U64/FLOAT/DOUBLE.

gcc

* config/mips/mips-msa.md: (MSA_NO_HADD): we have HADD for
S8/U8/S16/U16/S32/U32 only.
reduc_smin_scal_: New define pattern.
reduc_smax_scal_: Ditto.
reduc_umin_scal_: Ditto.
reduc_umax_scal_: Ditto.
reduc_plus_scal_: Ditto.
reduc_plus_scal_v4si: Ditto.
reduc_plus_scal_v8hi: Ditto.
reduc_plus_scal_v16qi: Ditto.
reduc__scal_: Ditto.
* config/mips/mips-protos.h: New function mips_expand_msa_reduc.
* config/mips/mips.cc: New function mips_expand_msa_reduc.
* config/mips/mips.md: Define any_bitwise iterator.

gcc/testsuite:

gcc.target/mips/msa-reduc.c: New tests.
---
 gcc/config/mips/mips-msa.md   | 128 ++
 gcc/config/mips/mips-protos.h |   1 +
 gcc/config/mips/mips.cc   |  41 +++
 gcc/config/mips/mips.md   |   4 +
 gcc/testsuite/gcc.target/mips/msa-reduc.c | 119 
 5 files changed, 293 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/msa-reduc.c

diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
index 377c63f0d35..976f296402e 100644
--- a/gcc/config/mips/mips-msa.md
+++ b/gcc/config/mips/mips-msa.md
@@ -125,6 +125,9 @@ (define_mode_iterator IMSA_WH  [V4SI V8HI])
 ;; Only floating-point modes.
 (define_mode_iterator FMSA [V2DF V4SF])
 
+;; Only used for reduce_plus_scal: V4SI, V8HI, V16QI have HADD.
+(define_mode_iterator MSA_NO_HADD [V2DF V4SF V2DI])
+
 ;; The attribute gives the integer vector mode with same size.
 (define_mode_attr VIMODE
   [(V2DF "V2DI")
@@ -2802,3 +2805,128 @@ (define_insn "msa__v_"
   (set_attr "mode" "TI")
   (set_attr "compact_form" "never")
   (set_attr "branch_likely" "no")])
+
+
+;; Vector reduction operation
+(define_expand "reduc_smin_scal_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:MSA 1 "register_operand")]
+  "ISA_HAS_MSA"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  mips_expand_msa_reduc (gen_smin3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0], tmp,
+ const0_rtx));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:MSA 1 "register_operand")]
+  "ISA_HAS_MSA"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  mips_expand_msa_reduc (gen_smax3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0], tmp,
+ const0_rtx));
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:IMSA 1 "register_operand")]
+  "ISA_HAS_MSA"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  mips_expand_msa_reduc (gen_umin3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0], tmp,
+ const0_rtx));
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:IMSA 1 "register_operand")]
+  "ISA_HAS_MSA"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  mips_expand_msa_reduc (gen_umax3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0], tmp,
+ const0_rtx));
+  DONE;
+})
+
+(define_expand "reduc_plus_scal_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:MSA_NO_HADD 1 "register_operand")]
+  "ISA_HAS_MSA"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  mips_expand_msa_reduc (gen_add3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0], tmp,
+ const0_rtx));
+  DONE;
+})
+
+(define_expand "reduc_plus_scal_v4si"
+  [(match_operand:SI 0 "register_operand")
+   (match_operand:V4SI 1 "register_operand")]
+  "ISA_HAS_MSA"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx tmp1 = gen_reg_rtx (V2DImode);
+  emit_insn (gen_msa_hadd_s_d (tmp1, operands[1], operands[1]));
+  emit_insn (gen_vec_extractv4sisi (operands[0], gen_lowpart (V4SImode, tmp1),
+   const0_rtx));
+  emit_insn (gen_vec_extractv4sisi (tmp, gen_lowpart (V4SImode, tmp1),
+   GEN_INT (2)));
+  emit_insn (gen_addsi3 (operands[0], operands[0], tmp));
+  DONE;
+})

Re: [PATCH] RISC-V: Fix double mode under RV32 not utilize vf

2024-08-26 Thread Andrew Waterman

On Fri, Jul 19, 2024 at 11:08 AM Jeff Law  wrote:
>
>
>
> On 7/19/24 2:55 AM, demin.han wrote:
> > Currently, some binops of vector vs double scalar under RV32 can't
> > translated to vf but vfmv+vxx.vv.
> >
> > The cause is that vec_duplicate is also expanded to broadcast for double 
> > mode
> > under RV32. last-combine can't process expanded broadcast.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/vector.md: Add !FLOAT_MODE_P constrain
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
> >   * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
> >   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto
> It looks like vadd-rv32gcv-nofm still isn't quite right according to the
> pre-commit testing:
>
>   >
> https://github.com/ewlu/gcc-precommit-ci/issues/1931#issuecomment-2238752679
>
>
> OK once that's fixed.  No need to wait for another review cycle.
>
> And a note.  We need to be careful as some uarchs may pay a penalty when
> the vector unit needs to get an operand from the GP or FP register
> files.  So there could well be cases where using .vf or .vx forms is
> slower.  Consider these two scenarios.
>
> First, we broadcast from the GP/FP across a vector regsiter outside a
> loop, the use a .vv form in the loop.
>
> Second we use a .vf or .vx form in the loop instead without any broadcast.
>
> In the former case we only pay the penalty for crossing register files
> once.  In the second case we'd pay it for every iteration of the loop.
>
> Given this is going to be uarch sensitive, I don't mind biasing towards
> the .vx/.vf forms right now, but we may need to add some costing models
> to this in the future as we can test on a wider variety of uarchs.

Just wanted to chime in to say that this should indeed be a tuning
decision, but our mental model should bias us in favor of the .vf/.vx
forms when we don't have any additional information.

It's a safe assumption that, for all uarches, it's better to use a
.vf/.vx form if the scalar operand is used only once.  If the scalar
is loop-invariant, then it's definitely uarch-dependent as to whether
a hoisted splat is preferable to repeated use of .vf/.vx.  (For
SiFive's in-order vector units, the splat is pure overhead; the
.vf/.vx forms are preferred.  I know the same is not true of other
uarches, though.)

There's the additional complicating factor: when the scalar operand
comes from memory, some uarches will prefer to use a strided load with
rs2=x0, rather than a scalar load followed by .vf/.vx, or a scalar
load followed by a splat.  (For SiFive's in-order vector units, this
optimization is profitable when the load is a cache miss, and it's a
de-optimization otherwise.  It isn't a case that's easy to tune for,
so thus far we've relegated it to hand-written code.)


>
>
> jeff
>

[PATCH v2 1/9] RISC-V: Fix vid const vector expander for non-npatterns size steps

2024-08-26 Thread Patrick O'Neill

Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2,  4,  4, ...}

This patch sets the step size to the requested value.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.

Signed-off-by: Patrick O'Neill 
---
Detected with the existing testsuite after patch 8/9 is applied:
FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  execution test
FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  execution test
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  execution test
---
 gcc/config/riscv/riscv-v.cc | 48 -
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c89603669e3..a3039a2cb19 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1312,25 +1312,61 @@ expand_const_vector (rtx target, rtx src)
  /* Generate the variable-length vector following this rule:
 { a, a, a + step, a + step, a + step * 2, a + step * 2, ...}
   E.g. { 0, 0, 8, 8, 16, 16, ... } */
- /* We want to create a pattern where value[ix] = floor (ix /
+
+ /* We want to create a pattern where value[idx] = floor (idx /
 NPATTERNS). As NPATTERNS is always a power of two we can
-rewrite this as = ix & -NPATTERNS.  */
+rewrite this as = idx & -NPATTERNS.  */
  /* Step 2: VID AND -NPATTERNS:
 { 0&-4, 1&-4, 2&-4, 3 &-4, 4 &-4, 5 &-4, 6 &-4, 7 &-4, ... }
  */
  rtx imm
= gen_int_mode (-builder.npatterns (), builder.inner_mode ());
- rtx tmp = gen_reg_rtx (builder.mode ());
- rtx and_ops[] = {tmp, vid, imm};
+ rtx tmp1 = gen_reg_rtx (builder.mode ());
+ rtx and_ops[] = {tmp1, vid, imm};
  icode = code_for_pred_scalar (AND, builder.mode ());
  emit_vlmax_insn (icode, BINARY_OP, and_ops);
+
+ /* Step 3: Convert to step size 1.  */
+ rtx tmp2 = gen_reg_rtx (builder.mode ());
+ /* log2 (npatterns) to get the shift amount to convert
+Eg.  { 0, 0, 0, 0, 4, 4, ... }
+into { 0, 0, 0, 0, 1, 1, ... }.  */
+ HOST_WIDE_INT shift_amt = exact_log2 (builder.npatterns ()) ;
+ rtx shift = gen_int_mode (shift_amt, builder.inner_mode ());
+ rtx shift_ops[] = {tmp2, tmp1, shift};
+ icode = code_for_pred_scalar (ASHIFTRT, builder.mode ());
+ emit_vlmax_insn (icode, BINARY_OP, shift_ops);
+
+ /* Step 4: Multiply to step size n.  */
+ HOST_WIDE_INT step_size =
+   INTVAL (builder.elt (builder.npatterns ()))
+   - INTVAL (builder.elt (0));
+ rtx tmp3 = gen_reg_rtx (builder.mode ());
+ if (pow2p_hwi (step_size))
+   {
+ /* Power of 2 can be handled with a left shift.  */
+ HOST_WIDE_INT shift = exact_log2 (step_size);
+ rtx shift_amount = gen_int_mode (shift, Pmode);
+ insn_code icode = code_for_pred_scalar (ASHIFT, mode);
+ rtx ops[] = {tmp3, tmp2, shift_amount};
+ emit_vlmax_insn (icode, BINARY_OP, ops);
+   }
+ else
+   {
+ rtx mult_amt = gen_int_mode (step_size, builder.inner_mode 
());
+ insn_code icode = code_for_pred_scalar (MULT, builder.mode 
());
+ rtx ops[] = {tmp3, tmp2, mult_amt};
+ emit_vlmax_insn (icode, BINARY_OP, ops);
+   }
+
+ /* Step 5: Add starting value to all elements.  */
  HOST_WIDE_INT init_val = INTVAL (builder.elt (0));
  if (init_val == 0)
-   emit_move_insn (target, tmp);
+   emit_move_insn (target, tmp3);
  else
{
  rtx dup = gen_const_vector_dup (builder.mode (), init_val);
- rtx add_ops[] = {target, tmp, dup};
+ rtx add_ops[] = {target, tmp3, dup};
  icode = code_for_pred (PLUS, builder.mode ());
  emit_vlmax_insn (icode, BINARY_OP, add_ops);
}
--
2.34.1

[PATCH v2 0/9] RISC-V: Improve const vector costing and expansion

2024-08-26 Thread Patrick O'Neill

Constant vectors are currently spilled/loaded from memory often. This series
increases the number of costed patterns via a catch-all pattern and fixes a
variety of bugs I found along the way.

v2 Changelog:
* Landed patch 1/9 from v1 of this patchset.
* Add new valid_vec_immediate_p helper
* Reorder "Handle 0.0 floating point pattern ..." to use riscv-v.h.
* Fix build failure on patches 6-8 that was previously fixed by patch 9.
* Append RFC to series.

Patrick O'Neill (9):
  RISC-V: Fix vid const vector expander for non-npatterns size steps
  RISC-V: Reorder insn cost match order to match corresponding expander
match order
  RISC-V: Handle case when constant vector construction target rtx is
not a register
  RISC-V: Emit costs for bool and stepped const vectors
  RISC-V: Handle 0.0 floating point pattern costing to match
const_vector expander
  RISC-V: Allow non-duplicate bool patterns in expand_const_vector
  RISC-V: Move helper functions above expand_const_vector
  RISC-V: Add vslide1up/down pattern to expand_const_vector
  RISC-V: Add cost model asserts

 gcc/config/riscv/riscv-v.cc   | 389 ++
 gcc/config/riscv/riscv-v.h| 158 +++
 gcc/config/riscv/riscv.cc | 209 +-
 .../riscv/rvv/autovec/materialize-1.c |  13 +
 .../riscv/rvv/autovec/materialize-2.c |  13 +
 .../riscv/rvv/autovec/materialize-3.c |  13 +
 .../riscv/rvv/autovec/materialize-4.c |  13 +
 .../riscv/rvv/autovec/materialize-5.c |  13 +
 .../riscv/rvv/autovec/materialize-6.c |  13 +
 9 files changed, 651 insertions(+), 183 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-v.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-6.c

--
2.34.1

[PATCH v2 4/9] RISC-V: Emit costs for bool and stepped const vectors

2024-08-26 Thread Patrick O'Neill

These cases are handled in the expander
(riscv-v.cc:expand_const_vector). We need the vector builder to detect
these cases so extract that out into a new riscv-v.h header file.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h.
* config/riscv/riscv.cc (riscv_const_insns): Emit placeholder costs for
bool/stepped const vectors.
* config/riscv/riscv-v.h: New file.

Signed-off-by: Patrick O'Neill 
---
Ack'd here: 
https://inbox.sourceware.org/gcc-patches/cd634c30-caf0-4375-a623-d9cd86498...@gmail.com/
---
 gcc/config/riscv/riscv-v.cc | 53 +-
 gcc/config/riscv/riscv-v.h  | 88 +
 gcc/config/riscv/riscv.cc   | 42 ++
 3 files changed, 131 insertions(+), 52 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-v.h

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index aea4b9b872b..897b31c069e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -51,6 +51,7 @@
 #include "targhooks.h"
 #include "predict.h"
 #include "errors.h"
+#include "riscv-v.h"

 using namespace riscv_vector;

@@ -436,58 +437,6 @@ emit_nonvlmax_insn (unsigned icode, unsigned insn_flags, 
rtx *ops, rtx vl)
   e.emit_insn ((enum insn_code) icode, ops);
 }

-class rvv_builder : public rtx_vector_builder
-{
-public:
-  rvv_builder () : rtx_vector_builder () {}
-  rvv_builder (machine_mode mode, unsigned int npatterns,
-  unsigned int nelts_per_pattern)
-: rtx_vector_builder (mode, npatterns, nelts_per_pattern)
-  {
-m_inner_mode = GET_MODE_INNER (mode);
-m_inner_bits_size = GET_MODE_BITSIZE (m_inner_mode);
-m_inner_bytes_size = GET_MODE_SIZE (m_inner_mode);
-m_mask_mode = get_mask_mode (mode);
-
-gcc_assert (
-  int_mode_for_size (inner_bits_size (), 0).exists (&m_inner_int_mode));
-m_int_mode
-  = get_vector_mode (m_inner_int_mode, GET_MODE_NUNITS (mode)).require ();
-  }
-
-  bool can_duplicate_repeating_sequence_p ();
-  bool is_repeating_sequence ();
-  rtx get_merged_repeating_sequence ();
-
-  bool repeating_sequence_use_merge_profitable_p ();
-  bool combine_sequence_use_slideup_profitable_p ();
-  bool combine_sequence_use_merge_profitable_p ();
-  rtx get_merge_scalar_mask (unsigned int, machine_mode) const;
-
-  bool single_step_npatterns_p () const;
-  bool npatterns_all_equal_p () const;
-  bool interleaved_stepped_npatterns_p () const;
-  bool npatterns_vid_diff_repeated_p () const;
-
-  machine_mode new_mode () const { return m_new_mode; }
-  scalar_mode inner_mode () const { return m_inner_mode; }
-  scalar_int_mode inner_int_mode () const { return m_inner_int_mode; }
-  machine_mode mask_mode () const { return m_mask_mode; }
-  machine_mode int_mode () const { return m_int_mode; }
-  unsigned int inner_bits_size () const { return m_inner_bits_size; }
-  unsigned int inner_bytes_size () const { return m_inner_bytes_size; }
-
-private:
-  scalar_mode m_inner_mode;
-  scalar_int_mode m_inner_int_mode;
-  machine_mode m_new_mode;
-  scalar_int_mode m_new_inner_mode;
-  machine_mode m_mask_mode;
-  machine_mode m_int_mode;
-  unsigned int m_inner_bits_size;
-  unsigned int m_inner_bytes_size;
-};
-
 /* Return true if the vector duplicated by a super element which is the fusion
of consecutive elements.

diff --git a/gcc/config/riscv/riscv-v.h b/gcc/config/riscv/riscv-v.h
new file mode 100644
index 000..4635b5415c7
--- /dev/null
+++ b/gcc/config/riscv/riscv-v.h
@@ -0,0 +1,88 @@
+/* Subroutines used for code generation for RISC-V 'V' Extension for
+   GNU compiler.
+   Copyright (C) 2022-2024 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#ifndef GCC_RISCV_V_H
+#define GCC_RISCV_V_H
+
+#include "rtx-vector-builder.h"
+
+using namespace riscv_vector;
+
+namespace riscv_vector {
+
+extern machine_mode get_mask_mode (machine_mode);
+extern opt_machine_mode get_vector_mode (scalar_mode, poly_uint64);
+
+class rvv_builder : public rtx_vector_builder
+{
+public:
+  rvv_builder () : rtx_vector_builder () {}
+  rvv_builder (machine_mode mode, unsigned int npatterns,
+  unsigned int nelts_per_pattern)
+: rtx_vector_builder (mode, npatterns, nelts_per_pattern)
+  {
+

[PATCH v2 6/9] RISC-V: Allow non-duplicate bool patterns in expand_const_vector

2024-08-26 Thread Patrick O'Neill

Currently we assert when encountering a non-duplicate boolean vector.
This patch allows non-duplicate vectors to fall through to the
gcc_unreachable and assert there.

This will be useful when adding a catch-all pattern to emit costs and
handle arbitary vectors.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Allow non-duplicate
to fall through other patterns before asserting.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv-v.cc | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 32349677dc2..cb2380ad664 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1109,26 +1109,19 @@ expand_const_vector (rtx target, rtx src)
 {
   machine_mode mode = GET_MODE (target);
   rtx result = register_operand (target, mode) ? target : gen_reg_rtx (mode);
-  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
-{
-  rtx elt;
-  gcc_assert (
-   const_vec_duplicate_p (src, &elt)
-   && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[] = {result, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_MASK_OP, ops);
-
-  if (result != target)
-   emit_move_insn (target, result);
-  return;
-}
-
   rtx elt;
   if (const_vec_duplicate_p (src, &elt))
 {
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+   {
+ gcc_assert (rtx_equal_p (elt, const0_rtx)
+ || rtx_equal_p (elt, const1_rtx));
+ rtx ops[] = {result, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), UNARY_MASK_OP, ops);
+   }
   /* Element in range -16 ~ 15 integer or 0.0 floating-point,
 we use vmv.v.i instruction.  */
-  if (valid_vec_immediate_p (src))
+  else if (valid_vec_immediate_p (src))
{
  rtx ops[] = {result, src};
  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
--
2.34.1

[PATCH v2 8/9] RISC-V: Add vslide1up/down pattern to expand_const_vector

2024-08-26 Thread Patrick O'Neill

Also explicitly disallow CONST_VECTOR_DUPLICATE_P for now.
CONST_VECTOR_DUPLICATE_P was previously disallowed implicitly.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Update comment.
(expand_vector_init_insert_elems): Ditto.
(expand_const_vector): Add catch-all pattern.
* config/riscv/riscv.cc (riscv_const_insns): Add costing for catch-all
pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/materialize-1.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-2.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-3.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-4.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-5.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-6.c: New test.

Signed-off-by: Patrick O'Neill 
---
This causes 4 new regressions on glibc rv64gcv:
Appears to be spilling due to the increased register pressure from 
materializing constants for vslide1down:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c scan-assembler-not jr
FAIL: gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c scan-assembler-not sp

Caused due to vle32/64 being replaced with splat & vslide1down:
FAIL: gcc.target/riscv/rvv/autovec/vls/init-5.c -O3 -ftree-vectorize 
-mrvv-vector-bits=scalable  scan-assembler-times vle32\\.v 7
FAIL: gcc.target/riscv/rvv/autovec/vls/init-7.c -O3 -ftree-vectorize 
-mrvv-vector-bits=scalable  scan-assembler-times vle64\\.v 7

I'm not sure if it's profitable to replace a lmul8 load with 127 vslide1down.vx
ops but we're being honest with the middle end when returning the # of insns
we'll be emitting when costing...
---
 gcc/config/riscv/riscv-v.cc   |  24 +++-
 gcc/config/riscv/riscv.cc | 108 --
 .../riscv/rvv/autovec/materialize-1.c |  13 +++
 .../riscv/rvv/autovec/materialize-2.c |  13 +++
 .../riscv/rvv/autovec/materialize-3.c |  13 +++
 .../riscv/rvv/autovec/materialize-4.c |  13 +++
 .../riscv/rvv/autovec/materialize-5.c |  13 +++
 .../riscv/rvv/autovec/materialize-6.c |  13 +++
 8 files changed, 199 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/materialize-6.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b6c3a21e2d..a31766f3662 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1104,7 +1104,7 @@ expand_vec_series (rtx dest, rtx base, rtx step, rtx vid)
 emit_move_insn (dest, result);
 }

-/* Subroutine of riscv_vector_expand_vector_init.
+/* Subroutine of riscv_vector_expand_vector_init and expand_const_vector.
Works as follows:
(a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
(b) Skip leading elements from BUILDER, which are the same as
@@ -1129,7 +1129,7 @@ expand_vector_init_insert_elems (rtx target, const 
rvv_builder &builder,
 }
 }

-/* Subroutine of expand_vec_init to handle case
+/* Subroutine of expand_vec_init and expand_const_vector to handle case
when all trailing elements of builder are same.
This works as follows:
(a) Use expand_insn interface to broadcast last vector element in TARGET.
@@ -1248,6 +1248,8 @@ expand_const_vector (rtx target, rtx src)
 }
   builder.finalize ();

+  bool emit_catch_all_pattern = false;
+
   if (CONST_VECTOR_DUPLICATE_P (src))
 {
   /* Handle the case with repeating sequence that NELTS_PER_PATTERN = 1
@@ -1555,10 +1557,24 @@ expand_const_vector (rtx target, rtx src)
}
   else
/* TODO: We will enable more variable-length vector in the future.  */
-   gcc_unreachable ();
+   emit_catch_all_pattern = true;
 }
   else
-gcc_unreachable ();
+emit_catch_all_pattern = true;
+
+  if (emit_catch_all_pattern)
+{
+  int nelts = XVECLEN (src, 0);
+
+  /* Optimize trailing same elements sequence:
+  v = {y, y2, y3, y4, y5, x, x, x, x, x, x, x, x, x, x, x};  */
+  if (!expand_vector_init_trailing_same_elem (result, builder, nelts))
+   /* Handle common situation with vslide1down.  This function can handle
+  any case of vec_init.  Only the cases that are not optimized
+  above will fall through here.  This prevents us from dumping
+  to/reading from the stack to initialize vectors.  */
+   expand_vector_init_insert_elems (result, builder, nelts);
+}

   if (result != target)
 emit_move_insn (target, result);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/

[PATCH v2 3/9] RISC-V: Handle case when constant vector construction target rtx is not a register

2024-08-26 Thread Patrick O'Neill

This manifests in RTL that is optimized away which causes runtime failures
in the testsuite. Update all patterns to use a temp result register if required.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if
needed.

Signed-off-by: Patrick O'Neill 
---
 gcc/config/riscv/riscv-v.cc | 73 +
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a3039a2cb19..aea4b9b872b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1150,26 +1150,29 @@ static void
 expand_const_vector (rtx target, rtx src)
 {
   machine_mode mode = GET_MODE (target);
+  rtx result = register_operand (target, mode) ? target : gen_reg_rtx (mode);
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
 {
   rtx elt;
   gcc_assert (
const_vec_duplicate_p (src, &elt)
&& (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[] = {target, src};
+  rtx ops[] = {result, src};
   emit_vlmax_insn (code_for_pred_mov (mode), UNARY_MASK_OP, ops);
+
+  if (result != target)
+   emit_move_insn (target, result);
   return;
 }

   rtx elt;
   if (const_vec_duplicate_p (src, &elt))
 {
-  rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx (mode);
   /* Element in range -16 ~ 15 integer or 0.0 floating-point,
 we use vmv.v.i instruction.  */
   if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
{
- rtx ops[] = {tmp, src};
+ rtx ops[] = {result, src};
  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
}
   else
@@ -1186,7 +1189,7 @@ expand_const_vector (rtx target, rtx src)
 instruction (vsetvl a5, zero).  */
  if (lra_in_progress)
{
- rtx ops[] = {tmp, elt};
+ rtx ops[] = {result, elt};
  emit_vlmax_insn (code_for_pred_broadcast (mode), UNARY_OP, ops);
}
  else
@@ -1194,15 +1197,15 @@ expand_const_vector (rtx target, rtx src)
  struct expand_operand ops[2];
  enum insn_code icode = optab_handler (vec_duplicate_optab, mode);
  gcc_assert (icode != CODE_FOR_nothing);
- create_output_operand (&ops[0], tmp, mode);
+ create_output_operand (&ops[0], result, mode);
  create_input_operand (&ops[1], elt, GET_MODE_INNER (mode));
  expand_insn (icode, 2, ops);
- tmp = ops[0].value;
+ result = ops[0].value;
}
}

-  if (tmp != target)
-   emit_move_insn (target, tmp);
+  if (result != target)
+   emit_move_insn (target, result);
   return;
 }

@@ -1210,7 +1213,10 @@ expand_const_vector (rtx target, rtx src)
   rtx base, step;
   if (const_vec_series_p (src, &base, &step))
 {
-  expand_vec_series (target, base, step);
+  expand_vec_series (result, base, step);
+
+  if (result != target)
+   emit_move_insn (target, result);
   return;
 }

@@ -1243,7 +1249,7 @@ expand_const_vector (rtx target, rtx src)
   all element equal to 0x0706050403020100.  */
  rtx ele = builder.get_merged_repeating_sequence ();
  rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
- emit_move_insn (target, gen_lowpart (mode, dup));
+ emit_move_insn (result, gen_lowpart (mode, dup));
}
   else
{
@@ -1272,8 +1278,8 @@ expand_const_vector (rtx target, rtx src)
  emit_vlmax_insn (code_for_pred_scalar (AND, builder.int_mode ()),
BINARY_OP, and_ops);

- rtx tmp = gen_reg_rtx (builder.mode ());
- rtx dup_ops[] = {tmp, builder.elt (0)};
+ rtx tmp1 = gen_reg_rtx (builder.mode ());
+ rtx dup_ops[] = {tmp1, builder.elt (0)};
  emit_vlmax_insn (code_for_pred_broadcast (builder.mode ()), UNARY_OP,
dup_ops);
  for (unsigned int i = 1; i < builder.npatterns (); i++)
@@ -1285,12 +1291,12 @@ expand_const_vector (rtx target, rtx src)

  /* Merge scalar to each i.  */
  rtx tmp2 = gen_reg_rtx (builder.mode ());
- rtx merge_ops[] = {tmp2, tmp, builder.elt (i), mask};
+ rtx merge_ops[] = {tmp2, tmp1, builder.elt (i), mask};
  insn_code icode = code_for_pred_merge_scalar (builder.mode ());
  emit_vlmax_insn (icode, MERGE_OP, merge_ops);
- tmp = tmp2;
+ tmp1 = tmp2;
}
- emit_move_insn (target, tmp);
+ emit_move_insn (result, tmp1);
}
 }
   else if (CONST_VECTOR_STEPPED_P (src))
@@ -1362,11 +1368,11 @@ expand_const_vector (rtx target, rtx src)
  /* Step 5: Add starting value to all elements.  */
  HOST_WIDE_INT init_val = INTVAL (builder.elt (0));

[PATCH v2 2/9] RISC-V: Reorder insn cost match order to match corresponding expander match order

2024-08-26 Thread Patrick O'Neill

The corresponding expander (riscv-v.cc:expand_const_vector) matches
const_vec_duplicate_p before const_vec_series_p. Reorder to match this
behavior when calculating costs.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Relocate.

Signed-off-by: Patrick O'Neill 
---
Ack'd here: 
https://inbox.sourceware.org/gcc-patches/3a97eb17-32fe-4cf4-874e-5c4a707b2...@gmail.com/
---
 gcc/config/riscv/riscv.cc | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8538d405f50..640394e0cb8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2142,15 +2142,6 @@ riscv_const_insns (rtx x, bool allow_new_pseudos)
  ...etc.  */
if (riscv_v_ext_mode_p (GET_MODE (x)))
  {
-   /* const series vector.  */
-   rtx base, step;
-   if (const_vec_series_p (x, &base, &step))
- {
-   /* This is not accurate, we will need to adapt the COST
-* accurately according to BASE && STEP.  */
-   return 1;
- }
-
rtx elt;
if (const_vec_duplicate_p (x, &elt))
  {
@@ -2186,6 +2177,15 @@ riscv_const_insns (rtx x, bool allow_new_pseudos)
  return 1 + 4; /*vmv.v.x + memory access.  */
  }
  }
+
+   /* const series vector.  */
+   rtx base, step;
+   if (const_vec_series_p (x, &base, &step))
+ {
+   /* This cost is not accurate, we will need to adapt the COST
+  accurately according to BASE && STEP.  */
+   return 1;
+ }
  }

/* TODO: We may support more const vector in the future.  */
--
2.34.1

[PATCH v2 9/9] RISC-V: Add cost model asserts

2024-08-26 Thread Patrick O'Neill

This patch adds some advanced checking to assert that the emitted costs match
emitted patterns for const_vecs.

Flow:
Costing: Insert into hashmap>
Expand: Check for membership in hashmap
 -> Not in hashmap: ignore, this wasn't costed
 -> In hashmap: Iterate over vec
-> if RTX not in hashmap: Ignore, this wasn't costed (hash collision)
-> if RTX in hashmap: Assert enum is expected

There are no false positive asserts with this flow.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Add RTL_CHECKING gated
asserts.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
* config/riscv/riscv-v.h (insert_expected_pattern): Add helper function
to insert hash collisions into hash map vec key.
(get_expected_costed_type): Add helper function to get the expected
cost type for a given rtx pattern.

Signed-off-by: Patrick O'Neill 
---
Was rfc: 
https://inbox.sourceware.org/gcc-patches/054f4f37-9615-4e01-940e-0cf4d188f...@gmail.com/T/#t

While I think it's extremely valuable I'd be open to dropping it if there's
strong opposition to it. I'm not sure how often people run with checking enabled
but this seems likely to bitrot if the answer is not often.
Maybe a sign to set up some weekly rtl-checking postcommit runs?

With this patch (without the ifdefs) the riscv rv64gcv testsuite on a 32 core 
machine took:
36689.15s  user 7398.57s system 2751% cpu 26:42.05 total
max memory:844 MB

Without this patch:
35510.99s  user 7157.93s system 2772% cpu 25:39.21 total
max memory:844 MB

The hash map is never explicitly freed by GCC.
---
 gcc/config/riscv/riscv-v.cc | 47 +
 gcc/config/riscv/riscv-v.h  | 68 +
 gcc/config/riscv/riscv.cc   | 45 ++--
 3 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a31766f3662..3236ff728a6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1173,6 +1173,12 @@ expand_vector_init_trailing_same_elem (rtx target,
 static void
 expand_const_vector (rtx target, rtx src)
 {
+#ifdef ENABLE_RTL_CHECKING
+  riscv_const_expect_p* expected_pattern = NULL;
+  if (EXPECTED_CONST_PATTERN)
+expected_pattern = get_expected_costed_type (EXPECTED_CONST_PATTERN, src);
+#endif
+
   machine_mode mode = GET_MODE (target);
   rtx result = register_operand (target, mode) ? target : gen_reg_rtx (mode);
   rtx elt;
@@ -1180,6 +1186,10 @@ expand_const_vector (rtx target, rtx src)
 {
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
{
+#ifdef ENABLE_RTL_CHECKING
+ if (expected_pattern)
+   gcc_assert (*expected_pattern == RVV_DUPLICATE_BOOL);
+#endif
  gcc_assert (rtx_equal_p (elt, const0_rtx)
  || rtx_equal_p (elt, const1_rtx));
  rtx ops[] = {result, src};
@@ -1189,11 +1199,20 @@ expand_const_vector (rtx target, rtx src)
 we use vmv.v.i instruction.  */
   else if (valid_vec_immediate_p (src))
{
+#ifdef ENABLE_RTL_CHECKING
+ if (expected_pattern)
+   gcc_assert (*expected_pattern == RVV_DUPLICATE_VMV_VI);
+#endif
  rtx ops[] = {result, src};
  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
}
   else
{
+#ifdef ENABLE_RTL_CHECKING
+ if (expected_pattern)
+   gcc_assert (*expected_pattern == RVV_DUPLICATE_INT_FP);
+#endif
+
  /* Emit vec_duplicate split pattern before RA so that
 we could have a better optimization opportunity in LICM
 which will hoist vmv.v.x outside the loop and in fwprop && combine
@@ -1230,6 +1249,10 @@ expand_const_vector (rtx target, rtx src)
   rtx base, step;
   if (const_vec_series_p (src, &base, &step))
 {
+#ifdef ENABLE_RTL_CHECKING
+  if (expected_pattern)
+   gcc_assert (*expected_pattern == RVV_SERIES);
+#endif
   expand_vec_series (result, base, step);

   if (result != target)
@@ -1323,6 +1346,10 @@ expand_const_vector (rtx target, rtx src)
   gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
   if (builder.single_step_npatterns_p ())
{
+#ifdef ENABLE_RTL_CHECKING
+ if (expected_pattern)
+   gcc_assert (*expected_pattern == RVV_PATTERN_SINGLE_STEP);
+#endif
  /* Describe the case by choosing NPATTERNS = 4 as an example.  */
  insn_code icode;

@@ -1462,6 +1489,10 @@ expand_const_vector (rtx target, rtx src)
}
   else if (builder.interleaved_stepped_npatterns_p ())
{
+#ifdef ENABLE_RTL_CHECKING
+ if (expected_pattern)
+   gcc_assert (*expected_pattern == RVV_PATTERN_INTERLEAVED);
+#endif
  rtx base1 = builder.elt (0);
  rtx base2 = builder.elt (1);
  poly_int64 step1
@@ -1564,6 +1595,13 @@ expand_const_vector (rtx target, rtx src)

   if (emit_catch_all_pattern)
 {
+#ifdef E

[PATCH v2 5/9] RISC-V: Handle 0.0 floating point pattern costing to match const_vector expander

2024-08-26 Thread Patrick O'Neill

The comment previously here stated that the Wc0/Wc1 cases are handled by
the vi constraint but that is not true for the 0.0 Wc0 case.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 floating-point
case.

Signed-off-by: Patrick O'Neill 
---
Ack'd here: 
https://inbox.sourceware.org/gcc-patches/d3mqflkxz4cq.3gakryqyo...@gmail.com/
---
 gcc/config/riscv/riscv-v.cc | 11 ++-
 gcc/config/riscv/riscv-v.h  |  2 ++
 gcc/config/riscv/riscv.cc   |  8 +++-
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 897b31c069e..32349677dc2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -794,6 +794,15 @@ const_vec_all_in_range_p (rtx vec, poly_int64 minval, 
poly_int64 maxval)
   return true;
 }

+/* Returns true if the vector's elements are all duplicates in
+   range -16 ~ 15 integer or 0.0 floating-point.  */
+
+bool
+valid_vec_immediate_p (rtx x)
+{
+  return (satisfies_constraint_vi (x) || satisfies_constraint_Wc0 (x));
+}
+
 /* Return a const vector of VAL. The VAL can be either const_int or
const_poly_int.  */

@@ -1119,7 +1128,7 @@ expand_const_vector (rtx target, rtx src)
 {
   /* Element in range -16 ~ 15 integer or 0.0 floating-point,
 we use vmv.v.i instruction.  */
-  if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
+  if (valid_vec_immediate_p (src))
{
  rtx ops[] = {result, src};
  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
diff --git a/gcc/config/riscv/riscv-v.h b/gcc/config/riscv/riscv-v.h
index 4635b5415c7..e7b095f094e 100644
--- a/gcc/config/riscv/riscv-v.h
+++ b/gcc/config/riscv/riscv-v.h
@@ -83,6 +83,8 @@ private:
   unsigned int m_inner_bytes_size;
 };

+extern bool valid_vec_immediate_p(rtx);
+
 } // namespace riscv_vector

 #endif // GCC_RISCV_V_H
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e2718c9eb6e..400b1059666 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2158,11 +2158,9 @@ riscv_const_insns (rtx x, bool allow_new_pseudos)
if (maybe_gt (GET_MODE_SIZE (smode), UNITS_PER_WORD)
&& !immediate_operand (elt, Pmode))
  return 0;
-   /* Constants from -16 to 15 can be loaded with vmv.v.i.
-  The Wc0, Wc1 constraints are already covered by the
-  vi constraint so we do not need to check them here
-  separately.  */
-   if (satisfies_constraint_vi (x))
+   /* Constants in range -16 ~ 15 integer or 0.0 floating-point
+  can be emitted using vmv.v.i.  */
+   if (valid_vec_immediate_p (x))
  return 1;

/* Any int/FP constants can always be broadcast from a
--
2.34.1

[PATCH v2 7/9] RISC-V: Move helper functions above expand_const_vector

2024-08-26 Thread Patrick O'Neill

These subroutines will be used in expand_const_vector in a future patch.
Relocate so expand_const_vector can use them.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vector_init_insert_elems): Relocate.
(expand_vector_init_trailing_same_elem): Ditto.

Signed-off-by: Patrick O'Neill 
---
Ack'd here: 
https://inbox.sourceware.org/gcc-patches/0a08cbce-1568-4197-8df3-33966e440...@gmail.com/
---
 gcc/config/riscv/riscv-v.cc | 132 ++--
 1 file changed, 66 insertions(+), 66 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index cb2380ad664..9b6c3a21e2d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1104,6 +1104,72 @@ expand_vec_series (rtx dest, rtx base, rtx step, rtx vid)
 emit_move_insn (dest, result);
 }

+/* Subroutine of riscv_vector_expand_vector_init.
+   Works as follows:
+   (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
+   (b) Skip leading elements from BUILDER, which are the same as
+   element NELTS_REQD - 1.
+   (c) Insert earlier elements in reverse order in TARGET using vslide1down.  
*/
+
+static void
+expand_vector_init_insert_elems (rtx target, const rvv_builder &builder,
+int nelts_reqd)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx dup = expand_vector_broadcast (mode, builder.elt (0));
+  emit_move_insn (target, dup);
+  int ndups = builder.count_dups (0, nelts_reqd - 1, 1);
+  for (int i = ndups; i < nelts_reqd; i++)
+{
+  unsigned int unspec
+   = FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN;
+  insn_code icode = code_for_pred_slide (unspec, mode);
+  rtx ops[] = {target, target, builder.elt (i)};
+  emit_vlmax_insn (icode, BINARY_OP, ops);
+}
+}
+
+/* Subroutine of expand_vec_init to handle case
+   when all trailing elements of builder are same.
+   This works as follows:
+   (a) Use expand_insn interface to broadcast last vector element in TARGET.
+   (b) Insert remaining elements in TARGET using insr.
+
+   ??? The heuristic used is to do above if number of same trailing elements
+   is greater than leading_ndups, loosely based on
+   heuristic from mostly_zeros_p.  May need fine-tuning.  */
+
+static bool
+expand_vector_init_trailing_same_elem (rtx target,
+  const rtx_vector_builder &builder,
+  int nelts_reqd)
+{
+  int leading_ndups = builder.count_dups (0, nelts_reqd - 1, 1);
+  int trailing_ndups = builder.count_dups (nelts_reqd - 1, -1, -1);
+  machine_mode mode = GET_MODE (target);
+
+  if (trailing_ndups > leading_ndups)
+{
+  rtx dup = expand_vector_broadcast (mode, builder.elt (nelts_reqd - 1));
+  for (int i = nelts_reqd - trailing_ndups - 1; i >= 0; i--)
+   {
+ unsigned int unspec
+   = FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1UP : UNSPEC_VSLIDE1UP;
+ insn_code icode = code_for_pred_slide (unspec, mode);
+ rtx tmp = gen_reg_rtx (mode);
+ rtx ops[] = {tmp, dup, builder.elt (i)};
+ emit_vlmax_insn (icode, BINARY_OP, ops);
+ /* slide1up need source and dest to be different REG.  */
+ dup = tmp;
+   }
+
+  emit_move_insn (target, dup);
+  return true;
+}
+
+  return false;
+}
+
 static void
 expand_const_vector (rtx target, rtx src)
 {
@@ -2338,31 +2404,6 @@ preferred_simd_mode (scalar_mode mode)
   return word_mode;
 }

-/* Subroutine of riscv_vector_expand_vector_init.
-   Works as follows:
-   (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
-   (b) Skip leading elements from BUILDER, which are the same as
-   element NELTS_REQD - 1.
-   (c) Insert earlier elements in reverse order in TARGET using vslide1down.  
*/
-
-static void
-expand_vector_init_insert_elems (rtx target, const rvv_builder &builder,
-int nelts_reqd)
-{
-  machine_mode mode = GET_MODE (target);
-  rtx dup = expand_vector_broadcast (mode, builder.elt (0));
-  emit_move_insn (target, dup);
-  int ndups = builder.count_dups (0, nelts_reqd - 1, 1);
-  for (int i = ndups; i < nelts_reqd; i++)
-{
-  unsigned int unspec
-   = FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN;
-  insn_code icode = code_for_pred_slide (unspec, mode);
-  rtx ops[] = {target, target, builder.elt (i)};
-  emit_vlmax_insn (icode, BINARY_OP, ops);
-}
-}
-
 /* Use merge approach to initialize the vector with repeating sequence.
v = {a, b, a, b, a, b, a, b}.

@@ -2487,47 +2528,6 @@ expand_vector_init_merge_combine_sequence (rtx target,
   emit_vlmax_insn (icode, MERGE_OP, merge_ops);
 }

-/* Subroutine of expand_vec_init to handle case
-   when all trailing elements of builder are same.
-   This works as follows:
-   (a) Use expand_insn interface to broadcast last vector element in TARGET.
-   (b) Insert remaining elements in TARGET using

RE: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-26 Thread Li, Pan2

Thanks Richard for comments.

> I think you want to use nop_convert here, for sure a truncation or
> extension wouldn't be valid?

Oh, yes, should be nop_convert.

> I think you don't need :c on both the inner plus and the bit_xor here?

Sure, could you please help to explain more about when should I need to add :c?
Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios.

> +   integer_zerop)
> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)

> The comment above quotes 'MIN' but that's not present here - that is,
> the comment quotes a source form while we match what we see on
> GIMPLE?  I do expect the matching will be quite fragile when not
> being isolated.

Got it, will update the comments to gimple.

Pan

-Original Message-
From: Richard Biener  
Sent: Monday, August 26, 2024 9:40 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

On Mon, Aug 26, 2024 at 4:20 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
>   T __attribute__((noinline))  \
>   sat_s_add_##T##_fmt_1 (T x, T y) \
>   {\
> T sum = (UT)x + (UT)y; \
> return (x ^ y) < 0 \
>   ? sum\
>   : (sum ^ x) >= 0 \
> ? sum  \
> : x < 0 ? MIN : MAX;   \
>   }
>
> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t sum;
>8   │   long unsigned int x.0_1;
>9   │   long unsigned int y.1_2;
>   10   │   long unsigned int _3;
>   11   │   long int _4;
>   12   │   long int _5;
>   13   │   int64_t _6;
>   14   │   _Bool _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │   long int _14;
>   18   │   long int _16;
>   19   │   long int _17;
>   20   │
>   21   │ ;;   basic block 2, loop depth 0
>   22   │ ;;pred:   ENTRY
>   23   │   x.0_1 = (long unsigned int) x_7(D);
>   24   │   y.1_2 = (long unsigned int) y_8(D);
>   25   │   _3 = x.0_1 + y.1_2;
>   26   │   sum_9 = (int64_t) _3;
>   27   │   _4 = x_7(D) ^ y_8(D);
>   28   │   _5 = x_7(D) ^ sum_9;
>   29   │   _17 = ~_4;
>   30   │   _16 = _5 & _17;
>   31   │   if (_16 < 0)
>   32   │ goto ; [41.00%]
>   33   │   else
>   34   │ goto ; [59.00%]
>   35   │ ;;succ:   3
>   36   │ ;;4
>   37   │
>   38   │ ;;   basic block 3, loop depth 0
>   39   │ ;;pred:   2
>   40   │   _11 = x_7(D) < 0;
>   41   │   _12 = (long int) _11;
>   42   │   _13 = -_12;
>   43   │   _14 = _13 ^ 9223372036854775807;
>   44   │ ;;succ:   4
>   45   │
>   46   │ ;;   basic block 4, loop depth 0
>   47   │ ;;pred:   2
>   48   │ ;;3
>   49   │   # _6 = PHI 
>   50   │   return _6;
>   51   │ ;;succ:   EXIT
>   52   │
>   53   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t _4;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   12   │   return _4;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add the matching for signed .SAT_ADD.
> * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
> matching func decl.
> (match_unsigned_saturation_add): Try signed .SAT_ADD and rename
> to ...
> (match_saturation_add): ... here.
> (math_opts_dom_walker::after_dom_children): Update the above renamed
> func from caller.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 18 ++
>  gcc/tree-ssa-math-opts.cc | 35 ++-
>  2 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 78f1957e8c7..b059e313415 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3192,6 +3192,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0
>
> +/* Signed

RE: [r15-3185 Regression] FAIL: gcc.target/i386/avx10_2-compare-1.c (test for excess errors) on Linux/x86_64

2024-08-26 Thread Jiang, Haochen

As applied to all AVX10.2 patches, it is caused by vector size warning
mentioned previously.

Thx,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Monday, August 26, 2024 11:54 PM
> To: jun.zh...@intel.com; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: [r15-3185 Regression] FAIL: gcc.target/i386/avx10_2-compare-1.c
> (test for excess errors) on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 576bd309ded9dfe258023f26924c064a7bf12875 is the first bad commit
> commit 576bd309ded9dfe258023f26924c064a7bf12875
> Author: Zhang, Jun 
> Date:   Mon Aug 26 10:53:54 2024 +0800
> 
> AVX10.2: Support compare instructions
> 
> caused
> 
> FAIL: gcc.target/i386/avx10_2-compare-1.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-3185/usr --enable-clocale=gnu --with-system-zlib -
> -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-compare-1.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_2-compare-1.c --
> target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)

[PATCH v2 2/2] [x86] Update ix86_mode_tieable_p and ix86_rtx_costs.

2024-08-26 Thread liuhongt

For mode2 bigger than 16-bytes, when it can be allocated to FIRST_SSE_REGS,
then it can only be allocated to ALL_SSE_REGS, and it can be tiebale
to all mode1 with smaller size which is available to FIRST_SSE_REGS.
When modes is equal to 16 bytes, exclude non-vector modes(TI/TFmode).
This is need for cse of all-ones/all-zeros, CSE checks costs with
ix86_modes_tieable_p with different size modes.

ALso update ix86_rtx_cost to prevent CONST0_RTX be propogated, it will
fail CSE of CONST0_RTX.

gcc/ChangeLog:

PR target/92080
* config/i386/i386.cc (ix86_modes_tieable_p): Relax
MODE_SIZE (mode1) to <= 64/32/16 bytes when it can be
allocated to FIRST_SSE_REG.
doesn't need to be exactly the same when >= 16.
(ix86_rtx_costs): Increase cost of const_double/const_vector
0/-1 a little to prevent propagation and enable more CSE.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr92080_vec_dup.c: New test.
* gcc.target/i386/pr92080_zero.c: New test.
---
 gcc/config/i386/i386.cc   | 14 +++--
 .../gcc.target/i386/pr92080_vec_dup.c | 48 +
 gcc/testsuite/gcc.target/i386/pr92080_zero.c  | 51 +++
 3 files changed, 108 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080_zero.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 224a78cc832..72b9859e376 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20933,15 +20933,17 @@ ix86_modes_tieable_p (machine_mode mode1, 
machine_mode mode2)
  any other mode acceptable to SSE registers.  */
   if (GET_MODE_SIZE (mode2) == 64
   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
-return (GET_MODE_SIZE (mode1) == 64
+return (GET_MODE_SIZE (mode1) <= 64
&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
   if (GET_MODE_SIZE (mode2) == 32
   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
-return (GET_MODE_SIZE (mode1) == 32
+return (GET_MODE_SIZE (mode1) <= 32
&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
   if (GET_MODE_SIZE (mode2) == 16
   && ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode2))
-return (GET_MODE_SIZE (mode1) == 16
+return ((VECTOR_MODE_P (mode2)
+? GET_MODE_SIZE (mode1) <= 16
+: GET_MODE_SIZE (mode1) == 16)
&& ix86_hard_regno_mode_ok (FIRST_SSE_REG, mode1));
 
   /* If MODE2 is appropriate for an MMX register, then tie
@@ -21507,10 +21509,12 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
case 0:
  break;
case 1:  /* 0: xor eliminates false dependency */
- *total = 0;
+ /* Add extra cost 1 to prevent propagation of CONST_VECTOR
+for SET, which will enable more CSE optimization.  */
+ *total = 0 + (outer_code == SET);
  return true;
default: /* -1: cmp contains false dependency */
- *total = 1;
+ *total = 1 + (outer_code == SET);
  return true;
}
   /* FALLTHRU */
diff --git a/gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c 
b/gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c
new file mode 100644
index 000..67fdd15d69c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92080_vec_dup.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v4 -O2" } */
+/* { dg-final { scan-assembler-times "vpbroadcast\[bwd\]" 3 } } */
+
+typedef int v16si __attribute__((vector_size(64)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef int v4si __attribute__((vector_size(16)));
+
+typedef short v32hi __attribute__((vector_size(64)));
+typedef short v16hi __attribute__((vector_size(32)));
+typedef short v8hi __attribute__((vector_size(16)));
+
+typedef char v64qi __attribute__((vector_size(64)));
+typedef char v32qi __attribute__((vector_size(32)));
+typedef char v16qi __attribute__((vector_size(16)));
+
+v16si sinksz;
+v8si sinksy;
+v4si sinksx;
+v32hi sinkhz;
+v16hi sinkhy;
+v8hi sinkhx;
+v64qi sinkbz;
+v32qi sinkby;
+v16qi sinkbx;
+
+void foo(char c) {
+  sinksz = __extension__(v16si){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinksy = __extension__(v8si){c,c,c,c,c,c,c,c};
+  sinksx = __extension__(v4si){c,c,c,c};
+}
+
+void foo1(char c) {
+  sinkhz = __extension__(v32hi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkhy = __extension__(v16hi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkhx = __extension__(v8hi){c,c,c,c,c,c,c,c};
+}
+
+void foo2(char c) {
+  sinkbz = __extension__(v64qi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkby = __extension__(v32qi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,
+c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+  sinkbx = __extension__(v16qi){c,c,c,c,c,c,c,c,c,c,c,c,c,c,c,c};
+}
diff --git a/g

[PATCH v2 1/2] Enhance cse_insn to handle all-zeros and all-ones for vector mode.

2024-08-26 Thread liuhongt

> You are possibly overwriting src_related_elt - I'd suggest to either break
> here or do the loop below for each found elt?
Changed.

> Do we know that will always succeed?
1) validate_subreg allows subreg for 2 vector modes with same component modes.
2) gen_lowpart in cse.cc is defined as gen_lowpart_if_possible,
If it fails, it returns 0, just fallback to src_related = 0.

> So on the GIMPLE side we are trying to handle such cases by maintaining
> only a single element in the hashtables, thus hash and compare them
> the same - them in this case (vec_dup:M (reg:c)) and (vec_dup:N (reg:c)),
> leaving it up to the consumer to reject or pun mismatches.
rtx_cost will be used to decided if it's profitable
((subreg:M (reg: N) 0) vs (vec_dup:M (reg:c))), if M and N is
not tieable, rtx_cost will be expensive and failed the replacement.
>
> For constants that would hold even more - note CSEing vs. duplicating
> constants might not be universally good.
Assume you mean (reg:c) in (vec_dup:M (reg:c) is from a constant, the later
rtl optimizer (.i.e forwprop/combine) will try to do the further simplication
for the constants if rtx_cost is profitable.)
For const_vector, it handled by the other codes

5063  /* Try to re-materialize a vec_dup with an existing constant.   */
5064  rtx src_elt;
5065  if ((!src_eqv_here || CONSTANT_P (src_eqv_here))
5066  && const_vec_duplicate_p (src, &src_elt))
5067{


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

Also try to handle redundant broadcasts when there's already a
broadcast to a bigger mode with exactly the same component value.
For broadcast, component mode needs to be the same.
For all-zeros/ones, only need to check the bigger mode.

gcc/ChangeLog:

PR rtl-optimization/92080
* cse.cc (cse_insn): Handle all-ones/all-zeros, and vec_dup
with variables.
---
 gcc/cse.cc | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 65794ac5f2c..fab2f515f8c 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4870,6 +4870,50 @@ cse_insn (rtx_insn *insn)
}
}
 
+  /* Try to handle special const_vector with elt 0 or -1.
+They can be represented with different modes, and can be cse.  */
+  if (src_const && src_related == 0 && CONST_VECTOR_P (src_const)
+ && (src_const == CONST0_RTX (mode)
+ || src_const == CONSTM1_RTX (mode))
+ && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+   {
+ machine_mode mode_iter;
+
+ for (int l = 0; l != 2; l++)
+   {
+ FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_VECTOR_INT)
+   {
+ if (maybe_lt (GET_MODE_SIZE (mode_iter),
+   GET_MODE_SIZE (mode)))
+   continue;
+
+ rtx src_const_iter = (src_const == CONST0_RTX (mode)
+   ? CONST0_RTX (mode_iter)
+   : CONSTM1_RTX (mode_iter));
+
+ struct table_elt *const_elt
+   = lookup (src_const_iter, HASH (src_const_iter, mode_iter),
+ mode_iter);
+
+ if (const_elt == 0)
+   continue;
+
+ for (const_elt = const_elt->first_same_value;
+  const_elt; const_elt = const_elt->next_same_value)
+   if (REG_P (const_elt->exp))
+ {
+   src_related = gen_lowpart (mode, const_elt->exp);
+   break;
+ }
+
+ if (src_related != 0)
+   break;
+   }
+ if (src_related != 0)
+   break;
+   }
+   }
+
   /* See if we have a CONST_INT that is already in a register in a
 wider mode.  */
 
@@ -5041,6 +5085,44 @@ cse_insn (rtx_insn *insn)
}
}
 
+  /* Try to find something like (vec_dup:v16si (reg:c))
+for (vec_dup:v8si (reg:c)).  */
+  if (src_related == 0
+ && VECTOR_MODE_P (mode)
+ && GET_CODE (src) == VEC_DUPLICATE)
+   {
+ poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (src)) * 2;
+ rtx inner_elt = XEXP (src, 0);
+ machine_mode result_mode;
+ struct table_elt *src_related_elt = NULL;;
+ while (related_vector_mode (mode, GET_MODE_INNER (mode),
+ nunits).exists (&result_mode))
+   {
+ rtx vec_dup = gen_rtx_VEC_DUPLICATE (result_mode, inner_elt);
+ struct table_elt* tmp = lookup (vec_dup, HASH (vec_dup, 
result_mode),
+ result_mode);
+ if (tmp)
+   {
+ src_related_elt = tmp;
+ break;
+   }
+ nunits *= 2;
+   }
+
+ if (src_related_elt)

[PATCH] Fix another inline7.c test failure on sparc targets

2024-08-26 Thread Bernd Edlinger

This new test was reported to be still failing on sparc targets.
Here the number of DW_AT_ranges dropped to zero.
The test should pass on this architecture with -Os, -O2 and -O3.
I tried to improve also different known problematic targets,
where only one subroutine had DW_AT_ranges:
Those are armhf (arm with hard float), powerpc and powerpc64.
The best option is to use -Os: So far the only one, where
all two inline instances in this test had two DW_AT_ranges.

gcc/testsuite/ChangeLog:

PR other/116462
* gcc.dg/debug/dwarf2/inline7.c: Switch to -Os optimization.
---
Hopefully this will be the final act, OK for trunk?

 gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
index 083df5b586c..8b2fa1210ad 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
@@ -1,6 +1,6 @@
 /* Verify that at least one of both inline instances have
a DW_AT_ranges but no extra DW_TAG_lexical_block.  */
-/* { dg-options "-O -gdwarf -dA" } */
+/* { dg-options "-Os -gdwarf -dA" } */
 /* { dg-do compile } */
 /* { dg-final { scan-assembler-times "\\(DIE \\(\[^\n\]*\\) 
DW_TAG_inlined_subroutine" 2 } } */
 /* { dg-final { scan-assembler " DW_AT_ranges" } } */
-- 
2.39.2

Re: [PATCH] Fix another inline7.c test failure on sparc targets

2024-08-26 Thread Richard Biener




> Am 27.08.2024 um 06:12 schrieb Bernd Edlinger :
> 
> This new test was reported to be still failing on sparc targets.
> Here the number of DW_AT_ranges dropped to zero.
> The test should pass on this architecture with -Os, -O2 and -O3.
> I tried to improve also different known problematic targets,
> where only one subroutine had DW_AT_ranges:
> Those are armhf (arm with hard float), powerpc and powerpc64.
> The best option is to use -Os: So far the only one, where
> all two inline instances in this test had two DW_AT_ranges.

Ok

Richard 

> gcc/testsuite/ChangeLog:
> 
>PR other/116462
>* gcc.dg/debug/dwarf2/inline7.c: Switch to -Os optimization.
> ---
> Hopefully this will be the final act, OK for trunk?
> 
> gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c 
> b/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
> index 083df5b586c..8b2fa1210ad 100644
> --- a/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
> +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/inline7.c
> @@ -1,6 +1,6 @@
> /* Verify that at least one of both inline instances have
>a DW_AT_ranges but no extra DW_TAG_lexical_block.  */
> -/* { dg-options "-O -gdwarf -dA" } */
> +/* { dg-options "-Os -gdwarf -dA" } */
> /* { dg-do compile } */
> /* { dg-final { scan-assembler-times "\\(DIE \\(\[^\n\]*\\) 
> DW_TAG_inlined_subroutine" 2 } } */
> /* { dg-final { scan-assembler " DW_AT_ranges" } } */
> --
> 2.39.2
>

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-26 Thread Martin Uecker

Am Montag, dem 26.08.2024 um 17:21 -0700 schrieb Kees Cook:
> On Mon, Aug 26, 2024 at 11:01:08PM +0200, Martin Uecker wrote:
> > Am Montag, dem 26.08.2024 um 13:30 -0700 schrieb Kees Cook:
> > > On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
> > > > Hi, Martin,
> > > > 
> > > > Looks like that there is some issue when I tried to use the _Generic 
> > > > for the testing cases, and then I narrowed down to a
> > > > small testing case that shows the problem without any change to GCC.
> > > > 
> > > > [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> > > > struct annotated {
> > > >   char b;
> > > >   int c[];
> > > > } *array_annotated;  
> > > > extern void * counted_by_ref (int *);
> > > > 
> > > > int main(int argc, char *argv[])
> > > > {
> > > >   typeof(counted_by_ref (array_annotated->c)) ret
> > > > = counted_by_ref (array_annotated->c); 
> > > >_Generic (ret, void* : (void)0, default: *ret = 10);
> > > > 
> > > >   return 0;
> > > > }
> > > > [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> > > > t1.c: In function ‘main’:
> > > > t1.c:12:44: warning: dereferencing ‘void *’ pointer
> > > >12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> > > >   |^~~~
> > > > t1.c:12:49: error: invalid use of void expression
> > > >12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> > > >   | ^
> > > 
> > > I implemented it like this[1] in the Linux kernel. So yours could be:
> > > 
> > > struct annotated {
> > >   char b;
> > >   int c[] __attribute__((counted_by(b));
> > > };
> > > extern struct annotated *array_annotated;
> > > 
> > > int main(int argc, char *argv[])
> > > {
> > >   typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
> > >  void *: (size_t *)NULL,
> > >  default: __builtin_get_counted_by(array_annotated->c)))
> > >   ret = __builtin_get_counted_by(array_annotated->c);
> > >   if (ret)
> > >   *ret = 10;
> > > 
> > >   return 0;
> > > }
> > > 
> > > It's a bit cumbersome, but it does what's needed.
> > > 
> > > This is, however, just doing exactly what Bill has suggested: it is
> > > converting the (void *)NULL into (size_t *)NULL when there is no
> > > counted_by annotation...
> > > 
> > > -Kees
> > > 
> > > [1] 
> > > https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/
> > 
> > Interesting. Will __builtin_get_counted_by(array_annotated->c) give
> > a null pointer (or an invalid pointer) of the correct type if 
> > array_annotated is a null pointer of an annotated struct type?
> 
> If you mean this part:
> 
>   typeof(P) __obj_ptr = NULL; \
>   /* Just query the counter type for type_max checking. */ \
>   typeof(_Generic(__flex_counter(__obj_ptr->FAM), \
>   void *: (size_t *)NULL, \
>   default: __flex_counter(__obj_ptr->FAM))) \
>   __counter_type_ptr = NULL; \
> 
> Where __obj_ptr starts as NULL, then yes. (Or at least, yes it does
> currently with Qing's GCC patch and Bill's Clang patch.)

Does __builtin_get_counted_by not evaluate its argument? In any
case, I think this should be documented whether this is 
supposed to work (or not).

> 
> > I also wonder a bit about the multiple macro evaluations of the arguments
> > for P and SIZE.
> 
> I tried to design it so they aren't used with anything that should
> have side-effects.

I was more concerned about the cost of macro expansions on
compile times. I would do:

__auto_type __FOO = (FOO);

for all macro parameters that are evaluated multiple times
and are expressions which might contain macros themselves.

There is also the issue of evaluation of typeof for variably modified 
types, which might not currently affect the kernel, but this would
also become safer for such types.


> Anyway, if __builtin_get_counted_by returns (size_t *)NULL then I think
> the _Generic wrapping isn't needed. That would make it easier to use?

It would make it easier for your use case.  I wonder though
whether other people might want to have the compile time error
when there is no attribute.


Martin

> 
> -Kees
>

Re: [PATCH] MATCH: add abs support for half float

2024-08-26 Thread Kugan Vivekanandarajah

Hi Richard,

> On 22 Aug 2024, at 10:34 pm, Richard Biener  
> wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
>  wrote:
>>
>> Hi Richard,
>>
>>> On 20 Aug 2024, at 6:09 pm, Richard Biener  
>>> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
>>>  wrote:

 Thanks for the comments.

> On 2 Aug 2024, at 8:36 pm, Richard Biener  
> wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
>  wrote:
>>
>>
>>
>>> On 1 Aug 2024, at 10:46 pm, Richard Biener  
>>> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
>>>  wrote:


 On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski  
 wrote:
>
> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
>  wrote:
>>
>> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
>>  wrote:
>>>
>>> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
>>>  wrote:

 On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
  wrote:
>
> On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
>  wrote:
>>
>> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
>>  wrote:
>>>
>>> On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
>>>  wrote:

 Revised based on the comment and moved it into existing 
 patterns as.

 gcc/ChangeLog:

 * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : 
 -A.
 Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/absfloat16.c: New test.
>>>
>>> The testcase needs to make sure it runs only for targets that 
>>> support
>>> float16 so like:
>>>
>>> /* { dg-require-effective-target float16 } */
>>> /* { dg-add-options float16 } */
>> Added in the attached version.
>
> + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> (for cmp (ge gt)
> (simplify
> -   (cnd (cmp @0 zerop) @1 (negate @1))
> -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> -&& bitwise_equal_p (@0, @1))
> +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> +&& ((VECTOR_TYPE_P (type)
> + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE 
> (@1)))
> +   || (!VECTOR_TYPE_P (type)
> +   && (TYPE_PRECISION (TREE_TYPE (@1))
> +   <= TYPE_PRECISION (TREE_TYPE (@0)
> +&& bitwise_equal_p (@1, @2))
>
> I wonder about the bitwise_equal_p which tests @1 against @2 now
> with the convert still applied to @1 - that looks odd.  You are 
> allowing
> sign-changing conversions but doesn't that change ge/gt behavior?
> Also why are sign/zero-extensions not OK for vector types?
 Thanks for the review.
 My main motivation here is for _Float16  as below.

 _Float16 absfloat16 (_Float16 x)
 {
 float _1;
 _Float16 _2;
 _Float16 _4;
  [local count: 1073741824]:
 _1 = (float) x_3(D);
 if (_1 < 0.0)
 goto ; [41.00%]
 else
 goto ; [59.00%]
  [local count: 440234144]:\
 _4 = -x_3(D);
  [local count: 1073741824]:
 # _2 = PHI <_4(3), x_3(D)(2)>
 return _2;
 }

 This is why I added  bitwise_equal_p test of @1 against @2 with
 TYPE_PRECISION checks.
 I agree that I will have to check for sign-changing conversions.

 Just to keep it simple, I disallowed vector types. I am not sure if
 this would  hit vec types. I am happy to handle this if that is
 needed.
>>>
>>> I think with __builtin_convertvector you should be able to construct
>>> a testcase that does
>> Thanks.
>

95 matches

Mail list logo