Re: [PATCH v4 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-11-27 Thread Hongyu Wang
Ping^2

Hongyu Wang  于2024年11月21日周四 11:04写道:
>
> Gently ping, it would be appreciate if anyone can help review this.
> We hope this patch will not miss GCC15 for complete support on APX.
>
> Kong, Lingling  于2024年11月14日周四 09:50写道:
>
> >
> > Hi,
> >
> > Many thanks to Richard for the suggestion that conditional load is like a 
> > scalar instance of maskload_optab . So this version has use maskload and 
> > maskstore optab to expand and generate cfcmov in ifcvt pass.
> >
> > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > We also tested spec with SDE and passed the runtime test.
> >
> > Ok for trunk?
> >
> > APX CFCMOV[1] feature implements conditionally faulting which means that 
> > all memory faults are suppressed when the condition code evaluates to false 
> > and load or store a memory operand. Now we could load or store a memory 
> > operand may trap or fault for conditional move.
> >
> > In middle-end, now we don't support a conditional move if we knew that a 
> > load from A or B could trap or fault. To enable CFCMOV, use mask_load and 
> > mask_store to expand.
> >
> > Conditional move suppress_fault for condition mem store would not move any 
> > arithmetic calculations. For condition mem load now just support a 
> > conditional move one trap mem and one no trap and no mem cases.
> >
> > [1].https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
> >
> > gcc/ChangeLog:
> >
> > * ifcvt.cc (can_use_scalar_mask_store): New func for conditional
> > faulting movcc for store.
> > (can_use_scalar_mask_load_store):  New func for conditional 
> > faulting.
> > (noce_try_cmove_arith): Try to convert to conditional faulting
> > movcc.
> > (noce_process_if_block): Ditto.
> > * optabs.cc (emit_conditional_move): Handle cfmovcc.
> > (emit_conditional_move_1): Ditto.
> > ---
> >  gcc/ifcvt.cc  | 105 +-
> >  gcc/optabs.cc |  20 ++
> >  2 files changed, 115 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> > index 74f13a637b2..b3adee35ff5 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -778,6 +778,8 @@ static bool noce_try_store_flag_mask (struct 
> > noce_if_info *);
> >  static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
> > rtx, rtx, rtx, rtx = NULL, rtx = NULL);
> >  static bool noce_try_cmove (struct noce_if_info *);
> > +static bool can_use_scalar_mask_store (rtx, rtx, rtx, bool);
> > +static bool can_use_scalar_mask_load_store (struct noce_if_info *);
> >  static bool noce_try_cmove_arith (struct noce_if_info *);
> >  static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn 
> > **);
> >  static bool noce_try_minmax (struct noce_if_info *);
> > @@ -2132,6 +2134,54 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool 
> > simple)
> >return true;
> >  }
> >
> > +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> > +   scalar mask store and could do conditional faulting movcc, i.e.
> > +   x86 cfcmov, especially when store x may cause memmory faults and
> > +   in else_bb x == b.  */
> > +
> > +static bool
> > +can_use_scalar_mask_store (rtx x, rtx a, rtx b, bool a_simple)
> > +{
> > +  gcc_assert (MEM_P (x));
> > +
> > +  machine_mode x_mode = GET_MODE (x);
> > +  if (convert_optab_handler (maskstore_optab, x_mode,
> > +x_mode) == CODE_FOR_nothing)
> > +return false;
> > +
> > +  if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> > +return false;
> > +  if (!a_simple || !register_operand (a, x_mode))
> > +return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Return TRUE if backend supports scalar maskload_optab/maskstore_optab,
> > +   which suppressed memory faults when load or store a memory operand
> > +   and the condition code evaluates to false.  */
> > +
> > +static bool
> > +can_use_scalar_mask_load_store (struct noce_if_info *if_info)
> > +{
> > +  rtx a = if_info->a;
> > +  rtx b = if_info->b;
> > +  rtx x = if_info->x;
> > +
> > +  if (!MEM_P (a) && !MEM_P (b))
> > +return false;
> > +
> > +  if (MEM_P (x))
> > +return can_use_scalar_mask_store (x, a, b, if_info->then_simple);
> > +  else
> > +/* Return TRUE if backend supports scalar maskload_optab, we could 
> > convert
> > +   "if (test) x = *a; else x = b;" or "if (test) x = a; else x = *b;"
> > +   to conditional faulting movcc, i.e. x86 cfcmov, especially when 
> > load a
> > +   or b may cause memmory faults.  */
> > +return convert_optab_handler (maskstore_optab, GET_MODE (a),
> > + GET_MODE (a)) != CODE_FOR_nothing;
> > +}
> > +
> >  /* Try more complex cases involving conditional_move.  */
> >
> >  static bool
> > @@ -2171,7 +2221,17 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
> >/* ??? We co

[PATCH 2/2] RISC-V: Add intrinsics testcases for SiFive Xsfvqmaccqoq/dod extensions.

2024-11-27 Thread shiyulong
From: yulong 

This commit adds testcases for Xsfvqmaccqoq/dod.

Co-Authored by: Kito Cheng 
Co-Authored by: Monk Chiang 
Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: New test.

---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 .../riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c   | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c   | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c| 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c| 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c   | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c   | 213 ++
 9 files changed, 1706 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 448374d49db..8f5860c46b4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -37,6 +37,8 @@ dg-init
 set CFLAGS "$DEFAULT_CFLAGS -O3"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
"" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/xsfvector/*.\[cS\]]] \
+   "" $CFLAGS
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
new file mode 100644
index 000..f2058a14779
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
@@ -0,0 +1,213 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_xsfvqmaccdod -mabi=lp64d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "riscv_vector.h"
+
+/*
+** test_sf_vqmacc_2x8x2_i32m1_vint32m1_t:
+** ...
+** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+
+** ...
+*/
+vint32m1_t
+test_sf_vqmacc_2x8x2_i32m1_vint32m1_t (vint32m1_t vd, vint8m1_t vs1,
+  vint8m1_t vs2, size_t vl)
+{
+  return __riscv_sf_vqmacc_2x8x2_i32m1 (vd, vs1, vs2, vl);
+}
+
+/*
+** test_sf_vqmacc_2x8x2_i32m2_vint32m2_t:
+** ...
+** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+
+** ...
+*/
+vint32m2_t
+test_sf_vqmacc_2x8x2_i32m2_vint32m2_t (vint32m2_t vd, vint8m1_t vs1,
+  vint8m2_t vs2, size_t vl)
+{
+  return __riscv_sf_vqmacc_2x8x2_i32m2 (vd, vs1, vs2, vl);
+}
+
+/*
+** test_sf_vqmacc_2x8x2_i32m4_vint32m4_t:
+** ...
+** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+
+** ...
+*/
+vint32m4_t
+test_sf_vqmacc_2x8x2_i32m4_vint32m4_t (vint32m4_t vd, vint8m1_t vs1,
+  vint8m4_t vs2, size_t vl)
+{
+  return __riscv_sf_vqmacc_2x8x2_i32m4 (vd, vs1, vs2, vl);
+}
+
+/*
+** test_sf_vqmacc_2x8x2_i32m8_vint32m8_t:
+** ...
+** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+
+** ...
+*/
+vint32m8_t
+test_sf_vqmacc_2x8x2_i32m8_vint32m8_t (vint32m8_t vd, vint8m1_t vs1,
+  vint8m8_t vs2, size_t vl)
+{
+  return __riscv_sf_vqmacc_2x8x2_i32m8 (vd, vs1, vs2, vl);
+}
+
+/*
+** test_sf_vqmacc_2x8x2_vint32m1_t:
+** ...
+** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+
+** ...
+*/
+vint32m1_t
+test_sf_vqmacc_2x8x2_vint32m1_t (vint32m1_t vd, vint8m1_t vs1, vint8m1_t vs2,
+size_t vl)
+{
+  return __riscv_sf_vqmacc_2x8x2 (vd, vs1, vs2, vl);
+}
+
+/*
+** test_sf_vqmacc_2x8x2_vint32m2_t:
+** ...
+

[PATCH 0/2] RISC-V: Add intrinsics support and testcases for SiFive Xsfvqmaccqoq/dod.

2024-11-27 Thread shiyulong
From: yulong 

This patch implements the Sifvie vendor extension Xsfvqmaccqoq and
Xsfvqmaccdod[1] support to gcc. Providing intrinsic functions vqmacc
(signed-signed mac), vqmaccu (unsigned-unsignedmac), vqmaccsu
(signed-unsigned mac), vqmaccus (unsigned-signed mac) for 4x8x4 and
2x8x2 martix multiplication operations.

[1] 
https://www.sifive.com/document-file/sifive-int8-matrix-multiplication-extensions-specification

Co-Authored by: Kito Cheng 
Co-Authored by: Monk Chiang 
Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

yulong (2):
  RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.
  RISC-V: Add intrinsics testcases for SiFive Xsfvqmaccqoq/dod
extensions.

 gcc/config.gcc|   2 +-
 gcc/config/riscv/generic-vector-ooo.md|   2 +-
 gcc/config/riscv/genrvv-type-indexer.cc   |  47 
 .../riscv/riscv-vector-builtins-shapes.cc |  30 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def |  12 +
 gcc/config/riscv/riscv-vector-builtins.cc | 151 -
 gcc/config/riscv/riscv-vector-builtins.def|  26 ++-
 gcc/config/riscv/riscv-vector-builtins.h  |  14 ++
 gcc/config/riscv/riscv.md |   4 +-
 .../riscv/sifive-vector-builtins-bases.cc | 164 ++
 .../riscv/sifive-vector-builtins-bases.h  |  35 +++
 .../sifive-vector-builtins-functions.def  |  54 +
 gcc/config/riscv/sifive-vector.md | 179 +++
 gcc/config/riscv/t-riscv  |  20 ++
 gcc/config/riscv/vector-iterators.md  |  33 +++
 gcc/config/riscv/vector.md|   1 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 .../riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c   | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c   | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c| 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c| 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c   | 213 ++
 .../riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c   | 213 ++
 26 files changed, 2463 insertions(+), 19 deletions(-)
 create mode 100644 gcc/config/riscv/sifive-vector-builtins-bases.cc
 create mode 100644 gcc/config/riscv/sifive-vector-builtins-bases.h
 create mode 100644 gcc/config/riscv/sifive-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/sifive-vector.md
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c

-- 
2.34.1



[committed] c: Fix gimplification ICE for shifts with invalid redeclarations

2024-11-27 Thread Joseph Myers
As reported in bug 117757, there is a C gimplification ICE for shifts
involving a variable that was incompatibly redeclared (and thus had
its type changed to error_mark_node).  Fix this with an appropriate
error_operand_p check.

Note that this is not the same issue as any of the other bugs reported
for ICEs later in the gimplifier dealing with such erroneous
redeclarations (it is, however, the same as the *second* ICE reported
in bug 115644 - the test in comment#1 for that bug, not the one in the
original bug report).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/117757

gcc/c-family/
* c-gimplify.cc (c_gimplify_expr): Check for error_operand_p
before calling TYPE_MAIN_VARIANT for shifts.

gcc/testsuite/
* gcc.dg/pr117757-1.c: New test.

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 09ea1b791590..d4b97b4a9728 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -806,7 +806,8 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
   We should get rid of this conversion when we have a proper
   type demotion/promotion pass.  */
tree *op1_p = &TREE_OPERAND (*expr_p, 1);
-   if (!VECTOR_TYPE_P (TREE_TYPE (*op1_p))
+   if (!error_operand_p (*op1_p)
+   && !VECTOR_TYPE_P (TREE_TYPE (*op1_p))
&& !types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)),
unsigned_type_node)
&& !types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)),
diff --git a/gcc/testsuite/gcc.dg/pr117757-1.c 
b/gcc/testsuite/gcc.dg/pr117757-1.c
new file mode 100644
index ..238b6db42bf5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr117757-1.c
@@ -0,0 +1,10 @@
+/* Test ICE for shift with invalid redeclaration (bug 117757).  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void
+f (int a)
+{
+  1 << a;
+  int a[1]; /* { dg-error "redeclared" } */
+}

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH v3] LoongArch: Mask shift offset when emit {xv, v}{srl, sll, sra} with sameimm vector

2024-11-27 Thread Jinyang He
For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.

gcc/ChangeLog:

* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.
---
v2: Fix indent in lsx.md and lasx.md.
Use "dg-do assemble" in test which suggested by Ruoyao.
v3: Re-enable scan-assembler.

 gcc/config/loongarch/constraints.md   | 14 ++--
 gcc/config/loongarch/lasx.md  | 60 
 gcc/config/loongarch/loongarch-protos.h   |  5 +-
 gcc/config/loongarch/lsx.md   | 60 
 gcc/config/loongarch/predicates.md|  8 +--
 .../vector/lasx/lasx-shift-sameimm-vec.c  | 72 +++
 .../vector/lsx/lsx-shift-sameimm-vec.c| 72 +++
 7 files changed, 254 insertions(+), 37 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 18da8b31f49..66ef1073fad 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -334,19 +334,19 @@
   (and (match_code "const_vector")
(match_test "loongarch_const_vector_same_int_p (op, mode, -16, 15)")))
 
-(define_constraint "Uuv6"
-  "@internal
-   A replicated vector const in which the replicated value is in the range
-   [0,63]."
-  (and (match_code "const_vector")
-   (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 63)")))
-
 (define_constraint "Urv8"
   "@internal
A replicated vector const with replicated byte values as well as elements"
   (and (match_code "const_vector")
(match_test "loongarch_const_vector_same_bytes_p (op, mode)")))
 
+(define_constraint "Uuvx"
+  "@internal
+   A replicated vector const in which the replicated value is in the unsigned
+   range [0,umax]."
+  (and (match_code "const_vector")
+   (match_test "loongarch_const_vector_same_int_p (op, mode)")))
+
 (define_memory_constraint "ZC"
   "A memory operand whose address is formed by a base register and offset
that is suitable for use in instructions with the same addressing mode
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 457ed163f31..90778dd8ff9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1013,11 +1013,23 @@
   [(set (match_operand:ILASX 0 "register_operand" "=f,f")
(lshiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
   "ISA_HAS_LASX"
-  "@
-   xvsrl.\t%u0,%u1,%u2
-   xvsrli.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsrl.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrli.\t%u0,%u1,%d2";
+  }
+default:
+  gcc_unreachable ();
+}
+}
   [(set_attr "type" "simd_shift")
(set_attr "mode" "")])
 
@@ -1026,11 +1038,23 @@
   [(set (match_operand:ILASX 0 "register_operand" "=f,f")
(ashiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
   "ISA_HAS_LASX"
-  "@
-   xvsra.\t%u0,%u1,%u2
-   xvsrai.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsra.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrai.\t%u0,%u1,%d2";
+  }
+default:
+  gcc_unreachable ();
+}
+}
   [(set_attr "type" "simd_shift")
(set_attr "mode" "")])
 

Re: [PATCH v4] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-11-27 Thread Jason Merrill
On 11/6/24 3:33 PM, Marek Polacek wrote:

On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote:

On 10/30/24 4:59 PM, Marek Polacek wrote:

On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote:

On Tue, 29 Oct 2024, Marek Polacek wrote:

+static tree
+cp_parser_pack_index (cp_parser *parser, tree pack)
+{
+  if (cxx_dialect < cxx26)
+pedwarn (cp_lexer_peek_token (parser->lexer)->location,
+OPT_Wc__26_extensions, "pack indexing only available with "
+"%<-std=c++2c%> or %<-std=gnu++2c%>");
+  /* Consume the '...' token.  */
+  cp_lexer_consume_token (parser->lexer);
+  /* Consume the '['.  */
+  cp_lexer_consume_token (parser->lexer);
+
+  if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_SQUARE))
+{
+  error_at (cp_lexer_peek_token (parser->lexer)->location,
+   "pack index missing");


Maybe cp_parser_error?


Unsure.  This:

   template
   void foo(Ts...[]);

then generates:

   error: variable or field 'foo' declared void
   error: expected primary-expression before '...' token
   error: pack index missing before ']' token

which doesn't seem better.


I guess the question is whether we need to deal with the vexing parse. 
But in this case it'd be ill-formed regardless, so what you have is fine.



@@ -6368,6 +6416,12 @@ cp_parser_primary_expression (cp_parser *parser,
  = make_location (caret_loc, start_loc, finish_loc);
decl.set_location (combined_loc);
+
+   /* "T...[constant-expression]" is a C++26 pack-index-expression.  */
+   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)
+   && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_SQUARE))
+ decl = cp_parser_pack_index (parser, decl);


Shouldn't this be in cp_parser_id_expression?


It should, but I need to wait until after finish_id_expression, so that
DECL isn't just an identifier node.


Ah, makes sense.


+ ~ computed-type-specifier


Hmm, seems we never implemented ~decltype.


Looks like CWG 1753: .


Thanks.


@@ -4031,6 +4036,15 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 *walk_subtrees = 0;
 return NULL_TREE;
+case PACK_INDEX_TYPE:
+case PACK_INDEX_EXPR:
+  /* We can have an expansion of an expansion, such as "Ts...[Is]...",
+so do look into the index.  */
+  cp_walk_tree (&PACK_INDEX_INDEX (t), &find_parameter_packs_r, ppd,
+   ppd->visited);
+  *walk_subtrees = 0;
+  return NULL_TREE;


Do we need to handle these specifically here?  I'd think the handling in
cp_walk_subtrees would be enough.


I think I do, otherwise the Ts...[Is]... test doesn't work.
It is used when calling check_for_bare_parameter_packs.


Makes sense.


I'm not seeing a test for https://eel.is/c++draft/diff#cpp23.dcl.dcl-2 or
the code to handle this case differently in C++23 vs 26.
  
Ah, right.  I've added the test (pack-indexing11.C) but we don't

compile it C++23 as we should due to:

pack-indexing11.C:7:13: error: expected ',' or '...' before '[' token
 7 | void f(T... [1]);
   | ^

which seems like a bug.  Opened .

Is fixing that a requirement for this patch?


No.  Really, given that we're reusing this grammar, it's probably fine 
to never fix it.



This patch implements C++26 Pack Indexing, as described in
.

The issue discussing how to mangle pack indexes has not been resolved
yet  and I've
made no attempt to address it so far.

Unlike v1, which used augmented TYPE/EXPR_PACK_EXPANSION codes, this
version introduces two new codes: PACK_INDEX_EXPR and PACK_INDEX_TYPE.
Both carry two operands: the pack expansion and the index.  They are
handled in tsubst_pack_index: substitute the index and the pack and
then extract the element from the vector (if possible).

To handle pack indexing in a decltype or with decltype(auto), there is
also the new PACK_INDEX_PARENTHESIZED_P flag.

With this feature, it's valid to write something like

   using U = tmpl;

where we first expand the template argument into

   Ts...[Is#0], Ts...[Is#1], ...

and then substitute each individual pack index.

+  MARK_TS_TYPE_NON_COMMON (PACK_INDEX_TYPE);


I wonder about trying to use the tree_common symtab member for the type 
index so we don't need non_common, but that's not necessary.



+   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)
+   && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_SQUARE))


This happens a lot in the parser changes, how about factoring it into 
cp_parser_next_tokens_are_pack_index?


Or change cp_parser_pack_index to cp_parser_maybe_pack_index that does 
this check, then returns the argument if we aren't looking at a pack index?



+cp_parser_pack_index (cp_parser *parser, tree pack)
+{
+  if (cxx_dialect < cxx26)
+pedwarn (cp_lexer_peek_token (parser->lexer)->loc

[r15-5727 Regression] FAIL: gcc.target/i386/pr112600-5-u16.c scan-tree-dump-times optimized ".SAT_ADD " 3 on Linux/x86_64

2024-11-27 Thread haochen.jiang
On Linux/x86_64,

4a8685911697c237ff8c0589827eb8649f8440f1 is the first bad commit
commit 4a8685911697c237ff8c0589827eb8649f8440f1
Author: Pan Li 
Date:   Fri Nov 22 11:48:26 2024 +0800

I386: Add more testcases for unsigned SAT_ADD vector pattern

caused

FAIL: gcc.target/i386/pr112600-5-u16.c scan-tree-dump-times optimized ".SAT_ADD 
" 3

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-5727/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr112600-5-u16.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr112600-5-u16.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-27 Thread Harald Anlauf
Am 27.11.24 um 21:56 schrieb Jerry D:

On 11/27/24 12:31 PM, Harald Anlauf wrote:

Dear all,

the attached patch fixes a wrong-code issue with bounds-checking
enabled when doing I/O of an array section and an index is either
an expression or a function result.  The problem does not occur
without bounds-checking.

When looking at the original testcase, the function occuring in
the affected index was evaluated twice, once with wrong arguments.

The most simple solution appears to fall back to scalarization
with bounds-checking enabled.  If someone has a quick idea to
handle this better, please speak up!

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This seems to be a 14/15 regression, so a backport is advisable.

Thanks,
Harald



The patch looks OK to me.

I wonder if this fall back to the scalarizer should be done everywhere 
if a a user has specified bounds checking, what is the point of 
optimizing array references?


If an array reference is of the type A(:,f()), there is no need to
do bounds-checking for the first array index (we don't, so OK),
and we also could pass the array slice to a library function that
handles the section in one go, without generating a loop with calls.
Scalarization is then sort of a missed-optimization.

The problem is that the second argument is somehow evaluated twice
with bounds-checking, but only with the I/O optimization.  I did not
see such an issue when assigning A(:,f()) to a temporary rank-1 array
and passing that array to the write().  It did create the right bounds
check, and called f() correctly just once.

Instead of creating a temporary, just passing to the scalarizer was
the simpler solution.  Maybe Paul has an idea to solve this in a
better way.

If the code works in 13 maybe we need to isolate to what broke it and 
intervene at that place.


Looking at the tree-dump, no bounds-check was generated in 13.
I did some work to extend bounds-checking during 14-development,
and the testcase may have just uncovered a latent issue?

(And we sometimes evaluate functions way too often, see e.g. pr114021,
so there's no lack of possibly related issues...)

Also go ahead with back porting if no other ideas pop up.  I just fear 
we are covering up something else.


I'll wait until tomorrow to see if Paul intervenes.  Otherwise I will
proceed and push.

Thanks for the review and discussion!

Harald


Jerry







[committed] libstdc++: Remove __builtin_expect from consteval assertion

2024-11-27 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/c++config (__glibcxx_assert): Remove useless
__builtin_expect from constexpr-only assertion. Improve
comments.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/c++config | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index c74b03013fd..a5001d0a0b0 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -626,14 +626,17 @@ namespace std
 #endif
 
 #if defined(_GLIBCXX_ASSERTIONS)
-// Enable runtime assertion checks, and also check in constant expressions.
+// When _GLIBCXX_ASSERTIONS is defined we enable runtime assertion checks.
+// These checks will also be done during constant evaluation.
 # define __glibcxx_assert(cond)
\
   do { \
 if (__builtin_expect(!bool(cond), false))  \
   _GLIBCXX_ASSERT_FAIL(cond);  \
   } while (false)
 #elif _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED
-// Only check assertions during constant evaluation.
+// _GLIBCXX_ASSERTIONS is not defined, so assertions checks are only enabled
+// during constant evaluation. This ensures we diagnose undefined behaviour
+// in constant expressions.
 namespace std
 {
   __attribute__((__always_inline__,__visibility__("default")))
@@ -643,12 +646,12 @@ namespace std
 }
 # define __glibcxx_assert(cond)
\
   do { \
-if (std::__is_constant_evaluated())
\
-  if (__builtin_expect(!bool(cond), false))
\
-   std::__glibcxx_assert_fail();   \
+if (std::__is_constant_evaluated() && !bool(cond)) \
+  std::__glibcxx_assert_fail();\
   } while (false)
 #else
-// Don't check any assertions.
+// _GLIBCXX_ASSERTIONS is not defined and __is_constant_evaluated() doesn't
+// work so don't check any assertions.
 # define __glibcxx_assert(cond)
 #endif
 
-- 
2.47.0



[committed] libstdc++: Add cold attribute to assertion failure functions [PR117650]

2024-11-27 Thread Jonathan Wakely
This helps the compiler to split the cold path into a separate clone, so
that the hot path is a smaller function that uses less icache, and the
cold path is only fetched into the icache if actually executed.

libstdc++-v3/ChangeLog:

PR libstdc++/117650
* include/bits/c++config (__glibcxx_assert_fail): Add cold
attribute.
* include/debug/formatter.h (_Error_formatter::_M_error):
Likewise.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/c++config| 2 +-
 libstdc++-v3/include/debug/formatter.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 236906d2f79..c74b03013fd 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -610,7 +610,7 @@ namespace std
 {
 #pragma GCC visibility push(default)
   // Don't use  because this should be unaffected by NDEBUG.
-  extern "C++" _GLIBCXX_NORETURN
+  extern "C++" _GLIBCXX_NORETURN __attribute__((__cold__))
   void
   __glibcxx_assert_fail /* Called when a precondition violation is detected. */
 (const char* __file, int __line, const char* __function,
diff --git a/libstdc++-v3/include/debug/formatter.h 
b/libstdc++-v3/include/debug/formatter.h
index 4f5a4539bb9..df7f6965a4c 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -571,7 +571,7 @@ namespace __gnu_debug
 _Error_formatter&
 _M_message(_Debug_msg_id __id) const throw ();
 
-_GLIBCXX_NORETURN void
+_GLIBCXX_NORETURN __attribute__((__cold__)) void
 _M_error() const;
 
 #if !_GLIBCXX_INLINE_VERSION
-- 
2.47.0



[committed] c: Fix ICE using function name in parameter type in old-style function definition [PR91193]

2024-11-27 Thread Joseph Myers
As reported in bug 91193, if an old-style function definition
redeclares a typedef name as a function, then uses that function name
at the start of the first old-style parameter definition, then the
parser interprets that token as a typedef name (because lookahead
occurred before processing of the function declarator completed), but
when it is looked up in processing that parameter definition, what is
found is the redefinition, resulting in an ICE.

The function name's scope starts at the end of its declarator, so this
is similar to other cases where we call
c_parser_maybe_reclassify_token because lookahead might have
classified a token as being a typedef or not based on information from
the wrong scope; do so in this case as well, so resulting in the
expected parse errors from using something that's no longer a typedef
name as if it were a typedef name, and eliminating the ICE.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/91193

gcc/c/
* c-parser.cc (c_parser_maybe_reclassify_token): Define earlier.
(c_parser_declaration_or_fndef): Call
c_parser_maybe_reclassify_token before parsing old-style parameter
definitions.

gcc/testsuite/
* gcc.dg/pr91193-1.c, gcc.dg/pr91193-2.c: New tests.

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 730f70bfdc66..9f16493aa143 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -2129,6 +2129,43 @@ handle_assume_attribute (location_t here, tree attrs, 
bool nested)
   return remove_attribute ("gnu", "assume", attrs);
 }
 
+/* We might need to reclassify any previously-lexed identifier, e.g.
+   when we've left a for loop with an if-statement without else in the
+   body - we might have used a wrong scope for the token.  See PR67784.  */
+
+static void
+c_parser_maybe_reclassify_token (c_parser *parser)
+{
+  if (c_parser_next_token_is (parser, CPP_NAME))
+{
+  c_token *token = c_parser_peek_token (parser);
+
+  if (token->id_kind != C_ID_CLASSNAME)
+   {
+ tree decl = lookup_name (token->value);
+
+ token->id_kind = C_ID_ID;
+ if (decl)
+   {
+ if (TREE_CODE (decl) == TYPE_DECL)
+   token->id_kind = C_ID_TYPENAME;
+   }
+ else if (c_dialect_objc ())
+   {
+ tree objc_interface_decl = objc_is_class_name (token->value);
+ /* Objective-C class names are in the same namespace as
+variables and typedefs, and hence are shadowed by local
+declarations.  */
+ if (objc_interface_decl)
+   {
+ token->value = objc_interface_decl;
+ token->id_kind = C_ID_CLASSNAME;
+   }
+   }
+   }
+}
+}
+
 /* Parse a declaration or function definition (C90 6.5, 6.7.1, C99
6.7, 6.9.1, C11 6.7, 6.9.1).  If FNDEF_OK is true, a function definition
is accepted; otherwise (old-style parameter declarations) only other
@@ -3021,6 +3058,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
 function definitions either.  */
   int save_debug_nonbind_markers_p = debug_nonbind_markers_p;
   debug_nonbind_markers_p = 0;
+  c_parser_maybe_reclassify_token (parser);
   while (c_parser_next_token_is_not (parser, CPP_EOF)
 && c_parser_next_token_is_not (parser, CPP_OPEN_BRACE))
c_parser_declaration_or_fndef (parser, false, false, false,
@@ -8359,43 +8397,6 @@ c_parser_else_body (c_parser *parser, const 
token_indent_info &else_tinfo,
   return c_end_compound_stmt (body_loc, block, flag_isoc99);
 }
 
-/* We might need to reclassify any previously-lexed identifier, e.g.
-   when we've left a for loop with an if-statement without else in the
-   body - we might have used a wrong scope for the token.  See PR67784.  */
-
-static void
-c_parser_maybe_reclassify_token (c_parser *parser)
-{
-  if (c_parser_next_token_is (parser, CPP_NAME))
-{
-  c_token *token = c_parser_peek_token (parser);
-
-  if (token->id_kind != C_ID_CLASSNAME)
-   {
- tree decl = lookup_name (token->value);
-
- token->id_kind = C_ID_ID;
- if (decl)
-   {
- if (TREE_CODE (decl) == TYPE_DECL)
-   token->id_kind = C_ID_TYPENAME;
-   }
- else if (c_dialect_objc ())
-   {
- tree objc_interface_decl = objc_is_class_name (token->value);
- /* Objective-C class names are in the same namespace as
-variables and typedefs, and hence are shadowed by local
-declarations.  */
- if (objc_interface_decl)
-   {
- token->value = objc_interface_decl;
- token->id_kind = C_ID_CLASSNAME;
-   }
-   }
-   }
-}
-}
-
 /* Parse an if statement (C90 6.6.4, C99 6.8.4, C11 6.8.4).
 
if-statement:
diff --git a/gcc/testsuite/gcc.dg/pr91193-1.c b/gcc/testsuite/gcc.dg/pr91

PING: [PATCH v4 1/7] Honor TARGET_PROMOTE_PROTOTYPES during RTL expand

2024-11-27 Thread H.J. Lu
On Thu, Nov 21, 2024, 2:02 PM H.J. Lu  wrote:

> Promote integer arguments smaller than int if TARGET_PROMOTE_PROTOTYPES
> returns true.
>
> PR middle-end/14907
> * calls.c (initialize_argument_information): Promote small integer
> arguments if TARGET_PROMOTE_PROTOTYPES returns true.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/calls.cc | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 01c4ef51545..60eb74e5945 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1375,6 +1375,11 @@ initialize_argument_information (int num_actuals
> ATTRIBUTE_UNUSED,
>}
>}
>
> +  bool promote_p
> += targetm.calls.promote_prototypes (fndecl
> +   ? TREE_TYPE (fndecl)
> +   : fntype);
> +
>/* I counts args in order (to be) pushed; ARGPOS counts in order
> written.  */
>for (argpos = 0; argpos < num_actuals; i--, argpos++)
>  {
> @@ -1384,6 +1389,10 @@ initialize_argument_information (int num_actuals
> ATTRIBUTE_UNUSED,
>/* Replace erroneous argument with constant zero.  */
>if (type == error_mark_node || !COMPLETE_TYPE_P (type))
> args[i].tree_value = integer_zero_node, type = integer_type_node;
> +  else if (promote_p
> +  && INTEGRAL_TYPE_P (type)
> +  && TYPE_PRECISION (type) < TYPE_PRECISION
> (integer_type_node))
> +   type = integer_type_node;
>
>/* If TYPE is a transparent union or record, pass things the way
>  we would pass the first field of the union or record.  We have
>

PING.

-- 
> 2.47.0
>
>
>


Re: [PATCH 13/15] Support for 64-bit location_t: Internal parts

2024-11-27 Thread David Malcolm
On Wed, 2024-11-27 at 13:18 -0500, Lewis Hyatt wrote:
> On Wed, Nov 27, 2024 at 09:41:13AM -0500, David Malcolm wrote:
> > On Wed, 2024-11-27 at 14:56 +0100, Richard Biener wrote:
> > > On Sun, Nov 3, 2024 at 11:28 PM Lewis Hyatt 
> > > wrote:
> > > > 
> > > > Several of the selftests in diagnostic-show-locus.cc and
> > > > input.cc
> > > > are
> > > > sensitive to linemap internals. Adjust them here so they will
> > > > support 64-bit
> > > > location_t if configured.
> > > > 
> > > > Likewise, handle 64-bit location_t in the support for
> > > > -fdump-internal-locations. As was done with the analyzer,
> > > > convert
> > > > to
> > > > (unsigned long long) explicitly so that 32- and 64-bit can be
> > > > handled with
> > > > the same printf formats.
> > > 
> > > I was hoping David would have a look here.  Absent from comments
> > > from
> > > him
> > > this is OK when all else is approved and after giving him another
> > > week.
> > 
> > Mostly looks good, but I have a couple of questions below...
> 
> Thanks for taking a look.
> 
> > 
> > 
> > > 
> > > What's missing review now?  I've lost track ...
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >     * diagnostic-show-locus.cc
> > > >     (test_one_liner_fixit_validation_adhoc_locations):
> > > > Adapt so
> > > > it can
> > > >     effectively test 7-bit ranges instead of 5-bit ranges.
> > > >     (test_one_liner_fixit_validation_adhoc_locations_utf8):
> > > > Likewise.
> > > >     * input.cc (get_end_location): Adjust types to support
> > > > 64-
> > > > bit
> > > >     location_t.
> > > >     (write_digit_row): Likewise.
> > > >     (dump_location_range): Likewise.
> > > >     (dump_location_info): Likewise.
> > > >     (class line_table_case): Likewise.
> > > >     (test_accessing_ordinary_linemaps): Replace some hard-
> > > > coded
> > > >     constants with the values defined in line-map.h.
> > > >     (for_each_line_table_case): Likewise.
> > > > ---
> > > >  gcc/diagnostic-show-locus.cc | 128
> > > > +--
> > > > 
> > > >  gcc/input.cc | 100 ++-
> > > >  2 files changed, 157 insertions(+), 71 deletions(-)
> > > > 
> > 
> > [...snip...]
> > 
> > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > index 04462ef6f5a..1629e4aeee8 100644
> > > > --- a/gcc/input.cc
> > > > +++ b/gcc/input.cc
> > 
> > [...snip...]
> > 
> > > > @@ -3865,11 +3870,11 @@ static const location_t
> > > > boundary_locations[] = {
> > > >    LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES + 0x100,
> > > > 
> > > >    /* Values near LINE_MAP_MAX_LOCATION_WITH_COLS.  */
> > > > -  LINE_MAP_MAX_LOCATION_WITH_COLS - 0x100,
> > > > +  LINE_MAP_MAX_LOCATION_WITH_COLS - 0x200,
> > > >    LINE_MAP_MAX_LOCATION_WITH_COLS - 1,
> > > >    LINE_MAP_MAX_LOCATION_WITH_COLS,
> > > >    LINE_MAP_MAX_LOCATION_WITH_COLS + 1,
> > > > -  LINE_MAP_MAX_LOCATION_WITH_COLS + 0x100,
> > > > +  LINE_MAP_MAX_LOCATION_WITH_COLS + 0x200,
> > > >  };
> > 
> > I see that this updates the offsets from 0x100 to 0x200 for the
> > _WITH_COLS case, but doesn't for the _WITH_PACKED_RANGES case.
> > 
> > What's the reasoning here?
> > 
> > In theory we can simply add new entries to boundary_locations to
> > get
> > more test coverage, but I don't know the extent to which this part
> > of
> > the selftests is slowing builds down on the slower configurations;
> > the
> > selftests are meant to be fast to run.
> > 
> 
> I needed to change it for _WITH_COLS because otherwise the test
> test_lexer_string_locations_concatenation_1 fails. This is because
> the
> assert_char_at_range() utility it uses does not handle the case of a
> token
> which straddles LINE_MAP_MAX_LOCATION_WITH_COLS; it assumes that the
> column
> information will be available in this case, but it is not. Once the
> number
> of range bits increases, the 0x100 buffer is not enough to avoid
> straddling
> the cutoff. I could also modify that selftest, I just did this change
> since
> it preserves what's currently being tested. I could change
> _WITH_PACKED_RANGES too if you prefer for consistency? It wasn't
> necessary
> but it will work either way, or could do both.

No need, I just wanted the reasoning documented here.


> 
> > > > 
> > > >  /* Run TESTCASE multiple times, once for each case in our test
> > > > matrix.  */
> > > > @@ -3884,10 +3889,9 @@ for_each_line_table_case (void
> > > > (*testcase)
> > > > (const line_table_case &))
> > > > 
> > > >    /* Run all tests with:
> > > >   (a) line_table->default_range_bits == 0, and
> > > > - (b) line_table->default_range_bits == 5.  */
> > > > -  int num_cases_tested = 0;
> > > > -  for (int default_range_bits = 0; default_range_bits <= 5;
> > > > -   default_range_bits += 5)
> > > > + (b) line_table->default_range_bits ==
> > > > line_map_suggested_range_bits.  */
> > > > +
> > > > +  for (int default_range_bits: {0,
> > > > line_map

Re: [PATCH] __builtin_prefetch fixes [PR117608]

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 8:50 PM Richard Biener  wrote:
>
> On Wed, 27 Nov 2024, Jakub Jelinek wrote:
>
> > Hi!
> >
> > The r15-4833-ge9ab41b79933 patch had among tons of config/i386
> > specific changes also important change to the generic code, allowing
> > also 2 as valid value of the second argument of __builtin_prefetch:
> > -  /* Argument 1 must be either zero or one.  */
> > -  if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
> > +  /* Argument 1 must be 0, 1 or 2.  */
> > +  if (INTVAL (op1) < 0 || INTVAL (op1) > 2)
> >
> > But the patch failed to document that change in __builtin_prefetch
> > documentation, and more importantly didn't adjust any of the other
> > backends to deal with it (my understanding is the expected behavior
> > is that 2 will be silently handled as 0 unless backends have some
> > more specific way).  Some of the backends would ICE on it, in some
> > cases gcc_assert failures/gcc_unreachable, in other cases crash later
> > (e.g. accessing arrays with that value as index and due to accessing
> > garbage after the array crashing at final.cc time), others treated 2
> > silently as 0, others treated 2 silently as 1.
> >
> > And even in the i386 backend there were bugs which caused ICEs.
> > The patch added some if (write == 0) and write 2 handling into
> > a (badly indented, maybe that is the reason, if (write == 1) body),
> > rather than into the else side, so it would be always false.
> >
> > The new *prefetch_rst2 define_insn only accepts parameters 2 1
> > (i.e. read-shared with moderate degree of locality), so in order
> > not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
> > or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
> > of the parameter.  If that isn't what we want and we want it to be used
> > also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
> > corresponding __builtin_ia32_prefetch, maybe the define_insn could match
> > other values.
> > And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
> > would ICE on most of the prefetches, so I had to add the FAIL; cases.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> OK, please leave other (target maintainers) time to comment.
LGTM for x86 part, thanks for handling this.
>
> Richard.
>
> > 2024-11-27  Jakub Jelinek  
> >
> >   PR target/117608
> >   * doc/extend.texi (__builtin_prefetch): Document that second
> >   argument may be also 2 and its meaning.
> >   * config/i386/i386.md (prefetch): Remove unreachable code.
> >   Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
> >   of locality is not 1.  Formatting fixes.
> >   * config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
> >   Call gen_prefetch even for TARGET_MOVRS.
> >   * config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
> >   * config/mips/mips.md (prefetch): Likewise.
> >   * config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
> >   * config/riscv/riscv.md (prefetch): Likewise.
> >   * config/loongarch/loongarch.md (prefetch): Likewise.
> >   * config/sparc/sparc.md (prefetch): Likewise.  Use IN_RANGE.
> >   * config/ia64/ia64.md (prefetch): Likewise.
> >   * config/pa/pa.md (prefetch): Likewise.
> >   * config/aarch64/aarch64.md (prefetch): Likewise.
> >   * config/rs6000/rs6000.md (prefetch): Likewise.
> >
> >   * gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
> >   2.
> >   * gcc.target/i386/pr117608-1.c: New test.
> >   * gcc.target/i386/pr117608-2.c: New test.
> >
> > --- gcc/doc/extend.texi.jj2024-11-26 09:37:56.173574966 +0100
> > +++ gcc/doc/extend.texi   2024-11-26 17:49:35.469152396 +0100
> > @@ -15675,9 +15675,11 @@ be in the cache by the time it is access
> >
> >  The value of @var{addr} is the address of the memory to prefetch.
> >  There are two optional arguments, @var{rw} and @var{locality}.
> > -The value of @var{rw} is a compile-time constant one or zero; one
> > -means that the prefetch is preparing for a write to the memory address
> > -and zero, the default, means that the prefetch is preparing for a read.
> > +The value of @var{rw} is a compile-time constant zero, one or two; one
> > +means that the prefetch is preparing for a write to the memory address,
> > +two means that the prefetch is preparing for a shared read (expected to be
> > +read by at least one other processor before it is written if written at
> > +all) and zero, the default, means that the prefetch is preparing for a 
> > read.
> >  The value @var{locality} must be a compile-time constant integer between
> >  zero and three.  A value of zero means that the data has no temporal
> >  locality, so it need not be left in the cache after the access.  A value
> > --- gcc/config/i386/i386.md.jj2024-11-26 09:37:56.047576735 +0100
> > +++ gcc/config/i386/i386.md   2024-11-26 18:18:34.662828702 +0100
> > @@ -2

Re: [PATCH v3] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector

2024-11-27 Thread Lulu Cheng


在 2024/11/28 上午9:26, Jinyang He 写道:

For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.

gcc/ChangeLog:

* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.
---
v2: Fix indent in lsx.md and lasx.md.
 Use "dg-do assemble" in test which suggested by Ruoyao.
v3: Re-enable scan-assembler.


LGTM!

Thanks!



  gcc/config/loongarch/constraints.md   | 14 ++--
  gcc/config/loongarch/lasx.md  | 60 
  gcc/config/loongarch/loongarch-protos.h   |  5 +-
  gcc/config/loongarch/lsx.md   | 60 
  gcc/config/loongarch/predicates.md|  8 +--
  .../vector/lasx/lasx-shift-sameimm-vec.c  | 72 +++
  .../vector/lsx/lsx-shift-sameimm-vec.c| 72 +++
  7 files changed, 254 insertions(+), 37 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 18da8b31f49..66ef1073fad 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -334,19 +334,19 @@
(and (match_code "const_vector")
 (match_test "loongarch_const_vector_same_int_p (op, mode, -16, 15)")))
  
-(define_constraint "Uuv6"

-  "@internal
-   A replicated vector const in which the replicated value is in the range
-   [0,63]."
-  (and (match_code "const_vector")
-   (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 63)")))
-
  (define_constraint "Urv8"
"@internal
 A replicated vector const with replicated byte values as well as elements"
(and (match_code "const_vector")
 (match_test "loongarch_const_vector_same_bytes_p (op, mode)")))
  
+(define_constraint "Uuvx"

+  "@internal
+   A replicated vector const in which the replicated value is in the unsigned
+   range [0,umax]."
+  (and (match_code "const_vector")
+   (match_test "loongarch_const_vector_same_int_p (op, mode)")))
+
  (define_memory_constraint "ZC"
"A memory operand whose address is formed by a base register and offset
 that is suitable for use in instructions with the same addressing mode
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 457ed163f31..90778dd8ff9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1013,11 +1013,23 @@
[(set (match_operand:ILASX 0 "register_operand" "=f,f")
(lshiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
"ISA_HAS_LASX"
-  "@
-   xvsrl.\t%u0,%u1,%u2
-   xvsrli.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsrl.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrli.\t%u0,%u1,%d2";
+  }
+default:
+  gcc_unreachable ();
+}
+}
[(set_attr "type" "simd_shift")
 (set_attr "mode" "")])
  
@@ -1026,11 +1038,23 @@

[(set (match_operand:ILASX 0 "register_operand" "=f,f")
(ashiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
"ISA_HAS_LASX"
-  "@
-   xvsra.\t%u0,%u1,%u2
-   xvsrai.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsra.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrai.\t%u0,%u1,%d2";
+  }
+default:
+  gc

Re: [PATCH 1/2] RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

2024-11-27 Thread Kito Cheng
Oh, I saw the second patch is just adding testcase, I think all
comments are minor, so no need v2, I will address those minor change
and commit after verified on my hand :)

On Thu, Nov 28, 2024 at 2:52 PM Kito Cheng  wrote:
>
> I incline to just keep sifive-vector, it although actually an IME
> (Integrated Matrix Extension), but we don't have too much
> instructions, also we will have more SiFive vector extension which not
> IME, so I think just put in sifive-vector is fine.
>
> And this patch is generally in good shape, I assume we just need one
> more version :)
>
> On Thu, Nov 28, 2024 at 10:39 AM  wrote:
> >
> > From: yulong 
> >
> > This commit adds intrinsics support for Xsfvqmaccqoq/dod.
> >
> > Co-Authored by: Kito Cheng 
> > Co-Authored by: Monk Chiang 
>
> Drop me and Monk, we aren't really involved in the development, so
> just keeping PLCT folks is fine.
>
> > diff --git a/gcc/config/riscv/sifive-vector.md 
> > b/gcc/config/riscv/sifive-vector.md
> > new file mode 100644
> > index 000..373e4d6dd86
> > --- /dev/null
> > +++ b/gcc/config/riscv/sifive-vector.md
> > @@ -0,0 +1,179 @@
>
> Following comment plz remove.
>
> > +;; Keep this list and the one above riscv_print_operand in sync.
> > +;; The special asm out single letter directives following a '%' are:
> > +;; h -- Print the high-part relocation associated with OP, after stripping
> > +;;   any outermost HIGH.
> > +;; R -- Print the low-part relocation associated with OP.
> > +;; C -- Print the integer branch condition for comparison OP.
> > +;; A -- Print the atomic operation suffix for memory model OP.
> > +;; F -- Print a FENCE if the memory model requires a release.
> > +;; z -- Print x0 if OP is zero, otherwise print OP normally.
> > +;; i -- Print i if the operand is not a register.
> > +;; S -- Print shift-index of single-bit mask OP.
> > +;; T -- Print shift-index of inverted single-bit mask OP.
> > +;; ~ -- Print w if TARGET_64BIT is true; otherwise not print anything.
>
> Until here.
>
>
> > diff --git a/gcc/config/riscv/vector-iterators.md 
> > b/gcc/config/riscv/vector-iterators.md
> > index 92cb651ce49..850fac1ba22 100644
> > --- a/gcc/config/riscv/vector-iterators.md
> > +++ b/gcc/config/riscv/vector-iterators.md
> > @@ -103,6 +103,7 @@
> >UNSPEC_WREDUC_SUM_ORDERED
> >UNSPEC_WREDUC_SUM_UNORDERED
> >UNSPEC_SELECT_MASK
> > +
>
> ^ drop this unnecessary blankline


Re: [PATCH 1/2] RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

2024-11-27 Thread Kito Cheng
I incline to just keep sifive-vector, it although actually an IME
(Integrated Matrix Extension), but we don't have too much
instructions, also we will have more SiFive vector extension which not
IME, so I think just put in sifive-vector is fine.

And this patch is generally in good shape, I assume we just need one
more version :)

On Thu, Nov 28, 2024 at 10:39 AM  wrote:
>
> From: yulong 
>
> This commit adds intrinsics support for Xsfvqmaccqoq/dod.
>
> Co-Authored by: Kito Cheng 
> Co-Authored by: Monk Chiang 

Drop me and Monk, we aren't really involved in the development, so
just keeping PLCT folks is fine.

> diff --git a/gcc/config/riscv/sifive-vector.md 
> b/gcc/config/riscv/sifive-vector.md
> new file mode 100644
> index 000..373e4d6dd86
> --- /dev/null
> +++ b/gcc/config/riscv/sifive-vector.md
> @@ -0,0 +1,179 @@

Following comment plz remove.

> +;; Keep this list and the one above riscv_print_operand in sync.
> +;; The special asm out single letter directives following a '%' are:
> +;; h -- Print the high-part relocation associated with OP, after stripping
> +;;   any outermost HIGH.
> +;; R -- Print the low-part relocation associated with OP.
> +;; C -- Print the integer branch condition for comparison OP.
> +;; A -- Print the atomic operation suffix for memory model OP.
> +;; F -- Print a FENCE if the memory model requires a release.
> +;; z -- Print x0 if OP is zero, otherwise print OP normally.
> +;; i -- Print i if the operand is not a register.
> +;; S -- Print shift-index of single-bit mask OP.
> +;; T -- Print shift-index of inverted single-bit mask OP.
> +;; ~ -- Print w if TARGET_64BIT is true; otherwise not print anything.

Until here.


> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 92cb651ce49..850fac1ba22 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -103,6 +103,7 @@
>UNSPEC_WREDUC_SUM_ORDERED
>UNSPEC_WREDUC_SUM_UNORDERED
>UNSPEC_SELECT_MASK
> +

^ drop this unnecessary blankline


Re: [RFC/PATCH] c++: Unwrap type traits defined in terms of builtins within concept diagnostics [PR117294]

2024-11-27 Thread Patrick Palka
On Fri, 8 Nov 2024, Nathaniel Shead wrote:

> Does this approach seem reasonable?  I'm pretty sure that the way I've
> handled the templating here is unideal but I'm not sure what a neat way
> to do what I'm trying to do here would be; any comments are welcome.

Clever approach, I like it!

> 
> -- >8 --
> 
> Currently, concept failures of standard type traits just report
> 'expression X evaluates to false'.  However, many type traits are
> actually defined in terms of compiler builtins; we can do better here.
> For instance, 'is_constructible_v' could go on to explain why the type
> is not constructible, or 'is_invocable_v' could list potential
> candidates.

That'd be great improvement.

> 
> As a first step to supporting that we need to be able to map the
> standard type traits to the builtins that they use.  Rather than adding
> another list that would need to be kept up-to-date whenever a builtin is
> added, this patch instead tries to detect any variable template defined
> directly in terms of a TRAIT_EXPR.
> 
> To avoid false positives, we ignore any variable templates that have any
> specialisations (partial or explicit), even if we wouldn't have chosen
> that specialisation anyway.  This shouldn't affect any of the standard
> library type traits that I could see.

You should be able to tsubst the TEMPLATE_ID_EXPR directly and look at
its TI_PARTIAL_INFO in order to determine which (if any) partial
specialization was selected.  And if an explicit specialization was
selected the resulting VAR_DECL will have DECL_TEMPLATE_SPECIALIZATION
set.

> 
> The new diagnostics from this patch are not immediately much better;
> however, it would be relatively straight-forward to update the messages
> in 'diagnose_trait_expr' to provide these new details.
> 
> This logic could also perhaps be used by 'diagnose_failing_condition' so
> that cases like 'static_assert(std::is_constructible_v)' get the same
> treatment; this patch doesn't attempt to update this yet.
> 
>   PR c++/117294
>   PR c++/113854
> 
> gcc/cp/ChangeLog:
> 
>   * constraint.cc (diagnose_trait_expr): Take location to diagnose
>   at explicitly.
>   (maybe_unwrap_standard_trait): New function.
>   (diagnose_atomic_constraint): Use it; pass in the location of
>   the atomic constraint to diagnose_trait_expr.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-traits4.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/constraint.cc  | 52 +--
>  gcc/testsuite/g++.dg/cpp2a/concepts-traits4.C | 31 +++
>  2 files changed, 79 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-traits4.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 8a36b9c88c4..c683e6a44dd 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3108,10 +3108,8 @@ get_constraint_error_location (tree t)
>  /* Emit a diagnostic for a failed trait.  */
>  
>  static void
> -diagnose_trait_expr (tree expr, tree args)
> +diagnose_trait_expr (location_t loc, tree expr, tree args)
>  {
> -  location_t loc = cp_expr_location (expr);
> -
>/* Build a "fake" version of the instantiated trait, so we can
>   get the instantiated types from result.  */
>++processing_template_decl;
> @@ -3323,6 +3321,51 @@ diagnose_trait_expr (tree expr, tree args)
>  }
>  }
>  
> +/* Attempt to detect if this is a standard type trait, defined in terms
> +   of a compiler builtin (above).  If so, this will allow us to provide
> +   slightly more helpful diagnostics.
> +
> +   However, don't unwrap if the type has been specialized (even if we
> +   wouldn't have used said specialization)..  */
> +
> +static void
> +maybe_unwrap_standard_trait (tree *expr, tree *args)
> +{
> +  if (TREE_CODE (*expr) != TEMPLATE_ID_EXPR)
> +return;
> +
> +  tree templ = TREE_OPERAND (*expr, 0);
> +  if (TREE_CODE (templ) != TEMPLATE_DECL
> +  || !variable_template_p (templ))
> +return;
> +
> +  tree gen_tmpl = most_general_template (templ);
> +  if (DECL_TEMPLATE_SPECIALIZATIONS (gen_tmpl))
> +return;
> +
> +  for (tree inst = DECL_TEMPLATE_INSTANTIATIONS (gen_tmpl);
> +   inst; inst = TREE_CHAIN (inst))
> +if (DECL_TEMPLATE_SPECIALIZATION (TREE_VALUE (inst)))
> +  return;
> +
> +  tree pattern = DECL_TEMPLATE_RESULT (gen_tmpl);
> +  tree initial = DECL_INITIAL (pattern);
> +  if (TREE_CODE (initial) != TRAIT_EXPR)
> +return;
> +
> +  /* At this point we're definitely providing a TRAIT_EXPR, update
> + *expr to point at it and provide remapped *args for it.  */
> +  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_tmpl);
> +  tree targs = TREE_OPERAND (*expr, 1);
> +  if (targs)
> +targs = tsubst_template_args (targs, *args, tf_none, NULL_TREE);
> +  targs = add_outermost_template_args (templ, targs);
> +  targs = coerce_template_parms (parms, targs, templ, tf_none);

If we substituted the TEMPLATE_ID_EXPR as

Re: PING [RFC] Prevent the scheduler from moving prefetch instructions when expanding __builtin_prefetch [PR 116713]

2024-11-27 Thread Jeff Law



On 11/27/24 5:59 AM, Oleg Endo wrote:

Hi,

Can the issue be resolved in a target independent manner as suggested below?
Or is it better to deal with this in the target code?
A blockage barrier like this patch does is probably significant 
overkill, particularly since it's emitted at the gimple->RTL expansion 
phase.


If we wanted to tackle this I'd like to hope we could do it in the 
scheduler itself to minimize overall impacts.


jeff



Re: [PATCH] genrecog: Split into separate partitions [PR111600].

2024-11-27 Thread Jeff Law



On 11/26/24 6:44 AM, Robin Dapp wrote:

Hi,

this patch makes genrecog split its output into separate files (10 by
default) in the same vein genemit does.  The changes are mostly
mechanical again, changing printfs and puts to fprintf.
As insn-recog.cc relies on being able to call other recog functions a
header insn-recog.h is introduced that pre declares all of those.

For simplicity the number of files is determined by (re-using)
--with-insnemit-partitions.  Naming suggestions welcome :)

Bootstrapped and regtested on x86 and power10, regtested on riscv.
aarch64 bootstrap is currently blocked because of the
"maybe uninitialized" issue discussed on IRC.

Regards
  Robin

gcc/ChangeLog:

* Makefile.in:  Add insn-recog split.
* configure.ac: Document that the number of insnemit partitions is
used for insn-recog as well.
* genconditions.cc (write_one_condition): Use fprintf.
* genpreds.cc (write_predicate_expr): Ditto.
(write_init_reg_class_start_regs): Ditto.
* genrecog.cc (write_header): Add header file to includes.
(printf_indent): Use fprintf.
(change_state): Ditto.
(print_code): Ditto.
(print_host_wide_int): Ditto.
(print_parameter_value): Ditto.
(print_test_rtx): Ditto.
(print_nonbool_test): Ditto.
(print_label_value): Ditto.
(print_test): Ditto.
(print_decision): Ditto.
(print_state): Ditto.
(print_subroutine_call): Ditto.
(print_acceptance): Ditto.
(print_subroutine_start): Ditto.
(print_pattern): Ditto.
(print_subroutine): Ditto.
(print_subroutine_group): Ditto.
(handle_arg): Add -O and -H for output and header file handling.
(main): Use callback.
* gentarget-def.cc (def_target_insn): Use fprintf.
* read-md.cc (md_reader::print_c_condition): Ditto.
* read-md.h (class md_reader): Ditto.
---




@@ -4293,19 +4295,19 @@ write_header (void)
 pattern that matched.  This is the same as the order in the machine\n\
 description of the entry that matched.  This number can be used as an\n\
 index into `insn_data' and other tables.\n");
-  puts ("\
+  fprintf (f, "%s", "\
Some might argue that puts->fputs would have been better.  But I think 
we've got transformations that will turn an fprintf like that into an fputs.


As you note the vast majority of the changes are mechanical.  I probably 
would have split that into a preparatory patch, but no sense in doing it 
now.


OK for the trunk.  There's already an rv64 bootstrap running on my BPI 
(which should give us some data on the genemit changes), I think this 
version should start its bootstrap cycle tomorrow with data later on Friday.


Jeff


Ping: [PATCH v4 0/1] C: Support Function multiversioning in the C front end

2024-11-27 Thread Alfie Richards
Hi,

Ping for this, if possible Id be grateful to know if the direction of 
this patch

is okay before working on it further?

I have played around with it more and have some improvements I can make
(pending clarification PR's to the ACLE) but would like feedback on the work
so far before pushing on with that further.

Kind regards,
Alfie

On 15/11/2024 16:36, alfie.richa...@arm.com wrote:

From: Alfie Richards 

Hi Joseph and all,

I worked through Josephs feedback, and as I fixed certain issues I came to the
conclusion he was correct that a rethink was required.

I reworked this to only have the one FMV binding for each function set which
gets replaced with the dispatched symbol decl when the second target is
processed.
I prefer this code structure, and think it is far clearer to the reader to
what was happening previously.

I also added some more tests for more error situations in particular.

This probably still has some rough edges, but I'm hoping its a better direction
overall.
Note, this does slightly change the interface for backend hooks. In particular
it will require changing the assumptions for get_function_versions_dispatcher
as it now can get called before all the functions have been processed in the C
frontend (maybe without default).
This shouldn't be a big deal but will require work in other backends if we
want to enable this for them also.

Kind regards,
Alfie Richards


Alfie Richards (1):
   C: Support Function multiversioning in the C front end

  gcc/attribs.cc|  21 ++-
  gcc/attribs.h |   3 +-
  gcc/c-family/c-gimplify.cc|  13 ++
  gcc/c/c-decl.cc   | 162 --
  gcc/calls.cc  |  17 ++
  gcc/calls.h   |   2 +
  gcc/cgraph.cc |  61 ++-
  gcc/cgraphunit.cc |  23 +++
  gcc/config/aarch64/aarch64.cc |  29 +---
  .../g++.target/aarch64/mv-symbols10.C |  42 +
  .../g++.target/aarch64/mv-symbols6.C  |  16 ++
  .../g++.target/aarch64/mv-symbols8.C  |  47 +
  .../g++.target/aarch64/mv-symbols9.C  |  44 +
  gcc/testsuite/gcc.target/aarch64/mv-1.c   |  40 +
  .../gcc.target/aarch64/mv-symbols-error1.c|  11 ++
  .../gcc.target/aarch64/mv-symbols-error10.c   |  11 ++
  .../gcc.target/aarch64/mv-symbols-error2.c|   8 +
  .../gcc.target/aarch64/mv-symbols-error3.c|   8 +
  .../gcc.target/aarch64/mv-symbols-error4.c|   8 +
  .../gcc.target/aarch64/mv-symbols-error5.c|  11 ++
  .../gcc.target/aarch64/mv-symbols-error6.c|   8 +
  .../gcc.target/aarch64/mv-symbols-error7.c|  12 ++
  .../gcc.target/aarch64/mv-symbols-error8.c|  11 ++
  .../gcc.target/aarch64/mv-symbols-error9.c|  10 ++
  .../gcc.target/aarch64/mv-symbols1.c  |  38 
  .../gcc.target/aarch64/mv-symbols10.c |  42 +
  .../gcc.target/aarch64/mv-symbols11.c |  16 ++
  .../gcc.target/aarch64/mv-symbols2.c  |  28 +++
  .../gcc.target/aarch64/mv-symbols3.c  |  27 +++
  .../gcc.target/aarch64/mv-symbols4.c  |  31 
  .../gcc.target/aarch64/mv-symbols5.c  |  36 
  .../gcc.target/aarch64/mv-symbols6.c  |  16 ++
  .../gcc.target/aarch64/mv-symbols7.c  |  47 +
  .../gcc.target/aarch64/mv-symbols8.c  |  47 +
  .../gcc.target/aarch64/mv-symbols9.c  |  44 +
  .../gcc.target/aarch64/mvc-symbols1.c |  25 +++
  .../gcc.target/aarch64/mvc-symbols2.c |  15 ++
  .../gcc.target/aarch64/mvc-symbols3.c |  19 ++
  .../gcc.target/aarch64/mvc-symbols4.c |  12 ++
  39 files changed, 1011 insertions(+), 50 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols10.C
  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols6.C
  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols8.C
  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-symbols9.C
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error10.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error4.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error5.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error6.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error7.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error8.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols-error9.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols1.c
  create mode 100644 gcc/testsuit

Re: [PATCH 13/15] Support for 64-bit location_t: Internal parts

2024-11-27 Thread David Malcolm
On Wed, 2024-11-27 at 14:56 +0100, Richard Biener wrote:
> On Sun, Nov 3, 2024 at 11:28 PM Lewis Hyatt  wrote:
> > 
> > Several of the selftests in diagnostic-show-locus.cc and input.cc
> > are
> > sensitive to linemap internals. Adjust them here so they will
> > support 64-bit
> > location_t if configured.
> > 
> > Likewise, handle 64-bit location_t in the support for
> > -fdump-internal-locations. As was done with the analyzer, convert
> > to
> > (unsigned long long) explicitly so that 32- and 64-bit can be
> > handled with
> > the same printf formats.
> 
> I was hoping David would have a look here.  Absent from comments from
> him
> this is OK when all else is approved and after giving him another
> week.

Mostly looks good, but I have a couple of questions below...


> 
> What's missing review now?  I've lost track ...
> 
> Thanks,
> Richard.
> 
> > gcc/ChangeLog:
> > 
> >     * diagnostic-show-locus.cc
> >     (test_one_liner_fixit_validation_adhoc_locations): Adapt so
> > it can
> >     effectively test 7-bit ranges instead of 5-bit ranges.
> >     (test_one_liner_fixit_validation_adhoc_locations_utf8):
> > Likewise.
> >     * input.cc (get_end_location): Adjust types to support 64-
> > bit
> >     location_t.
> >     (write_digit_row): Likewise.
> >     (dump_location_range): Likewise.
> >     (dump_location_info): Likewise.
> >     (class line_table_case): Likewise.
> >     (test_accessing_ordinary_linemaps): Replace some hard-coded
> >     constants with the values defined in line-map.h.
> >     (for_each_line_table_case): Likewise.
> > ---
> >  gcc/diagnostic-show-locus.cc | 128 +--
> > 
> >  gcc/input.cc | 100 ++-
> >  2 files changed, 157 insertions(+), 71 deletions(-)
> > 

[...snip...]

> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 04462ef6f5a..1629e4aeee8 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc

[...snip...]

> > @@ -3865,11 +3870,11 @@ static const location_t
> > boundary_locations[] = {
> >    LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES + 0x100,
> > 
> >    /* Values near LINE_MAP_MAX_LOCATION_WITH_COLS.  */
> > -  LINE_MAP_MAX_LOCATION_WITH_COLS - 0x100,
> > +  LINE_MAP_MAX_LOCATION_WITH_COLS - 0x200,
> >    LINE_MAP_MAX_LOCATION_WITH_COLS - 1,
> >    LINE_MAP_MAX_LOCATION_WITH_COLS,
> >    LINE_MAP_MAX_LOCATION_WITH_COLS + 1,
> > -  LINE_MAP_MAX_LOCATION_WITH_COLS + 0x100,
> > +  LINE_MAP_MAX_LOCATION_WITH_COLS + 0x200,
> >  };

I see that this updates the offsets from 0x100 to 0x200 for the
_WITH_COLS case, but doesn't for the _WITH_PACKED_RANGES case.

What's the reasoning here?

In theory we can simply add new entries to boundary_locations to get
more test coverage, but I don't know the extent to which this part of
the selftests is slowing builds down on the slower configurations; the
selftests are meant to be fast to run.

> > 
> >  /* Run TESTCASE multiple times, once for each case in our test
> > matrix.  */
> > @@ -3884,10 +3889,9 @@ for_each_line_table_case (void (*testcase)
> > (const line_table_case &))
> > 
> >    /* Run all tests with:
> >   (a) line_table->default_range_bits == 0, and
> > - (b) line_table->default_range_bits == 5.  */
> > -  int num_cases_tested = 0;
> > -  for (int default_range_bits = 0; default_range_bits <= 5;
> > -   default_range_bits += 5)
> > + (b) line_table->default_range_bits ==
> > line_map_suggested_range_bits.  */
> > +
> > +  for (int default_range_bits: {0, line_map_suggested_range_bits})
> >  {
> >    /* ...and use each of the "interesting" location values as
> >  the starting location within line_table.  */
> > @@ -3895,15 +3899,9 @@ for_each_line_table_case (void (*testcase)
> > (const line_table_case &))
> >    for (int loc_idx = 0; loc_idx < num_boundary_locations;
> > loc_idx++)
> >     {
> >   line_table_case c (default_range_bits,
> > boundary_locations[loc_idx]);
> > -
> >   testcase (c);
> > -
> > - num_cases_tested++;
> >     }
> >  }

I see that this eliminates the tracking of num_cases_tested and the
assert on it below.  Was this deliberate, or was the removal meant to
be a temporary thing whilst developing the patch kit?


> > -
> > -  /* Verify that we fully covered the test matrix.  */
> > -  ASSERT_EQ (num_cases_tested, 2 * 12);
> >  }
> > 
> >  /* Verify that when presented with a consecutive pair of locations
> > with
> 

Otherwise looks good to me.

Thanks
Dave



Re: optimize basic_string

2024-11-27 Thread Jonathan Wakely
On Tue, 26 Nov 2024 at 15:43, Jan Hubicka  wrote:
>
> Hi,
> here is updated patch.
>
> I am not ceratin if:
> const size_t __diffmax
>   = __gnu_cxx::__numeric_traits::__max / sizeof(_CharT);
> really needs "/ sizeof (_CharT)".  I think we only need to be able to
> compute the difference between two entries in multiplies of sizeof(_CharT)?
> However it is consistent with what std::vector does. Do we really need
> to be also able to represent differences in number of bytes?
>
> I was also thinking about API change implication.  I hope that indeed
> real code does not need knowing max size and it is mostly needed to let
> GCC to VRP optimize the memory allocations.
>
> I also noticed that this patch trigger empty-loop.C failure which I
> originaly attributed to different change.  I filled PR117764 on that.
> We are no longer able to eliminate empty loops early, but we still
> optimize them late.
>
> I also changed empty() to do the test directly (as done by std::vector
> and friends) instead of computing size.  I do not think having
> __builtin_unreachable range check is useful there, so this should save
> some compile-time effort.

Yes, that makes sense.

> Bootstrapped/regtested x86_64-linux, OK?

Looks good, OK for trunk, thanks.

>
> libstdc++-v3/ChangeLog:
>
> * include/bits/basic_string.h (basic_string::size(),
> basic_string::length(), basic_string::capacity()): Add
> __builtin_unreachable to declare value ranges.
> (basic_string::empty): Inline test.
> (basic_string::max_size()): Account correctly the terminating 0
> and limits implied by ptrdiff_t.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/empty-loop.C: xfail optimization at cddce2 and check
> it happens at cddce3.
> * g++.dg/tree-ssa/string-1.C: New test.
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C 
> b/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
> index ed4a603bf5b..b7e7e27cc04 100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
> @@ -30,5 +30,8 @@ int foo (vector &v, list &l, set 
> &s, map &m
>
>return 0;
>  }
> -/* { dg-final { scan-tree-dump-not "if" "cddce2"} } */
> +/* Adding __builtin_unreachable to std::string::size() prevents cddce2 from
> +   eliminating the loop early, see PR117764.  */
> +/* { dg-final { scan-tree-dump-not "if" "cddce2" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-not "if" "cddce3"} } */
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/string-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/string-1.C
> new file mode 100644
> index 000..d38c23a7628
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/string-1.C
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -std=c++20 -fdump-tree-optimized" } */
> +#include 
> +std::string
> +test (std::string &a)
> +{
> +   return a;
> +}
> +/* { dg-final { scan-tree-dump-not "throw" "optimized" } } */
> diff --git a/libstdc++-v3/include/bits/basic_string.h 
> b/libstdc++-v3/include/bits/basic_string.h
> index f5b320099b1..17b973c8b45 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -1079,20 +1079,30 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>size_type
>size() const _GLIBCXX_NOEXCEPT
> -  { return _M_string_length; }
> +  {
> +   size_type __sz = _M_string_length;
> +   if (__sz > max_size ())
> + __builtin_unreachable ();
> +   return __sz;
> +  }
>
>///  Returns the number of characters in the string, not including any
>///  null-termination.
>_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>size_type
>length() const _GLIBCXX_NOEXCEPT
> -  { return _M_string_length; }
> +  { return size(); }
>
>///  Returns the size() of the largest possible %string.
>_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
>size_type
>max_size() const _GLIBCXX_NOEXCEPT
> -  { return (_Alloc_traits::max_size(_M_get_allocator()) - 1) / 2; }
> +  {
> +   const size_t __diffmax
> + = __gnu_cxx::__numeric_traits::__max / sizeof(_CharT);
> +   const size_t __allocmax = _Alloc_traits::max_size(_M_get_allocator());
> +   return (std::min)(__diffmax, __allocmax) - 1;
> +  }
>
>/**
> *  @brief  Resizes the %string to the specified number of characters.
> @@ -1184,8 +1194,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>size_type
>capacity() const _GLIBCXX_NOEXCEPT
>{
> -   return _M_is_local() ? size_type(_S_local_capacity)
> -: _M_allocated_capacity;
> +   size_t __sz = _M_is_local() ? size_type(_S_local_capacity)
> +: _M_allocated_capacity;
> +   if (__sz < _S_local_capacity || __sz > max_size ())
> + __builtin_unreachable ();
> +   return __sz;
>

Re: [PATCH v4] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-11-27 Thread Marek Polacek
Ping.

On Wed, Nov 06, 2024 at 03:33:00PM -0500, Marek Polacek wrote:
> On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote:
> > On 10/30/24 4:59 PM, Marek Polacek wrote:
> > > On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote:
> > > > On Tue, 29 Oct 2024, Marek Polacek wrote:
> > > --- a/gcc/cp/cp-tree.h
> > > +++ b/gcc/cp/cp-tree.h
> > > @@ -451,6 +451,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
> > > ATOMIC_CONSTR_MAP_INSTANTIATED_P (in ATOMIC_CONSTR)
> > > contract_semantic (in ASSERTION_, PRECONDITION_, 
> > > POSTCONDITION_STMT)
> > > RETURN_EXPR_LOCAL_ADDR_P (in RETURN_EXPR)
> > > +  PACK_INDEX_PARENTHESIZED_P (in PACK_INDEX_*)
> > >  1: IDENTIFIER_KIND_BIT_1 (in IDENTIFIER_NODE)
> > > TI_PENDING_TEMPLATE_FLAG.
> > > TEMPLATE_PARMS_FOR_INLINE.
> > > @@ -2258,7 +2259,8 @@ enum languages { lang_c, lang_cplusplus };
> > >  || TREE_CODE (T) == BOUND_TEMPLATE_TEMPLATE_PARM \
> > >  || TREE_CODE (T) == DECLTYPE_TYPE\
> > >  || TREE_CODE (T) == TRAIT_TYPE   \
> > > -   || TREE_CODE (T) == DEPENDENT_OPERATOR_TYPE)
> > > +   || TREE_CODE (T) == DEPENDENT_OPERATOR_TYPE   \
> > > +   || PACK_INDEX_P (T))
> > 
> > Just PACK_INDEX_TYPE here, I think?
> 
> I think so too.  Done.
> 
> > >   /* Nonzero if T is a class (or struct or union) type.  Also nonzero
> > >  for template type parameters, typename types, and instantiated
> > > @@ -4001,6 +4003,9 @@ struct GTY(()) lang_decl {
> > >   #define PACK_EXPANSION_CHECK(NODE) \
> > > TREE_CHECK2 (NODE, TYPE_PACK_EXPANSION, EXPR_PACK_EXPANSION)
> > > +#define PACK_INDEX_CHECK(NODE) \
> > > +  TREE_CHECK2 (NODE, PACK_INDEX_TYPE, PACK_INDEX_EXPR)
> > > +
> > >   /* Extracts the type or expression pattern from a TYPE_PACK_EXPANSION or
> > >  EXPR_PACK_EXPANSION.  */
> > >   #define PACK_EXPANSION_PATTERN(NODE)\
> > > @@ -4025,6 +4030,22 @@ struct GTY(()) lang_decl {
> > >   ? &TYPE_MAX_VALUE_RAW (NODE)\
> > >   : &TREE_OPERAND ((NODE), 2))
> > > +/* True if NODE is a pack index.  */
> > > +#define PACK_INDEX_P(NODE) \
> > > +  (TREE_CODE (NODE) == PACK_INDEX_TYPE \
> > > +   || TREE_CODE (NODE) == PACK_INDEX_EXPR)
> > > +
> > > +/* For a pack index T...[N], the pack expansion T.  */
> > 
> > "the pack expansion T..."?
> 
> Done.
>  
> > > +#define PACK_INDEX_PACK(NODE) \
> > > +  (TREE_CODE (PACK_INDEX_CHECK (NODE)) == PACK_INDEX_TYPE \
> > > +   ? TREE_TYPE (NODE) : TREE_OPERAND (NODE, 0))
> > > +
> > > +/* For a pack index T...[N], the index N.  */
> > > +#define PACK_INDEX_INDEX(NODE) \
> > > +  *(TREE_CODE (PACK_INDEX_CHECK (NODE)) == PACK_INDEX_TYPE \
> > > +? &TYPE_MAX_VALUE_RAW (NODE) \
> > > +: &TREE_OPERAND ((NODE), 1))
> > > +
> > >   /* True iff this pack expansion is within a function context.  */
> > >   #define PACK_EXPANSION_LOCAL_P(NODE) \
> > > TREE_LANG_FLAG_0 (PACK_EXPANSION_CHECK (NODE))
> > > @@ -4042,6 +4063,11 @@ struct GTY(()) lang_decl {
> > >   #define PACK_EXPANSION_FORCE_EXTRA_ARGS_P(NODE) \
> > > TREE_LANG_FLAG_3 (PACK_EXPANSION_CHECK (NODE))
> > > +/* Indicates whether a pack expansion has been parenthesized.  Used for
> > > +   a pack expansion in a decltype.  */
> > > +#define PACK_INDEX_PARENTHESIZED_P(NODE) \
> > > +  TREE_LANG_FLAG_1 (PACK_INDEX_CHECK (NODE))
> > 
> > This should only apply to PACK_INDEX_EXPR, I think?
> 
> True, fixed.
> 
> > >   /* True iff the wildcard can match a template parameter pack.  */
> > >   #define WILDCARD_PACK_P(NODE) TREE_LANG_FLAG_0 (NODE)
> > > @@ -7581,6 +7607,7 @@ extern bool template_parameter_pack_p   
> > > (const_tree);
> > >   extern bool function_parameter_pack_p   (const_tree);
> > >   extern bool function_parameter_expanded_from_pack_p (tree, tree);
> > >   extern tree make_pack_expansion (tree, tsubst_flags_t = 
> > > tf_warning_or_error);
> > > +extern tree make_pack_index  (tree, tree);
> > >   extern bool check_for_bare_parameter_packs  (tree, location_t = 
> > > UNKNOWN_LOCATION);
> > >   extern tree build_template_info (tree, tree);
> > >   extern tree get_template_info   (const_tree);
> > > @@ -7906,6 +7933,8 @@ extern tree finish_underlying_type  (tree);
> > >   extern tree calculate_bases (tree, tsubst_flags_t);
> > >   extern tree finish_bases(tree, bool);
> > >   extern tree calculate_direct_bases  (tree, tsubst_flags_t);
> > > +extern tree pack_index_element   (tree, tree, bool,
> > > +  tsubst_flags_t);
> > >   extern tree finish_offsetof (tree, tree, 
> > > location_t);
> > >   extern void finish_decl_cleanup (tree, tree);
> > >   extern void finish_eh_cleanup   (

[PATCH] libstdc++/ranges: make _RangeAdaptorClosure befriend operator|

2024-11-27 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

This declares the range adaptor pipe operators a friend of the
_RangeAdaptorClosure base class so that the std module doesn't need to
export them for ADL to find them.

Note that we deliberately don't define these pipe operators as hidden
friends, see r14-3293-g4a6f3676e7dd9e.

libstdc++-v3/ChangeLog:

* include/std/ranges (views::__adaptor::_RangeAdaptorClosure):
Befriend both operator| overloads.
---
 libstdc++-v3/include/std/ranges | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 5153dcc26c4..9d30e3a8e9d 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -949,8 +949,7 @@ namespace views::__adaptor
   // _S_has_simple_call_op to true if the behavior of this adaptor is
   // independent of the constness/value category of the adaptor object.
   template
-struct _RangeAdaptorClosure
-{ };
+struct _RangeAdaptorClosure;
 
   template
 requires (!same_as<_Tp, _RangeAdaptorClosure<_Up>>)
@@ -984,6 +983,26 @@ namespace views::__adaptor
 }
 #pragma GCC diagnostic pop
 
+  template
+struct _RangeAdaptorClosure
+{
+  // In non-modules compilation ADL finds these operator| either way and
+  // the friend declarations are redundant.  But with the std module these
+  // friend declarations enable ADL to find these operators without having
+  // to export them.
+  template
+   requires __is_range_adaptor_closure<_Self>
+ && __adaptor_invocable<_Self, _Range>
+   friend constexpr auto
+   operator|(_Range&& __r, _Self&& __self);
+
+  template
+   requires __is_range_adaptor_closure<_Lhs>
+ && __is_range_adaptor_closure<_Rhs>
+   friend constexpr auto
+   operator|(_Lhs&& __lhs, _Rhs&& __rhs);
+};
+
   // The base class of every range adaptor non-closure.
   //
   // The static data member _Derived::_S_arity must contain the total number of
-- 
2.47.1.313.gcc01bad4a9



Re: [PATCH] libstdc++/ranges: make _RangeAdaptorClosure befriend operator|

2024-11-27 Thread Patrick Palka
On Wed, 27 Nov 2024, Jonathan Wakely wrote:

> On Wed, 27 Nov 2024 at 15:43, Patrick Palka  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> OK

Thanks, I noticed Jason already pushed his module std fixes patch so I
went ahead and removed the now unnecessary export as well.  This is
what I ultimately pushed:

-- >8 --

Subject: [PATCH] libstdc++/ranges: make _RangeAdaptorClosure befriend
 operator|

This declares the range adaptor pipe operators a friend of the
_RangeAdaptorClosure base class so that the std module doesn't need to
export them for ADL to find them.

Note that we deliberately don't define these pipe operators as hidden
friends, see r14-3293-g4a6f3676e7dd9e.

libstdc++-v3/ChangeLog:

* include/std/ranges (views::__adaptor::_RangeAdaptorClosure):
Befriend both operator| overloads.
* src/c++23/std.cc.in: Don't export views::__adaptor::operator|.

Reviewed-by: Jonathan Wakely 
---
 libstdc++-v3/include/std/ranges  | 23 +--
 libstdc++-v3/src/c++23/std.cc.in |  6 --
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 5153dcc26c4..f4b89778479 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -949,8 +949,7 @@ namespace views::__adaptor
   // _S_has_simple_call_op to true if the behavior of this adaptor is
   // independent of the constness/value category of the adaptor object.
   template
-struct _RangeAdaptorClosure
-{ };
+struct _RangeAdaptorClosure;
 
   template
 requires (!same_as<_Tp, _RangeAdaptorClosure<_Up>>)
@@ -984,6 +983,26 @@ namespace views::__adaptor
 }
 #pragma GCC diagnostic pop
 
+  template
+struct _RangeAdaptorClosure
+{
+  // In non-modules compilation ADL finds these operators either way and
+  // the friend declarations are redundant.  But with the std module these
+  // friend declarations enable ADL to find these operators without having
+  // to export them.
+  template
+   requires __is_range_adaptor_closure<_Self>
+ && __adaptor_invocable<_Self, _Range>
+   friend constexpr auto
+   operator|(_Range&& __r, _Self&& __self);
+
+  template
+   requires __is_range_adaptor_closure<_Lhs>
+ && __is_range_adaptor_closure<_Rhs>
+   friend constexpr auto
+   operator|(_Lhs&& __lhs, _Rhs&& __rhs);
+};
+
   // The base class of every range adaptor non-closure.
   //
   // The static data member _Derived::_S_arity must contain the total number of
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index 7d787a5..16e66c3d921 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -2366,12 +2366,6 @@ export namespace std
 using ranges::concat_view;
 namespace views { using views::concat; }
 #endif
-
-// FIXME can we avoid this export using friends?
-namespace views::__adaptor
-{
-  using __adaptor::operator|;
-}
   }
 }
 
-- 
2.47.1.313.gcc01bad4a9



Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-27 Thread Jerry D
On 11/27/24 12:31 PM, Harald Anlauf wrote:

Dear all,

the attached patch fixes a wrong-code issue with bounds-checking
enabled when doing I/O of an array section and an index is either
an expression or a function result.  The problem does not occur
without bounds-checking.

When looking at the original testcase, the function occuring in
the affected index was evaluated twice, once with wrong arguments.

The most simple solution appears to fall back to scalarization
with bounds-checking enabled.  If someone has a quick idea to
handle this better, please speak up!

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This seems to be a 14/15 regression, so a backport is advisable.

Thanks,
Harald



The patch looks OK to me.

I wonder if this fall back to the scalarizer should be done everywhere 
if a a user has specified bounds checking, what is the point of 
optimizing array references?


If the code works in 13 maybe we need to isolate to what broke it and 
intervene at that place.


Also go ahead with back porting if no other ideas pop up.  I just fear 
we are covering up something else.


Jerry



Re: [PATCH 3/3] dwarf: lto: Stabilize external die references.

2024-11-27 Thread Richard Biener
On Wed, Nov 6, 2024 at 3:36 PM Michal Jires  wrote:
>
> During Incremental LTO, contents of LTO partitions diverge because of
> external DIE references (DW_AT_abstract_origin).
>
> External references are in form 'die_symbol+offset'.
> Originally there is only single die_symbol for each compilation unit and
> its offsets are in 100'000s, which easily diverge.
>
> Die symbols have to be unique across compilation units. Originally for
> this purpose the die symbol name is computed from hash of entire file.
> To avoid this I added flag_lto_debuginfo_assume_unique_filepaths
> which computes the die_symbol only from filepath, which seems reasonable
> assumption for any project using incremental LTO.
> Compilation unit's die symbol name is then prepended to each die symbol
> for uniqueness.
>
> To remove divergence of offsets in case of C++, we have to add die
> symbols to DW_TAG_subprogram (functions), DW_TAG_variable and
> DW_TAG_namespace.

I wonder why you could not always do this for a subset of symbols,
namely those exported from the current TU and building a symbol
based on the symbols assembler name?

That is, I dislike relying on a new flag_lto_debuginfo_assume_unique_filepaths
flag.

I'd also really like to see a way to get rid of those symbols at link time :/
Or at least make them smaller?  For example by hashing the assembler
name?  The BFD linker has .gnu_lto_* special-casing for sections to discard,
maybe we can add a special .note section, .note.gnu.discard_syms with
a list of symbols to discard after link editing?

> Benefits:
> Before this patch Incremental LTO diverges/recompiles ~twice as much
> with '-g'. With this, additional divergence with '-g' is under 10 %.
>
> Negatives:
> When the flag is set, the added die symbols survive into final
> executable. For `cc1` executable this represents almost 10 % size
> increase of only added symbols.
> You can strip them out, but I have not found a simple way to remove them
> automatically in GCC.
> However for the purposes of Incremental LTO it should suffice. There was
> no measured compilation time increase because of streaming these
> additional symbols/strings.

I fail to see how this helps without adjusting dwarf2out_die_ref_for_decl?

Richard.

> gcc/ChangeLog:
>
> * common.opt: New flag.
> * dwarf2out.cc (compute_comp_unit_symbol):
>   With flag, don't checksum contents but filepath.
> (compute_die_symbols_from_die): New.
> (compute_die_symbols): New.
> (dwarf2out_early_finish): Call compute_die_symbols.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/lto/die_symbol_conflicts_0.C: New test.
> ---
>  gcc/common.opt|   4 +
>  gcc/dwarf2out.cc  | 120 +-
>  .../g++.dg/lto/die_symbol_conflicts_0.C   |  12 ++
>  3 files changed, 132 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 12b25ff486d..4aa80f0df8f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2253,6 +2253,10 @@ flto-partition=
>  Common Joined RejectNegative Enum(lto_partition_model) 
> Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED)
>  Specify the algorithm to partition symbols and vars at linktime.
>
> +flto-debuginfo-assume-unique-filepaths
> +Common Var(flag_lto_debuginfo_assume_unique_filepaths) Init(0)
> +Assume all linked source files have unique filepaths.
> +
>  ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
>  flto-compression-level=
>  Common Joined RejectNegative UInteger Var(flag_lto_compression_level) 
> Init(-1) IntegerRange(0, 19)
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index bf1ac45ed73..af272a3a824 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -8015,9 +8015,17 @@ compute_comp_unit_symbol (dw_die_ref unit_die)
>   the name filename of the unit.  */
>
>md5_init_ctx (&ctx);
> -  mark = 0;
> -  die_checksum (unit_die, &ctx, &mark);
> -  unmark_all_dies (unit_die);
> +  if (flag_lto_debuginfo_assume_unique_filepaths)
> +{
> +  gcc_assert (die_name);
> +  md5_process_bytes (die_name, strlen (die_name), &ctx);
> +}
> +  else
> +{
> +  mark = 0;
> +  die_checksum (unit_die, &ctx, &mark);
> +  unmark_all_dies (unit_die);
> +}
>md5_finish_ctx (&ctx, checksum);
>
>/* When we this for comp_unit_die () we have a DW_AT_name that might
> @@ -33119,6 +33127,110 @@ ctf_debug_do_cu (dw_die_ref die)
>FOR_EACH_CHILD (die, c, ctf_do_die (c));
>  }
>
> +/* Recursively compute die symbols from DIE's attributes.
> +   Not all symbols can be computed this way.  */
> +static void
> +compute_die_symbols_from_die (dw_die_ref die)
> +{
> +  dw_attr_node *a;
> +  int i;
> +  const char* name = NULL;
> +
> +  if (!die->die_attr)
> +return;
> +
> +  switch (die->die_tag)
> +{
> +  /* Assumed that each die parent has at most single chil

Re: [committed] i386: x86 can use x >> -y for x >> 32-y [PR36503]

2024-11-27 Thread Uros Bizjak
On Mon, Nov 25, 2024 at 12:10 PM Jakub Jelinek  wrote:
>
> On Mon, Nov 25, 2024 at 11:52:31AM +0100, Uros Bizjak wrote:
> > > Any reason for an exact comparison rather than
> > >   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
> > > ?
> > > I mean, we can optimize this way 1U << (32 - x) or
> > > 1U << (1504 - x) or any other multiply of 32.
> >
> > Count values outside of [0, bitwidth) are undefined in general. Also,
>
> Sure, e.g. the 1504 - x case will be undefined if x is not in [1473, 1504]
> But it very well could be in that range.
>
> What I'm arguing about is that changing the test (in all the new patterns)
> will not noticeably slow compilation down and be more general.
>
> > during the bootstrap only values of 32 or 64 were found, so I thought
> > to not complicate the condition too much
> >
> > FYI, the conversion triggers 504 times during the bootstrap, I somehow
> > expected a lower number.
> >
> > > Similarly, we can optimize 1U << (32 + x) to 1U << x and
> > > again do that for any other multiplies of 32.
> >
> > I don't think anybody uses the above idiom, it is valid only for
> > certain targets (x86 and the ones that mask count argument), but
> > undefined in general.
>
> No, it is again valid whenever x is in [-32, -1] range, which it very well
> could be (or if x is unsigned, in [-32U, -1U] range).
> Of course if it is rare, adding new patterns for it might not be worth it.

Huh, I was wrong. There are 356 cases where 1U << (32 + x) to 1U << x
optimization triggers during the bootstrap.

I am testing the attached patch.

Thanks,
Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index df78e4df9d8..788d2555a1a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15896,7 +15896,53 @@ (define_insn_and_split "*ashl3_mask_1"
   ""
   [(set_attr "isa" "*,bmi2")])
 
-(define_insn_and_split "*ashl3_negcnt"
+(define_insn_and_split "*ashl3_add"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (subreg:QI
+   (plus
+ (match_operand 2 "int248_register_operand" "c,r")
+ (match_operand 3 "const_int_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*ashl3_add_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (plus:QI
+   (match_operand:QI 2 "register_operand" "c,r")
+   (match_operand:QI 3 "const_int_operand"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*ashl3_sub"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(ashift:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
@@ -15927,7 +15973,7 @@ (define_insn_and_split "*ashl3_negcnt"
 }
   [(set_attr "isa" "*,bmi2")])
 
-(define_insn_and_split "*ashl3_negcnt_1"
+(define_insn_and_split "*ashl3_sub_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(ashift:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
@@ -16678,7 +16724,53 @@ (define_insn_and_split "*3_mask_1"
   ""
   [(set_attr "isa" "*,bmi2")])
 
-(define_insn_and_split "*3_negcnt"
+(define_insn_and_split "*3_add"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_shiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (subreg:QI
+   (plus
+ (match_operand 2 "int248_register_operand" "c,r")
+ (match_operand 3 "const_int_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (any_shiftrt:SWI48 (match_dup 1)
+ (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*3_add_1"
+  [(set (match_operand:SWI48 0 "noni

RE: [PATCH v4] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-27 Thread Li, Pan2
> OK, but please drop "a" suffix from new files (when I added original
> pr112600 testcases, "a" suffix was for char type and "b" was for short
> type, but this is not the case with your testcases).

Thanks Uros, will commit with these changes if no surprise from test.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, November 27, 2024 4:15 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
Subject: Re: [PATCH v4] I386: Add more testcases for unsigned SAT_ADD vector 
pattern

On Wed, Nov 27, 2024 at 3:00 AM  wrote:
>
> From: Pan Li 
>
> Some forms like below failed to recog the SAT_ADD pattern

... failed to be recognized as a SAT_ADD pattern ...

> for target i386.  It is related to some match pattern
> extraction but get fixed after the refactor of the SAT_ADD
> pattern.  Thus, add testcases to ensure we may have similar
> issue in futrue.

... to ensure we won't have similar issues in the future.

>
>   #define DEF_SAT_ADD(T)   \
>   T sat_add_##T (T x, T y) \
>   {\
> T res; \
> res = x + y;   \
> res |= -(T)(res < x);  \
> return res;\
>   }
>
>   #define VEC_DEF_SAT_ADD(T)   \
>   void vec_sat_add(T * restrict a, T * restrict b) \
>   {\
> for (int i = 0; i < 8; i++)\
>   b[i] = sat_add_##T (a[i], b[i]); \
>   }
>
>   DEF_SAT_ADD (uint32_t)
>   VEC_DEF_SAT_ADD (uint32_t)
>
> The below test suites are passed for this patch.
> make -k check-gcc RUNTESTFLAGS="--target_board=unix\{,-m32\} 
> i386.exp=pr112600-5a-*.c"
>
> PR target/112600
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr112600-5a-u16.c: New test.
> * gcc.target/i386/pr112600-5a-u32.c: New test.
> * gcc.target/i386/pr112600-5a-u64.c: New test.
> * gcc.target/i386/pr112600-5a-u8.c: New test.
> * gcc.target/i386/pr112600-5a.h: New test.

OK, but please drop "a" suffix from new files (when I added original
pr112600 testcases, "a" suffix was for char type and "b" was for short
type, but this is not the case with your testcases).

Thanks,
Uros.

>
> Signed-off-by: Pan Li 
> ---
>  .../gcc.target/i386/pr112600-5a-u16.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u32.c |  9 
>  .../gcc.target/i386/pr112600-5a-u64.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
>  gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
>  5 files changed, 61 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> new file mode 100644
> index 000..f462bfa4800
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint16_t)
> +VEC_DEF_SAT_ADD (uint16_t)
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 3 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> new file mode 100644
> index 000..5797c97ebe9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> @@ -0,0 +1,9 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint32_t)
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> new file mode 100644
> index 000..d5f81f72ed5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile  { target { ! ia32 } } } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint64_t)
> +
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> new file mode 100644
> index 000..cb8657ecd86
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint8_t)
> +VEC_DEF_SAT_A

Re: [PATCH] libstdc++: module std fixes

2024-11-27 Thread Patrick Palka
On Tue, 26 Nov 2024, Jason Merrill wrote:

> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> -- 8< --
> 
> Some tests were failing due to the exported using declaration of iter_move
> conflicting with friend declarations; the exported using needs to be in the
> inline namespace, like the customization point itself, rather than
> std::ranges.
> 
> Also add a few missing exports.
> 
> Some tests failed to find some operators defined in implementation-detail
> namespaces; this exports them as well, but as previously discussed it's
> probably preferable to make those operators friends so ADL can find them
> that way.
> 
> libstdc++-v3/ChangeLog:
> 
>   * src/c++23/std.cc.in: Fix iter_move/swap.  Add fold_left_first, to,
>   concat, and some operators.
> ---
>  libstdc++-v3/src/c++23/std.cc.in | 64 ++--
>  1 file changed, 36 insertions(+), 28 deletions(-)
> 
> diff --git a/libstdc++-v3/src/c++23/std.cc.in 
> b/libstdc++-v3/src/c++23/std.cc.in
> index d225c8b8c85..7d787a5 100644
> --- a/libstdc++-v3/src/c++23/std.cc.in
> +++ b/libstdc++-v3/src/c++23/std.cc.in
> @@ -494,8 +494,13 @@ export namespace std
>  #endif
>  #if __cpp_lib_ranges_fold
>  using ranges::fold_left;
> +using ranges::fold_left_first;
> +using ranges::fold_left_first_with_iter;
>  using ranges::fold_left_with_iter;
>  using ranges::fold_right;
> +using ranges::fold_right_last;
> +using ranges::in_value_result;
> +using ranges::out_value_result;
>  #endif
>  #if __cpp_lib_ranges_find_last
>  using ranges::find_last;
> @@ -1572,10 +1577,14 @@ export namespace std
>using std::iter_reference_t;
>using std::iter_value_t;
>using std::iterator_traits;
> -  namespace ranges
> +  // _Cpo is an implementation detail we can't avoid exposing; if we do the
> +  // using in ranges directly, it conflicts with any friend functions of the
> +  // same name, which is why the customization points are in an inline
> +  // namespace in the first place.
> +  namespace ranges::inline _Cpo
>{
> -using std::ranges::iter_move;
> -using std::ranges::iter_swap;
> +using _Cpo::iter_move;
> +using _Cpo::iter_swap;
>}
>using std::advance;
>using std::bidirectional_iterator;
> @@ -1679,6 +1688,15 @@ export namespace std
>using std::make_const_sentinel;
>  #endif
>  }
> +// FIXME these should be friends of __normal_iterator to avoid exporting
> +// __gnu_cxx.
> +export namespace __gnu_cxx
> +{
> +  using __gnu_cxx::operator==;
> +  using __gnu_cxx::operator<=>;
> +  using __gnu_cxx::operator+;
> +  using __gnu_cxx::operator-;
> +}
>  
>  // 
>  export namespace std
> @@ -2278,43 +2296,32 @@ export namespace std
>namespace views = ranges::views;
>using std::tuple_element;
>using std::tuple_size;
> -#if __glibcxx_ranges_as_const // >= C++23
>namespace ranges
>{
> +#if __glibcxx_ranges_as_const // >= C++23
>  using ranges::constant_range;
>  using ranges::const_iterator_t;
>  using ranges::const_sentinel_t;
>  using ranges::range_const_reference_t;
>  using ranges::as_const_view;
>  namespace views { using views::as_const; }
> -  }
>  #endif
>  #ifdef __glibcxx_generator  // C++ >= 23 && __glibcxx_coroutine
> -  namespace ranges
> -  {
>  using ranges::elements_of;
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_as_rvalue // C++ >= 23
> -  namespace ranges {
>  using ranges::as_rvalue_view;
>  namespace views { using views::as_rvalue; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_chunk // C++ >= 23
> -  namespace ranges {
>  using ranges::chunk_view;
>  namespace views { using views::chunk; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_slide // C++ >= 23
> -  namespace ranges {
>  using ranges::slide_view;
>  namespace views { using views::slide; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_zip // C++ >= 23
> -  namespace ranges {
>  using ranges::zip_view;
>  using ranges::zip_transform_view;
>  using ranges::adjacent_view;
> @@ -2327,44 +2334,45 @@ export namespace std
>using views::pairwise;
>using views::pairwise_transform;
>  }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_chunk_by // C++ >= 23
> -  namespace ranges {
>  using ranges::chunk_by_view;
>  namespace views { using views::chunk_by; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_join_with // C++ >= 23
> -  namespace ranges {
>  using ranges::join_with_view;
>  namespace views { using views::join_with; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_repeat // C++ >= 23
> -  namespace ranges {
>  using ranges::repeat_view;
>  namespace views { using views::repeat; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_stride // C++ >= 23
> -  namespace ranges {
>  using ranges::stride_view;
>  namespace views { using views::stride; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_cartesian_product // C++ >= 23
> -  namespace ranges {
>  using ranges::cartesian_product_view;
>

Re: [PATCH RFC] c++: modules and using-directives

2024-11-27 Thread Jason Merrill
On 11/27/24 1:43 AM, Nathaniel Shead wrote:

On Wed, Nov 27, 2024 at 12:03:23AM -0500, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.

Does this approach make sense to you?  Any other ideas?

-- 8< --

We weren't representing 'using namespace' at all in modules, which broke
some of the  literals tests.

I experimented with various approaches to representing them, and ended up
with emitting them as a pseudo-binding for "using", which as a keyword can't
have any real bindings.  Then reading this pseudo-binding adds it to
using_directives instead of the usual handling.

+   /* ??? should we try to distinguish whether the using-directive
+  is purview/exported?  */
+   add_binding_entity (used, WMB_Flags(WMB_Using|WMB_Purview), &data);


I don't think the standard is entirely clear about how using-directives
should interact with modules; they don't declare names, and before P2615
were in fact forbidden from being explicitly exported, which implies to
me that the intention was for them to not be considered outside of the
declaring module.


P2615 is certainly clear about allowing them.  Given that, I think the 
general rules of [module.interface] apply, so it should be found by name 
lookup in an importing TU.



That said, if we were to do this I would think the logic should match
what we do for any other name, in terms of requiring it to be explicitly
exported/purview as required; in particular, I would hope that something
like this doesn't happen:

   // m.cpp
   export module M;
   using namespace std;

   // test.cpp
   #include 
   import M;
   int main() {
 cout << "hello\n";  // using-directive "inherited" from M?
   }


Good point, I have more work to do.

I think that since ADL doesn't consider using-directives, we only need 
to represent the exported ones?



  name_lookup::search_namespace_only (tree scope)
  {
bool found = false;
+  if (modules_p () && name && !id_equal (name, "using"))
+{
+  name_lookup u (get_identifier ("using"));
+  u.search_namespace_only (scope);
+}


Could we just add to the list of using-directives within read_namespaces
perhaps?  Probably as a second pass after all namespaces have been
created so that we don't run into issues with circular directives.
That would mean we wouldn't need to do this in every lookup.


That was my first thought, but I had trouble figuring out how.  Perhaps 
I'll try again.


Jason



[PATCH] c++: template-id dependence wrt local static arg [PR117792]

2024-11-27 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14/13?

-- >8 --

Here we end up ICEing at instantiation time for the call to
f ultimately because we wrongly consider the call to be
non-dependent, and so we specialize f ahead of time and then get
confused when fully substituting this specialization.

The call is dependent due to [temp.dep.temp]/3 and we miss that because
function template-id arguments aren't coerced until overload resolution,
and so the local static template argument lacks an implicit cast to
reference type that value_dependent_expression_p looks for before
considering dependence of the address.  Other kinds of template-ids aren't
affected since they're coerced ahead of time.

So when considering dependence of a function template-id, we need to
conservatively consider dependence of the address of each argument (if
applicable).

PR c++/117792

gcc/cp/ChangeLog:

* pt.cc (type_dependent_expression_p): Consider the dependence
of the address of each template argument of a function
template-id.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype7.C: New test.
---
 gcc/cp/pt.cc  | 10 --
 gcc/testsuite/g++.dg/cpp1z/nontype7.C | 22 ++
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 564e368ff43..2f2ec39b083 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -28864,9 +28864,15 @@ type_dependent_expression_p (tree expression)
 
   if (TREE_CODE (expression) == TEMPLATE_ID_EXPR)
{
- if (any_dependent_template_arguments_p
- (TREE_OPERAND (expression, 1)))
+ tree args = TREE_OPERAND (expression, 1);
+ if (any_dependent_template_arguments_p (args))
return true;
+ /* Arguments of a function template-id aren't necessarily coerced
+yet so we must conservatively assume that the address (and not
+just value) of the argument matters as per [temp.dep.temp]/3.  */
+ for (tree arg : tree_vec_range (args))
+   if (has_value_dependent_address (arg))
+ return true;
  expression = TREE_OPERAND (expression, 0);
  if (identifier_p (expression))
return true;
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype7.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype7.C
new file mode 100644
index 000..b03c643c987
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype7.C
@@ -0,0 +1,22 @@
+// PR c++/117792
+// { dg-do compile { target c++17 } }
+
+template
+void f(T) { }
+
+template
+void f(...) = delete;
+
+template int v;
+
+template struct A { static const int value = 0; };
+
+template
+void g() {
+  static const int local_static = 0;
+  auto x = v; // OK
+  auto y = A::value; // OK
+  f(0); // ICE
+}
+
+template void g();
-- 
2.47.1.313.gcc01bad4a9



Re: [PATCH 1/2] c++: some further concepts cleanups

2024-11-27 Thread Patrick Palka
On Tue, 5 Nov 2024, Jason Merrill wrote:

> On 10/15/24 12:45 AM, Patrick Palka wrote:
> > This patch further cleans up the concepts code following the removal of
> > Concepts TS support:
> > 
> >* concept-ids are now the only kind of "concept check", so we can
> >  simplify some code accordingly.  In particular resolve_concept_check
> >  seems like a no-op and can be removed.
> >* In turn, deduce_constrained_parameter doesn't seem to do anything
> >  interesting.
> >* In light of the above we might as well inline finish_type_constraints
> >  into its only caller.  Note that the "prototype parameter" of a
> >  concept is just the first template parameter which the caller can
> >  easily obtain itself.
> 
> But it's still a defined term in the language
> (https://eel.is/c++draft/temp#concept-7) so I think it's worth having an
> accessor function.  I agree with doing away with the function that returns a
> pair.

Done.

> 
> >* placeholder_extract_concept_and_args is only ever called on a
> >  concept-id, so it's simpler to inline it into its callers.
> >* There's no such thing as a template-template-parameter wtih a
> >  type-constraint, so we can remove such handling from the parser.
> >  This means is_constrained_parameter is currently equivalent to
> >  declares_constrained_template_template_parameter, so let's prefer
> >  to use the latter.
> 
> Why prefer the longer name?

"is_constrained_parameter" suggests it should return true for a
constrained non-type parameter, but that's currently not the case and
callers don't expect/want that behavior, so the longer name seems
more accurate.

> 
> > We might be able to remove WILDCARD_DECL and CONSTRAINED_PARM_PROTOTYPE
> > now as well, but I left that as future work.
> > 
> > @@ -18901,7 +18842,8 @@ cp_parser_template_parameter (cp_parser* parser,
> > bool *is_non_type,
> >   }
> >   /* The parameter may have been constrained type parameter.  */
> > -  if (is_constrained_parameter (parameter_declarator))
> > +  tree type = parameter_declarator->decl_specifiers.type;
> > +  if (declares_constrained_type_template_parameter (type))
> 
> Why not retain a function that takes the declarator?

I added another overload of declares_constrained_type_template_parameter
taking a cp_parameter_declarator.

> 
> >   return finish_constrained_parameter (parser,
> >parameter_declarator,
> >is_non_type);
> > @@ -20987,11 +20929,12 @@ cp_parser_placeholder_type_specifier (cp_parser
> > *parser, location_t loc,
> > tsubst_flags_t complain = tentative ? tf_none : tf_warning_or_error;
> >   /* Get the concept and prototype parameter for the constraint.  */
> > -  tree_pair info = finish_type_constraints (tmpl, args, complain);
> > -  tree con = info.first;
> > -  tree proto = info.second;
> > -  if (con == error_mark_node)
> > +  tree check = build_type_constraint (tmpl, args, complain);
> > +  if (check == error_mark_node)
> >   return error_mark_node;
> > +  tree con = STRIP_TEMPLATE (tmpl);
> > +  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (tmpl);
> > +  tree proto = TREE_VALUE (TREE_VEC_ELT (parms, 0));
> 
> And as mentioned above, let's have a small function that returns the prototype
> parameter of a concept, to use here.

Done.

> 
> Incidentally, why do we want to strip the template from con?  Is that also a
> relic of the different possible forms of concept?

Oops, I think we could use DECL_TEMPLATE_RESULT here instead of STRIP_TEMPLATE,
since 'tmpl' will always be a TEMPLATE_DECL.

Here's v2 which squashes the 3rd patch (removing WILDCARD_DECL) into
this one, and addresses your feedback.

-- >8 --

Subject: [PATCH] c++: some further concepts cleanups

This patch further cleans up the concepts code following the removal of
Concepts TS support:

  * concept-ids are now the only kind of "concept check", so we can
simplify some code accordingly.  In particular resolve_concept_check
seems like a no-op and can be removed.
  * In turn, deduce_constrained_parameter doesn't seem to do anything
interesting.
  * In light of the above we might as well inline finish_type_constraints
into its only caller.
  * Introduce and use a helper for obtaining the prototype parameter of
a concept, i.e. its first template parameter.
  * placeholder_extract_concept_and_args is only ever called on a
concept-id, so it's simpler to inline it into its callers.
  * There's no such thing as a template-template-parameter with a
type-constraint, so we can remove such handling from the parser.
This means is_constrained_parameter is currently equivalent to
declares_constrained_type_template_parameter, so let's prefer
to use the latter.
  * Remove WILDCARD_DECL and instead use the concept's prototype parameter
as the dummy first argument of a type-constraint during template
argument coerc

[committed] i386: x86 can use x >> y for x >> 32+y [PR36503]

2024-11-27 Thread Uros Bizjak
x86 targets mask 32-bit shifts with a 5-bit mask (and 64-bit with 6-bit mask),
so they can use x >> y instead of x >> 32+y.

The optimization converts:

leal32(%rsi), %ecx
sall%cl, %eax

to:
sall%cl, %eax

PR target/36503

gcc/ChangeLog:

* config/i386/i386.md (*ashl3_add):
New define_insn_and_split pattern.
(*ashl3_add_1): Ditto.
(*3_add): Ditto.
(*3_add_1): Ditto.
(*ashl3_sub): Rename from *ashl3_negcnt.
(*ashl3_sub_1): Rename from *ashl3_negcnt_1.
(*3_sub): Rename from *3_negcnt.
(*3_sub_1): Rename from *3_negcnt_1.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr36503-3.c: New test.
* gcc.target/i386/pr36503-4.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index df78e4df9d8..2fc48006bca 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15896,7 +15896,53 @@ (define_insn_and_split "*ashl3_mask_1"
   ""
   [(set_attr "isa" "*,bmi2")])
 
-(define_insn_and_split "*ashl3_negcnt"
+(define_insn_and_split "*ashl3_add"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (subreg:QI
+   (plus
+ (match_operand 2 "int248_register_operand" "c,r")
+ (match_operand 3 "const_int_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*ashl3_add_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (plus:QI
+   (match_operand:QI 2 "register_operand" "c,r")
+   (match_operand:QI 3 "const_int_operand"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*ashl3_sub"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(ashift:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
@@ -15927,7 +15973,7 @@ (define_insn_and_split "*ashl3_negcnt"
 }
   [(set_attr "isa" "*,bmi2")])
 
-(define_insn_and_split "*ashl3_negcnt_1"
+(define_insn_and_split "*ashl3_sub_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(ashift:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
@@ -16678,7 +16724,53 @@ (define_insn_and_split "*3_mask_1"
   ""
   [(set_attr "isa" "*,bmi2")])
 
-(define_insn_and_split "*3_negcnt"
+(define_insn_and_split "*3_add"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_shiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (subreg:QI
+   (plus
+ (match_operand 2 "int248_register_operand" "c,r")
+ (match_operand 3 "const_int_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (any_shiftrt:SWI48 (match_dup 1)
+ (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*3_add_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_shiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (plus:QI
+   (match_operand:QI 2 "register_operand" "c,r")
+   (match_operand:QI 3 "const_int_operand"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)
+   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (any_shiftrt:SWI48 (match_dup 1)
+ (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "isa" "*,bmi2")])
+
+(define_insn_and_split "*3_sub"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(any_shiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand")
@@ -16709

Re: [PATCH 1/3] dwarf: Delete dead code.

2024-11-27 Thread Richard Biener
On Wed, Nov 6, 2024 at 3:35 PM Michal Jires  wrote:
>
> This if branch checks for comdat_type_p (GTY union tag) and then uses
> incorrect union variant die_id.die_symbol. There is no way to create
> this combination of valid values even if we ignore the GTY.
>
> Running testsuite with abort() in branch confirms that it is never taken.

Did you test with -fdebug-types-section?  This code should be still needed
to generate the linkonce debug-type sections.  Note it doesn't work (very well)
when combined with LTO.

Richard.

> gcc/ChangeLog:
>
> * dwarf2out.cc (output_comp_unit): Delete dead code.
> ---
>  gcc/dwarf2out.cc | 25 +
>  1 file changed, 5 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 38aedb64470..e10a5c78fe9 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -11234,8 +11234,7 @@ static void
>  output_comp_unit (dw_die_ref die, int output_if_empty,
>   const unsigned char *dwo_id)
>  {
> -  const char *secname, *oldsym;
> -  char *tmp;
> +  const char *oldsym;
>
>/* Unless we are outputting main CU, we may throw away empty ones.  */
>if (!output_if_empty && die->die_child == NULL)
> @@ -11269,21 +11268,10 @@ output_comp_unit (dw_die_ref die, int 
> output_if_empty,
>calc_die_sizes (die);
>
>oldsym = die->die_id.die_symbol;
> -  if (oldsym && die->comdat_type_p)
> -{
> -  tmp = XALLOCAVEC (char, strlen (oldsym) + 24);
>
> -  sprintf (tmp, ".gnu.linkonce.wi.%s", oldsym);
> -  secname = tmp;
> -  die->die_id.die_symbol = NULL;
> -  switch_to_section (get_section (secname, SECTION_DEBUG, NULL));
> -}
> -  else
> -{
> -  switch_to_section (debug_info_section);
> -  ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
> -  info_section_emitted = true;
> -}
> +  switch_to_section (debug_info_section);
> +  ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
> +  info_section_emitted = true;
>
>/* For LTO cross unit DIE refs we want a symbol on the start of the
>   debuginfo section, not on the CU DIE.  */
> @@ -11322,10 +11310,7 @@ output_comp_unit (dw_die_ref die, int 
> output_if_empty,
>/* Leave the marks on the main CU, so we can check them in
>   output_pubnames.  */
>if (oldsym)
> -{
> -  unmark_dies (die);
> -  die->die_id.die_symbol = oldsym;
> -}
> +unmark_dies (die);
>  }
>
>  /* Whether to generate the DWARF accelerator tables in .debug_pubnames
> --
> 2.47.0
>


Re: [PATCH 13/15] Support for 64-bit location_t: Internal parts

2024-11-27 Thread Richard Biener
On Sun, Nov 3, 2024 at 11:28 PM Lewis Hyatt  wrote:
>
> Several of the selftests in diagnostic-show-locus.cc and input.cc are
> sensitive to linemap internals. Adjust them here so they will support 64-bit
> location_t if configured.
>
> Likewise, handle 64-bit location_t in the support for
> -fdump-internal-locations. As was done with the analyzer, convert to
> (unsigned long long) explicitly so that 32- and 64-bit can be handled with
> the same printf formats.

I was hoping David would have a look here.  Absent from comments from him
this is OK when all else is approved and after giving him another week.

What's missing review now?  I've lost track ...

Thanks,
Richard.

> gcc/ChangeLog:
>
> * diagnostic-show-locus.cc
> (test_one_liner_fixit_validation_adhoc_locations): Adapt so it can
> effectively test 7-bit ranges instead of 5-bit ranges.
> (test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
> * input.cc (get_end_location): Adjust types to support 64-bit
> location_t.
> (write_digit_row): Likewise.
> (dump_location_range): Likewise.
> (dump_location_info): Likewise.
> (class line_table_case): Likewise.
> (test_accessing_ordinary_linemaps): Replace some hard-coded
> constants with the values defined in line-map.h.
> (for_each_line_table_case): Likewise.
> ---
>  gcc/diagnostic-show-locus.cc | 128 +--
>  gcc/input.cc | 100 ++-
>  2 files changed, 157 insertions(+), 71 deletions(-)
>
> diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
> index a2a4b047ff9..ffe9e807104 100644
> --- a/gcc/diagnostic-show-locus.cc
> +++ b/gcc/diagnostic-show-locus.cc
> @@ -4013,12 +4013,12 @@ test_one_liner_fixit_validation_adhoc_locations ()
>  {
>/* Generate a range that's too long to be packed, so must
>   be stored as an ad-hoc location (given the defaults
> - of 5 bits or 0 bits of packed range); 41 columns > 2**5.  */
> + of 5 or 7 bits or 0 bits of packed range); 150 columns > 2**7.  */
>const location_t c7 = linemap_position_for_column (line_table, 7);
> -  const location_t c47 = linemap_position_for_column (line_table, 47);
> -  const location_t loc = make_location (c7, c7, c47);
> +  const location_t c157 = linemap_position_for_column (line_table, 157);
> +  const location_t loc = make_location (c7, c7, c157);
>
> -  if (c47 > LINE_MAP_MAX_LOCATION_WITH_COLS)
> +  if (c157 > LINE_MAP_MAX_LOCATION_WITH_COLS)
>  return;
>
>ASSERT_TRUE (IS_ADHOC_LOC (loc));
> @@ -4032,7 +4032,18 @@ test_one_liner_fixit_validation_adhoc_locations ()
>
>  test_diagnostic_context dc;
>  ASSERT_STREQ (" foo = bar.field;\n"
> - "   ^~   \n"
> + "   ^~   "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  \n"
>   "   test\n",
>   dc.test_show_locus (richloc));
>}
> @@ -4040,29 +4051,62 @@ test_one_liner_fixit_validation_adhoc_locations ()
>/* Remove.  */
>{
>  rich_location richloc (line_table, loc);
> -source_range range = source_range::from_locations (loc, c47);
> +source_range range = source_range::from_locations (loc, c157);
>  richloc.add_fixit_remove (range);
>  /* It should not have been discarded by the validator.  */
>  ASSERT_EQ (1, richloc.get_num_fixit_hints ());
>
>  test_diagnostic_context dc;
>  ASSERT_STREQ (" foo = bar.field;\n"
> - "   ^~   \n"
> - "   -\n",
> + "   ^~   "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  "
> + "  \n"
> + "   -"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--"
> + "--\n",
>   dc.test_show_locus (richloc));
>}
>

[PATCH v3 3/3] Match: make SAT_ADD case 7 commutative

2024-11-27 Thread Akram Ahmad
Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.

The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Therefore, this commit adds the
commutative property to this case which causes more valid cases of
unsigned saturating arithmetic to be recognised.

Before:
 
 _1 = BIT_FIELD_REF ;
 sum_5 = _1 + a_4(D);
 if (a_4(D) <= sum_5)
   goto ; [INV]
 else
   goto ; [INV]

  :

  :
 _2 = PHI <255(3), sum_5(2)>
 return _2;

After:
   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  _2 = .SAT_ADD (_1, a_4(D)); [tail call]
  return _2;

This passes the aarch64-none-linux-gnu regression tests with no new
failures. The tests will be skipped on targets which do not support
IFN_SAT_ADD for each of these modes via dg-require-effective-target.

gcc/ChangeLog:

* match.pd: Modify existing case for SAT_ADD.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test.
---
 gcc/match.pd  |  4 ++--
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 +++
 5 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fc5efa6247..98c50ab097f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
SAT_ADD = (X + Y) | -((X + Y) < X)  */
 (match (usadd_left_part_1 @0 @1)
- (plus:c @0 @1)
+ (plus @0 @1)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
@@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation add, case 7 (branch with le):
SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
 (match (unsigned_integer_sat_add @0 @1)
- (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
+ (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep))
 
 /* Unsigned saturation add, case 8 (branch with gt):
SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
new file mode 100644
index 000..866ce6cdbc1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target usadd_himode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
new file mode 100644
index 000..8f841c32852
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target usadd_simode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
new file mode 100644
index 000..39548d63384
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target usadd_dimode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at 

[PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic

2024-11-27 Thread Akram Ahmad
Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

v3 of this series now introduces support for dg-require-effective-target for 
both usadd
and ussub optabs as well as individual modes that these optabs may be 
implemented for.
aarch64 support for these optabs is in review, so there are currently no 
targets listed
in these effective-target options.

Regression tests for aarch64 all pass with no failures.

v3 changes:
- add support for new effective-target keywords.
- tests for the two new patterns now use the dg-require-effective-target so 
that they are
  skipped on relevant targets.

v2 changes:
- add new tests for both patterns (these will fail on targets which don't 
implement
  the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series 
adds
  support for this in aarch64).
- minor adjustment to the constraints on the match statement for 
usadd_left_part_1.

If this is OK for master, please commit these on my behalf, as I do not have 
the ability
to do so.

Many thanks,

Akram

---

Akram Ahmad (3):
  testsuite: Support dg-require-effective-target for us{add, sub}
  Match: support new case of unsigned scalar SAT_SUB
  Match: make SAT_ADD case 7 commutative

 gcc/match.pd  | 12 +++-
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +
 gcc/testsuite/lib/target-supports.exp | 56 +++
 10 files changed, 214 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c

-- 
2.34.1



[PATCH v3 1/3] testsuite: Support dg-require-effective-target for us{add, sub}

2024-11-27 Thread Akram Ahmad
Support for middle-end representation of saturating arithmetic (via
IFN_SAT_ADD or IFN_SAT_SUB) cannot be determined externally, making it
currently impossible to selectively skip relevant tests on targets which
do not support this.

This patch adds new dg-require-effective-target keywords for each of the
unsigned saturating arithmetic optabs, for scalar QImode, HImode,
SImode, and DImode. These can then be used in future tests which focus
on these internal functions.

Currently passes aarch64 regression tests with no additional failures.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add new effective-target keywords
---
 gcc/testsuite/lib/target-supports.exp | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d113a08dff7..ec1d73970a1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4471,6 +4471,62 @@ proc check_effective_target_vect_complex_add_double { } {
}}]
 }
 
+# Return 1 if the target supports middle-end representation of saturating
+# addition for QImode, 0 otherwise.
+
+proc check_effective_target_usadd_qimode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# addition for HImode, 0 otherwise.
+
+proc check_effective_target_usadd_himode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# addition for SImode, 0 otherwise.
+
+proc check_effective_target_usadd_simode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# addition for DImode, 0 otherwise.
+
+proc check_effective_target_usadd_dimode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for QImode, 0 otherwise.
+
+proc check_effective_target_ussub_qimode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for HImode, 0 otherwise.
+
+proc check_effective_target_ussub_himode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for SImode, 0 otherwise.
+
+proc check_effective_target_ussub_simode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for DImode, 0 otherwise.
+
+proc check_effective_target_ussub_dimode { } {
+return 0
+}
+
 # Return 1 if the target supports signed int->float conversion
 #
 
-- 
2.34.1



[PATCH v3 2/3] Match: support new case of unsigned scalar SAT_SUB

2024-11-27 Thread Akram Ahmad
This patch adds a new case for unsigned scalar saturating subtraction
using a branch with a greater-than-or-equal condition. For example,

X >= (X - Y) ? (X - Y) : 0

is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars,
which therefore correctly matches more cases of IFN SAT_SUB. New tests
are added to verify this behaviour on targets which use the standard
names for IFN SAT_SUB, and the tests are skipped if the current target
does not support IFN_SAT_SUB for each of these modes (via
dg-require-effective-target).

This passes the aarch64 regression tests with no additional failures.

gcc/ChangeLog:

* match.pd: Add new match for SAT_SUB.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c: New test.
---
 gcc/match.pd  |  8 
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +++
 5 files changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ee53c25cef9..4fc5efa6247 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* Unsigned saturation sub, case 11 (branch with ge):
+  SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond^ (ge @0 (minus @0 @1))
+  (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
+
 /* Signed saturation sub, case 1:
T minus = (T)((UT)X - (UT)Y);
SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
new file mode 100644
index 000..641fac50858
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_himode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
new file mode 100644
index 000..27f3bae7d52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_simode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
new file mode 100644
index 000..92883ce60c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_dimode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
new file mode 100644
index 000..06ff91dbed0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_qimode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint8_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
-- 
2.34.1



[PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-27 Thread Harald Anlauf
Dear all,

the attached patch fixes a wrong-code issue with bounds-checking
enabled when doing I/O of an array section and an index is either
an expression or a function result.  The problem does not occur
without bounds-checking.

When looking at the original testcase, the function occuring in
the affected index was evaluated twice, once with wrong arguments.

The most simple solution appears to fall back to scalarization
with bounds-checking enabled.  If someone has a quick idea to
handle this better, please speak up!

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This seems to be a 14/15 regression, so a backport is advisable.

Thanks,
Harald

From fa47a04e74a862ea4b85fa6f74b4b6ce21b61716 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 27 Nov 2024 21:11:16 +0100
Subject: [PATCH] Fortran: fix crash with bounds check writing array section
 [PR117791]

	PR fortran/117791

gcc/fortran/ChangeLog:

	* trans-io.cc (gfc_trans_transfer): When an array index depends on
	a function evaluation or an expression, do not use optimized array
	I/O of an array section and fall back to normal scalarization.

gcc/testsuite/ChangeLog:

	* gfortran.dg/bounds_check_array_io.f90: New test.
---
 gcc/fortran/trans-io.cc   | 20 
 .../gfortran.dg/bounds_check_array_io.f90 | 31 +++
 2 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/bounds_check_array_io.f90

diff --git a/gcc/fortran/trans-io.cc b/gcc/fortran/trans-io.cc
index 961a711c530..906dd7c6eb6 100644
--- a/gcc/fortran/trans-io.cc
+++ b/gcc/fortran/trans-io.cc
@@ -2648,6 +2648,26 @@ gfc_trans_transfer (gfc_code * code)
 	 || gfc_expr_attr (expr).pointer))
 	goto scalarize;

+  /* With array-bounds checking enabled, force scalarization in some
+	 situations, e.g., when an array index depends on a function
+	 evaluation or an expression and possibly has side-effects.  */
+  if ((gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)
+	  && ref
+	  && ref->u.ar.type == AR_SECTION)
+	{
+	  for (n = 0; n < ref->u.ar.dimen; n++)
+	if (ref->u.ar.dimen_type[n] == DIMEN_ELEMENT
+		&& ref->u.ar.start[n])
+	  {
+		switch (ref->u.ar.start[n]->expr_type)
+		  {
+		  case EXPR_FUNCTION:
+		  case EXPR_OP:
+		goto scalarize;
+		  }
+	  }
+	}
+
   if (!(gfc_bt_struct (expr->ts.type)
 	  || expr->ts.type == BT_CLASS)
 	&& ref && ref->next == NULL
diff --git a/gcc/testsuite/gfortran.dg/bounds_check_array_io.f90 b/gcc/testsuite/gfortran.dg/bounds_check_array_io.f90
new file mode 100644
index 000..0cfc1174283
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bounds_check_array_io.f90
@@ -0,0 +1,31 @@
+! { dg-do run }
+! { dg-additional-options "-fcheck=bounds -fdump-tree-original" }
+!
+! PR fortran/117791 - crash with bounds check writing array section
+! Contributed by Andreas van Hameren (hameren at ifj dot edu dot pl)
+
+program testprogram
+  implicit none
+  integer, parameter :: array(4,2)=reshape ([11,12,13,14 ,15,16,17,18], [4,2])
+  integer:: i(3) = [45,51,0]
+
+  write(*,*) 'line 1:',array(:,  sort_2(i(1:2)) )
+  write(*,*) 'line 2:',array(:,  3 - sort_2(i(1:2)) )
+  write(*,*) 'line 3:',array(:, int (3 - sort_2(i(1:2
+
+contains
+
+  function sort_2(i) result(rslt)
+integer,intent(in) :: i(2)
+integer:: rslt
+if (i(1) <= i(2)) then
+   rslt = 1
+else
+   rslt = 2
+endif
+  end function
+
+end program
+
+! { dg-final { scan-tree-dump-times "sort_2" 5 "original" } }
+! { dg-final { scan-tree-dump-not "_gfortran_transfer_array_write" "original" } }
--
2.35.3



Re: [PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx

2024-11-27 Thread Jeff Law



On 11/27/24 5:48 AM, Robin Dapp wrote:

This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx.  From example as below:


I think we concluded a while ago that we don't want this turned on universally.
For the example/tests you provide it will be a de-optimization on any uarch
that has non-zero GPR -> VR latency.

So at least we need to define RTL costs for the combined variant and make them
depend on the VR <-> GPR costs (so we don't do this if the latency/cost is >
0).

Does the optimization happen in combine or late-combine BTW?  I thought
late-combine because we need to look through the unary op (vec_duplicate).
Yea, I think that's the general agreement.  Essentially realizing that 
there may be varying costs for accessing GPR or FPR data in the vector 
unit depending on the uarch.


Also note this isn't a bugfix and so it ought to be a gcc-16 thing.

Jeff


Re: [PATCH 5/8] ipa: Update value range jump functions during inlining

2024-11-27 Thread Jan Hubicka
> > Hi,
> >
> > when inlining (during the analysis phase) a call graph edge, we update
> > all pass-through jump functions corresponding to edges going out of
> > the newly inlined function to be relative to the function into which
> > we are inlining or to expose the information originally captured for
> > the edge that is being inlined.
> >
> > Similarly, we can combine the value range information in pass-through
> > jump functions corresponding to both edges, which is what this patch
> > adds - at least for the case when the inlined pass-through is a
> > simple, non-arithmetic one, which is the case that we also handle for
> > constant and aggregate jump function parts.
> >
> > Bootstrapped and tested on x86_64-linux, the whole patch series has
> > additionally passed LTO and profiled-LTO bootstrap on the same platform
> > and a bootstrap and testsuite on ppc64-linux.  Aarch64-linux bootstrap
> > and testing is in progress.  OK for master is that passes too?
> >
> > Thanks,
> >
> > Martin
> >
> >
> > gcc/ChangeLog:
> >
> > 2024-11-01  Martin Jambor  
> >
> > * ipa-cp.h: Forward declare class ipa_vr.
> > (ipa_vr_operation_and_type_effects) Declare.
> > * ipa-cp.cc (ipa_vr_operation_and_type_effects): Make public.
> > * ipa-prop.cc (update_jump_functions_after_inlining): Also update
> > value range jump functions.

OK,
thanks!
Honza


Re: [PATCH 13/15] Support for 64-bit location_t: Internal parts

2024-11-27 Thread Lewis Hyatt
On Wed, Nov 27, 2024 at 09:41:13AM -0500, David Malcolm wrote:
> On Wed, 2024-11-27 at 14:56 +0100, Richard Biener wrote:
> > On Sun, Nov 3, 2024 at 11:28 PM Lewis Hyatt  wrote:
> > > 
> > > Several of the selftests in diagnostic-show-locus.cc and input.cc
> > > are
> > > sensitive to linemap internals. Adjust them here so they will
> > > support 64-bit
> > > location_t if configured.
> > > 
> > > Likewise, handle 64-bit location_t in the support for
> > > -fdump-internal-locations. As was done with the analyzer, convert
> > > to
> > > (unsigned long long) explicitly so that 32- and 64-bit can be
> > > handled with
> > > the same printf formats.
> > 
> > I was hoping David would have a look here.  Absent from comments from
> > him
> > this is OK when all else is approved and after giving him another
> > week.
> 
> Mostly looks good, but I have a couple of questions below...

Thanks for taking a look.

> 
> 
> > 
> > What's missing review now?  I've lost track ...
> > 
> > Thanks,
> > Richard.
> > 
> > > gcc/ChangeLog:
> > > 
> > > * diagnostic-show-locus.cc
> > > (test_one_liner_fixit_validation_adhoc_locations): Adapt so
> > > it can
> > > effectively test 7-bit ranges instead of 5-bit ranges.
> > > (test_one_liner_fixit_validation_adhoc_locations_utf8):
> > > Likewise.
> > > * input.cc (get_end_location): Adjust types to support 64-
> > > bit
> > > location_t.
> > > (write_digit_row): Likewise.
> > > (dump_location_range): Likewise.
> > > (dump_location_info): Likewise.
> > > (class line_table_case): Likewise.
> > > (test_accessing_ordinary_linemaps): Replace some hard-coded
> > > constants with the values defined in line-map.h.
> > > (for_each_line_table_case): Likewise.
> > > ---
> > >  gcc/diagnostic-show-locus.cc | 128 +--
> > > 
> > >  gcc/input.cc | 100 ++-
> > >  2 files changed, 157 insertions(+), 71 deletions(-)
> > > 
> 
> [...snip...]
> 
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 04462ef6f5a..1629e4aeee8 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> 
> [...snip...]
> 
> > > @@ -3865,11 +3870,11 @@ static const location_t
> > > boundary_locations[] = {
> > >LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES + 0x100,
> > > 
> > >/* Values near LINE_MAP_MAX_LOCATION_WITH_COLS.  */
> > > -  LINE_MAP_MAX_LOCATION_WITH_COLS - 0x100,
> > > +  LINE_MAP_MAX_LOCATION_WITH_COLS - 0x200,
> > >LINE_MAP_MAX_LOCATION_WITH_COLS - 1,
> > >LINE_MAP_MAX_LOCATION_WITH_COLS,
> > >LINE_MAP_MAX_LOCATION_WITH_COLS + 1,
> > > -  LINE_MAP_MAX_LOCATION_WITH_COLS + 0x100,
> > > +  LINE_MAP_MAX_LOCATION_WITH_COLS + 0x200,
> > >  };
> 
> I see that this updates the offsets from 0x100 to 0x200 for the
> _WITH_COLS case, but doesn't for the _WITH_PACKED_RANGES case.
> 
> What's the reasoning here?
> 
> In theory we can simply add new entries to boundary_locations to get
> more test coverage, but I don't know the extent to which this part of
> the selftests is slowing builds down on the slower configurations; the
> selftests are meant to be fast to run.
>

I needed to change it for _WITH_COLS because otherwise the test
test_lexer_string_locations_concatenation_1 fails. This is because the
assert_char_at_range() utility it uses does not handle the case of a token
which straddles LINE_MAP_MAX_LOCATION_WITH_COLS; it assumes that the column
information will be available in this case, but it is not. Once the number
of range bits increases, the 0x100 buffer is not enough to avoid straddling
the cutoff. I could also modify that selftest, I just did this change since
it preserves what's currently being tested. I could change
_WITH_PACKED_RANGES too if you prefer for consistency? It wasn't necessary
but it will work either way, or could do both.

> > > 
> > >  /* Run TESTCASE multiple times, once for each case in our test
> > > matrix.  */
> > > @@ -3884,10 +3889,9 @@ for_each_line_table_case (void (*testcase)
> > > (const line_table_case &))
> > > 
> > >/* Run all tests with:
> > >   (a) line_table->default_range_bits == 0, and
> > > - (b) line_table->default_range_bits == 5.  */
> > > -  int num_cases_tested = 0;
> > > -  for (int default_range_bits = 0; default_range_bits <= 5;
> > > -   default_range_bits += 5)
> > > + (b) line_table->default_range_bits ==
> > > line_map_suggested_range_bits.  */
> > > +
> > > +  for (int default_range_bits: {0, line_map_suggested_range_bits})
> > >  {
> > >/* ...and use each of the "interesting" location values as
> > >  the starting location within line_table.  */
> > > @@ -3895,15 +3899,9 @@ for_each_line_table_case (void (*testcase)
> > > (const line_table_case &))
> > >for (int loc_idx = 0; loc_idx < num_boundary_locations;
> > > loc_idx++)
> > > {
> > >   line_table_case c (default_range_bits,
> > > boundary_locations

c: Do not remove _Atomic from array element type for typeof_unqual [PR117781]

2024-11-27 Thread Joseph Myers
As reported in bug 117781, my fix for bug 112841 broke the case of
typeof_unqual applied to an array of _Atomic elements, which should
not have _Atomic removed since only the element type is atomic, not
the array type.  Fix with logic to ensure that atomic element types
are preserved as such, while other qualifiers (i.e. those that are
semantically rather than only syntactically such in C) are removed.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

PR c/117781

gcc/c/
* c-parser.cc (c_parser_typeof_specifier): Do not remove _Atomic
from array element type for typeof_unqual.

gcc/testsuite/
* gcc.dg/c23-typeof-5.c: New test.

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 48683572d7cd..9a2b19c10a7a 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -4443,9 +4443,19 @@ c_parser_typeof_specifier (c_parser *parser)
   parens.skip_until_found_close (parser);
   if (ret.spec != error_mark_node)
 {
-  if (is_unqual
- && TYPE_QUALS (strip_array_types (ret.spec)) != TYPE_UNQUALIFIED)
-   ret.spec = TYPE_MAIN_VARIANT (ret.spec);
+  if (is_unqual)
+   {
+ bool is_array = TREE_CODE (ret.spec) == ARRAY_TYPE;
+ int quals = TYPE_QUALS (strip_array_types (ret.spec));
+ if ((is_array ? quals & ~TYPE_QUAL_ATOMIC : quals)
+ != TYPE_UNQUALIFIED)
+   {
+ ret.spec = TYPE_MAIN_VARIANT (ret.spec);
+ if (quals & TYPE_QUAL_ATOMIC && is_array)
+   ret.spec = c_build_qualified_type (ret.spec,
+  TYPE_QUAL_ATOMIC);
+   }
+   }
   if (is_std)
{
  /* In ISO C terms, _Noreturn is not part of the type of
diff --git a/gcc/testsuite/gcc.dg/c23-typeof-5.c 
b/gcc/testsuite/gcc.dg/c23-typeof-5.c
new file mode 100644
index ..e8076982822d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-typeof-5.c
@@ -0,0 +1,18 @@
+/* Test C23 typeof and typeof_unqual on arrays of atomic elements (bug
+   117781).  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+
+_Atomic int a[2], b[2][2];
+const _Atomic int c[2], d[2][2];
+
+extern typeof (a) a;
+extern typeof (b) b;
+extern typeof (c) c;
+extern typeof (d) d;
+extern typeof_unqual (a) a;
+extern typeof_unqual (b) b;
+extern typeof_unqual (c) a;
+extern typeof_unqual (d) b;
+extern typeof_unqual (volatile _Atomic int [2]) a;
+extern typeof_unqual (volatile _Atomic int [2][2]) b;

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH 13/15] Support for 64-bit location_t: Internal parts

2024-11-27 Thread Lewis Hyatt
On Wed, Nov 27, 2024 at 8:56 AM Richard Biener
 wrote:
>
> On Sun, Nov 3, 2024 at 11:28 PM Lewis Hyatt  wrote:
> >
> > Several of the selftests in diagnostic-show-locus.cc and input.cc are
> > sensitive to linemap internals. Adjust them here so they will support 64-bit
> > location_t if configured.
> >
> > Likewise, handle 64-bit location_t in the support for
> > -fdump-internal-locations. As was done with the analyzer, convert to
> > (unsigned long long) explicitly so that 32- and 64-bit can be handled with
> > the same printf formats.
>
> I was hoping David would have a look here.  Absent from comments from him
> this is OK when all else is approved and after giving him another week.
>
> What's missing review now?  I've lost track ...
>
> Thanks,
> Richard.
>

Thanks, sounds good. This is pretty much the last one to review from
v2 other than the final patch that switches over to 64-bit (in v2, #
14/14 https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669161.html).
That could probably wait for v3...
I am planning to send out a v3 series soon that addresses your
comments and has some new patches in response. I have already been
pushing the patches that were acked and that don't introduce any
functional changes, so v3 will not contain a lot of duplication, but
I'll indicate there which ones need to be reviewed still. The main new
ones are addressing the issue with reallocated gphi objects, there are
three places which call loop_version() that can invalidate them and I
have patches which address it, that I am about to test now.

-Lewis


Re: [PATCH 5/8] ipa: Update value range jump functions during inlining

2024-11-27 Thread Martin Jambor
Hello,

I believe all questions regarding the patch below have been answered and
so I would like to ping it.

Thanks,

Martin


On Tue, Nov 05 2024, Martin Jambor wrote:
> Hi,
>
> when inlining (during the analysis phase) a call graph edge, we update
> all pass-through jump functions corresponding to edges going out of
> the newly inlined function to be relative to the function into which
> we are inlining or to expose the information originally captured for
> the edge that is being inlined.
>
> Similarly, we can combine the value range information in pass-through
> jump functions corresponding to both edges, which is what this patch
> adds - at least for the case when the inlined pass-through is a
> simple, non-arithmetic one, which is the case that we also handle for
> constant and aggregate jump function parts.
>
> Bootstrapped and tested on x86_64-linux, the whole patch series has
> additionally passed LTO and profiled-LTO bootstrap on the same platform
> and a bootstrap and testsuite on ppc64-linux.  Aarch64-linux bootstrap
> and testing is in progress.  OK for master is that passes too?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2024-11-01  Martin Jambor  
>
>   * ipa-cp.h: Forward declare class ipa_vr.
>   (ipa_vr_operation_and_type_effects) Declare.
>   * ipa-cp.cc (ipa_vr_operation_and_type_effects): Make public.
>   * ipa-prop.cc (update_jump_functions_after_inlining): Also update
>   value range jump functions.
> ---
>  gcc/ipa-cp.cc   |  4 ++--
>  gcc/ipa-cp.h| 13 +
>  gcc/ipa-prop.cc | 18 ++
>  3 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index fa4f0feeb8d..92dd2af19e0 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1644,7 +1644,7 @@ ipa_context_from_jfunc (ipa_node_params *info, 
> cgraph_edge *cs, int csidx,
> DST_TYPE on value range in SRC_VR and store it to DST_VR.  Return true if
> the result is a range that is not VARYING nor UNDEFINED.  */
>  
> -static bool
> +bool
>  ipa_vr_operation_and_type_effects (vrange &dst_vr,
>  const vrange &src_vr,
>  enum tree_code operation,
> @@ -1670,7 +1670,7 @@ ipa_vr_operation_and_type_effects (vrange &dst_vr,
>  /* Same as above, but the SRC_VR argument is an IPA_VR which must
> first be extracted onto a vrange.  */
>  
> -static bool
> +bool
>  ipa_vr_operation_and_type_effects (vrange &dst_vr,
>  const ipa_vr &src_vr,
>  enum tree_code operation,
> diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h
> index ba2ebfede63..4f569c1ee83 100644
> --- a/gcc/ipa-cp.h
> +++ b/gcc/ipa-cp.h
> @@ -299,4 +299,17 @@ ipa_vr_supported_type_p (tree type)
>return irange::supports_p (type) || prange::supports_p (type);
>  }
>  
> +class ipa_vr;
> +
> +bool ipa_vr_operation_and_type_effects (vrange &dst_vr,
> + const vrange &src_vr,
> + enum tree_code operation,
> + tree dst_type, tree src_type);
> +bool ipa_vr_operation_and_type_effects (vrange &dst_vr,
> + const ipa_vr &src_vr,
> + enum tree_code operation,
> + tree dst_type, tree src_type);
> +
> +
> +
>  #endif /* IPA_CP_H */
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 012f8a32386..3b24bcbed15 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -3486,6 +3486,24 @@ update_jump_functions_after_inlining (struct 
> cgraph_edge *cs,
> gcc_unreachable ();
>   }
>  
> +   if (src->m_vr && src->m_vr->known_p ())
> + {
> +   value_range svr (src->m_vr->type ());
> +   if (!dst->m_vr || !dst->m_vr->known_p ())
> + ipa_set_jfunc_vr (dst, *src->m_vr);
> +   else if (ipa_vr_operation_and_type_effects (svr, *src->m_vr,
> +NOP_EXPR,
> +dst->m_vr->type (),
> +src->m_vr->type ()))
> + {
> +   value_range dvr;
> +   dst->m_vr->get_vrange (dvr);
> +   dvr.intersect (svr);
> +   if (!dvr.undefined_p ())
> + ipa_set_jfunc_vr (dst, dvr);
> + }
> + }
> +
> if (src->agg.items
> && (dst_agg_p || !src->agg.by_ref))
>   {
> -- 
> 2.47.0


Re: [PATCH 2/3] dwarf: lto: Allow die_symbol outside of comp_unit.

2024-11-27 Thread Richard Biener
On Wed, Nov 6, 2024 at 3:36 PM Michal Jires  wrote:
>
> Die symbols are used for external references.
> Typically during LTO, early debug emits 'die_symbol+offset' for each
> possibly referenced DIE in future. Partitions in LTRANS phase then
> use these references.
>
> Originally die symbols are handled only in root comp_unit and
> in attributes.
>
> This patch allows die symbols to be attached to any DIE.
> References then choose closest parent with die symbol.
>
> gcc/ChangeLog:
>
> * dwarf2out.cc (dwarf2out_die_ref_for_decl):
>   Choose closest parent with die_symbol.
> (output_die): Output asm label.
> (output_unit_die_symbol_list): New.
> (output_comp_unit): Output die_symbol list.
> (reset_dies): Reset all die_symbols.
> (dwarf2out_finish): Don't reset comp_unit die_symbol.
> ---
>  gcc/dwarf2out.cc | 80 +++-
>  1 file changed, 45 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index e10a5c78fe9..bf1ac45ed73 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -6039,14 +6039,14 @@ dwarf2out_die_ref_for_decl (tree decl, const char 
> **sym,

ah - here it is ...

>/* Similar to get_ref_die_offset_label, but using the "correct"
>   label.  */
> -  *off = die->die_offset;
> -  while (die->die_parent)
> +  unsigned HOST_WIDE_INT unit_offset = die->die_offset;
> +  while (die->die_parent && (die->comdat_type_p || !die->die_id.die_symbol))
>  die = die->die_parent;
> -  /* For the containing CU DIE we compute a die_symbol in
> +  /* Root CU DIE always contains die_symbol computed in
>   compute_comp_unit_symbol.  */
> -  if (die->die_tag == DW_TAG_compile_unit)
> +  if (!die->comdat_type_p && die->die_id.die_symbol)
>  {
> -  gcc_assert (die->die_id.die_symbol != NULL);
> +  *off = unit_offset - die->die_offset;
>*sym = die->die_id.die_symbol;
>return true;
>  }
> @@ -10798,6 +10798,10 @@ output_die (dw_die_ref die)
>unsigned long size;
>unsigned ix;
>
> +  if ((flag_generate_lto || flag_generate_offload)
> +  && !die->comdat_type_p && die->die_id.die_symbol)
> +ASM_OUTPUT_LABEL (asm_out_file, die->die_id.die_symbol);
> +
>dw2_asm_output_data_uleb128 (die->die_abbrev, "(DIE (%#lx) %s)",
>(unsigned long)die->die_offset,
>dwarf_tag_name (die->die_tag));
> @@ -11228,14 +11232,41 @@ output_compilation_unit_header (enum 
> dwarf_unit_type ut)
>  dw2_asm_output_data (1, DWARF2_ADDR_SIZE, "Pointer Size (in bytes)");
>  }
>
> +/* Output list of all die symbols in the DIE.  */
> +static void
> +output_unit_die_symbol_list (dw_die_ref die)
> +{
> +  if (!die->comdat_type_p && die->die_id.die_symbol)
> +{
> +  const char* sym = die->die_id.die_symbol;
> +  /* ???  No way to get visibility assembled without a decl.  */
> +  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> + get_identifier (sym), char_type_node);
> +  TREE_PUBLIC (decl) = true;
> +  TREE_STATIC (decl) = true;
> +  DECL_ARTIFICIAL (decl) = true;
> +  DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN;
> +  DECL_VISIBILITY_SPECIFIED (decl) = true;
> +  targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
> +#ifdef ASM_WEAKEN_LABEL
> +  /* We prefer a .weak because that handles duplicates from duplicate
> +archive members in a graceful way.  */
> +  ASM_WEAKEN_LABEL (asm_out_file, sym);
> +#else
> +  targetm.asm_out.globalize_label (asm_out_file, sym);
> +#endif
> +}
> +
> +  dw_die_ref c;
> +  FOR_EACH_CHILD (die, c, output_unit_die_symbol_list (c));

I'm not sure it will work this way together with the output_die hunk, instead
assemblers likely expect all this to happen close to the actual label
emission, so I suggest to only split out the visibiltiy/globalizing fancy
and emit it from output_die instead.

> +}
> +
>  /* Output the compilation unit DIE and its children.  */
>
>  static void
>  output_comp_unit (dw_die_ref die, int output_if_empty,
>   const unsigned char *dwo_id)
>  {
> -  const char *oldsym;
> -
>/* Unless we are outputting main CU, we may throw away empty ones.  */
>if (!output_if_empty && die->die_child == NULL)
>  return;
> @@ -11267,34 +11298,12 @@ output_comp_unit (dw_die_ref die, int 
> output_if_empty,
>  : DWARF_COMPILE_UNIT_HEADER_SIZE);
>calc_die_sizes (die);
>
> -  oldsym = die->die_id.die_symbol;
> -
>switch_to_section (debug_info_section);
>ASM_OUTPUT_LABEL (asm_out_file, debug_info_section_label);
>info_section_emitted = true;
>
> -  /* For LTO cross unit DIE refs we want a symbol on the start of the
> - debuginfo section, not on the CU DIE.  */
> -  if ((flag_generate_lto || flag_generate_offload) && oldsym)
> -{
> -  /* ???  No way to get visibility assembled without a dec

Re: [RFC/RFA][PATCH v6 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-11-27 Thread Jeff Law



On 11/13/24 7:16 AM, Mariam Arutunian wrote:




To address this, I added code in |target-supports.exp| and modified the 
relevant tests.

I've attached the patch. Could you please check whether it is correct?

Yea, it basically looks right.  Doing a test now ;-)

I think I've got the word_mode dependencies sorted out locally as well. 
I'll want to do another once over of those changes, then I'll push the 
whole thing back up to my branch and do a hopefully final walk through 
the entire series.


jeff



Re: [PATCH 3/8] ipa: Skip type conversions in jump function constructions

2024-11-27 Thread Martin Jambor
Hi,

On Fri, Nov 15 2024, Richard Biener wrote:
> On Fri, Nov 15, 2024 at 1:45 PM Jan Hubicka  wrote:
>>
>> >
>> > The patch only ever skips just one conversion, never a chain of them and
>> > deliberately so, for the reasons shown in the example.
>> >
>> > However, I have just realized that combining pass_through jump functions
>> > during inlining may unfortunately have exactly the same effect, so we
>> > indeed need to be more careful.
>>
>> Ah, right.  I was thinking if I can trigger someting like this and this
>> option did not came to my mind.
>> >
>> > The motivating example converts a bool (integer with precision one) to
>> > an int, so what about the patch below which allows converting between
>> > integers and to a higher precision?  (Assuming the patch will pass
>> > bootstrap and testing on x86_64-linux which is underway).
>>
>> Allowing only conversion that monotonously increase precision looks.

As in "looks better?"  Or perhaps even "looks good?"

>> Perhaps Richi may have opinion here.
>> In a way this is similar to what we do in gimple_call_builtin that has
>> some type checking and also allows widening conversions.  So perhaps
>> this can be unified.
>>
>> I just noticed that testusite has few examples that, for example, define
>>
>> void *calloc (long, long)
>>
>> and this makes the test fail since parameter is really unsigned long
>> and in the turn we disable some calloc optimizations even though this
>> does not affect code generation.  Some passes use
>> gimple_call_builtin while other look up callee decl by hand.
>
> I think all conversions that are not value changing (based on incoming range)
> are OK.  Even signed int with [3, 10] -> unsigned char [3, 10] would be OK.
> But signed int with [-1, 1] -> unsigned char [0, 1] [ 0xff ] might
> cause problems.
>

Right, I was not going to use ranges for this because I suspected that
more often than not the range would be unknown.  But if that's the
suggestion, would something like the following (only very mildly tested)
function be OK?

I kept the type check because it looks quite a bit cheaper and would
work also for conversions between integers with the same precision but
different signedness which we can safely fold_convert between as well.

-- 8< --

/* If T is an SSA_NAME that is the result of a simple type conversion
   statement from an integer type to another integer which is known to
   be able to represent the values the operand of conversion can hold,
   return the operand of that conversion, otherwise return T.  */

static tree
skip_a_safe_conversion_op (tree t)
{
  if (TREE_CODE (t) != SSA_NAME
  || SSA_NAME_IS_DEFAULT_DEF (t))
return t;

  gimple *def = SSA_NAME_DEF_STMT (t);
  if (!is_gimple_assign (def)
  || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))
  || !INTEGRAL_TYPE_P (TREE_TYPE (t))
  || !INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (def
return t;

  tree rhs1 = gimple_assign_rhs1 (def);
  if (TYPE_PRECISION (TREE_TYPE (t))
  >= TYPE_PRECISION (TREE_TYPE (rhs1)))
return gimple_assign_rhs1 (def);

  if (!ipa_vr_supported_type_p (TREE_TYPE (rhs1)))
return t;
  value_range vr (TREE_TYPE (rhs1));
  if (!get_range_query (cfun)->range_of_expr (vr, rhs1, def)
  || vr.undefined_p ())
return t;

  widest_int new_minv = wi::to_widest (TYPE_MIN_VALUE (TREE_TYPE (t)));
  widest_int new_maxv = wi::to_widest (TYPE_MAX_VALUE (TREE_TYPE (t)));
  irange &ir = as_a (vr);
  if (new_minv <= widest_int::from (ir.lower_bound (),
TYPE_SIGN (TREE_TYPE (rhs1)))
  && new_maxv >= widest_int::from (ir.upper_bound (),
   TYPE_SIGN (TREE_TYPE (rhs1
return gimple_assign_rhs1 (def);

  return t;
}

-- 8< --

Note that while the value-range based approach also works for the
motivating test-case, that is simply because lower_bound() and
upper_bound() methods invoked on a VARYING value range of an integer
type yield the minimum and maximum value respectively.  This happens
because get_range_query (cfun)->range_of_expr invoked to find the range
of _1 at the conversion statement in the function below indeed gives
back a VARYING one.

__attribute__((noinline))
int foo (int (*) (int) f)
{
  _Bool _1;
  int _2;
  int _6;

   [local count: 1073741824]:
  _1 = f_3(D) == 0B;
  _2 = (int) _1;
  _6 = bar (_2);
  return _6;

}

On the other hand, when I added a simple fprintf before the return in
the above function when we were able to determine that the conversion is
safe from a value range and not from simply looking at the precision and
then ran make stage2-bubble (C, C++ and Fortran), it did trigger 269
times.  So there is some potential even if it is not huge.

I think that the motivating test case is a useful one and so I'd like to
get some form of the patch in for GCC 15.  I'll be grateful for any
feedback on how to determine a conversion is safe.

Martin


Re: [PATCH] libstdc++/ranges: make _RangeAdaptorClosure befriend operator|

2024-11-27 Thread Jonathan Wakely
On Wed, 27 Nov 2024 at 15:43, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

OK

>
> -- >8 --
>
> This declares the range adaptor pipe operators a friend of the
> _RangeAdaptorClosure base class so that the std module doesn't need to
> export them for ADL to find them.
>
> Note that we deliberately don't define these pipe operators as hidden
> friends, see r14-3293-g4a6f3676e7dd9e.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (views::__adaptor::_RangeAdaptorClosure):
> Befriend both operator| overloads.
> ---
>  libstdc++-v3/include/std/ranges | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 5153dcc26c4..9d30e3a8e9d 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -949,8 +949,7 @@ namespace views::__adaptor
>// _S_has_simple_call_op to true if the behavior of this adaptor is
>// independent of the constness/value category of the adaptor object.
>template
> -struct _RangeAdaptorClosure
> -{ };
> +struct _RangeAdaptorClosure;
>
>template
>  requires (!same_as<_Tp, _RangeAdaptorClosure<_Up>>)
> @@ -984,6 +983,26 @@ namespace views::__adaptor
>  }
>  #pragma GCC diagnostic pop
>
> +  template
> +struct _RangeAdaptorClosure
> +{
> +  // In non-modules compilation ADL finds these operator| either way and
> +  // the friend declarations are redundant.  But with the std module 
> these
> +  // friend declarations enable ADL to find these operators without 
> having
> +  // to export them.
> +  template
> +   requires __is_range_adaptor_closure<_Self>
> + && __adaptor_invocable<_Self, _Range>
> +   friend constexpr auto
> +   operator|(_Range&& __r, _Self&& __self);
> +
> +  template
> +   requires __is_range_adaptor_closure<_Lhs>
> + && __is_range_adaptor_closure<_Rhs>
> +   friend constexpr auto
> +   operator|(_Lhs&& __lhs, _Rhs&& __rhs);
> +};
> +
>// The base class of every range adaptor non-closure.
>//
>// The static data member _Derived::_S_arity must contain the total number 
> of
> --
> 2.47.1.313.gcc01bad4a9
>



[PATCH]middle-end: rework vectorizable_store to iterate over single index [PR117557]

2024-11-27 Thread Tamar Christina
Hi All,

The testcase

#include 
#include 

#define N 8
#define L 8

void f(const uint8_t * restrict seq1,
   const uint8_t *idx, uint8_t *seq_out) {
  for (int i = 0; i < L; ++i) {
uint8_t h = idx[i];
memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
  }
}

compiled at -O3 -mcpu=neoverse-n1+sve

miscompiles to:

ld1wz31.s, p3/z, [x23, z29.s, sxtw]
ld1wz29.s, p7/z, [x23, z30.s, sxtw]
st1wz29.s, p7, [x24, z12.s, sxtw]
st1wz31.s, p7, [x24, z12.s, sxtw]

rather than

ld1wz31.s, p3/z, [x23, z29.s, sxtw]
ld1wz29.s, p7/z, [x23, z30.s, sxtw]
st1wz29.s, p7, [x24, z12.s, sxtw]
addvl   x3, x24, #2
st1wz31.s, p3, [x3, z12.s, sxtw]

Where two things go wrong, the wrong mask is used and the address pointers to
the stores are wrong.

This issue is happening because the codegen loop in vectorizable_store is a
nested loop where in the outer loop we iterate over ncopies and in the inner
loop we loop over vec_num.

For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
determined by only the outerloop index and the pointer address is only updated
in the outer loop.

As such for SLP we always use the same predicate and the same memory location.
This patch flattens the two loops and instead iterates over ncopies * vec_num
and simplified the indexing.

This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error
moves somewhere else.  I will look at that next but fixes some other libraries
that also started failing.

Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
x86_64-pc-linux-gnu -m32, -m64 and no issues

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/117557
* tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
vec_num loops.

gcc/testsuite/ChangeLog:

PR tree-optimization/117557
* gcc.target/aarch64/pr117557.c: New test.

---
diff --git a/gcc/testsuite/gcc.target/aarch64/pr117557.c 
b/gcc/testsuite/gcc.target/aarch64/pr117557.c
new file mode 100644
index 
..80b3fde41109988db70eafd715224df0b0029cd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr117557.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mcpu=neoverse-n1+sve -fdump-tree-vect" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+#include 
+
+#define N 8
+#define L 8
+
+/*
+**f:
+** ...
+** ld1wz[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
+** ld1wz[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
+** st1wz[0-9]+.s, p\1, \[x[0-9]+, z[0-9]+.s, sxtw\]
+** incbx([0-9]+), all, mul #2
+** st1wz[0-9]+.s, p\2, \[x\3, z[0-9]+.s, sxtw\]
+** ret
+** ...
+*/
+void f(const uint8_t * restrict seq1,
+   const uint8_t *idx, uint8_t *seq_out) {
+  for (int i = 0; i < L; ++i) {
+uint8_t h = idx[i];
+memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
+  }
+}
+
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
c2d5818b2786123fac7afe290d85c7dd2bda4308..4759c274f3ccbb111a907576539b2a8efb7726a3
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9228,7 +9228,8 @@ vectorizable_store (vec_info *vinfo,
   gcc_assert (!grouped_store);
   auto_vec vec_offsets;
   unsigned int inside_cost = 0, prologue_cost = 0;
-  for (j = 0; j < ncopies; j++)
+  int num_stmts = ncopies * vec_num;
+  for (j = 0; j < num_stmts; j++)
{
  gimple *new_stmt;
  if (j == 0)
@@ -9246,14 +9247,14 @@ vectorizable_store (vec_info *vinfo,
vect_get_slp_defs (op_node, gvec_oprnds[0]);
  else
vect_get_vec_defs_for_operand (vinfo, first_stmt_info,
-  ncopies, op, gvec_oprnds[0]);
+  num_stmts, op, 
gvec_oprnds[0]);
  if (mask)
{
  if (slp_node)
vect_get_slp_defs (mask_node, &vec_masks);
  else
vect_get_vec_defs_for_operand (vinfo, stmt_info,
-  ncopies,
+  num_stmts,
   mask, &vec_masks,
   mask_vectype);
}
@@ -9279,281 +9280,280 @@ vectorizable_store (vec_info *vinfo,
}
 
  new_stmt = NULL;
- for (i = 0; i < vec_num; ++i)
+ if (!costing_p)
{
- if (!costing_p)
-   {
- vec_oprnd = (*gvec_oprnds[0])[vec_num * j + i];
- if (mask)
-   vec_mask = vec_masks[vec_num * j + i];
- /* We should have catched mism

Re: [PATCH] c: Fix sizeof error recovery [PR117745]

2024-11-27 Thread Marek Polacek
On Wed, Nov 27, 2024 at 10:12:25AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Compilation of the following testcase hangs forever after emitting first
> error.  The problem is that in one place we just return error_mark_node
> directly rather than going through c_expr_sizeof_expr or c_expr_sizeof_type.
> The parsing of the expression could have called record_maybe_used_decl
> though, but nothing calls pop_maybe_used which needs to be called after
> parsing of every sizeof/typeof, successful or not.

Ah, I see.

> At the end of the toplevel declaration we free the parser_obstack and in
> another function record_maybe_used_decl is called again and due to the
> missing pop_maybe_unused we end up with a cycle in the chain.
> 
> The following patch fixes it by just setting error and goto to the
> sizeof_expr:
>   c_inhibit_evaluation_warnings--;
>   in_sizeof--;
>   mark_exp_read (expr.value);
>   if (TREE_CODE (expr.value) == COMPONENT_REF
>   && DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
> error_at (expr_loc, "% applied to a bit-field");
>   result = c_expr_sizeof_expr (expr_loc, expr);
> where c_expr_sizeof_expr will do:
>   struct c_expr ret;
>   if (expr.value == error_mark_node)
> {
>   ret.value = error_mark_node;
>   ret.original_code = ERROR_MARK;
>   ret.original_type = NULL;
>   ret.m_decimal = 0;
>   pop_maybe_used (false);
> }
> ...
>   return ret;
> which is exactly what the old code did manually except for the missing
> pop_maybe_used call.  mark_exp_read does nothing on error_mark_node and
> error_mark_node doesn't have COMPONENT_REF tree_code.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-11-27  Jakub Jelinek  
> 
>   PR c/117745
>   * c-parser.cc (c_parser_sizeof_expression): If type_name is NULL,
>   just expr.set_error () and goto sizeof_expr instead of doing error
>   recovery manually.
> 
>   * gcc.dg/pr117745.c: New test.
> 
> --- gcc/c/c-parser.cc.jj  2024-11-23 13:00:28.328028242 +0100
> +++ gcc/c/c-parser.cc 2024-11-26 11:59:18.036410746 +0100
> @@ -10405,13 +10405,8 @@ c_parser_sizeof_expression (c_parser *pa
>finish = parser->tokens_buf[0].location;
>if (type_name == NULL)
>   {
> -   struct c_expr ret;
> -   c_inhibit_evaluation_warnings--;
> -   in_sizeof--;
> -   ret.set_error ();
> -   ret.original_code = ERROR_MARK;
> -   ret.original_type = NULL;
> -   return ret;
> +   expr.set_error ();
> +   goto sizeof_expr;

Patch is OK, though I'd appreciate a comment explaining why we bother to
call c_expr_sizeof_expr.  Maybe

  /* Let c_expr_sizeof_expr call pop_maybe_used; the parsing of the
 expression could have called record_maybe_used_decl.  */

or something like that.  c_expr_sizeof_type doesn't do it, so that wouldn't
do.

>   }
>if (c_parser_next_token_is (parser, CPP_OPEN_BRACE))
>   {
> --- gcc/testsuite/gcc.dg/pr117745.c.jj2024-11-26 12:07:11.120756946 
> +0100
> +++ gcc/testsuite/gcc.dg/pr117745.c   2024-11-26 12:08:00.031068850 +0100
> @@ -0,0 +1,8 @@
> +/* PR c/117745 */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +static int foo (void);
> +void bar (void) { sizeof (int [0 ? foo () : 1); }/* { dg-error 
> "expected" } */
> +static int baz (void);
> +void qux (void) { sizeof (sizeof (int[baz ()])); }
> 
>   Jakub
> 

Marek



RE: [PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx

2024-11-27 Thread Li, Pan2
I see, thanks Robin, will have a try for this change.

Pan

-Original Message-
From: Robin Dapp  
Sent: Wednesday, November 27, 2024 9:44 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp 
Subject: Re: [PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx

> I see, didn't aware of that. I am not sure if we need to consider vsetvl here?
> As there are extra 2 insn here.

I wouldn't consider it as it's outside of the loop.  What matters is latency
inside the loop.

> I see, need to consider the cost here. Any example I can reference? Sorry I
> haven't touch cost model in previous.

Refer to riscv_rtx_costs.  We set all vector instruction costs to 1. What you
would need to do is pattern-match scalar (broadcast) operands inside our
IF_THEN_ELSE vector patterns.  There are a some scalar examples that show how
it's done in principle.

Then make use of either
  get_vector_costs ()->regmove->GR2VR;
or riscv_register_move_cost to increase the pattern cost respectively.

-- 
Regards
 Robin



RE: [RFC] PR81358: Enable automatic linking of libatomic

2024-11-27 Thread Joseph Myers
On Tue, 19 Nov 2024, Prathamesh Kulkarni wrote:

> +#ifdef USE_LD_AS_NEEDED
> +#define LINK_LIBATOMIC_SPEC "%{!fno-link-libatomic:" LD_AS_NEEDED_OPTION \
> + " -latomic " LD_NO_AS_NEEDED_OPTION "} "
> +#else
> +#define LINK_LIBATOMIC_SPEC ""
> +#endif

I'd expect conditionals to be set up so that, if libatomic is not built 
(typically because an unsupported target OS resulted in UNSUPPORTED=1 
being set in libatomic/configure.tgt), no attempt is ever made to link it 
in.  (So in that case, users might get undefined references to __atomic_* 
and it would be their responsibility to provide a board support package 
that links with appropriate definitions of those symbols.)

> diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am

> +AM_CFLAGS = $(XCFLAGS) -fno-link-libatomic
> +AM_CCASFLAGS = $(XCFLAGS) -fno-link-libatomic
> +AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS) 
> -fno-link-libatomic

> diff --git a/libatomic/configure.ac b/libatomic/configure.ac

> +CFLAGS="$CFLAGS -fno-link-libatomic"

> +XCFLAGS="$XCFLAGS $XPCFLAGS -fno-link-libatomic"

I don't see any clear conceptual design here for where this flag should 
go.  It should only need to be added in one place, not three times.  
Adding to CFLAGS before the default is set in configure, and before 
save_CFLAGS is set, seems especially dubious, though maybe you avoid 
problems with losing the default CFLAGS setting if libatomic is always 
configured with CFLAGS set by the toplevel Makefile.

My expectation is that CFLAGS should not be modified until after 
save_CFLAGS is set, which should not be until after configure has executed 
the logic that sets a -g -O2 default.  Is there some problem with that 
ordering (e.g. configure tests that expect to link target programs but run 
as part of the same Autoconf macro invocation that also generates the 
logic to determine default values)?  Also, the comment on save_CFLAGS 
says:

# In order to override CFLAGS_FOR_TARGET, all of our special flags go
# in XCFLAGS.  But we need them in CFLAGS during configury.  So put them
# in both places for now and restore CFLAGS at the end of config.

So if the option is set in CFLAGS itself during configure, that should be 
after save_CFLAGS is set, meaning only the setting in XCFLAGS is relevant 
for actually building libatomic.

Also, the new command-line option should be documented in invoke.texi.

-- 
Joseph S. Myers
josmy...@redhat.com



[pushed: r15-5739] c-family: offer suggestions for missing command-line options [PR82892]

2024-11-27 Thread David Malcolm
Some builtin macros are only defined when certain command-line options
are provided.  Update the error messages for them so that we suggest
the pertinent option ('-fopenacc' for '_OPENACC', and '-fopenmp' for
'_OPENMP')

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-5739-g5341eb669658c7.

gcc/c-family/ChangeLog:
PR c/82892
* c-common.h (get_option_for_builtin_define): New decl.
* c-cppbuiltin.cc (get_option_for_builtin_define): New.
* known-headers.cc: Include "opts.h".
(suggest_missing_option::suggest_missing_option): New.
(suggest_missing_option::~suggest_missing_option): New.
* known-headers.h (class suggest_missing_option): New.

gcc/c/ChangeLog:
PR c/82892
* c-decl.cc (lookup_name_fuzzy): Provide hints for missing
command-line options.

gcc/cp/ChangeLog:
PR c/82892
* name-lookup.cc (suggest_alternatives_for_1): Provide hints for
missing command-line options.

gcc/testsuite/ChangeLog:
PR c/82892
* c-c++-common/spellcheck-missing-option.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/c-family/c-common.h   |  1 +
 gcc/c-family/c-cppbuiltin.cc  | 13 +
 gcc/c-family/known-headers.cc | 28 +++
 gcc/c-family/known-headers.h  | 15 ++
 gcc/c/c-decl.cc   | 11 
 gcc/cp/name-lookup.cc | 11 
 .../c-c++-common/spellcheck-missing-option.c  | 15 ++
 7 files changed, 94 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/spellcheck-missing-option.c

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index a323982ae26b..7834e0d19590 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1260,6 +1260,7 @@ extern void c_stddef_cpp_builtins (void);
 extern void fe_file_change (const line_map_ordinary *);
 extern void c_parse_error (const char *, enum cpp_ttype, tree, unsigned char,
   rich_location *richloc);
+extern diagnostic_option_id get_option_for_builtin_define (const char 
*macro_name);
 
 /* In c-ppoutput.cc  */
 extern void init_pp_output (FILE *);
diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index 8fbfef561e8a..c354c794b55e 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1673,6 +1673,19 @@ c_cpp_builtins (cpp_reader *pfile)
 cpp_define (pfile, "__DECIMAL_BID_FORMAT__");
 }
 
+/* Given NAME, return the command-line option that would make it be
+   a builtin define, or 0 if unrecognized.  */
+
+diagnostic_option_id
+get_option_for_builtin_define (const char *name)
+{
+  if (!strcmp (name, "_OPENACC"))
+return OPT_fopenacc;
+  if (!strcmp (name, "_OPENMP"))
+return OPT_fopenmp;
+  return 0;
+}
+
 /* Pass an object-like macro.  If it doesn't lie in the user's
namespace, defines it unconditionally.  Otherwise define a version
with two leading underscores, and another version with two leading
diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
index 58a7259e8c49..2077cab6c098 100644
--- a/gcc/c-family/known-headers.cc
+++ b/gcc/c-family/known-headers.cc
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/name-hint.h"
 #include "c-family/known-headers.h"
 #include "gcc-rich-location.h"
+#include "opts.h"
 
 /* An enum for distinguishing between the C and C++ stdlibs.  */
 
@@ -323,3 +324,30 @@ suggest_missing_header::~suggest_missing_header ()
  " this is probably fixable by adding %<#include %s%>",
  m_name_str, m_header_hint, m_header_hint);
 }
+
+/* Implementation of class suggest_missing_option.  */
+
+/* suggest_missing_option's ctor.  */
+
+suggest_missing_option::suggest_missing_option (location_t loc,
+   const char *macro_name,
+   diagnostic_option_id option_id)
+: deferred_diagnostic (loc), m_name_str (macro_name), m_option_id (option_id)
+{
+  gcc_assert (macro_name);
+  gcc_assert (option_id.m_idx > 0);
+}
+
+/* suggest_missing_option's dtor.  */
+
+suggest_missing_option::~suggest_missing_option ()
+{
+  if (is_suppressed_p ())
+return;
+
+  const char *option_name = cl_options[m_option_id.m_idx].opt_text;
+  inform (get_location (),
+ "%qs is defined when using option %qs;"
+ " this is probably fixable by adding %qs to the command-line options",
+ m_name_str, option_name, option_name);
+}
diff --git a/gcc/c-family/known-headers.h b/gcc/c-family/known-headers.h
index 7c7ee781e739..a6cbbd2d86b2 100644
--- a/gcc/c-family/known-headers.h
+++ b/gcc/c-family/known-headers.h
@@ -41,4 +41,19 @@ class suggest_missing_header : public deferred_diagnostic
   const char *m_header_hint;
 };
 
+/* Subclass of deferred_diagnostic for suggesting to the user
+ 

[PATCH 1/2] RISC-V: Add intrinsics support for SiFive Xsfvqmaccqoq/dod extensions.

2024-11-27 Thread shiyulong
From: yulong 

This commit adds intrinsics support for Xsfvqmaccqoq/dod.

Co-Authored by: Kito Cheng 
Co-Authored by: Monk Chiang 
Co-Authored by: Jiawei Chen 
Co-Authored by: Shihua Liao 
Co-Authored by: Yixuan Chen 

gcc/ChangeLog:

* config.gcc: Add new SiFive *.o files.
* config/riscv/generic-vector-ooo.md: New reservation.
* config/riscv/genrvv-type-indexer.cc (main): New type.
* config/riscv/riscv-vector-builtins-shapes.cc (struct sf_vqmacc_def): 
New function.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_QMACC_OPS): New 
macros type.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_QMACC_OPS): New 
builtins def.
(DEF_RVV_TYPE_INDEX): Ditto.
(DEF_RVV_FUNCTION): Ditto.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX): New 
types def.
(4x8x4): New op type.
(2x8x2): Ditto.
(quad_emul_vector): New base type.
(quad_emul_signed_vector): Ditto.
(quad_emul_unsigned_vector): Ditto.
(quad_fixed_vector): Ditto.
(quad_fixed_signed_vector): Ditto.
(quad_fixed_unsigned_vector): Ditto.
(quad_lmul1_vector): Ditto.
(quad_lmul1_signed_vector): Ditto.
(quad_lmul1_unsigned_vector): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): New 
extensions.
(required_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/riscv.md: New attr.
* config/riscv/t-riscv: Add include for SiFive files.
* config/riscv/vector-iterators.md: New iterator.
* config/riscv/vector.md: New include for SiFive file.
* config/riscv/sifive-vector-builtins-bases.cc: New file.
* config/riscv/sifive-vector-builtins-bases.h: New file.
* config/riscv/sifive-vector-builtins-functions.def: New file.
* config/riscv/sifive-vector.md: New file.

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/generic-vector-ooo.md|   2 +-
 gcc/config/riscv/genrvv-type-indexer.cc   |  47 +
 .../riscv/riscv-vector-builtins-shapes.cc |  30 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def |  12 ++
 gcc/config/riscv/riscv-vector-builtins.cc | 151 ++-
 gcc/config/riscv/riscv-vector-builtins.def|  26 ++-
 gcc/config/riscv/riscv-vector-builtins.h  |  14 ++
 gcc/config/riscv/riscv.md |   4 +-
 .../riscv/sifive-vector-builtins-bases.cc | 164 
 .../riscv/sifive-vector-builtins-bases.h  |  35 
 .../sifive-vector-builtins-functions.def  |  54 ++
 gcc/config/riscv/sifive-vector.md | 179 ++
 gcc/config/riscv/t-riscv  |  20 ++
 gcc/config/riscv/vector-iterators.md  |  33 
 gcc/config/riscv/vector.md|   1 +
 17 files changed, 757 insertions(+), 19 deletions(-)
 create mode 100644 gcc/config/riscv/sifive-vector-builtins-bases.cc
 create mode 100644 gcc/config/riscv/sifive-vector-builtins-bases.h
 create mode 100644 gcc/config/riscv/sifive-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/sifive-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 12018d2193c..afa78453197 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -552,7 +552,7 @@ riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o 
riscv-avlprop.o"
-   extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
+   extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o 
sifive-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
extra_headers="riscv_vector.h riscv_crypto.h riscv_bitmanip.h 
riscv_th_vector.h riscv_cmo.h"
diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index efe6bc41e86..132ab039822 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -69,7 +69,7 @@
 
 ;; Vector float multiplication and FMA.
 (define_insn_reservation "vec_fmul" 6
-  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd,vfwmaccbf16")
+  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd,vfwmaccbf16,sf_vqmacc")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector crypto, assumed to be a generic operation for now.
diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config

[pushed: r15-5740] analyzer, timevar: avoid naked "new" in JSON-handling

2024-11-27 Thread David Malcolm
Now that  is always included, use std::unique_ptr in a few more
places to avoid naked "new".

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-5740-g066f309db6a545.

gcc/analyzer/ChangeLog:
* engine.cc (strongly_connected_components::to_json): Avoid naked
"new".
* infinite-loop.cc (infinite_loop::to_json): Convert return type
to unique_ptr.  Avoid naked "new".
* sm-signal.cc (signal_delivery_edge_info_t::to_json): Delete
unused function.
* supergraph.cc (supernode::to_json): Avoid naked "new".

gcc/ChangeLog:
* timevar.cc: Include "make-unique.h".
(timer::named_items::make_json): Convert return type to unique_ptr.
Avoid naked "new".
(make_json_for_timevar_time_def): Likewise.
(timer::timevar_def::make_json): Likewise.
(timer::make_json): Likewise.
* timevar.h (timer::make_json): Likewise.
(timer::timevar_def::make_json): Likewise.
* tree-diagnostic-client-data-hooks.cc: Update for above changes.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc   |  2 +-
 gcc/analyzer/infinite-loop.cc|  8 +++---
 gcc/analyzer/sm-signal.cc|  6 -
 gcc/analyzer/supergraph.cc   |  8 +++---
 gcc/timevar.cc   | 31 
 gcc/timevar.h|  4 +--
 gcc/tree-diagnostic-client-data-hooks.cc |  4 +--
 7 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index d19eb7a70e61..373182158121 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -2422,7 +2422,7 @@ strongly_connected_components::to_json () const
 {
   auto scc_arr = ::make_unique ();
   for (int i = 0; i < m_sg.num_nodes (); i++)
-scc_arr->append (new json::integer_number (get_scc_id (i)));
+scc_arr->append (::make_unique (get_scc_id (i)));
   return scc_arr;
 }
 
diff --git a/gcc/analyzer/infinite-loop.cc b/gcc/analyzer/infinite-loop.cc
index f1a60e8d65a4..14ceba7e5ac1 100644
--- a/gcc/analyzer/infinite-loop.cc
+++ b/gcc/analyzer/infinite-loop.cc
@@ -105,15 +105,15 @@ struct infinite_loop
&& m_loc == other.m_loc);
   }
 
-  json::object *
+  std::unique_ptr
   to_json () const
   {
-json::object *loop_obj = new json::object ();
+auto loop_obj = ::make_unique ();
 loop_obj->set_integer ("enode", m_enode.m_index);
-json::array *edge_arr = new json::array ();
+auto edge_arr = ::make_unique ();
 for (auto eedge : m_eedge_vec)
   edge_arr->append (eedge->to_json ());
-loop_obj->set ("eedges", edge_arr);
+loop_obj->set ("eedges", std::move (edge_arr));
 return loop_obj;
   }
 
diff --git a/gcc/analyzer/sm-signal.cc b/gcc/analyzer/sm-signal.cc
index 8adaa6f0e23b..3c1da5d64ed1 100644
--- a/gcc/analyzer/sm-signal.cc
+++ b/gcc/analyzer/sm-signal.cc
@@ -220,12 +220,6 @@ public:
 pp_string (pp, "signal delivered");
   }
 
-  json::object *to_json () const
-  {
-json::object *custom_obj = new json::object ();
-return custom_obj;
-  }
-
   bool update_model (region_model *model,
 const exploded_edge *eedge,
 region_model_context *) const final override
diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index ef6ab6329295..f2e3dc4ead16 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -737,7 +737,7 @@ supernode::to_json () const
 
   /* Phi nodes.  */
   {
-json::array *phi_arr = new json::array ();
+auto phi_arr = ::make_unique ();
 for (gphi_iterator gpi = const_cast (this)->start_phis ();
 !gsi_end_p (gpi); gsi_next (&gpi))
   {
@@ -747,12 +747,12 @@ supernode::to_json () const
pp_gimple_stmt_1 (&pp, stmt, 0, (dump_flags_t)0);
phi_arr->append_string (pp_formatted_text (&pp));
   }
-snode_obj->set ("phis", phi_arr);
+snode_obj->set ("phis", std::move (phi_arr));
   }
 
   /* Statements.  */
   {
-json::array *stmt_arr = new json::array ();
+auto stmt_arr = ::make_unique ();
 int i;
 gimple *stmt;
 FOR_EACH_VEC_ELT (m_stmts, i, stmt)
@@ -762,7 +762,7 @@ supernode::to_json () const
pp_gimple_stmt_1 (&pp, stmt, 0, (dump_flags_t)0);
stmt_arr->append_string (pp_formatted_text (&pp));
   }
-snode_obj->set ("stmts", stmt_arr);
+snode_obj->set ("stmts", std::move (stmt_arr));
   }
 
   return snode_obj;
diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 29c0152c6158..48d0c72cbdfc 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "timevar.h"
 #include "options.h"
 #include "json.h"
+#include "make-unique.h"
 
 /* Non-NULL if timevars should be used.  In GCC, this happens with
the -ftime-report flag.  */
@@ -59,7 +60,7 @@ class timer::named_items
   void pop ();
  

[PATCH] AIX Build failure with default -std=gnu23.

2024-11-27 Thread Sangamesh Mallayya
 libiberty/getopt.c file is defining _NO_PROTO which causes conflicting
 declarations for the functions in AIX header files like stdio.h & stdlib.h.
 These declarations are being considered as errors in C23 which wasn't
 the case with C17.

Here is the error we get.

/gcc_build/./prev-gcc/xgcc -B/gcc_build/./prev-gcc/ 
-B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/bin/ -B/home/sangam
/install/GCC/powerpc-ibm-aix7.3.3.0/bin/ 
-B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/lib/ -isystem /home/sangam/ins
tall/GCC/powerpc-ibm-aix7.3.3.0/include -isystem 
/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/sys-include   -fno-check
ing -c -DHAVE_CONFIG_H -g -O2 -fno-checking  -I. 
-I/opt/freeware/src/packages/BUILD/gcc/libiberty/../include  -W -Wall -W
write-strings -Wc++-compat -Wstrict-prototypes -Wshadow=local -pedantic  
-D_GNU_SOURCE  /opt/freeware/src/packages/BUILD/
gcc/libiberty/getopt.c -o getopt.o


In file included from 
/opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:45:
/gcc_build/prev-gcc/include-fixed/stdio.h:593:12: error: conflicting types for 
'fgetpos64'; have 'int(FILE *, fpos64_t *)
' {aka 'int(FILE *, long long int *)'}
  593 | extern int fgetpos64(FILE *, fpos64_t *);
  |^
/gcc_build/prev-gcc/include-fixed/stdio.h:298:17: note: previous declaration of 
'fgetpos64' with type 'int(void)'
  298 | extern int  fgetpos();
  | ^~~
/gcc_build/prev-gcc/include-fixed/stdio.h:594:14: error: conflicting types for 
'fopen64'; have 'FILE *(const char *, cons
t char *)'
  594 | extern FILE *fopen64(const char *, const char *);
  |  ^~~

/gcc_build/prev-gcc/include-fixed/stdio.h:259:17: note: previous declaration of 
'fopen64' with type 'FILE *(void)'
  259 | extern FILE *   fopen();
  | ^
/gcc_build/prev-gcc/include-fixed/stdio.h:595:14: error: conflicting types for 
'freopen64'; have 'FILE *(const char *, co
nst char *, FILE *)'
  595 | extern FILE *freopen64(const char *, const char *, FILE *);
  |  ^
/gcc_build/prev-gcc/include-fixed/stdio.h:260:17: note: previous declaration of 
'freopen64' with type 'FILE *(void)'
  260 | extern FILE *   freopen();
  | ^~~
/gcc_build/prev-gcc/include-fixed/stdio.h:597:12: error: conflicting types for 
'fsetpos64'; have 'int(FILE *, const fpos6
4_t *)' {aka 'int(FILE *, const long long int *)'}
  597 | extern int fsetpos64(FILE *, const fpos64_t *);
  |^
/gcc_build/prev-gcc/include-fixed/stdio.h:300:17: note: previous declaration of 
'fsetpos64' with type 'int(void)'
  300 | extern int  fsetpos();
  | ^~~
In file included from 
/opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:216:
/gcc_build/prev-gcc/include-fixed/stdlib.h: In function 'strtold':
/gcc_build/prev-gcc/include-fixed/stdlib.h:233:30: error: too many arguments to 
function 'strtod'


Compiled with this patch on RHEL8.10 ppc64le as well.

---
 libiberty/getopt.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/libiberty/getopt.c b/libiberty/getopt.c
index 2f7086cc0c8..48736d4db41 100644
--- a/libiberty/getopt.c
+++ b/libiberty/getopt.c
@@ -23,12 +23,6 @@
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301,
USA.  */
 
-/* This tells Alpha OSF/1 not to define a getopt prototype in .
-   Ditto for AIX 3.2 and .  */
-#ifndef _NO_PROTO
-# define _NO_PROTO
-#endif
-
 #ifdef HAVE_CONFIG_H
 # include 
 #endif
-- 
2.41.0



[pushed: r15-5737] diagnostics: replace %<%s%> with %qs [PR104896]

2024-11-27 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-5737-g9f06b910a840d8.

gcc/analyzer/ChangeLog:
PR c/104896
* sm-malloc.cc: Replace "%<%s%>" with "%qs" in message wording.

gcc/c-family/ChangeLog:
PR c/104896
* c-lex.cc (c_common_lex_availability_macro): Replace "%<%s%>"
with "%qs" in message wording.
* c-opts.cc (c_common_handle_option): Likewise.
* c-warn.cc (warn_parm_array_mismatch): Likewise.

gcc/ChangeLog:
PR c/104896
* common/config/ia64/ia64-common.cc (ia64_handle_option): Replace
"%<%s%>" with "%qs" in message wording.
* common/config/rs6000/rs6000-common.cc (rs6000_handle_option):
Likewise.
* config/aarch64/aarch64.cc (aarch64_validate_sls_mitigation):
Likewise.
(aarch64_override_options): Likewise.
(aarch64_process_target_attr): Likewise.
* config/arm/aarch-common.cc (aarch_validate_mbranch_protection):
Likewise.
* config/pru/pru.cc (pru_insert_attributes): Likewise.
* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::parse_arch): Likewise.
* omp-general.cc (oacc_verify_routine_clauses): Likewise.
* tree-ssa-uninit.cc (maybe_warn_read_write_only): Likewise.
(maybe_warn_pass_by_reference): Likewise.

gcc/cp/ChangeLog:
PR c/104896
* cvt.cc (maybe_warn_nodiscard): Replace "%<%s%>" with "%qs" in
message wording.

gcc/fortran/ChangeLog:
PR c/104896
* resolve.cc (resolve_operator): Replace "%<%s%>" with "%qs" in
message wording.

gcc/go/ChangeLog:
PR c/104896
* gofrontend/embed.cc (Gogo::initializer_for_embeds): Replace
"%<%s%>" with "%qs" in message wording.
* gofrontend/expressions.cc
(Selector_expression::lower_method_expression): Likewise.
* gofrontend/gogo.cc (Gogo::set_package_name): Likewise.
(Named_object::export_named_object): Likewise.
* gofrontend/parse.cc (Parse::struct_type): Likewise.
(Parse::parameter_list): Likewise.

gcc/rust/ChangeLog:
PR c/104896
* backend/rust-compile-expr.cc
(CompileExpr::compile_integer_literal): Replace "%<%s%>" with
"%qs" in message wording.
(CompileExpr::compile_float_literal): Likewise.
* backend/rust-compile-intrinsic.cc (Intrinsics::compile):
Likewise.
* backend/rust-tree.cc (maybe_warn_nodiscard): Likewise.
* checks/lints/rust-lint-scan-deadcode.h: Likewise.
* lex/rust-lex.cc (Lexer::parse_partial_unicode_escape): Likewise.
(Lexer::parse_raw_byte_string): Likewise.
* lex/rust-token.cc (Token::get_str): Likewise.
* metadata/rust-export-metadata.cc
(PublicInterface::write_to_path): Likewise.
* parse/rust-parse.cc
(peculiar_fragment_match_compatible_fragment): Likewise.
(peculiar_fragment_match_compatible): Likewise.
* resolve/rust-ast-resolve-path.cc (ResolvePath::resolve_path):
Likewise.
* resolve/rust-ast-resolve-toplevel.h: Likewise.
* resolve/rust-ast-resolve-type.cc (ResolveRelativeTypePath::go):
Likewise.
* rust-session-manager.cc (validate_crate_name): Likewise.
(Session::load_extern_crate): Likewise.
* typecheck/rust-hir-type-check-expr.cc (TypeCheckExpr::visit):
Likewise.
(TypeCheckExpr::resolve_fn_trait_call): Likewise.
* typecheck/rust-hir-type-check-implitem.cc
(TypeCheckImplItemWithTrait::visit): Likewise.
* typecheck/rust-hir-type-check-item.cc
(TypeCheckItem::validate_trait_impl_block): Likewise.
* typecheck/rust-hir-type-check-struct.cc
(TypeCheckStructExpr::visit): Likewise.
* typecheck/rust-tyty-call.cc (TypeCheckCallExpr::visit):
Likewise.
* typecheck/rust-tyty.cc (BaseType::bounds_compatible): Likewise.
* typecheck/rust-unify.cc (UnifyRules::emit_abi_mismatch):
Likewise.
* util/rust-attributes.cc (AttributeChecker::visit): Likewise.

libcpp/ChangeLog:
PR c/104896
* pch.cc (cpp_valid_state): Replace "%<%s%>" with "%qs" in message
wording.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-malloc.cc | 14 ++--
 gcc/c-family/c-lex.cc |  2 +-
 gcc/c-family/c-opts.cc|  2 +-
 gcc/c-family/c-warn.cc|  6 ++---
 gcc/common/config/ia64/ia64-common.cc |  2 +-
 gcc/common/config/rs6000/rs6000-common.cc |  2 +-
 gcc/config/aarch64/aarch64.cc |  6 ++---
 gcc/config/arm/aarch-common.cc|  6 ++---
 gcc/config/pru/pru.cc |  2 +-
 gcc/config/riscv/riscv-target-attr.cc |  2 +-
 gcc/cp/cvt.cc |  4 ++--
 gcc/fortran/resolve.cc   

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-27 Thread Hongtao Liu
On Wed, Nov 27, 2024 at 9:43 PM Richard Biener
 wrote:
>
> On Wed, Nov 27, 2024 at 4:26 AM liuhongt  wrote:
> >
> > When loop requires any kind of versioning which could increase register
> > pressure too much, and it's in a deeply nest big loop, don't do
> > vectorization.
> >
> > I tested the patch with both Ofast and O2 for SPEC2017, besides 
> > 548.exchange_r,
> > other benchmarks are same binary.
> >
> > Bootstrapped and regtested 0on x86_64-pc-linux-gnu{-m32,}
> > Any comments?
>
> The vectorizer tries to version an outer loop when vectorizing a loop nest
> and the versioning condition is invariant.  See vect_loop_versioning.  This
> tries to handle such cases.  Often the generated runtime alias checks are
> not invariant because we do not consider the outer evolutions.  I think we
> should instead fix this there.
>
> Question below ...
>
> > gcc/ChangeLog:
> >
> > pr target/117088
> > * config/i386/i386.cc
> > (ix86_vector_costs::ix86_vect_in_deep_nested_loop_p): New function.
> > (ix86_vector_costs::finish_cost): Prevent loop vectorization
> > if it's in a deeply nested loop and require versioning.
> > * config/i386/i386.opt (--param=vect-max-loop-depth=): New
> > param.
> > ---
> >  gcc/config/i386/i386.cc  | 89 
> >  gcc/config/i386/i386.opt |  4 ++
> >  2 files changed, 93 insertions(+)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 526c9df7618..608f40413d2 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -25019,6 +25019,8 @@ private:
> >
> >/* Estimate register pressure of the vectorized code.  */
> >void ix86_vect_estimate_reg_pressure ();
> > +  /* Check if vect_loop is in a deeply-nested loop.  */
> > +  bool ix86_vect_in_deep_nested_loop_p (class loop *vect_loop);
> >/* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for
> >   estimation of register pressure.
> >   ??? Currently it's only used by vec_construct/scalar_to_vec
> > @@ -25324,6 +25326,84 @@ ix86_vector_costs::ix86_vect_estimate_reg_pressure 
> > ()
> >  }
> >  }
> >
> > +/* Return true if vect_loop is in a deeply-nested loop.
> > +   .i.e vect_loop_n in below loop structure.
> > +loop1
> > +{
> > + loop2
> > + {
> > +  loop3
> > +  {
> > +   vect_loop_1;
> > +   loop4
> > +   {
> > +vect_loop_2;
> > +loop5
> > +{
> > + vect_loop_3;
> > + loop6
> > + {
> > +  vect_loop_4;
> > +  loop7
> > +  {
> > +   vect_loop_5;
> > +   loop8
> > +   {
> > +   loop9
> > +   }
> > +  vect_loop_6;
> > +  }
> > + vect_loop_7;
> > + }
> > +}
> > +   }
> > + }
> > + It's a big hammer to fix O2 regression for 548.exchange_r after 
> > vectorization
> > + is enhanced by (r15-4225-g70c3db511ba14f)  */
> > +bool
> > +ix86_vector_costs::ix86_vect_in_deep_nested_loop_p (class loop *vect_loop)
> > +{
> > +  if (loop_depth (vect_loop) > (unsigned) ix86_vect_max_loop_depth)
> > +return true;
> > +
> > +  if (loop_depth (vect_loop) < 2)
> > +return false;
> > +
>
> while the above two are "obvious", what you check below isn't clear to me.
> Is this trying to compute whether 'vect_loop' is inside of a loop nest which
> at any sibling of vect_loop (or even sibling of an outer loop of vect_loop,
> recursively) is a sub-nest with a loop depth (relative to what?) exceeds
> ix86_vect_max_loop_depth?
Yes, the function tries to find if the vect_loop is in a "big outer
loop" which contains an innermost loop with loop_depth >
ix86_vect_max_loop_depth.
If yes, then prevent vectorization for the loop if its tripcount is
not constant VF-times.(requires any kind of versioning is not
accurate, and yes it's a big hammer.)
>
> > +  class loop* outer_loop = loop_outer (vect_loop);
> > +
> > +  auto_vec m_loop_stack;
> > +  auto_sbitmap m_visited_loops (number_of_loops (cfun));
> > +
> > +  /* Get all sibling loops for vect_loop.  */
> > +  class loop* next_loop = outer_loop->inner;
> > +  for (; next_loop; next_loop = next_loop->next)
> > +{
> > +  m_loop_stack.safe_push (next_loop);
> > +  bitmap_set_bit (m_visited_loops, next_loop->num);
> > +}
> > +
> > +  /* DFS the max depth of all sibling loop.  */
> > +  while (!m_loop_stack.is_empty ())
> > +{
> > +  next_loop = m_loop_stack.pop ();
> > +  if (loop_depth (next_loop) > (unsigned) ix86_vect_max_loop_depth)
> > +   return true;
> > +
> > +  class loop* inner_loop = next_loop->inner;
> > +  while (inner_loop)
> > +   {
> > + if (!bitmap_bit_p (m_visited_loops, inner_loop->num))
> > +   {
> > + m_loop_stack.safe_push (inner_loop);
> > + bitmap_set_bit (m_visited_loops, inner_loop->num);
> > +   }
> > + inner_loop = inner_loop->next;
> > +   }
> > +}
> > +
> > +  return false;
> > +}
> > +
> >  void
> >  ix86_vector_costs::finish_cost

[PATCH v5] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-11-27 Thread Marek Polacek
On Wed, Nov 27, 2024 at 04:19:33PM -0500, Jason Merrill wrote:
> On 11/6/24 3:33 PM, Marek Polacek wrote:
> > On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote:
> > > On 10/30/24 4:59 PM, Marek Polacek wrote:
> > > > On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote:
> > > > > On Tue, 29 Oct 2024, Marek Polacek wrote:
> > > > +static tree
> > > > +cp_parser_pack_index (cp_parser *parser, tree pack)
> > > > +{
> > > > +  if (cxx_dialect < cxx26)
> > > > +pedwarn (cp_lexer_peek_token (parser->lexer)->location,
> > > > +OPT_Wc__26_extensions, "pack indexing only available with "
> > > > +"%<-std=c++2c%> or %<-std=gnu++2c%>");
> > > > +  /* Consume the '...' token.  */
> > > > +  cp_lexer_consume_token (parser->lexer);
> > > > +  /* Consume the '['.  */
> > > > +  cp_lexer_consume_token (parser->lexer);
> > > > +
> > > > +  if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_SQUARE))
> > > > +{
> > > > +  error_at (cp_lexer_peek_token (parser->lexer)->location,
> > > > +   "pack index missing");
> > > 
> > > Maybe cp_parser_error?
> > 
> > Unsure.  This:
> > 
> >template
> >void foo(Ts...[]);
> > 
> > then generates:
> > 
> >error: variable or field 'foo' declared void
> >error: expected primary-expression before '...' token
> >error: pack index missing before ']' token
> > 
> > which doesn't seem better.
> 
> I guess the question is whether we need to deal with the vexing parse. But
> in this case it'd be ill-formed regardless, so what you have is fine.
> 
> > > > @@ -6368,6 +6416,12 @@ cp_parser_primary_expression (cp_parser *parser,
> > > >   = make_location (caret_loc, start_loc, finish_loc);
> > > > decl.set_location (combined_loc);
> > > > +
> > > > +   /* "T...[constant-expression]" is a C++26 
> > > > pack-index-expression.  */
> > > > +   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)
> > > > +   && cp_lexer_nth_token_is (parser->lexer, 2, 
> > > > CPP_OPEN_SQUARE))
> > > > + decl = cp_parser_pack_index (parser, decl);
> > > 
> > > Shouldn't this be in cp_parser_id_expression?
> > 
> > It should, but I need to wait until after finish_id_expression, so that
> > DECL isn't just an identifier node.
> 
> Ah, makes sense.
> 
> > > > + ~ computed-type-specifier
> > > 
> > > Hmm, seems we never implemented ~decltype.
> > 
> > Looks like CWG 1753: .
> 
> Thanks.
> 
> > > > @@ -4031,6 +4036,15 @@ find_parameter_packs_r (tree *tp, int 
> > > > *walk_subtrees, void* data)
> > > >  *walk_subtrees = 0;
> > > >  return NULL_TREE;
> > > > +case PACK_INDEX_TYPE:
> > > > +case PACK_INDEX_EXPR:
> > > > +  /* We can have an expansion of an expansion, such as 
> > > > "Ts...[Is]...",
> > > > +so do look into the index.  */
> > > > +  cp_walk_tree (&PACK_INDEX_INDEX (t), &find_parameter_packs_r, 
> > > > ppd,
> > > > +   ppd->visited);
> > > > +  *walk_subtrees = 0;
> > > > +  return NULL_TREE;
> > > 
> > > Do we need to handle these specifically here?  I'd think the handling in
> > > cp_walk_subtrees would be enough.
> > 
> > I think I do, otherwise the Ts...[Is]... test doesn't work.
> > It is used when calling check_for_bare_parameter_packs.
> 
> Makes sense.
> 
> > > I'm not seeing a test for https://eel.is/c++draft/diff#cpp23.dcl.dcl-2 or
> > > the code to handle this case differently in C++23 vs 26.
> > Ah, right.  I've added the test (pack-indexing11.C) but we don't
> > compile it C++23 as we should due to:
> > 
> > pack-indexing11.C:7:13: error: expected ',' or '...' before '[' token
> >  7 | void f(T... [1]);
> >| ^
> > 
> > which seems like a bug.  Opened .
> > 
> > Is fixing that a requirement for this patch?
> 
> No.  Really, given that we're reusing this grammar, it's probably fine to
> never fix it.

I've closed it.
 
> > This patch implements C++26 Pack Indexing, as described in
> > .
> > 
> > The issue discussing how to mangle pack indexes has not been resolved
> > yet  and I've
> > made no attempt to address it so far.
> > 
> > Unlike v1, which used augmented TYPE/EXPR_PACK_EXPANSION codes, this
> > version introduces two new codes: PACK_INDEX_EXPR and PACK_INDEX_TYPE.
> > Both carry two operands: the pack expansion and the index.  They are
> > handled in tsubst_pack_index: substitute the index and the pack and
> > then extract the element from the vector (if possible).
> > 
> > To handle pack indexing in a decltype or with decltype(auto), there is
> > also the new PACK_INDEX_PARENTHESIZED_P flag.
> > 
> > With this feature, it's valid to write something like
> > 
> >using U = tmpl;
> > 
> > where we first expand the template argument into
> > 
> >Ts...[Is#0], Ts...[Is#1], ...
> > 
> > and then substitute each individual pack i

[PATCH] c++: P2865R5, Remove Deprecated Array Comparisons from C++26 [PR117788]

2024-11-27 Thread Marek Polacek
Not a bugfix, but this should only affect C++26.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8-- 
This patch implements P2865R5 by promoting the warning to error in C++26
only.  -Wno-array-compare shouldn't disable the error, so adjust the call
sites as well.

In C++20 we should warn even without -Wall.  Jason fixed this in r15-5713
but let's add a test that doesn't use -Wall.

PR c++/117788

gcc/c-family/ChangeLog:

* c-warn.cc (do_warn_array_compare): Emit an error in C++26.

gcc/cp/ChangeLog:

* typeck.cc (cp_build_binary_op) : Don't check
warn_array_compare.  Check tf_warning_or_error instead of just
tf_warning.
: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/Warray-compare-1.c: Expect an error in C++26.
* c-c++-common/Warray-compare-2.c: Likewise.
* c-c++-common/Warray-compare-3.c: Likewise.
* c-c++-common/Warray-compare-4.c: New test.
* g++.dg/tree-ssa/pr15791-1.C: Expect an error in C++26.
---
 gcc/c-family/c-warn.cc| 25 +++---
 gcc/cp/typeck.cc  | 10 ++--
 gcc/testsuite/c-c++-common/Warray-compare-1.c | 21 +---
 gcc/testsuite/c-c++-common/Warray-compare-2.c | 21 +---
 gcc/testsuite/c-c++-common/Warray-compare-3.c |  7 +--
 gcc/testsuite/c-c++-common/Warray-compare-4.c | 50 +++
 gcc/testsuite/g++.dg/tree-ssa/pr15791-1.C |  2 +-
 7 files changed, 106 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Warray-compare-4.c

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index 05d6e37edae..acda9a3ee3d 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -3818,8 +3818,9 @@ maybe_warn_sizeof_array_div (location_t loc, tree arr, 
tree arr_type,
 
 /* Warn about C++20 [depr.array.comp] array comparisons: "Equality
and relational comparisons between two operands of array type are
-   deprecated."  We also warn in C and earlier C++ standards.  CODE is
-   the code for this comparison, OP0 and OP1 are the operands.  */
+   deprecated."  In C++26 this is an error.  We also warn in C and earlier
+   C++ standards.  CODE is the code for this comparison, OP0 and OP1 are
+   the operands.  */
 
 void
 do_warn_array_compare (location_t location, tree_code code, tree op0, tree op1)
@@ -3832,10 +3833,22 @@ do_warn_array_compare (location_t location, tree_code 
code, tree op0, tree op1)
 op1 = TREE_OPERAND (op1, 0);
 
   auto_diagnostic_group d;
-  if (warning_at (location, OPT_Warray_compare,
- (c_dialect_cxx () && cxx_dialect >= cxx20)
- ? G_("comparison between two arrays is deprecated in C++20")
- : G_("comparison between two arrays")))
+  diagnostic_t kind = DK_WARNING;
+  const char *msg;
+  if (c_dialect_cxx () && cxx_dialect >= cxx20)
+{
+  /* P2865R5 made this comparison ill-formed in C++26.  */
+  if (cxx_dialect >= cxx26)
+   {
+ msg = G_("comparison between two arrays is not allowed in C++26");
+ kind = DK_ERROR;
+   }
+  else
+   msg = G_("comparison between two arrays is deprecated in C++20");
+}
+  else
+msg = G_("comparison between two arrays");
+  if (emit_diagnostic (kind, location, OPT_Warray_compare, msg))
 {
   /* C doesn't allow +arr.  */
   if (c_dialect_cxx ())
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 102a8ed131c..c3895eb5fb2 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -5816,7 +5816,7 @@ cp_build_binary_op (const op_location_t &location,
warning_at (location, OPT_Wfloat_equal,
"comparing floating-point with %<==%> "
"or % is unsafe");
-  if (complain & tf_warning)
+  if (complain & tf_warning_or_error)
{
  tree stripped_orig_op0 = tree_strip_any_location_wrapper (orig_op0);
  tree stripped_orig_op1 = tree_strip_any_location_wrapper (orig_op1);
@@ -5827,8 +5827,7 @@ cp_build_binary_op (const op_location_t &location,
warning_at (location, OPT_Waddress,
"comparison with string literal results in "
"unspecified behavior");
- else if (warn_array_compare
-  && TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
+ else if (TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
   && TREE_CODE (TREE_TYPE (orig_op1)) == ARRAY_TYPE)
do_warn_array_compare (location, code, stripped_orig_op0,
   stripped_orig_op1);
@@ -6111,11 +6110,10 @@ cp_build_binary_op (const op_location_t &location,
"comparison with string literal results "
"in unspecified behavior");
}
-  else if (warn_array_compare
-  && TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
+  else if (TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
   && TREE_COD

Re: [PATCH v8] Target-independent store forwarding avoidance.

2024-11-27 Thread Philipp Tomsich
On Thu 28. Nov 2024 at 15:36, Richard Biener 
wrote:

> On Mon, Nov 25, 2024 at 3:28 AM Philipp Tomsich
>  wrote:
> >
> > Pushed to master with the following fixups:
> >   - new timevar added
> >   - nits addressed
> >   - whitespace fixes
>
> The pass seems to be disabled by default everywhere - I thought we
> decided to avoid adding
> passes like this because they tend to bit-rot quickly and become a
> maintenance burden.
>
> What was the plan here?


We are preparing a follow-on commit to enable on Aarch64 and a few more key
architectures.

Richard.
>
> > Philipp.
> >
> >
> > On Mon, 25 Nov 2024 at 03:30, Jeff Law  wrote:
> > >
> > >
> > >
> > > On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote:
> > > > From: kelefth 
> > > >
> > > > This pass detects cases of expensive store forwarding and tries to
> avoid them
> > > > by reordering the stores and using suitable bit insertion sequences.
> > > > For example it can transform this:
> > > >
> > > >   strbw2, [x1, 1]
> > > >   ldr x0, [x1]  # Expensive store forwarding to larger
> load.
> > > >
> > > > To:
> > > >
> > > >   ldr x0, [x1]
> > > >   strbw2, [x1]
> > > >   bfi x0, x2, 0, 8
> > > >
> > > > Assembly like this can appear with bitfields or type punning /
> unions.
> > > > On stress-ng when running the cpu-union microbenchmark the following
> speedups
> > > > have been observed.
> > > >
> > > >Neoverse-N1:  +29.4%
> > > >Intel Coffeelake: +13.1%
> > > >AMD 5950X:+17.5%
> > > >
> > > > The transformation is rejected on cases that would cause
> store_bit_field
> > > > to generate subreg expressions on different register classes.
> > > > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
> contain
> > > > such cases and have been marked as XFAIL.
> > > >
> > > > There is a special handling for machines with BITS_BIG_ENDIAN !=
> > > > BYTES_BIG_ENDIAN. The need for this came up from an issue in H8
> > > > architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN
> > > > is false. In that case, the START parameter of store_bit_field
> > > > needs to be calculated from the end of the destination register.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * Makefile.in (OBJS): Add avoid-store-forwarding.o.
> > > >   * common.opt (favoid-store-forwarding): New option.
> > > >   * common.opt.urls: Regenerate.
> > > >   * doc/invoke.texi: New param store-forwarding-max-distance.
> > > >   * doc/passes.texi: Document new pass.
> > > >   * doc/tm.texi: Regenerate.
> > > >   * doc/tm.texi.in: Document new pass.
> > > >   * params.opt (store-forwarding-max-distance): New param.
> > > >   * passes.def: Add pass_rtl_avoid_store_forwarding before
> > > >   pass_early_remat.
> > > >   * target.def (avoid_store_forwarding_p): New DEFHOOK.
> > > >   * target.h (struct store_fwd_info): Declare.
> > > >   * targhooks.cc (default_avoid_store_forwarding_p): New
> function.
> > > >   * targhooks.h (default_avoid_store_forwarding_p): Declare.
> > > >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> > > >   * avoid-store-forwarding.cc: New file.
> > > >   * avoid-store-forwarding.h: New file.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> > > >   * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> > > >   * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> > > >   * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
> > > >   * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
> > > >   * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c:
> New test.
> > > >  * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c:
> New test.
> > > >
> > > > Signed-off-by: Philipp Tomsich 
> > > > Signed-off-by: Konstantinos Eleftheriou <
> konstantinos.elefther...@vrull.eu>
> > > >
> > > > Series-version: 8
> > > >
> > > > Series-changes: 8
> > > >   - Fix store_bit_field call for big-endian targets, where
> > > > BITS_BIG_ENDIAN is false.
> > > >   - Handle store_forwarding_max_distance = 0 as a special case
> that
> > > > disables cost checks for avoid-store-forwarding.
> > > >   - Update testcases for AArch64 and add testcases for x86-64.
> > > >
> > > > Series-changes: 7
> > > >   - Fix bug when copying back the load register, in the case
> that the
> > > > load is eliminated.
> > > >
> > > > Series-changes: 6
> > > >   - Reject the transformation on cases that would cause
> store_bit_field
> > > > to generate subreg expressions on different register classes.
> > > > Files avoid-store-forwarding-4.c and
> avoid-store-forwarding-5.c
> > > >contain such cases and have been marked as XFAIL.
> > > >   - Use optimize_bb_for_speed_p instead of
> optimize_insn_for_speed_p.
> > > >   - Inline and remo

Re: [RFC][PATCH] RISC-V: Support TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN

2024-11-27 Thread Zhijin Zeng
I create a pr in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/455, 
maybe we can discuss it there.

Zhijin,
Thanks.

> From: "Kito Cheng"
> Date:  Tue, Nov 26, 2024, 23:41
> Subject:  Re: [RFC][PATCH] RISC-V: Support 
> TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
> To: "Zhijin Zeng"
> Cc: 
> Do you mind creating a PR to

> https://github.com/riscv/riscv-elf-psabi-doc/pull/ to start the

> discussion? I believe this should be documented somewhere since it

> should be consistent between LLVM and GCC.

> 
> On Mon, Nov 4, 2024 at 2:26 PM Zhijin Zeng  wrote:

> >

> > I can't find the vector function name mangling of risc-v, so in order to

> > support TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,

> > TARGET_SIMD_CLONE_ADJUST and TARGET_SIMD_CLONE_USABLE, I add risc-v

> > vector function mangling rules as follow:

> >

> >      _ZGVNv_

> >

> >      'x' is the LMUL, if the LMUL is 1/2/4/8 and 'x' is 1/2/4/8.

> >      'y' is the count of elements also 'simdlen' in gcc.

> >      'v...' depends on the number of parameter, there are as many 'v'

> > characters as there are parameters.

> >      'func_name' is the scalar function name.

> >

> > Maybe it's incorrect or incomplete, but I think it's worhting

> > discussing. And combined with the glibc patch, we can use libmvec in risc-v.

> >

> > glibc patch:

> > https://inbox.sourceware.org/libc-alpha/af3aceaa-dcf1-4fa6-ad7e-a7fd7444a...@spacemit.com/

> >

> >

> > Thanks,

> >

> > Zhijin

> >

> >

> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc

> > index 0b3b2c4cba9..d181aae4ac7 100644

> > --- a/gcc/config/riscv/riscv.cc

> > +++ b/gcc/config/riscv/riscv.cc

> > @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see

> >

> >   #define INCLUDE_MEMORY

> >   #define INCLUDE_STRING

> > +#include 

> >   #include "config.h"

> >   #include "system.h"

> >   #include "coretypes.h"

> > @@ -34,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see

> >   #include "insn-config.h"

> >   #include "insn-attr.h"

> >   #include "recog.h"

> > +#include "cgraph.h"

> >   #include "output.h"

> >   #include "alias.h"

> >   #include "tree.h"

> > @@ -5787,7 +5789,9 @@ riscv_vector_type_p (const_tree type)

> >   {

> >     /* Currently, only builtin scalable vector type is allowed, in the

> > future,

> >        more vector types may be allowed, such as GNU vector type, etc.  */

> > -  return riscv_vector::builtin_type_p (type);

> > +  if (!type)

> > +    return false;

> > +  return riscv_vector::builtin_type_p (type) || VECTOR_TYPE_P (type);

> >   }

> >

> >   static unsigned int

> > @@ -12695,6 +12699,231 @@

> > riscv_stack_clash_protection_alloca_probe_range (void)

> >     return STACK_CLASH_CALLER_GUARD;

> >   }

> >

> > +/* Return true for types that could be supported as SIMD return or

> > +   argument types.  */

> > +

> > +static bool

> > +supported_simd_type (tree t)

> > +{

> > +  if (SCALAR_FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t))

> > +    {

> > +      HOST_WIDE_INT s = tree_to_shwi (TYPE_SIZE_UNIT (t));

> > +      return s == 1 || s == 2 || s == 4 || s == 8;

> > +    }

> > +  return false;

> > +}

> > +

> > +static unsigned

> > +lane_size (cgraph_simd_clone_arg_type clone_arg_type, tree type)

> > +{

> > +  gcc_assert (clone_arg_type != SIMD_CLONE_ARG_TYPE_MASK);

> > +

> > +  if (INTEGRAL_TYPE_P (type)

> > +      || SCALAR_FLOAT_TYPE_P (type))

> > +    switch (TYPE_PRECISION (type) / BITS_PER_UNIT)

> > +      {

> > +      default:

> > +       break;

> > +      case 1:

> > +      case 2:

> > +      case 4:

> > +      case 8:

> > +       return TYPE_PRECISION (type);

> > +      }

> > +  gcc_unreachable ();

> > +}

> > +

> > +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */

> > +

> > +static int

> > +riscv_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,

> > +                                       struct cgraph_simd_clone *clonei,

> > +                                       tree base_type ATTRIBUTE_UNUSED,

> > +                                       int num, bool explicit_p)

> > +{

> > +  tree t, ret_type;

> > +  unsigned int elt_bit = 0;

> > +  unsigned HOST_WIDE_INT const_simdlen;

> > +

> > +  if (!TARGET_VECTOR)

> > +    return 0;

> > +

> > +  if (maybe_ne (clonei->simdlen, 0U)

> > +      && clonei->simdlen.is_constant (&const_simdlen)

> > +      && (const_simdlen < 2

> > +         || const_simdlen > 1024

> > +         || (const_simdlen & (const_simdlen - 1)) != 0))

> > +    {

> > +      if (explicit_p)

> > +       warning_at (DECL_SOURCE_LOCATION (node->decl), 0,

> > +                   "unsupported simdlen %wd", const_simdlen);

> > +      return 0;

> > +    }

> > +

> > +  ret_type = TREE_TYPE (TREE_TYPE (node->decl));

> > +  if (TREE_CODE (ret_type) != VOID_TYPE

> > +      && !supported_simd_type (ret_type))

> > +    {

> > +      if (!explicit_p)

> > +       ;

> > +      else if (COMP

Re: [PATCH v8] Target-independent store forwarding avoidance.

2024-11-27 Thread Richard Biener
On Mon, Nov 25, 2024 at 3:28 AM Philipp Tomsich
 wrote:
>
> Pushed to master with the following fixups:
>   - new timevar added
>   - nits addressed
>   - whitespace fixes

The pass seems to be disabled by default everywhere - I thought we
decided to avoid adding
passes like this because they tend to bit-rot quickly and become a
maintenance burden.

What was the plan here?

Richard.

> Philipp.
>
>
> On Mon, 25 Nov 2024 at 03:30, Jeff Law  wrote:
> >
> >
> >
> > On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote:
> > > From: kelefth 
> > >
> > > This pass detects cases of expensive store forwarding and tries to avoid 
> > > them
> > > by reordering the stores and using suitable bit insertion sequences.
> > > For example it can transform this:
> > >
> > >   strbw2, [x1, 1]
> > >   ldr x0, [x1]  # Expensive store forwarding to larger load.
> > >
> > > To:
> > >
> > >   ldr x0, [x1]
> > >   strbw2, [x1]
> > >   bfi x0, x2, 0, 8
> > >
> > > Assembly like this can appear with bitfields or type punning / unions.
> > > On stress-ng when running the cpu-union microbenchmark the following 
> > > speedups
> > > have been observed.
> > >
> > >Neoverse-N1:  +29.4%
> > >Intel Coffeelake: +13.1%
> > >AMD 5950X:+17.5%
> > >
> > > The transformation is rejected on cases that would cause store_bit_field
> > > to generate subreg expressions on different register classes.
> > > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain
> > > such cases and have been marked as XFAIL.
> > >
> > > There is a special handling for machines with BITS_BIG_ENDIAN !=
> > > BYTES_BIG_ENDIAN. The need for this came up from an issue in H8
> > > architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN
> > > is false. In that case, the START parameter of store_bit_field
> > > needs to be calculated from the end of the destination register.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * Makefile.in (OBJS): Add avoid-store-forwarding.o.
> > >   * common.opt (favoid-store-forwarding): New option.
> > >   * common.opt.urls: Regenerate.
> > >   * doc/invoke.texi: New param store-forwarding-max-distance.
> > >   * doc/passes.texi: Document new pass.
> > >   * doc/tm.texi: Regenerate.
> > >   * doc/tm.texi.in: Document new pass.
> > >   * params.opt (store-forwarding-max-distance): New param.
> > >   * passes.def: Add pass_rtl_avoid_store_forwarding before
> > >   pass_early_remat.
> > >   * target.def (avoid_store_forwarding_p): New DEFHOOK.
> > >   * target.h (struct store_fwd_info): Declare.
> > >   * targhooks.cc (default_avoid_store_forwarding_p): New function.
> > >   * targhooks.h (default_avoid_store_forwarding_p): Declare.
> > >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> > >   * avoid-store-forwarding.cc: New file.
> > >   * avoid-store-forwarding.h: New file.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> > >   * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> > >   * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> > >   * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
> > >   * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
> > >   * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New 
> > > test.
> > >  * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New 
> > > test.
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > Signed-off-by: Konstantinos Eleftheriou 
> > > 
> > >
> > > Series-version: 8
> > >
> > > Series-changes: 8
> > >   - Fix store_bit_field call for big-endian targets, where
> > > BITS_BIG_ENDIAN is false.
> > >   - Handle store_forwarding_max_distance = 0 as a special case that
> > > disables cost checks for avoid-store-forwarding.
> > >   - Update testcases for AArch64 and add testcases for x86-64.
> > >
> > > Series-changes: 7
> > >   - Fix bug when copying back the load register, in the case that the
> > > load is eliminated.
> > >
> > > Series-changes: 6
> > >   - Reject the transformation on cases that would cause 
> > > store_bit_field
> > > to generate subreg expressions on different register classes.
> > > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
> > >contain such cases and have been marked as XFAIL.
> > >   - Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p.
> > >   - Inline and remove get_load_mem.
> > >   - New implementation for is_store_forwarding.
> > >   - Refactor the main loop in avoid_store_forwarding.
> > >   - Avoid using the word 'forwardings'.
> > >   - Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg.
> > >   - Don't use df_insn_rescan where not needed.
> > >   - Change order of emitting stores and bit insert ins

[PATCH v2] LoongArch: Mask shift offset when emit {xv, v}{srl, sll, sra} with sameimm vector

2024-11-27 Thread Jinyang He
For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.

gcc/ChangeLog:

* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.
---
v2: Fix indent in lsx.md and lasx.md.
Use "dg-do assemble" in test which suggested by Ruoyao.

 gcc/config/loongarch/constraints.md   | 14 ++---
 gcc/config/loongarch/lasx.md  | 60 +++
 gcc/config/loongarch/loongarch-protos.h   |  5 +-
 gcc/config/loongarch/lsx.md   | 60 +++
 gcc/config/loongarch/predicates.md|  8 +--
 .../vector/lasx/lasx-shift-sameimm-vec.c  | 48 +++
 .../vector/lsx/lsx-shift-sameimm-vec.c| 48 +++
 7 files changed, 206 insertions(+), 37 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 18da8b31f49..66ef1073fad 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -334,19 +334,19 @@
   (and (match_code "const_vector")
(match_test "loongarch_const_vector_same_int_p (op, mode, -16, 15)")))
 
-(define_constraint "Uuv6"
-  "@internal
-   A replicated vector const in which the replicated value is in the range
-   [0,63]."
-  (and (match_code "const_vector")
-   (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 63)")))
-
 (define_constraint "Urv8"
   "@internal
A replicated vector const with replicated byte values as well as elements"
   (and (match_code "const_vector")
(match_test "loongarch_const_vector_same_bytes_p (op, mode)")))
 
+(define_constraint "Uuvx"
+  "@internal
+   A replicated vector const in which the replicated value is in the unsigned
+   range [0,umax]."
+  (and (match_code "const_vector")
+   (match_test "loongarch_const_vector_same_int_p (op, mode)")))
+
 (define_memory_constraint "ZC"
   "A memory operand whose address is formed by a base register and offset
that is suitable for use in instructions with the same addressing mode
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 457ed163f31..90778dd8ff9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1013,11 +1013,23 @@
   [(set (match_operand:ILASX 0 "register_operand" "=f,f")
(lshiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
   "ISA_HAS_LASX"
-  "@
-   xvsrl.\t%u0,%u1,%u2
-   xvsrli.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsrl.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrli.\t%u0,%u1,%d2";
+  }
+default:
+  gcc_unreachable ();
+}
+}
   [(set_attr "type" "simd_shift")
(set_attr "mode" "")])
 
@@ -1026,11 +1038,23 @@
   [(set (match_operand:ILASX 0 "register_operand" "=f,f")
(ashiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
   "ISA_HAS_LASX"
-  "@
-   xvsra.\t%u0,%u1,%u2
-   xvsrai.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsra.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrai.\t%u0,%u1,%d2";
+  }
+default:
+  gcc_unreachable ();
+}
+}
   [(set_attr "type" "simd_shift")
(set_attr "mode" "")])
 
@@ -1039,11 +1063,23 @@
   [(se

Re: [PATCH v4] I386: Add more testcases for unsigned SAT_ADD vector pattern

2024-11-27 Thread Uros Bizjak
On Wed, Nov 27, 2024 at 3:00 AM  wrote:
>
> From: Pan Li 
>
> Some forms like below failed to recog the SAT_ADD pattern

... failed to be recognized as a SAT_ADD pattern ...

> for target i386.  It is related to some match pattern
> extraction but get fixed after the refactor of the SAT_ADD
> pattern.  Thus, add testcases to ensure we may have similar
> issue in futrue.

... to ensure we won't have similar issues in the future.

>
>   #define DEF_SAT_ADD(T)   \
>   T sat_add_##T (T x, T y) \
>   {\
> T res; \
> res = x + y;   \
> res |= -(T)(res < x);  \
> return res;\
>   }
>
>   #define VEC_DEF_SAT_ADD(T)   \
>   void vec_sat_add(T * restrict a, T * restrict b) \
>   {\
> for (int i = 0; i < 8; i++)\
>   b[i] = sat_add_##T (a[i], b[i]); \
>   }
>
>   DEF_SAT_ADD (uint32_t)
>   VEC_DEF_SAT_ADD (uint32_t)
>
> The below test suites are passed for this patch.
> make -k check-gcc RUNTESTFLAGS="--target_board=unix\{,-m32\} 
> i386.exp=pr112600-5a-*.c"
>
> PR target/112600
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr112600-5a-u16.c: New test.
> * gcc.target/i386/pr112600-5a-u32.c: New test.
> * gcc.target/i386/pr112600-5a-u64.c: New test.
> * gcc.target/i386/pr112600-5a-u8.c: New test.
> * gcc.target/i386/pr112600-5a.h: New test.

OK, but please drop "a" suffix from new files (when I added original
pr112600 testcases, "a" suffix was for char type and "b" was for short
type, but this is not the case with your testcases).

Thanks,
Uros.

>
> Signed-off-by: Pan Li 
> ---
>  .../gcc.target/i386/pr112600-5a-u16.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u32.c |  9 
>  .../gcc.target/i386/pr112600-5a-u64.c | 10 +
>  .../gcc.target/i386/pr112600-5a-u8.c  | 10 +
>  gcc/testsuite/gcc.target/i386/pr112600-5a.h   | 22 +++
>  5 files changed, 61 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112600-5a.h
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> new file mode 100644
> index 000..f462bfa4800
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u16.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint16_t)
> +VEC_DEF_SAT_ADD (uint16_t)
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 3 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> new file mode 100644
> index 000..5797c97ebe9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u32.c
> @@ -0,0 +1,9 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint32_t)
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> new file mode 100644
> index 000..d5f81f72ed5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u64.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile  { target { ! ia32 } } } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint64_t)
> +
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> new file mode 100644
> index 000..cb8657ecd86
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a-u8.c
> @@ -0,0 +1,10 @@
> +/* PR target/112600 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -fdump-tree-optimized" } */
> +
> +#include "pr112600-5a.h"
> +
> +DEF_SAT_ADD (uint8_t)
> +VEC_DEF_SAT_ADD (uint8_t)
> +
> +/* { dg-final { scan-tree-dump-times ".SAT_ADD " 2 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr112600-5a.h 
> b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
> new file mode 100644
> index 000..482c865e953
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112600-5a.h
> @@ -0,0 +1,22 @@
> +#ifndef HAVE_DEFINED_PR112600_5A_H
> +#define HAVE_DEFINED_PR112600_5A_H
> +
> +#include 
> +
> +#define DEF_SAT_ADD(T)   \
> +T sat_add_##T (T x, T y) \
> +{

[Patch, fortran] PR117768 - [15.0 regression] ICE in diagnostic_impl (?)

2024-11-27 Thread Paul Richard Thomas
Hi All,

The "fix" for PR84674 caused this regression.

The diagnostics that I had used for PR117763 allowed me to find a much
better fix for PR84674 and so this patch reverts the chunk in resolve.cc.

The chunk in class.cc works because non_overridable typebound procedures,
whose parent is abstract, do not have the 'overridden' field set. This
caused an immediate return from 'add_proc_comp' and this led to viable
typebound procedures being rejected. The fix checks the vtype component for
a specific typebound procedure that is abstract and uses this to suppress
the immediate return. I tested not adding the initialization expression if
the specific is abstract but, although this version regression tested OK,
decided to keep the patch as minimal as possible.

OK for mainline and, after a decent interval, to backport the chunk in
class.cc to the branches affected by PR84674?

Regards

Paul

Fortran: Fix non_overridable typebound proc problems [PR84674/117768].

2024-11-27  Paul Thomas  

gcc/fortran/ChangeLog

PR fortran/84674
* class.cc (add_proc_comp): If the component points to a tbp
that is abstract, do not return since the new version is more
likely to be usable.
PR fortran/117768
* resolve.cc (resolve_fl_derived): Remove the condition that
rejected a completely empty derived type extension.

gcc/testsuite/ChangeLog

PR fortran/117768
* gfortran.dg/pr117768.f90: New test.
diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 59ac0d97e08..64a0e726eeb 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -884,11 +884,21 @@ static void
 add_proc_comp (gfc_symbol *vtype, const char *name, gfc_typebound_proc *tb)
 {
   gfc_component *c;
-
+  bool is_abstract = false;
 
   c = gfc_find_component (vtype, name, true, true, NULL);
 
-  if (tb->non_overridable && !tb->overridden && c)
+  /* If the present component typebound proc is abstract, the new version
+ should unconditionally be tested if it is a suitable replacement.  */
+  if (c && c->tb && c->tb->u.specific
+  && c->tb->u.specific->n.sym->attr.abstract)
+is_abstract = true;
+
+  /* Pass on the new tb being not overridable if a component is found and
+ either there is not an overridden specific or the present component
+ tb is abstract. This ensures that possible, viable replacements are
+ loaded.  */
+  if (tb->non_overridable && !tb->overridden && !is_abstract && c)
 return;
 
   if (c == NULL)
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 0d3845f9ce3..afed8db7852 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -3229,8 +3229,8 @@ static bool check_pure_function (gfc_expr *e)
   const char *name = NULL;
   code_stack *stack;
   bool saw_block = false;
-  
-  /* A BLOCK construct within a DO CONCURRENT construct leads to 
+
+  /* A BLOCK construct within a DO CONCURRENT construct leads to
  gfc_do_concurrent_flag = 0 when the check for an impure function
  occurs.  Check the stack to see if the source code has a nested
  BLOCK construct.  */
@@ -16305,10 +16305,6 @@ resolve_fl_derived (gfc_symbol *sym)
   && sym->ns->proc_name
   && sym->ns->proc_name->attr.flavor == FL_MODULE
   && sym->attr.access != ACCESS_PRIVATE
-  && !(sym->attr.extension
-	   && sym->attr.zero_comp
-	   && !sym->f2k_derived->tb_sym_root
-	   && !sym->f2k_derived->tb_uop_root)
   && !(sym->attr.vtype || sym->attr.pdt_template))
 {
   gfc_symbol *vtab = gfc_find_derived_vtab (sym);
diff --git a/gcc/testsuite/gfortran.dg/pr117768.f90 b/gcc/testsuite/gfortran.dg/pr117768.f90
new file mode 100644
index 000..f9cf46421c1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr117768.f90
@@ -0,0 +1,76 @@
+! { dg-do compile }
+!
+! Fix a regession caused by the first patch for PR84674.
+!
+! Contributed by Juergen Reuter  
+!
+module m1
+  implicit none
+  private
+  public :: t1
+  type, abstract :: t1
+  end type t1
+end module m1
+
+module t_base
+  use m1, only: t1
+  implicit none
+  private
+  public :: t_t
+  type, abstract :: t_t
+   contains
+ procedure (t_out), deferred :: output
+  end type t_t
+
+  abstract interface
+ subroutine t_out (t, handle)
+   import
+   class(t_t), intent(inout) :: t
+   class(t1), intent(inout), optional :: handle
+ end subroutine t_out
+  end interface
+
+end module t_base
+
+
+module t_ascii
+  use m1, only: t1
+  use t_base
+  implicit none
+  private
+
+  type, abstract, extends (t_t) :: t1_t
+   contains
+ procedure :: output => t_ascii_output
+  end type t1_t
+  type, extends (t1_t) :: t2_t
+  end type t2_t
+  type, extends (t1_t) :: t3_t
+ logical :: verbose = .true.
+  end type t3_t
+
+  interface
+module subroutine t_ascii_output &
+ (t, handle)
+  class(t1_t), intent(inout) :: t
+  class(t1), intent(inout), optional :: handle
+end subroutine t_ascii_output
+  end interface
+end module t_ascii
+
+submodule (t_ascii) t_ascii_s
+  implicit none
+contains
+  module 

Re: [Patch, fortran] PR117768 - [15.0 regression] ICE in diagnostic_impl (?)

2024-11-27 Thread Paul Richard Thomas
Hi Andre,

Yes indeed, I did regtest the patch :-)

Thanks for the thumbs up.

Paul


On Wed, 27 Nov 2024 at 09:07, Andre Vehreschild  wrote:

> Hi Paul,
>
> the patch looks fine to me. I assume you have regtested it? (Because you
> don't
> state so.)
>
> Thanks for the patch.
>
> - Andre
>
> On Wed, 27 Nov 2024 08:57:25 +
> Paul Richard Thomas  wrote:
>
> > Hi All,
> >
> > The "fix" for PR84674 caused this regression.
> >
> > The diagnostics that I had used for PR117763 allowed me to find a much
> > better fix for PR84674 and so this patch reverts the chunk in resolve.cc.
> >
> > The chunk in class.cc works because non_overridable typebound procedures,
> > whose parent is abstract, do not have the 'overridden' field set. This
> > caused an immediate return from 'add_proc_comp' and this led to viable
> > typebound procedures being rejected. The fix checks the vtype component
> for
> > a specific typebound procedure that is abstract and uses this to suppress
> > the immediate return. I tested not adding the initialization expression
> if
> > the specific is abstract but, although this version regression tested OK,
> > decided to keep the patch as minimal as possible.
> >
> > OK for mainline and, after a decent interval, to backport the chunk in
> > class.cc to the branches affected by PR84674?
> >
> > Regards
> >
> > Paul
> >
> > Fortran: Fix non_overridable typebound proc problems [PR84674/117768].
> >
> > 2024-11-27  Paul Thomas  
> >
> > gcc/fortran/ChangeLog
> >
> > PR fortran/84674
> > * class.cc (add_proc_comp): If the component points to a tbp
> > that is abstract, do not return since the new version is more
> > likely to be usable.
> > PR fortran/117768
> > * resolve.cc (resolve_fl_derived): Remove the condition that
> > rejected a completely empty derived type extension.
> >
> > gcc/testsuite/ChangeLog
> >
> > PR fortran/117768
> > * gfortran.dg/pr117768.f90: New test.
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [PATCH v2 1/3] c++: Fix mangling of otherwise unattached class-scope lambdas [PR107741]

2024-11-27 Thread Nathaniel Shead
On Thu, Nov 21, 2024 at 07:51:55PM +0100, Jason Merrill wrote:
> On 11/9/24 9:22 AM, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?  Given
> > that this doesn't actually fix the modules PR c++/116568 anymore I've
> > pulled my workaround for that out as a separate patch (3/3).
> 
> In general, mangling changes should depend on -fabi-version, and tests
> should verify both the old and new mangling.  Likewise for patch 2/3.
> 

OK, thanks.  Does this include C++20-only ABI-changes, such as for
unevaluated lambdas or lambdas in template arguments?  (Since my
understanding is that we currently consider C++20 to be unstable.)

If we're going to need to do this anyway I think I might wait until I
can create actual correct manglings in all cases, not just this slightly
better one (see [1] for where I got stuck last time I had a chance to
look at this).

Also FYI, due to some recent changes in life circumstances I do not
currently have much time to make contributions, so I probably won't be
able to work on this until next year.  I haven't merged patch 3/3
because it turns out that it does depend on these patches to avoid
regressions.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668298.html)

> > This is a step closer to implementing the suggested changes for
> > https://github.com/itanium-cxx-abi/cxx-abi/pull/85.  Most lambdas
> > defined within a class should have an extra scope of that class so that
> > uses across different TUs are properly merged by the linker.  This also
> > needs to happen during template instantiation.
> > 
> > While I was working on this I found some other cases where the mangling
> > of lambdas was incorrect and causing issues, notably the testcase
> > lambda-ctx3.C which currently emits the same mangling for the base class
> > and member lambdas, causing mysterious assembler errors. However, this
> > doesn't fix the A::x case of the linker PR at this time so I've left
> > that as an XFAIL.
> > 
> > One notable case not handled either here or in the ABI is what is
> > supposed to happen with lambdas declared in alias templates; see
> > lambda-ctx4.C.  I believe that by the C++ standard, such lambdas should
> > also dedup across TUs, but this isn't currently implemented (for
> > class-scope or not).  I wasn't able to work out how to fix the mangling
> > logic for this case easily so I've just excluded alias templates from
> > the class-scope mangling rules in template instantiation.
> > 
> >  PR c++/107741
> > 
> > gcc/cp/ChangeLog:
> > 
> >  * cp-tree.h (LAMBDA_EXPR_EXTRA_SCOPE): Adjust comment.
> >  * parser.cc (cp_parser_class_head): Start (and do not finish)
> >  lambda scope for all valid types.
> >  (cp_parser_class_specifier): Finish lambda scope after parsing
> >  members instead.
> >  (cp_parser_member_declaration): Adjust comment to mention
> >  missing lambda scoping for static member initializers.
> >  * pt.cc (instantiate_class_template): Add lambda scoping.
> >  (instantiate_template): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/abi/lambda-ctx2.C: New test.
> > * g++.dg/abi/lambda-ctx3.C: New test.
> > * g++.dg/abi/lambda-ctx4.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/cp-tree.h   |  3 ++-
> >   gcc/cp/parser.cc   | 31 ++-
> >   gcc/cp/pt.cc   | 14 ++-
> >   gcc/testsuite/g++.dg/abi/lambda-ctx2.C | 34 ++
> >   gcc/testsuite/g++.dg/abi/lambda-ctx3.C | 21 
> >   gcc/testsuite/g++.dg/abi/lambda-ctx4.C | 22 +
> >   6 files changed, 111 insertions(+), 14 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2.C
> >   create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx3.C
> >   create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx4.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index f98a1de42ca..f6cf1754d86 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -1513,7 +1513,8 @@ enum cp_lambda_default_capture_mode_type {
> > (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->locus)
> >   /* The mangling scope for the lambda: FUNCTION_DECL, PARM_DECL, VAR_DECL,
> > -   FIELD_DECL or NULL_TREE.  If this is NULL_TREE, we have no linkage.  */
> > +   FIELD_DECL, TYPE_DECL, or NULL_TREE.  If this is NULL_TREE, we have no
> > +   linkage.  */
> >   #define LAMBDA_EXPR_EXTRA_SCOPE(NODE) \
> > (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->extra_scope)
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index c1375ecdbb5..7f22384d8a7 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -27107,6 +27107,8 @@ cp_parser_class_specifier (cp_parser* parser)
> > if (!braces.require_open (parser))
> >   {
> > pop_deferring_access_checks ();

Re: [PATCH] libstdc++: module std fixes

2024-11-27 Thread Jonathan Wakely
On Wed, 27 Nov 2024, 04:56 Jason Merrill,  wrote:

> Tested x86_64-pc-linux-gnu, OK for trunk?
>

OK, thanks.

I'll make the change to use hidden friends for the __normal_iterator ops.



> -- 8< --
>
> Some tests were failing due to the exported using declaration of iter_move
> conflicting with friend declarations; the exported using needs to be in the
> inline namespace, like the customization point itself, rather than
> std::ranges.
>
> Also add a few missing exports.
>
> Some tests failed to find some operators defined in implementation-detail
> namespaces; this exports them as well, but as previously discussed it's
> probably preferable to make those operators friends so ADL can find them
> that way.
>
> libstdc++-v3/ChangeLog:
>
> * src/c++23/std.cc.in: Fix iter_move/swap.  Add fold_left_first,
> to,
> concat, and some operators.
> ---
>  libstdc++-v3/src/c++23/std.cc.in | 64 ++--
>  1 file changed, 36 insertions(+), 28 deletions(-)
>
> diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/
> std.cc.in
> index d225c8b8c85..7d787a5 100644
> --- a/libstdc++-v3/src/c++23/std.cc.in
> +++ b/libstdc++-v3/src/c++23/std.cc.in
> @@ -494,8 +494,13 @@ export namespace std
>  #endif
>  #if __cpp_lib_ranges_fold
>  using ranges::fold_left;
> +using ranges::fold_left_first;
> +using ranges::fold_left_first_with_iter;
>  using ranges::fold_left_with_iter;
>  using ranges::fold_right;
> +using ranges::fold_right_last;
> +using ranges::in_value_result;
> +using ranges::out_value_result;
>  #endif
>  #if __cpp_lib_ranges_find_last
>  using ranges::find_last;
> @@ -1572,10 +1577,14 @@ export namespace std
>using std::iter_reference_t;
>using std::iter_value_t;
>using std::iterator_traits;
> -  namespace ranges
> +  // _Cpo is an implementation detail we can't avoid exposing; if we do
> the
> +  // using in ranges directly, it conflicts with any friend functions of
> the
> +  // same name, which is why the customization points are in an inline
> +  // namespace in the first place.
> +  namespace ranges::inline _Cpo
>{
> -using std::ranges::iter_move;
> -using std::ranges::iter_swap;
> +using _Cpo::iter_move;
> +using _Cpo::iter_swap;
>}
>using std::advance;
>using std::bidirectional_iterator;
> @@ -1679,6 +1688,15 @@ export namespace std
>using std::make_const_sentinel;
>  #endif
>  }
> +// FIXME these should be friends of __normal_iterator to avoid exporting
> +// __gnu_cxx.
> +export namespace __gnu_cxx
> +{
> +  using __gnu_cxx::operator==;
> +  using __gnu_cxx::operator<=>;
> +  using __gnu_cxx::operator+;
> +  using __gnu_cxx::operator-;
> +}
>
>  // 
>  export namespace std
> @@ -2278,43 +2296,32 @@ export namespace std
>namespace views = ranges::views;
>using std::tuple_element;
>using std::tuple_size;
> -#if __glibcxx_ranges_as_const // >= C++23
>namespace ranges
>{
> +#if __glibcxx_ranges_as_const // >= C++23
>  using ranges::constant_range;
>  using ranges::const_iterator_t;
>  using ranges::const_sentinel_t;
>  using ranges::range_const_reference_t;
>  using ranges::as_const_view;
>  namespace views { using views::as_const; }
> -  }
>  #endif
>  #ifdef __glibcxx_generator  // C++ >= 23 && __glibcxx_coroutine
> -  namespace ranges
> -  {
>  using ranges::elements_of;
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_as_rvalue // C++ >= 23
> -  namespace ranges {
>  using ranges::as_rvalue_view;
>  namespace views { using views::as_rvalue; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_chunk // C++ >= 23
> -  namespace ranges {
>  using ranges::chunk_view;
>  namespace views { using views::chunk; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_slide // C++ >= 23
> -  namespace ranges {
>  using ranges::slide_view;
>  namespace views { using views::slide; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_zip // C++ >= 23
> -  namespace ranges {
>  using ranges::zip_view;
>  using ranges::zip_transform_view;
>  using ranges::adjacent_view;
> @@ -2327,44 +2334,45 @@ export namespace std
>using views::pairwise;
>using views::pairwise_transform;
>  }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_chunk_by // C++ >= 23
> -  namespace ranges {
>  using ranges::chunk_by_view;
>  namespace views { using views::chunk_by; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_join_with // C++ >= 23
> -  namespace ranges {
>  using ranges::join_with_view;
>  namespace views { using views::join_with; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_repeat // C++ >= 23
> -  namespace ranges {
>  using ranges::repeat_view;
>  namespace views { using views::repeat; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_stride // C++ >= 23
> -  namespace ranges {
>  using ranges::stride_view;
>  namespace views { using views::stride; }
> -  }
>  #endif
>  #ifdef __cpp_lib_ranges_carte

Backport two LRA patches to gcc-14 branch

2024-11-27 Thread Uros Bizjak
Hello!

I'd like to backport two LRA patches to gcc-14 branch:

1. [PR114942][LRA]: Don't reuse input reload reg of inout early clobber operand
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9585317f0715699197b1313bbf939c6ea3c1ace6

2. [PR117105][LRA]: Use unique value reload pseudo for early clobber operand
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4b09e2c67ef593db171b0755b46378964421782b

They both fix RA failure with strict_low_part family of instructions:

(insn 24 55 54 4 (parallel [
(set (strict_low_part (reg:QI 2 cx [orig:109 e ] [109]))
(and:QI (subreg:QI (zero_extract:HI (reg/v:HI 2 cx
[orig:109 e ] [109])
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)
(reg:QI 1 dx [orig:115 _6 ] [115])))
(clobber (reg:CC 17 flags))

that were added by me for PR target/78904, so I have some interest in
the backport.

The backport of two patches was bootstrapped and regression tested
with the current gcc-14 branch.

Is the backport OK for branch?

Thanks,
Uros.


Re: [Patch, fortran] PR117768 - [15.0 regression] ICE in diagnostic_impl (?)

2024-11-27 Thread Andre Vehreschild
Hi Paul,

the patch looks fine to me. I assume you have regtested it? (Because you don't
state so.)

Thanks for the patch.

- Andre

On Wed, 27 Nov 2024 08:57:25 +
Paul Richard Thomas  wrote:

> Hi All,
>
> The "fix" for PR84674 caused this regression.
>
> The diagnostics that I had used for PR117763 allowed me to find a much
> better fix for PR84674 and so this patch reverts the chunk in resolve.cc.
>
> The chunk in class.cc works because non_overridable typebound procedures,
> whose parent is abstract, do not have the 'overridden' field set. This
> caused an immediate return from 'add_proc_comp' and this led to viable
> typebound procedures being rejected. The fix checks the vtype component for
> a specific typebound procedure that is abstract and uses this to suppress
> the immediate return. I tested not adding the initialization expression if
> the specific is abstract but, although this version regression tested OK,
> decided to keep the patch as minimal as possible.
>
> OK for mainline and, after a decent interval, to backport the chunk in
> class.cc to the branches affected by PR84674?
>
> Regards
>
> Paul
>
> Fortran: Fix non_overridable typebound proc problems [PR84674/117768].
>
> 2024-11-27  Paul Thomas  
>
> gcc/fortran/ChangeLog
>
> PR fortran/84674
> * class.cc (add_proc_comp): If the component points to a tbp
> that is abstract, do not return since the new version is more
> likely to be usable.
> PR fortran/117768
> * resolve.cc (resolve_fl_derived): Remove the condition that
> rejected a completely empty derived type extension.
>
> gcc/testsuite/ChangeLog
>
> PR fortran/117768
> * gfortran.dg/pr117768.f90: New test.


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH v2] fold fold_truth_andor field merging into ifcombine

2024-11-27 Thread Alexandre Oliva
On Nov 22, 2024, Alexandre Oliva  wrote:

> - Rework BIT_XOR handling to avoid having to match patterns again.

I goofed here, and only an -O3 profiling bootstrap of gcc-14 caught it.
Please consider this incremental patchlet as if included as part of the
previous one.  Bootstrapped on top of the other on x86_64-linux-gnu.

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 731f6ccd5597e..d0caabd8a4b4d 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7486,6 +7486,10 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
  exp = res_ops[1];
  gcc_checking_assert (!xor_cmp_op);
}
+  else if (!xor_cmp_op)
+   /* Not much we can do when xor appears in the right-hand compare
+  operand.  */
+   return NULL_TREE;
   else
{
  *xor_p = true;


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] c: Fix sizeof error recovery [PR117745]

2024-11-27 Thread Jakub Jelinek
Hi!

Compilation of the following testcase hangs forever after emitting first
error.  The problem is that in one place we just return error_mark_node
directly rather than going through c_expr_sizeof_expr or c_expr_sizeof_type.
The parsing of the expression could have called record_maybe_used_decl
though, but nothing calls pop_maybe_used which needs to be called after
parsing of every sizeof/typeof, successful or not.
At the end of the toplevel declaration we free the parser_obstack and in
another function record_maybe_used_decl is called again and due to the
missing pop_maybe_unused we end up with a cycle in the chain.

The following patch fixes it by just setting error and goto to the
sizeof_expr:
  c_inhibit_evaluation_warnings--;
  in_sizeof--;
  mark_exp_read (expr.value);
  if (TREE_CODE (expr.value) == COMPONENT_REF
  && DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
error_at (expr_loc, "% applied to a bit-field");
  result = c_expr_sizeof_expr (expr_loc, expr);
where c_expr_sizeof_expr will do:
  struct c_expr ret;
  if (expr.value == error_mark_node)
{
  ret.value = error_mark_node;
  ret.original_code = ERROR_MARK;
  ret.original_type = NULL;
  ret.m_decimal = 0;
  pop_maybe_used (false);
}
...
  return ret;
which is exactly what the old code did manually except for the missing
pop_maybe_used call.  mark_exp_read does nothing on error_mark_node and
error_mark_node doesn't have COMPONENT_REF tree_code.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-27  Jakub Jelinek  

PR c/117745
* c-parser.cc (c_parser_sizeof_expression): If type_name is NULL,
just expr.set_error () and goto sizeof_expr instead of doing error
recovery manually.

* gcc.dg/pr117745.c: New test.

--- gcc/c/c-parser.cc.jj2024-11-23 13:00:28.328028242 +0100
+++ gcc/c/c-parser.cc   2024-11-26 11:59:18.036410746 +0100
@@ -10405,13 +10405,8 @@ c_parser_sizeof_expression (c_parser *pa
   finish = parser->tokens_buf[0].location;
   if (type_name == NULL)
{
- struct c_expr ret;
- c_inhibit_evaluation_warnings--;
- in_sizeof--;
- ret.set_error ();
- ret.original_code = ERROR_MARK;
- ret.original_type = NULL;
- return ret;
+ expr.set_error ();
+ goto sizeof_expr;
}
   if (c_parser_next_token_is (parser, CPP_OPEN_BRACE))
{
--- gcc/testsuite/gcc.dg/pr117745.c.jj  2024-11-26 12:07:11.120756946 +0100
+++ gcc/testsuite/gcc.dg/pr117745.c 2024-11-26 12:08:00.031068850 +0100
@@ -0,0 +1,8 @@
+/* PR c/117745 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+static int foo (void);
+void bar (void) { sizeof (int [0 ? foo () : 1); }  /* { dg-error 
"expected" } */
+static int baz (void);
+void qux (void) { sizeof (sizeof (int[baz ()])); }

Jakub



Re: [Patch, fortran] PR117768 - [15.0 regression] ICE in diagnostic_impl (?)

2024-11-27 Thread Paul Richard Thomas
Pushed as r15-5716.

Paul


On Wed, 27 Nov 2024 at 09:15, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:

> Hi Andre,
>
> Yes indeed, I did regtest the patch :-)
>
> Thanks for the thumbs up.
>
> Paul
>
>
> On Wed, 27 Nov 2024 at 09:07, Andre Vehreschild  wrote:
>
>> Hi Paul,
>>
>> the patch looks fine to me. I assume you have regtested it? (Because you
>> don't
>> state so.)
>>
>> Thanks for the patch.
>>
>> - Andre
>>
>> On Wed, 27 Nov 2024 08:57:25 +
>> Paul Richard Thomas  wrote:
>>
>> > Hi All,
>> >
>> > The "fix" for PR84674 caused this regression.
>> >
>> > The diagnostics that I had used for PR117763 allowed me to find a much
>> > better fix for PR84674 and so this patch reverts the chunk in
>> resolve.cc.
>> >
>> > The chunk in class.cc works because non_overridable typebound
>> procedures,
>> > whose parent is abstract, do not have the 'overridden' field set. This
>> > caused an immediate return from 'add_proc_comp' and this led to viable
>> > typebound procedures being rejected. The fix checks the vtype component
>> for
>> > a specific typebound procedure that is abstract and uses this to
>> suppress
>> > the immediate return. I tested not adding the initialization expression
>> if
>> > the specific is abstract but, although this version regression tested
>> OK,
>> > decided to keep the patch as minimal as possible.
>> >
>> > OK for mainline and, after a decent interval, to backport the chunk in
>> > class.cc to the branches affected by PR84674?
>> >
>> > Regards
>> >
>> > Paul
>> >
>> > Fortran: Fix non_overridable typebound proc problems [PR84674/117768].
>> >
>> > 2024-11-27  Paul Thomas  
>> >
>> > gcc/fortran/ChangeLog
>> >
>> > PR fortran/84674
>> > * class.cc (add_proc_comp): If the component points to a tbp
>> > that is abstract, do not return since the new version is more
>> > likely to be usable.
>> > PR fortran/117768
>> > * resolve.cc (resolve_fl_derived): Remove the condition that
>> > rejected a completely empty derived type extension.
>> >
>> > gcc/testsuite/ChangeLog
>> >
>> > PR fortran/117768
>> > * gfortran.dg/pr117768.f90: New test.
>>
>>
>> --
>> Andre Vehreschild * Email: vehre ad gmx dot de
>>
>


[PATCH] match.pd: Avoid introducing UB in the ((X /[ex] C1) +- C2) * (C1 * C3) simplification [PR117692]

2024-11-27 Thread Jakub Jelinek
Hi!

As the pr117692.c testcase shows, the generalized pattern can introduce
UB when there wasn't any.
The old pattern was I believe correct, it is as if in the new
pattern C3 was always 1 and I don't see how that could have introduced
UB.
But if type is signed and C3 (aka factor) isn't 1 and for + X and C2
could have different sign or for - X and C2 could have the same sign,
when doing the addition/subtraction first the absolute value could
decrease, while if first multiplying by C3 we could invoke UB already
during that multiplication.

The following patch fixes it by going through the casts to utype if
ranger (get_range_pos_neg) detects the sign compared to sign of C2
(INTEGER_CST) could be the same or could be different depending on op
because then the absolute value will not increase.

Other possibility (perhaps as another check if this check doesn't succeed)
would be to test whether X * C3 could actually overflow.
vr-values.cc has check_for_binary_op_overflow (currently not exported)
which I think does what we'd need to check, if it returns true and sets
ovf to false.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-27  Jakub Jelinek  

PR tree-optimization/117692
* tree.cc (get_range_pos_neg): Adjust function comment, use
non-negative instead of positive.
* match.pd
(((X /[ex] C1) +- C2) * (C1 * C3) -> (X * C3) +- (C1 * C2 * C3)):
Use casts to utype if type is signed, factor isn't 1 and
C1 and C2 could have different sign for + or could have the
same sign for -.

* gcc.dg/tree-ssa/mulexactdiv-5.c: Expect 8 nop_exprs.
* gcc.dg/tree-ssa/pr117692.c: New test.

--- gcc/tree.cc.jj  2024-11-23 13:00:31.615980802 +0100
+++ gcc/tree.cc 2024-11-26 13:50:45.910803860 +0100
@@ -14510,8 +14510,8 @@ verify_type (const_tree t)
 
 
 /* Return 1 if ARG interpreted as signed in its precision is known to be
-   always positive or 2 if ARG is known to be always negative, or 3 if
-   ARG may be positive or negative.  */
+   always non-negative or 2 if ARG is known to be always negative, or 3 if
+   ARG may be non-negative or negative.  */
 
 int
 get_range_pos_neg (tree arg)
--- gcc/match.pd.jj 2024-11-26 09:37:32.563906412 +0100
+++ gcc/match.pd2024-11-26 18:36:33.037742888 +0100
@@ -5577,7 +5577,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  && TREE_CODE (@3) == INTEGER_CST
  && (mul = wi::mul (wi::to_wide (@2), wi::to_wide (@3),
 TYPE_SIGN (type), &overflow),
- !overflow))
+ !overflow)
+ && (TYPE_UNSIGNED (type)
+ /* Not using unsigned arithmetics is unsafe if factor
+isn't 1 and if for op plus @0 and @2 could have different
+sign or for op minus @0 and @2 could have the same sign.  */
+ || known_eq (factor, 1)
+ || (get_range_pos_neg (@0)
+ | (((op == PLUS_EXPR) ^ (tree_int_cst_sgn (@2) < 0))
+? 1 : 2)) != 3))
   (op (mult @0 { wide_int_to_tree (type, factor); })
  { wide_int_to_tree (type, mul); })
   (with { tree utype = unsigned_type_for (type); }
--- gcc/testsuite/gcc.dg/tree-ssa/mulexactdiv-5.c.jj2024-10-24 
18:53:41.438042287 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/mulexactdiv-5.c   2024-11-26 
14:25:07.507313308 +0100
@@ -18,7 +18,7 @@ TEST_CMP (f4, 8, 4, 200)
 
 /* { dg-final { scan-tree-dump-not {<[a-z]*_div_expr,} "optimized" } } */
 /* { dg-final { scan-tree-dump-not {

[PATCH] builtins: Emit __sync_lock_release_{8,16} call as last resort instead of doing nothing [PR117642]

2024-11-27 Thread Jakub Jelinek
Hi!

As the following testcases show, in case of multi-word
__sync_lock_test_and_set where we don't actually support atomics for that
size (__int128 for x86_64 lp64 with -mno-cx16, long long for ia32 with
-march=i{3,4}86), as the last fallback if we don't know anything else
we just emit calls to __sync_lock_test_and_set_{8,16}.  Those aren't defined
in libatomic, but perhaps users could define them themselves.
While __sync_lock_release if it gives up and has no way to emit the atomic
store just does nothing at all, so no clear sign to the users something
went wrong and that the code will not do what they expected.
This regressed when __atomic_* support has been introduced, previously we
would just emit those calls even in this case.

The patch just emits the call as the last fallback.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-27  Jakub Jelinek  

PR target/117642
* builtins.cc (expand_builtin_sync_lock_release): Change return type
from void to rtx, return result of expand_atomic_store.
(expand_builtin) : If
expand_builtin_sync_lock_release returns NULL, do a break rather
than return const0_rtx.

* gcc.target/i386/pr117642-1.c: New test.
* gcc.target/i386/pr117642-2.c: New test.

--- gcc/builtins.cc.jj  2024-11-26 09:46:39.513228477 +0100
+++ gcc/builtins.cc 2024-11-26 15:07:16.236123486 +0100
@@ -6587,7 +6587,7 @@ expand_builtin_sync_lock_test_and_set (m
 
 /* Expand the __sync_lock_release intrinsic.  EXP is the CALL_EXPR.  */
 
-static void
+static rtx
 expand_builtin_sync_lock_release (machine_mode mode, tree exp)
 {
   rtx mem;
@@ -6595,7 +6595,7 @@ expand_builtin_sync_lock_release (machin
   /* Expand the operands.  */
   mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
 
-  expand_atomic_store (mem, const0_rtx, MEMMODEL_SYNC_RELEASE, true);
+  return expand_atomic_store (mem, const0_rtx, MEMMODEL_SYNC_RELEASE, true);
 }
 
 /* Given an integer representing an ``enum memmodel'', verify its
@@ -8605,8 +8605,9 @@ expand_builtin (tree exp, rtx target, rt
 case BUILT_IN_SYNC_LOCK_RELEASE_8:
 case BUILT_IN_SYNC_LOCK_RELEASE_16:
   mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_LOCK_RELEASE_1);
-  expand_builtin_sync_lock_release (mode, exp);
-  return const0_rtx;
+  if (expand_builtin_sync_lock_release (mode, exp))
+   return const0_rtx;
+  break;
 
 case BUILT_IN_SYNC_SYNCHRONIZE:
   expand_builtin_sync_synchronize ();
--- gcc/testsuite/gcc.target/i386/pr117642-1.c.jj   2024-11-26 
15:18:07.270066467 +0100
+++ gcc/testsuite/gcc.target/i386/pr117642-1.c  2024-11-26 15:19:20.467048094 
+0100
@@ -0,0 +1,19 @@
+/* PR target/117642 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-mno-cx16" } */
+/* { dg-final { scan-assembler "__sync_lock_test_and_set_16" } } */
+/* { dg-final { scan-assembler "__sync_lock_release_16" } } */
+
+__int128 t = 1;
+
+void
+foo (void)
+{
+  __sync_lock_test_and_set (&t, 1);
+}
+
+void
+bar (void)
+{
+  __sync_lock_release (&t);
+}
--- gcc/testsuite/gcc.target/i386/pr117642-2.c.jj   2024-11-26 
15:19:33.174871291 +0100
+++ gcc/testsuite/gcc.target/i386/pr117642-2.c  2024-11-26 15:20:04.092441144 
+0100
@@ -0,0 +1,19 @@
+/* PR target/117642 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-march=i486" } */
+/* { dg-final { scan-assembler "__sync_lock_test_and_set_8" } } */
+/* { dg-final { scan-assembler "__sync_lock_release_8" } } */
+
+long long t = 1;
+
+void
+foo (void)
+{
+  __sync_lock_test_and_set (&t, 1);
+}
+
+void
+bar (void)
+{
+  __sync_lock_release (&t);
+}

Jakub



Re: Backport two LRA patches to gcc-14 branch

2024-11-27 Thread Richard Biener
On Wed, 27 Nov 2024, Uros Bizjak wrote:

> Hello!
> 
> I'd like to backport two LRA patches to gcc-14 branch:
> 
> 1. [PR114942][LRA]: Don't reuse input reload reg of inout early clobber 
> operand
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9585317f0715699197b1313bbf939c6ea3c1ace6
> 
> 2. [PR117105][LRA]: Use unique value reload pseudo for early clobber operand
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4b09e2c67ef593db171b0755b46378964421782b
> 
> They both fix RA failure with strict_low_part family of instructions:
> 
> (insn 24 55 54 4 (parallel [
> (set (strict_low_part (reg:QI 2 cx [orig:109 e ] [109]))
> (and:QI (subreg:QI (zero_extract:HI (reg/v:HI 2 cx
> [orig:109 e ] [109])
> (const_int 8 [0x8])
> (const_int 8 [0x8])) 0)
> (reg:QI 1 dx [orig:115 _6 ] [115])))
> (clobber (reg:CC 17 flags))
> 
> that were added by me for PR target/78904, so I have some interest in
> the backport.
> 
> The backport of two patches was bootstrapped and regression tested
> with the current gcc-14 branch.
> 
> Is the backport OK for branch?

OK from my side, please leave others time to object.

Richard.


Re: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes.

2024-11-27 Thread Richard Sandiford
Soumya AR  writes:
> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain 
> operands
> inverted. These can be extended to work with Neon modes.
>
> Since these instructions are unpredicated, duplicate patterns were added with
> the predicate removed to generate these instructions for Neon modes.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Soumya AR 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve2.md
>   (*aarch64_sve2_nbsl_unpred): New pattern to match unpredicated
>   form.
>   (*aarch64_sve2_bsl1n_unpred): Likewise.
>   (*aarch64_sve2_bsl2n_unpred): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/bitsel.c: New test.

Thanks for the patch.  But since this is a new optimisation, and is not
fixing a regression, I'm not sure whether it would be appropriate during
stage 3.  Let's see what other maintainers say.

Richard



Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-27 Thread Thomas Schwinge
Hi Tobias!

On 2024-11-18T14:23:24+0100, Tobias Burnus  wrote:
> This fixes a C23 error, causing a build fail: 'false'
> should have been 'NULL'.

ACK.

> The NULL value is not really handled as the code calling
> maybe_init_omp_async assumes that agent->omp_async_queue can be 
> dereferenced. Hence, besides fixing the false/NULL issue, it switches to 
> a 'fatal' error. Comments before I commit it after lunch?

(Please don't bundle up unrelated changes...)

That's a bug in 'libgomp/plugin/plugin-gcn.c:maybe_init_omp_async' (or
its users); the real user of 'GOMP_OFFLOAD_openacc_async_exec' does
handle the error condition:

libgomp/oacc-async.c-  if (!dev->openacc.async.asyncqueue[async])
libgomp/oacc-async.c-{
libgomp/oacc-async.c-  dev->openacc.async.asyncqueue[async]
libgomp/oacc-async.c:   = dev->openacc.async.construct_func 
(dev->target_id);
libgomp/oacc-async.c-
libgomp/oacc-async.c-  if (!dev->openacc.async.asyncqueue[async])
libgomp/oacc-async.c-   {
libgomp/oacc-async.c- gomp_mutex_unlock (&dev->openacc.async.lock);
libgomp/oacc-async.c- gomp_fatal ("async %d creation failed", async);
libgomp/oacc-async.c-   }

..., but needs to 'gomp_mutex_unlock' before 'gomp_fatal', which your
change now circumvents.  Therefore, please revert these 's%error%fatal'
changes, and instead fix up the libgomp GCN plugin-internal usage of
'GOMP_OFFLOAD_openacc_async_construct'.


Grüße
 Thomas


> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -4388,7 +4388,8 @@ GOMP_OFFLOAD_openacc_async_exec (void (*fn_ptr) (void 
> *),
>gcn_exec (kernel, devaddrs, dims, targ_mem_desc, true, aq);
>  }
>  
> -/* Create a new asynchronous thread and queue for running future kernels.  */
> +/* Create a new asynchronous thread and queue for running future kernels;
> +   fails with a fatal error as all callers expect the queue to exist.  */
>  
>  struct goacc_asyncqueue *
>  GOMP_OFFLOAD_openacc_async_construct (int device)
> @@ -4416,18 +4417,18 @@ GOMP_OFFLOAD_openacc_async_construct (int device)
>  
>if (pthread_mutex_init (&aq->mutex, NULL))
>  {
> -  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue mutex");
> -  return false;
> +  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue mutex");
> +  return NULL;
>  }
>if (pthread_cond_init (&aq->queue_cond_in, NULL))
>  {
> -  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
> -  return false;
> +  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +  return NULL;
>  }
>if (pthread_cond_init (&aq->queue_cond_out, NULL))
>  {
> -  GOMP_PLUGIN_error ("Failed to initialize a GCN agent queue cond");
> -  return false;
> +  GOMP_PLUGIN_fatal ("Failed to initialize a GCN agent queue cond");
> +  return NULL;
>  }
>  
>hsa_status_t status = hsa_fns.hsa_queue_create_fn (agent->id,


[PATCH] __builtin_prefetch fixes [PR117608]

2024-11-27 Thread Jakub Jelinek
Hi!

The r15-4833-ge9ab41b79933 patch had among tons of config/i386
specific changes also important change to the generic code, allowing
also 2 as valid value of the second argument of __builtin_prefetch:
-  /* Argument 1 must be either zero or one.  */

  
-  if (INTVAL (op1) != 0 && INTVAL (op1) != 1)  

  
+  /* Argument 1 must be 0, 1 or 2.  */ 

  
+  if (INTVAL (op1) < 0 || INTVAL (op1) > 2)

  

But the patch failed to document that change in __builtin_prefetch
documentation, and more importantly didn't adjust any of the other
backends to deal with it (my understanding is the expected behavior
is that 2 will be silently handled as 0 unless backends have some
more specific way).  Some of the backends would ICE on it, in some
cases gcc_assert failures/gcc_unreachable, in other cases crash later
(e.g. accessing arrays with that value as index and due to accessing
garbage after the array crashing at final.cc time), others treated 2
silently as 0, others treated 2 silently as 1.

And even in the i386 backend there were bugs which caused ICEs.
The patch added some if (write == 0) and write 2 handling into
a (badly indented, maybe that is the reason, if (write == 1) body),
rather than into the else side, so it would be always false.

The new *prefetch_rst2 define_insn only accepts parameters 2 1
(i.e. read-shared with moderate degree of locality), so in order
not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
of the parameter.  If that isn't what we want and we want it to be used
also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
corresponding __builtin_ia32_prefetch, maybe the define_insn could match
other values.
And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
would ICE on most of the prefetches, so I had to add the FAIL; cases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-27  Jakub Jelinek  

PR target/117608
* doc/extend.texi (__builtin_prefetch): Document that second
argument may be also 2 and its meaning.
* config/i386/i386.md (prefetch): Remove unreachable code.
Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
of locality is not 1.  Formatting fixes.
* config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
Call gen_prefetch even for TARGET_MOVRS.
* config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
* config/mips/mips.md (prefetch): Likewise.
* config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
* config/riscv/riscv.md (prefetch): Likewise.
* config/loongarch/loongarch.md (prefetch): Likewise.
* config/sparc/sparc.md (prefetch): Likewise.  Use IN_RANGE.
* config/ia64/ia64.md (prefetch): Likewise.
* config/pa/pa.md (prefetch): Likewise.
* config/aarch64/aarch64.md (prefetch): Likewise.
* config/rs6000/rs6000.md (prefetch): Likewise.

* gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
2.
* gcc.target/i386/pr117608-1.c: New test.
* gcc.target/i386/pr117608-2.c: New test.

--- gcc/doc/extend.texi.jj  2024-11-26 09:37:56.173574966 +0100
+++ gcc/doc/extend.texi 2024-11-26 17:49:35.469152396 +0100
@@ -15675,9 +15675,11 @@ be in the cache by the time it is access
 
 The value of @var{addr} is the address of the memory to prefetch.
 There are two optional arguments, @var{rw} and @var{locality}.
-The value of @var{rw} is a compile-time constant one or zero; one
-means that the prefetch is preparing for a write to the memory address
-and zero, the default, means that the prefetch is preparing for a read.
+The value of @var{rw} is a compile-time constant zero, one or two; one
+means that the prefetch is preparing for a write to the memory address,
+two means that the prefetch is preparing for a shared read (expected to be
+read by at least one other processor before it is written if written at
+all) and zero, the default, means that the prefetch is preparing for a read.
 The value @var{locality} must be a compile-time constant integer between
 zero and three.  A value of zero means that the data has no temporal
 locality, so it need not be left in the cache after the access.  A value
--- gcc/config/i386/i386.md.jj  2024-11-26 09:37:56.047576735 +0100
+++ gcc/config/i386/i386.md 

[PATCH] libatomic: Cleanup AArch64 ifunc selection

2024-11-27 Thread Wilco Dijkstra
Simplify and cleanup ifunc selection logic.  Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.

Passes regress and bootstrap, OK for commit?

libatomic:
* config/linux/aarch64/host-config.h (has_lse2): Cleanup.
(has_lse128): Likewise.
(has_rcpc3): Add early check for LSE2.

---

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 
93f367d587803ce26b3c9a45881ac2d9b2e37168..d9d9239897c82d2eebff2bf38f6bac3a7c7b23ea
 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -91,69 +91,62 @@ has_lse2 (unsigned long hwcap, const __ifunc_arg_t 
*features)
   /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
 return true;
-  /* No point checking further for atomic 128-bit load/store if LSE
- prerequisite not met.  */
-  if (!(hwcap & HWCAP_ATOMICS))
-return false;
-  if (!(hwcap & HWCAP_CPUID))
-return false;
 
-  unsigned long midr;
-  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+  /* If LSE and CPUID are supported, check MIDR.  */
+  if (hwcap & HWCAP_CPUID && hwcap & HWCAP_ATOMICS)
+{
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
 
-  /* Neoverse N1 supports atomic 128-bit load/store.  */
-  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM (midr) == 0xd0c)
-return true;
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  return MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM (midr) == 0xd0c;
+}
 
   return false;
 }
 
-/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic,
-   bits[23:20].  The expected value is 0b0011.  Check that.  */
+/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, bits[23:20].
+   The minimum value for LSE128 is 0b0011.  */
 
 #define AT_FEAT_FIELD(isar0)   (((isar0) >> 20) & 15)
 
 static inline bool
 has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
-  if (hwcap & _IFUNC_ARG_HWCAP
-  && features->_hwcap2 & HWCAP2_LSE128)
-return true;
-  /* A 0 HWCAP2_LSE128 bit may be just as much a sign of missing HWCAP2 bit
- support in older kernels as it is of CPU feature absence.  Try fallback
- method to guarantee LSE128 is not implemented.
-
- In the absence of HWCAP_CPUID, we are unable to check for LSE128.
- If feature check available, check LSE2 prerequisite before proceeding.  */
-  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
- return false;
-
-  unsigned long isar0;
-  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
-  if (AT_FEAT_FIELD (isar0) >= 3)
+  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128)
 return true;
+
+  /* If LSE2 and CPUID are supported, check for LSE128.  */
+  if (hwcap & HWCAP_CPUID && hwcap & HWCAP_USCAT)
+{
+  unsigned long isar0;
+  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
+  return AT_FEAT_FIELD (isar0) >= 3;
+}
+
   return false;
 }
 
-/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].  The
-   expected value is 0b0011.  Check that.  */
+/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].
+   The minimum value for LRCPC3 is 0b0011.  */
 
 static inline bool
 has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
-  if (hwcap & _IFUNC_ARG_HWCAP
-  && features->_hwcap2 & HWCAP2_LRCPC3)
-return true;
-  /* Try fallback feature check method to guarantee LRCPC3 is not implemented.
-
- In the absence of HWCAP_CPUID, we are unable to check for RCPC3, return.
- If feature check available, check LSE2 prerequisite before proceeding.  */
-  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+  /* LSE2 is a prerequisite for atomic LDIAPP/STILP.  */
+  if (!(hwcap & HWCAP_USCAT))
 return false;
-  unsigned long isar1;
-  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1));
-  if (AT_FEAT_FIELD (isar1) >= 3)
+
+  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LRCPC3)
 return true;
+
+  if (hwcap & HWCAP_CPUID)
+{
+  unsigned long isar1;
+  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1));
+  return AT_FEAT_FIELD (isar1) >= 3;
+}
+
   return false;
 }
 


Re: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes.

2024-11-27 Thread Kyrylo Tkachov

> On 27 Nov 2024, at 09:34, Richard Sandiford  wrote:
> 
> Soumya AR  writes:
>> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain 
>> operands
>> inverted. These can be extended to work with Neon modes.
>> 
>> Since these instructions are unpredicated, duplicate patterns were added with
>> the predicate removed to generate these instructions for Neon modes.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Soumya AR 
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/aarch64-sve2.md
>> (*aarch64_sve2_nbsl_unpred): New pattern to match unpredicated
>> form.
>> (*aarch64_sve2_bsl1n_unpred): Likewise.
>> (*aarch64_sve2_bsl2n_unpred): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/aarch64/sve/bitsel.c: New test.
> 
> Thanks for the patch.  But since this is a new optimisation, and is not
> fixing a regression, I'm not sure whether it would be appropriate during
> stage 3.  Let's see what other maintainers say.

IMO it’s not high risk but it’s a nice-to-have optimisation rather than driven 
by a concrete motivating workload.
Given that we have a few such patches (like the ASRD patch from Soumya) it 
would be consistent to either take them all now or stage them all for GCC 16.
I’d be okay with deferring them to GCC 16 but would appreciate if they received 
some feedback on the implementation beforehand so they can be polished for next 
stage1.

Thanks,
Kyrill


> 
> Richard
> 



Re: [PATCH] match.pd: Avoid introducing UB in the ((X /[ex] C1) +- C2) * (C1 * C3) simplification [PR117692]

2024-11-27 Thread Richard Biener
On Wed, 27 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> As the pr117692.c testcase shows, the generalized pattern can introduce
> UB when there wasn't any.
> The old pattern was I believe correct, it is as if in the new
> pattern C3 was always 1 and I don't see how that could have introduced
> UB.
> But if type is signed and C3 (aka factor) isn't 1 and for + X and C2
> could have different sign or for - X and C2 could have the same sign,
> when doing the addition/subtraction first the absolute value could
> decrease, while if first multiplying by C3 we could invoke UB already
> during that multiplication.
> 
> The following patch fixes it by going through the casts to utype if
> ranger (get_range_pos_neg) detects the sign compared to sign of C2
> (INTEGER_CST) could be the same or could be different depending on op
> because then the absolute value will not increase.
> 
> Other possibility (perhaps as another check if this check doesn't succeed)
> would be to test whether X * C3 could actually overflow.
> vr-values.cc has check_for_binary_op_overflow (currently not exported)
> which I think does what we'd need to check, if it returns true and sets
> ovf to false.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

As improvement, if required we could do the range evaluation or
perform the simplified arithmetic in an unsigned type and cast back.

Thanks,
Richard.

> 2024-11-27  Jakub Jelinek  
> 
>   PR tree-optimization/117692
>   * tree.cc (get_range_pos_neg): Adjust function comment, use
>   non-negative instead of positive.
>   * match.pd
>   (((X /[ex] C1) +- C2) * (C1 * C3) -> (X * C3) +- (C1 * C2 * C3)):
>   Use casts to utype if type is signed, factor isn't 1 and
>   C1 and C2 could have different sign for + or could have the
>   same sign for -.
> 
>   * gcc.dg/tree-ssa/mulexactdiv-5.c: Expect 8 nop_exprs.
>   * gcc.dg/tree-ssa/pr117692.c: New test.
> 
> --- gcc/tree.cc.jj2024-11-23 13:00:31.615980802 +0100
> +++ gcc/tree.cc   2024-11-26 13:50:45.910803860 +0100
> @@ -14510,8 +14510,8 @@ verify_type (const_tree t)
>  
>  
>  /* Return 1 if ARG interpreted as signed in its precision is known to be
> -   always positive or 2 if ARG is known to be always negative, or 3 if
> -   ARG may be positive or negative.  */
> +   always non-negative or 2 if ARG is known to be always negative, or 3 if
> +   ARG may be non-negative or negative.  */
>  
>  int
>  get_range_pos_neg (tree arg)
> --- gcc/match.pd.jj   2024-11-26 09:37:32.563906412 +0100
> +++ gcc/match.pd  2024-11-26 18:36:33.037742888 +0100
> @@ -5577,7 +5577,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && TREE_CODE (@3) == INTEGER_CST
> && (mul = wi::mul (wi::to_wide (@2), wi::to_wide (@3),
>TYPE_SIGN (type), &overflow),
> -   !overflow))
> +   !overflow)
> +   && (TYPE_UNSIGNED (type)
> +   /* Not using unsigned arithmetics is unsafe if factor
> +  isn't 1 and if for op plus @0 and @2 could have different
> +  sign or for op minus @0 and @2 could have the same sign.  */
> +   || known_eq (factor, 1)
> +   || (get_range_pos_neg (@0)
> +   | (((op == PLUS_EXPR) ^ (tree_int_cst_sgn (@2) < 0))
> +  ? 1 : 2)) != 3))
>(op (mult @0 { wide_int_to_tree (type, factor); })
> { wide_int_to_tree (type, mul); })
>(with { tree utype = unsigned_type_for (type); }
> --- gcc/testsuite/gcc.dg/tree-ssa/mulexactdiv-5.c.jj  2024-10-24 
> 18:53:41.438042287 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/mulexactdiv-5.c 2024-11-26 
> 14:25:07.507313308 +0100
> @@ -18,7 +18,7 @@ TEST_CMP (f4, 8, 4, 200)
>  
>  /* { dg-final { scan-tree-dump-not {<[a-z]*_div_expr,} "optimized" } } */
>  /* { dg-final { scan-tree-dump-not { -/* { dg-final { scan-tree-dump-not { +/* { dg-final { scan-tree-dump-times {  /* { dg-final { scan-tree-dump { } */
>  /* { dg-final { scan-tree-dump { } */
>  /* { dg-final { scan-tree-dump { } */
> --- gcc/testsuite/gcc.dg/tree-ssa/pr117692.c.jj   2024-11-26 
> 14:16:46.689274363 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr117692.c  2024-11-26 14:18:55.161491240 
> +0100
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/117692 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +/* { dg-final { scan-tree-dump " \\\* 25;" "vrp1" } } */
> +/* { dg-final { scan-tree-dump " \\\+ 800;" "vrp1" } } */
> +/* { dg-final { scan-tree-dump " = \\\(unsigned int\\\) " "vrp1" } } */
> +/* { dg-final { scan-tree-dump " = \\\(int\\\) " "vrp1" } } */
> +
> +int
> +foo (int x)
> +{
> +  if (x & 7)
> +__builtin_unreachable ();
> +  x /= 8;
> +  x += 4;
> +  return x * 200;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH v1 3/3] RISC-V: Add testcases for vec_duplicate + vadd.vv combine to vadd.vx

2024-11-27 Thread pan2 . li
From: Pan Li 

Add asm dump check and run test for vec_duplicate + vadd.vv combine
to vadd.vx.  Introduce new folder to hold all related testcases.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add new folder vx_vf for all
vec_dup + vv to vx testcases.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  17 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 401 ++
 .../riscv/rvv/autovec/vx_vf/vx_binary_run.h   |  26 ++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c|   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c   |   8 +
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c|   8 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i8.c  |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c |  14 +
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u8.c  |  14 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 20 files changed, 622 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_run.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
new file mode 100644
index 000..66654eb9022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
@@ -0,0 +1,17 @@
+#ifndef HAVE_DEFINED_VX_VF_BINARY_H
+#define HAVE_DEFINED_VX_VF_BINARY_H
+
+#include 
+
+#define DEF_VX_BI

[PATCH v1 2/3] RISC-V: Adjust the testcases after vec_duplicate + vadd.vv combine

2024-11-27 Thread pan2 . li
From: Pan Li 

After we support the vec_duplicate + vadd.vv combine to vadd.vx, the
existing testcases need some adjust for asm dump check times.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adjust
the asm dump check times.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/struct_vect_24.c: Ditto.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c  | 3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c   | 3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c  | 3 ++-
 .../gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/struct_vect_24.c | 6 +++---
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
index 667f457d658..7db55b298d1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
@@ -3,7 +3,8 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 10 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 6 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 3 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vf} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c
index a3b012631be..65e569d9d1c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c
@@ -3,6 +3,7 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 10 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 6 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 9 } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c
index 1d8a19ce0b2..4a48fce435e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c
@@ -3,7 +3,8 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 8 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 3 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vf} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c
index ef52f49657b..1cf6c06ecca 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c
@@ -3,6 +3,7 @@
 
 #include "vadd-template.h"
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 16 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vx} 8 } } */
 /* { dg-final { scan-assembler-times {\tvadd\.vi} 8 } } */
 /* { dg-final { scan-assembler-times {\tvfadd\.vv} 9 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/struct_vect_24.c 
b/gcc/testsuite/gcc.target/riscv/struct_vect_24.c
index 7c0852f1a55..9d36796f2ec 100644
--- a/gcc/testsuite/gcc.target/riscv/struct_vect_24.c
+++ b/gcc/testsuite/gcc.target/riscv/struct_vect_24.c
@@ -42,6 +42,6 @@ TEST (test)
 
 /* Check the vectorized loop for stack clash probing.  */
 
-/* { dg-final { scan-assembler-times {sd\tzero,1024\(sp\)} 6 } } */
-/* { dg-final { scan-assembler-times {bge\tt1,t0,.[^\\r\\n]*} 2 } } */
-/* { dg-final { scan-assembler-times {sub\s+t1,t1,t0} 2 } } */
+/* { dg-final { scan-assembler-times {sd\tzero,1024\(sp\)} 4 } } */
+/* { dg-final { scan-assembler-not {bge\tt1,t0,.[^\\r\\n]*} } } */
+/* { dg-final { scan-assembler-not {sub\s+t1,t1,t0} } } */
-- 
2.43.0



[committed] libstdc++: Simplify std::forward_list assignment using 'if constexpr'

2024-11-27 Thread Jonathan Wakely
Use diagnostic pragmas to allow using `if constexpr` in C++11 mode, so
that we don't need to use tag dispatching.

The unused member functions are preserved for the purposes of explicit
instantiations. The _M_assign function template can be removed, because
member function templates aren't instantiated by explicit instantiations
anyway.

libstdc++-v3/ChangeLog:

* include/bits/forward_list.h (operator=(forward_list&&)): Use
if constexpr instead of dispatching to _M_move_assign.
(assign(InputIterator, InputIterator)): Use if constexpr instead
of dispatching to _M_assign.
(assign(size_type, const T&)): Use if constexpr instead of
dispatching to _M_assign_n.
(_M_move_assign, _M_assign_n): Do not define for versioned
namespace.
(_M_assign): Remove.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/forward_list.h | 119 +++
 1 file changed, 79 insertions(+), 40 deletions(-)

diff --git a/libstdc++-v3/include/bits/forward_list.h 
b/libstdc++-v3/include/bits/forward_list.h
index 663ed0c46af..ac1b3593c79 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -644,6 +644,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   forward_list&
   operator=(const forward_list& __list);
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   /**
*  @brief  The %forward_list move assignment operator.
*  @param  __list  A %forward_list of identical element and allocator
@@ -663,7 +665,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
constexpr bool __move_storage =
  _Node_alloc_traits::_S_propagate_on_move_assign()
  || _Node_alloc_traits::_S_always_equal();
-   _M_move_assign(std::move(__list), __bool_constant<__move_storage>());
+   if constexpr (!__move_storage)
+ {
+   if (__list._M_get_Node_allocator() != this->_M_get_Node_allocator())
+ {
+   // The rvalue's allocator cannot be moved, or is not equal,
+   // so we need to individually move each element.
+   this->assign(std::make_move_iterator(__list.begin()),
+std::make_move_iterator(__list.end()));
+   return *this;
+ }
+ }
+
+   clear();
+   this->_M_impl._M_head._M_next = __list._M_impl._M_head._M_next;
+   __list._M_impl._M_head._M_next = nullptr;
+   if constexpr (_Node_alloc_traits::_S_propagate_on_move_assign())
+ this->_M_get_Node_allocator()
+ = std::move(__list._M_get_Node_allocator());
return *this;
   }
 
@@ -699,9 +718,30 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
void
assign(_InputIterator __first, _InputIterator __last)
{
- typedef is_assignable<_Tp, decltype(*__first)> __assignable;
- _M_assign(__first, __last, __assignable());
+ if constexpr (is_assignable<_Tp, decltype(*__first)>::value)
+   {
+ auto __prev = before_begin();
+ auto __curr = begin();
+ auto __end = end();
+ while (__curr != __end && __first != __last)
+   {
+ *__curr = *__first;
+ ++__prev;
+ ++__curr;
+ ++__first;
+   }
+ if (__first != __last)
+   insert_after(__prev, __first, __last);
+ else if (__curr != __end)
+   erase_after(__prev, __end);
+   }
+ else
+   {
+ clear();
+ insert_after(cbefore_begin(), __first, __last);
+   }
}
+#pragma GCC diagnostic pop
 
 #if __glibcxx_ranges_to_container // C++ >= 23
   /**
@@ -736,6 +776,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
}
 #endif // ranges_to_container
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   /**
*  @brief  Assigns a given value to a %forward_list.
*  @param  __n  Number of elements to be assigned.
@@ -748,7 +790,31 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*/
   void
   assign(size_type __n, const _Tp& __val)
-  { _M_assign_n(__n, __val, is_copy_assignable<_Tp>()); }
+  {
+   if constexpr (is_copy_assignable<_Tp>::value)
+ {
+   auto __prev = before_begin();
+   auto __curr = begin();
+   auto __end = end();
+   while (__curr != __end && __n > 0)
+ {
+   *__curr = __val;
+   ++__prev;
+   ++__curr;
+   --__n;
+ }
+   if (__n > 0)
+ insert_after(__prev, __n, __val);
+   else if (__curr != __end)
+ erase_after(__prev, __end);
+ }
+   else
+ {
+   clear();
+   insert_after(cbefore_begin(), __n, __val);
+ }
+  }
+#pragm

Re: Backport two LRA patches to gcc-14 branch

2024-11-27 Thread Vladimir Makarov


On 11/27/24 04:05, Uros Bizjak wrote:

Hello!

I'd like to backport two LRA patches to gcc-14 branch:

1. [PR114942][LRA]: Don't reuse input reload reg of inout early clobber operand
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9585317f0715699197b1313bbf939c6ea3c1ace6

2. [PR117105][LRA]: Use unique value reload pseudo for early clobber operand
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4b09e2c67ef593db171b0755b46378964421782b

They both fix RA failure with strict_low_part family of instructions:

(insn 24 55 54 4 (parallel [
 (set (strict_low_part (reg:QI 2 cx [orig:109 e ] [109]))
 (and:QI (subreg:QI (zero_extract:HI (reg/v:HI 2 cx
[orig:109 e ] [109])
 (const_int 8 [0x8])
 (const_int 8 [0x8])) 0)
 (reg:QI 1 dx [orig:115 _6 ] [115])))
 (clobber (reg:CC 17 flags))

that were added by me for PR target/78904, so I have some interest in
the backport.

The backport of two patches was bootstrapped and regression tested
with the current gcc-14 branch.

Is the backport OK for branch?


OK.  They are both safe.  I don't expect any issues with them.

Thank you, Uros.




Re: [PATCH] libstdc++: Add debug assertions to std::list and std::forward_list

2024-11-27 Thread Jonathan Wakely
On Mon, 18 Nov 2024 at 18:32, François Dumont  wrote:
>
>
> On 18/11/2024 19:24, François Dumont wrote:
> >
> > On 16/11/2024 02:18, Jonathan Wakely wrote:
> >> On Sat, 16 Nov 2024 at 01:09, Jonathan Wakely 
> >> wrote:
> >>> While working on fancy pointer support for the linked lists I noticed
> >>> they didn't have any debug assertions. This adds the obvious non-empty
> >>> assertions to front(), back(), pop_front() and pop_back().
> >>>
> >>> For the pop members, adding an assertion to the underlying function
> >>> that
> >>> erases a member means it also check erase(end()), which is always
> >>> invalid, and erase(begin()) on an empty list. For those erase
> >>> members we
> >>> can also add a check so that we return without doing anything if the
> >>> assertion is disabled, but would have failed had it been enabled.
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>  * include/bits/forward_list.h (forward_list::front): Add
> >>>  non-empty assertions.
> >>>  * include/bits/forward_list.tcc
> >>> (_Fwd_list_base::_M_erase_after):
> >>>  Likewise. Return immediately if argument is invalid.
> >>>  * include/bits/stl_list.h (list::front, list::back): Add
> >>>  non-empty assertions.
> >>>  (list::_M_erase): Likewise. Return immediately if argument is
> >>>  invalid.
> >>> ---
> >>>
> >>> Tested x86_64-linux.
> >>>
> >>> As pull request: https://forge.sourceware.org/gcc/gcc-TEST/pulls/26
> >>>
> >>>   libstdc++-v3/include/bits/forward_list.h   |  3 +++
> >>>   libstdc++-v3/include/bits/forward_list.tcc |  6 ++
> >>>   libstdc++-v3/include/bits/stl_list.h   | 19 +--
> >>>   3 files changed, 26 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/bits/forward_list.h
> >>> b/libstdc++-v3/include/bits/forward_list.h
> >>> index c9238cef96f..3fac657518c 100644
> >>> --- a/libstdc++-v3/include/bits/forward_list.h
> >>> +++ b/libstdc++-v3/include/bits/forward_list.h
> >>> @@ -42,6 +42,7 @@
> >>>   #include 
> >>>   #include 
> >>>   #include 
> >>> +#include 
> >>>   #if __glibcxx_ranges_to_container // C++ >= 23
> >>>   # include  // ranges::begin, ranges::distance
> >>> etc.
> >>>   # include  // ranges::subrange
> >>> @@ -884,6 +885,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >>> reference
> >>> front()
> >>> {
> >>> +   __glibcxx_requires_nonempty();
> >>>  _Node* __front =
> >>> static_cast<_Node*>(this->_M_impl._M_head._M_next);
> >>>  return *__front->_M_valptr();
> >>> }
> >>> @@ -896,6 +898,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >>> const_reference
> >>> front() const
> >>> {
> >>> +   __glibcxx_requires_nonempty();
> >>>  _Node* __front =
> >>> static_cast<_Node*>(this->_M_impl._M_head._M_next);
> >>>  return *__front->_M_valptr();
> >>> }
> >>> diff --git a/libstdc++-v3/include/bits/forward_list.tcc
> >>> b/libstdc++-v3/include/bits/forward_list.tcc
> >>> index 9750c7c0502..50acdb9f26b 100644
> >>> --- a/libstdc++-v3/include/bits/forward_list.tcc
> >>> +++ b/libstdc++-v3/include/bits/forward_list.tcc
> >>> @@ -63,6 +63,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >>>   _Fwd_list_base<_Tp, _Alloc>::
> >>>   _M_erase_after(_Fwd_list_node_base* __pos)
> >>>   {
> >>> +  if (__pos == nullptr || __pos->_M_next == nullptr)
> >>> [[__unlikely__]]
> >>> +   {
> >>> + __glibcxx_assert(__pos != nullptr && __pos->_M_next !=
> >>> nullptr);
> >>> + return nullptr;
> >>> +   }
> >>> +
> >>> _Node* __curr = static_cast<_Node*>(__pos->_M_next);
> >>> __pos->_M_next = __curr->_M_next;
> >>> _Node_alloc_traits::destroy(_M_get_Node_allocator(),
> >>> diff --git a/libstdc++-v3/include/bits/stl_list.h
> >>> b/libstdc++-v3/include/bits/stl_list.h
> >>> index 7deb04b4bfe..d70ba90b8fa 100644
> >>> --- a/libstdc++-v3/include/bits/stl_list.h
> >>> +++ b/libstdc++-v3/include/bits/stl_list.h
> >>> @@ -59,6 +59,7 @@
> >>>
> >>>   #include 
> >>>   #include 
> >>> +#include 
> >>>   #if __cplusplus >= 201103L
> >>>   #include 
> >>>   #include 
> >>> @@ -1249,7 +1250,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> >>> _GLIBCXX_NODISCARD
> >>> reference
> >>> front() _GLIBCXX_NOEXCEPT
> >>> -  { return *begin(); }
> >>> +  {
> >>> +   __glibcxx_requires_nonempty();
> >>> +   return *begin();
> >>> +  }
> >>>
> >>> /**
> >>>  *  Returns a read-only (constant) reference to the data at
> >>> the first
> >>> @@ -1258,7 +1262,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> >>> _GLIBCXX_NODISCARD
> >>> const_reference
> >>> front() const _GLIBCXX_NOEXCEPT
> >>> -  { return *begin(); }
> >>> +  {
> >>> +   __glibcxx_requires_nonempty();
> >>> +   return *begin();
> >>> +  }
> >>>
> >>> /**
> >>>  *  Returns a read/write reference to the data at the last
> >>> element
>

Re: [COMMITED] [lto] ipcp don't propagate where not needed

2024-11-27 Thread Martin Jambor
On Wed, Nov 06 2024, Michal Jires wrote:
> On Wed, 2024-11-06 at 17:33:50 +, Jonathan Wakely wrote:
>> 
>> If there's going to be a constructor then it should initialize the members.
>> 
>> Otherwise, your original patch was better, because you could write
>> this to get an all-zeros object:
>> 
>>   lto_encoder_entry e{};
>> 
>> Now you can't safely initialize it, because the default constructor
>> leaves everything indeterminate. That's just a bug waiting to happen.
>> 
>
> Using all-zeros would be probably bug anyway and explicitly initializing
> might encourage thinking that such default values are supposed to be
> used.
>
> Anyway, I have misglanced the code for which this was needed, and we can
> trivially get rid of it.
>
> Is this now OK?
>

The patch is OK (this should still fall under the "callgraph" category
and in any case I think the change is rather obvious.)

You need to commit this with a ChangeLog entry, though.

Thanks,

Martin


> ---
>  gcc/lto-cgraph.cc  | 3 +--
>  gcc/lto-streamer.h | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
> index b18d2b34e46..c9b846a04d6 100644
> --- a/gcc/lto-cgraph.cc
> +++ b/gcc/lto-cgraph.cc
> @@ -142,7 +142,6 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
> encoder,
>   symtab_node *node)
>  {
>int index;
> -  lto_encoder_entry last_node;
>  
>size_t *slot = encoder->map->get (node);
>if (slot == NULL || !*slot)
> @@ -153,7 +152,7 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t 
> encoder,
>  
>/* Remove from vector. We do this by swapping node with the last element
>   of the vector.  */
> -  last_node = encoder->nodes.pop ();
> +  lto_encoder_entry last_node = encoder->nodes.pop ();
>if (last_node.node != node)
>  {
>bool existed = encoder->map->put (last_node.node, index + 1);
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 1c416a7a1b9..294e7b3e328 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -443,8 +443,7 @@ struct lto_stats_d
>  /* Entry of LTO symtab encoder.  */
>  struct lto_encoder_entry
>  {
> -  /* Constructors.  */
> -  lto_encoder_entry () {}
> +  /* Constructor.  */
>lto_encoder_entry (symtab_node* n)
>  : node (n), in_partition (false), body (false), only_for_inlining (true),
>initializer (false)
> -- 
> 2.47.0


Re: [PATCH] builtins: Emit __sync_lock_release_{8,16} call as last resort instead of doing nothing [PR117642]

2024-11-27 Thread Richard Biener
On Wed, 27 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcases show, in case of multi-word
> __sync_lock_test_and_set where we don't actually support atomics for that
> size (__int128 for x86_64 lp64 with -mno-cx16, long long for ia32 with
> -march=i{3,4}86), as the last fallback if we don't know anything else
> we just emit calls to __sync_lock_test_and_set_{8,16}.  Those aren't defined
> in libatomic, but perhaps users could define them themselves.
> While __sync_lock_release if it gives up and has no way to emit the atomic
> store just does nothing at all, so no clear sign to the users something
> went wrong and that the code will not do what they expected.
> This regressed when __atomic_* support has been introduced, previously we
> would just emit those calls even in this case.
> 
> The patch just emits the call as the last fallback.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-11-27  Jakub Jelinek  
> 
>   PR target/117642
>   * builtins.cc (expand_builtin_sync_lock_release): Change return type
>   from void to rtx, return result of expand_atomic_store.
>   (expand_builtin) : If
>   expand_builtin_sync_lock_release returns NULL, do a break rather
>   than return const0_rtx.
> 
>   * gcc.target/i386/pr117642-1.c: New test.
>   * gcc.target/i386/pr117642-2.c: New test.
> 
> --- gcc/builtins.cc.jj2024-11-26 09:46:39.513228477 +0100
> +++ gcc/builtins.cc   2024-11-26 15:07:16.236123486 +0100
> @@ -6587,7 +6587,7 @@ expand_builtin_sync_lock_test_and_set (m
>  
>  /* Expand the __sync_lock_release intrinsic.  EXP is the CALL_EXPR.  */
>  
> -static void
> +static rtx
>  expand_builtin_sync_lock_release (machine_mode mode, tree exp)
>  {
>rtx mem;
> @@ -6595,7 +6595,7 @@ expand_builtin_sync_lock_release (machin
>/* Expand the operands.  */
>mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
>  
> -  expand_atomic_store (mem, const0_rtx, MEMMODEL_SYNC_RELEASE, true);
> +  return expand_atomic_store (mem, const0_rtx, MEMMODEL_SYNC_RELEASE, true);
>  }
>  
>  /* Given an integer representing an ``enum memmodel'', verify its
> @@ -8605,8 +8605,9 @@ expand_builtin (tree exp, rtx target, rt
>  case BUILT_IN_SYNC_LOCK_RELEASE_8:
>  case BUILT_IN_SYNC_LOCK_RELEASE_16:
>mode = get_builtin_sync_mode (fcode - BUILT_IN_SYNC_LOCK_RELEASE_1);
> -  expand_builtin_sync_lock_release (mode, exp);
> -  return const0_rtx;
> +  if (expand_builtin_sync_lock_release (mode, exp))
> + return const0_rtx;
> +  break;
>  
>  case BUILT_IN_SYNC_SYNCHRONIZE:
>expand_builtin_sync_synchronize ();
> --- gcc/testsuite/gcc.target/i386/pr117642-1.c.jj 2024-11-26 
> 15:18:07.270066467 +0100
> +++ gcc/testsuite/gcc.target/i386/pr117642-1.c2024-11-26 
> 15:19:20.467048094 +0100
> @@ -0,0 +1,19 @@
> +/* PR target/117642 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-mno-cx16" } */
> +/* { dg-final { scan-assembler "__sync_lock_test_and_set_16" } } */
> +/* { dg-final { scan-assembler "__sync_lock_release_16" } } */
> +
> +__int128 t = 1;
> +
> +void
> +foo (void)
> +{
> +  __sync_lock_test_and_set (&t, 1);
> +}
> +
> +void
> +bar (void)
> +{
> +  __sync_lock_release (&t);
> +}
> --- gcc/testsuite/gcc.target/i386/pr117642-2.c.jj 2024-11-26 
> 15:19:33.174871291 +0100
> +++ gcc/testsuite/gcc.target/i386/pr117642-2.c2024-11-26 
> 15:20:04.092441144 +0100
> @@ -0,0 +1,19 @@
> +/* PR target/117642 */
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-march=i486" } */
> +/* { dg-final { scan-assembler "__sync_lock_test_and_set_8" } } */
> +/* { dg-final { scan-assembler "__sync_lock_release_8" } } */
> +
> +long long t = 1;
> +
> +void
> +foo (void)
> +{
> +  __sync_lock_test_and_set (&t, 1);
> +}
> +
> +void
> +bar (void)
> +{
> +  __sync_lock_release (&t);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx

2024-11-27 Thread Robin Dapp
> This patch would like to combine the vec_duplicate + vadd.vv to the
> vadd.vx.  From example as below:

I think we concluded a while ago that we don't want this turned on universally.
For the example/tests you provide it will be a de-optimization on any uarch
that has non-zero GPR -> VR latency.

So at least we need to define RTL costs for the combined variant and make them
depend on the VR <-> GPR costs (so we don't do this if the latency/cost is >
0).

Does the optimization happen in combine or late-combine BTW?  I thought
late-combine because we need to look through the unary op (vec_duplicate).

-- 
Regards
 Robin



Re: [PATCH] __builtin_prefetch fixes [PR117608]

2024-11-27 Thread Richard Biener
On Wed, 27 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> The r15-4833-ge9ab41b79933 patch had among tons of config/i386
> specific changes also important change to the generic code, allowing
> also 2 as valid value of the second argument of __builtin_prefetch:
> -  /* Argument 1 must be either zero or one.  */  
>   
>   
> -  if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
>   
>   
> +  /* Argument 1 must be 0, 1 or 2.  */   
>   
>   
> +  if (INTVAL (op1) < 0 || INTVAL (op1) > 2)  
>   
>   
> 
> But the patch failed to document that change in __builtin_prefetch
> documentation, and more importantly didn't adjust any of the other
> backends to deal with it (my understanding is the expected behavior
> is that 2 will be silently handled as 0 unless backends have some
> more specific way).  Some of the backends would ICE on it, in some
> cases gcc_assert failures/gcc_unreachable, in other cases crash later
> (e.g. accessing arrays with that value as index and due to accessing
> garbage after the array crashing at final.cc time), others treated 2
> silently as 0, others treated 2 silently as 1.
> 
> And even in the i386 backend there were bugs which caused ICEs.
> The patch added some if (write == 0) and write 2 handling into
> a (badly indented, maybe that is the reason, if (write == 1) body),
> rather than into the else side, so it would be always false.
> 
> The new *prefetch_rst2 define_insn only accepts parameters 2 1
> (i.e. read-shared with moderate degree of locality), so in order
> not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
> or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
> of the parameter.  If that isn't what we want and we want it to be used
> also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
> corresponding __builtin_ia32_prefetch, maybe the define_insn could match
> other values.
> And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
> would ICE on most of the prefetches, so I had to add the FAIL; cases.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, please leave other (target maintainers) time to comment.

Richard.

> 2024-11-27  Jakub Jelinek  
> 
>   PR target/117608
>   * doc/extend.texi (__builtin_prefetch): Document that second
>   argument may be also 2 and its meaning.
>   * config/i386/i386.md (prefetch): Remove unreachable code.
>   Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
>   of locality is not 1.  Formatting fixes.
>   * config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
>   Call gen_prefetch even for TARGET_MOVRS.
>   * config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
>   * config/mips/mips.md (prefetch): Likewise.
>   * config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
>   * config/riscv/riscv.md (prefetch): Likewise.
>   * config/loongarch/loongarch.md (prefetch): Likewise.
>   * config/sparc/sparc.md (prefetch): Likewise.  Use IN_RANGE.
>   * config/ia64/ia64.md (prefetch): Likewise.
>   * config/pa/pa.md (prefetch): Likewise.
>   * config/aarch64/aarch64.md (prefetch): Likewise.
>   * config/rs6000/rs6000.md (prefetch): Likewise.
> 
>   * gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
>   2.
>   * gcc.target/i386/pr117608-1.c: New test.
>   * gcc.target/i386/pr117608-2.c: New test.
> 
> --- gcc/doc/extend.texi.jj2024-11-26 09:37:56.173574966 +0100
> +++ gcc/doc/extend.texi   2024-11-26 17:49:35.469152396 +0100
> @@ -15675,9 +15675,11 @@ be in the cache by the time it is access
>  
>  The value of @var{addr} is the address of the memory to prefetch.
>  There are two optional arguments, @var{rw} and @var{locality}.
> -The value of @var{rw} is a compile-time constant one or zero; one
> -means that the prefetch is preparing for a write to the memory address
> -and zero, the default, means that the prefetch is preparing for a read.
> +The value of @var{rw} is a compile-time constant zero, one or two; one
> +means that the prefetch is preparing for a write to the memory address,
> +two means that the prefetch is preparing for a shared read (expected to be
> +read by at least one other processor before it is written if written at
> +all) and zero, the default, means that the prefetch is preparing for a read.
>  The value @var{locality} must be a compile-time constant integer betwe

[committed] libstdc++: Simplify std::list assignment using 'if constexpr'

2024-11-27 Thread Jonathan Wakely
Use diagnostic pragmas to allow using `if constexpr` in C++11 mode, so
that we don't need to use tag dispatching.

The _M_move_assign overloads that were previously used for tag
dispatching are no longer used, but are retained here (at least for the
default config) so that an explicit instantiation will still define
those members. This ensures that old code which expects an explicit
instantiation in some other translation unit will still link. I'm not
sure if that's really needed, we should probably have a policy about
whether we support explicit instantiations where the declaration and
definition use different versions of the headers.

libstdc++-v3/ChangeLog:

* include/bits/stl_list.h (operator=(list&&)): Use if constexpr
instead of dispatching to _M_move_assign.
(_M_move_assign): Do not define for versioned namespace.
---

Tested x86_64-linux. Pushed to trunk.

I don't know if we really need to keep the unused functions around. We
should decide and document whether or not we support explicit
instantiations across releases.

 libstdc++-v3/include/bits/stl_list.h | 29 ++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index 7deb04b4bfe..3f92de42996 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -935,6 +935,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   operator=(const list& __x);
 
 #if __cplusplus >= 201103L
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   /**
*  @brief  %List move assignment operator.
*  @param  __x  A %list of identical element and allocator types.
@@ -952,9 +954,29 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
constexpr bool __move_storage =
  _Node_alloc_traits::_S_propagate_on_move_assign()
  || _Node_alloc_traits::_S_always_equal();
-   _M_move_assign(std::move(__x), __bool_constant<__move_storage>());
+   if constexpr (!__move_storage)
+ {
+   if (__x._M_get_Node_allocator() != this->_M_get_Node_allocator())
+ {
+   // The rvalue's allocator cannot be moved, or is not equal,
+   // so we need to individually move each element.
+   _M_assign_dispatch(std::make_move_iterator(__x.begin()),
+  std::make_move_iterator(__x.end()),
+  __false_type{});
+   return *this;
+ }
+ }
+
+   this->clear();
+   this->_M_move_nodes(std::move(__x));
+
+   if constexpr (_Node_alloc_traits::_S_propagate_on_move_assign())
+ this->_M_get_Node_allocator()
+ = std::move(__x._M_get_Node_allocator());
+
return *this;
   }
+#pragma GCC diagnostic pop
 
   /**
*  @brief  %List initializer list assignment operator.
@@ -2156,7 +2178,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   const_iterator
   _M_resize_pos(size_type& __new_size) const;
 
-#if __cplusplus >= 201103L
+#if __cplusplus >= 201103L && ! _GLIBCXX_INLINE_VERSION
+  // XXX GLIBCXX_ABI Deprecated
+  // These are unused and only kept so that explicit instantiations will
+  // continue to define the symbols.
   void
   _M_move_assign(list&& __x, true_type) noexcept
   {
-- 
2.47.0



[committed] libstdc++: Fix unsigned wraparound in codecvt::do_length [PR105857]

2024-11-27 Thread Jonathan Wakely
When the max argument to std::codecvt::length
is SIZE_MAX/4+1 or greater the multiplication with sizeof(wchar_t) will
wrap to a small value, and the alloca call will have a buffer that's
smaller than requested. The call to mbsnrtowcs then has a buffer that is
smaller than the value passed as the buffer length. When libstdc++.so is
built with -D_FORTIFY_SOURCE=3 the mismatched buffer and length will get
detected and will abort inside Glibc.

When it doesn't abort, there's no buffer overflow because Glibc's
mbsnrtowcs has the same len * sizeof(wchar_t) calculation to determine
the size of the buffer in bytes, and that will wrap to the same small
number as the alloca argument. So luckily Glibc agrees with the caller
about the real size of the buffer, and won't overflow it.

Even when the max argument isn't large enough to wrap, it can still be
much too large to safely pass to alloca, so we should limit that. We
already have a loop that processes chunks so that we can handle null
characters in the middle of the input. If we limit the alloca buffer to
4kB then we'll just loop each time that buffer is filled.

libstdc++-v3/ChangeLog:

PR libstdc++/105857
* config/locale/dragonfly/codecvt_members.cc (do_length): Limit
size of alloca buffer to 4k.
* config/locale/gnu/codecvt_members.cc (do_length): Likewise.
* testsuite/22_locale/codecvt/length/wchar_t/105857.cc: New
test.
---

Tested x86_64-linux. Pushed to trunk, backports to follow.

 .../locale/dragonfly/codecvt_members.cc   |  9 +---
 .../config/locale/gnu/codecvt_members.cc  |  9 +---
 .../codecvt/length/wchar_t/105857.cc  | 21 +++
 3 files changed, 33 insertions(+), 6 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/22_locale/codecvt/length/wchar_t/105857.cc

diff --git a/libstdc++-v3/config/locale/dragonfly/codecvt_members.cc 
b/libstdc++-v3/config/locale/dragonfly/codecvt_members.cc
index f84143a4d58..188d8db14a8 100644
--- a/libstdc++-v3/config/locale/dragonfly/codecvt_members.cc
+++ b/libstdc++-v3/config/locale/dragonfly/codecvt_members.cc
@@ -226,12 +226,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 // mbsnrtowcs is *very* fast but stops if encounters NUL characters:
 // in case we advance past it and then continue, in a loop.
-// NB: mbsnrtowcs is a GNU extension
+// NB: mbsnrtowcs is in POSIX.1-2008
+
+const size_t __to_len = 1024; // Size of alloca'd output buffer
 
 // A dummy internal buffer is needed in order for mbsnrtocws to consider
 // its fourth parameter (it wouldn't with NULL as first parameter).
 wchar_t* __to = static_cast(__builtin_alloca(sizeof(wchar_t)
-  * __max));
+  * __to_len));
 while (__from < __end && __max)
   {
const extern_type* __from_chunk_end;
@@ -244,7 +246,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
const extern_type* __tmp_from = __from;
size_t __conv = mbsnrtowcs(__to, &__from,
   __from_chunk_end - __from,
-  __max, &__state);
+  __max > __to_len ? __to_len : __max,
+  &__state);
if (__conv == static_cast(-1))
  {
// In case of error, in order to stop at the exact place we
diff --git a/libstdc++-v3/config/locale/gnu/codecvt_members.cc 
b/libstdc++-v3/config/locale/gnu/codecvt_members.cc
index 794f25a5f35..e0c9330fbdd 100644
--- a/libstdc++-v3/config/locale/gnu/codecvt_members.cc
+++ b/libstdc++-v3/config/locale/gnu/codecvt_members.cc
@@ -230,12 +230,14 @@ namespace
 
 // mbsnrtowcs is *very* fast but stops if encounters NUL characters:
 // in case we advance past it and then continue, in a loop.
-// NB: mbsnrtowcs is a GNU extension
+// NB: mbsnrtowcs is in POSIX.1-2008
+
+const size_t __to_len = 1024; // Size of alloca'd output buffer
 
 // A dummy internal buffer is needed in order for mbsnrtocws to consider
 // its fourth parameter (it wouldn't with NULL as first parameter).
 wchar_t* __to = static_cast(__builtin_alloca(sizeof(wchar_t)
-  * __max));
+  * __to_len));
 while (__from < __end && __max)
   {
const extern_type* __from_chunk_end;
@@ -248,7 +250,8 @@ namespace
const extern_type* __tmp_from = __from;
size_t __conv = mbsnrtowcs(__to, &__from,
   __from_chunk_end - __from,
-  __max, &__state);
+  __max > __to_len ? __to_len : __max,
+  &__state);
if (__conv == static_cast(-1))
  {
// In case of error, in order to stop at the exact place we
diff --git a/libstdc++-v3/testsui

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx

2024-11-27 Thread pan2 . li
From: Pan Li 

This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx.  From example as below:

  #define DEF_VX_BINARY(T, OP)\
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = in[i] OP x;\
  }

  DEF_VX_BINARY(int32_t, +)

Before this patch:
  10   │ test_binary_vx_add:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma // eliminated
  13   │ vmv.v.x v2,a2// Ditto.
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vadd.vv v1,v2,v1
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_binary_vx_add:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vadd.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*_vx_): Add new
combine to convert vec_duplicate + vadd.vv to vaddvx.
* config/riscv/vector-iterators.md: Add new iterator for vx.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec-opt.md  | 22 ++
 gcc/config/riscv/vector-iterators.md |  4 
 2 files changed, 26 insertions(+)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 4b33a145c17..6bc0388a087 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1611,3 +1611,25 @@ (define_insn_and_split "*vandn_"
 DONE;
   }
   [(set_attr "type" "vandn")])
+
+;; 
=
+;; Combine vec_duplicate + op.vv to op.vx
+;; Include
+;; - vadd.vx
+;; 
=
+(define_insn_and_split "*_vx_"
+ [(set (match_operand:V_VLSI0 "register_operand")
+   (any_int_binop_no_shift_vx:V_VLSI
+(vec_duplicate:V_VLSI
+  (match_operand: 1 "register_operand"))
+(match_operand:V_VLSI  2 "")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[2], operands[1]};
+riscv_vector::emit_vlmax_insn (code_for_pred_scalar (, mode),
+  riscv_vector::BINARY_OP, ops);
+  }
+  [(set_attr "type" "vialu")])
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 92cb651ce49..80c184e297b 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3987,6 +3987,10 @@ (define_code_iterator any_int_binop_no_shift
  [plus minus and ior xor smax umax smin umin mult div udiv mod umod
 ])
 
+(define_code_iterator any_int_binop_no_shift_vx
+ [plus
+])
+
 (define_code_iterator any_sat_int_binop [ss_plus ss_minus us_plus us_minus])
 (define_code_iterator sat_int_plus_binop [ss_plus us_plus])
 (define_code_iterator sat_int_minus_binop [ss_minus us_minus])
-- 
2.43.0



Re: [PATCH] libstdc++: Add debug assertions to std::list and std::forward_list

2024-11-27 Thread Jonathan Wakely
On Mon, 18 Nov 2024 at 19:10, Marc Glisse  wrote:
>
> On Sat, 16 Nov 2024, Jonathan Wakely wrote:
>
> >>void
> >>_M_erase(iterator __position) _GLIBCXX_NOEXCEPT
> >>{
> >> +   if (__builtin_expect(empty(), 0))
> >> + {
> >> +   __glibcxx_requires_nonempty();
> >> +   return;
> >> + }
> >
> > Hmm, I'm having second thoughts about the "return without doing
> > anything part now.
> > For this simple test:
> >
> > #include 
> >
> > int main()
> > {
> > std::list l;
> > l.erase(l.begin());
> > }
> >
> > Currently it crashes (bad), but with -O1 there's a nice warning:
> >
> > /usr/include/c++/14/bits/new_allocator.h:172:33: warning: ‘void
> > operator delete(void*, std::size_t)’ called on unallocated object ‘l’
> > [-Wfree-nonheap-object]
> >
> > And Asan can diagnose it too.
> >
> > Adding an assertion is definitely an improvement, as it avoids the
> > crash . But returning when the assertion is disabled, so that the
> > function is a no-op, means that the warning about freeing a null
> > pointer goes away, because the compiler can see it's never reached.
> > And now Asan can't diagnose it.
> >
> > I think on balance, making it a no-op and avoiding arbitrary UB is
> > better. The warning is only possible in trivial cases where the
> > compiler can see the pointer is definitely null. In more realistic
> > code, there will be no warning, and UB, so turning it into a silent
> > no-op does seem safer. If you want to detect the bug, enable
> > assertions.
> >
> > What do others think? Better to add the assertion but leave the UB
> > present when assertions are disabled, or add the assertion and
> > silently remove the UB when assertions are enabled?
>
> This all seems related to the current flamewar^Wdiscussion on the
> C++ reflectors. A precondition is violated, and the question is what we
> should do about it.
>
> The ASAN regression looks bad. With special code protected by the relevant
> macros if necessary, it would be good if it still gave a diagnostic.

Yes, that should be doable, and preferable to silently ignoring the bug.

> As a personal opinion, I am not convinced that no-op is a good default
> behavior. I am fine with UB if I am not in a debug or hardened mode. But
> *if* we are going to the trouble of testing the precondition, stopping the
> program (trap) or raising an exception seems less surprising. Ignoring the
> operation and continuing, which can have unpredictable behavior since the
> logic of the program has failed already at this point, looks like
> something that should only happen if the user explicitly asked for a
> no-fail mode that tries its best to continue even when asked to execute
> nonsense.

Some users definitely want no-fail modes, but I like your suggestion
of requiring explicit opt-in.

Ideally we'd have a single syntax that works for all cases, something like:

if (!__precondition(pos != end())
  return;

In hardened modes __precondition would either return true or
abort/trap. In no-fail mode it would check and return false if the
precondition is violated.

But I think we'd need other forms too, e.g. "Asan is enabled and will
detect this one anyway so just return true without checking". That
wouldn't be applicable for all precondition checks, so we'd need to
pass flags to the __precondition function, or have more than one
function. That needs more investigation and is not going to happen for
GCC 15.

> (things would obviously be different if the standard standardized the
> no-op behavior on these specific functions)

I'm actually thinking about writing a proposal to do exactly that, in
select places. There are some preconditions that could be changed from
"get this right, or you have UB" to "erroneously returns with no
effects" or similar wording. The C++26 notion of erroneous behaviour
means that we are allowed to diagnose (abort, trap, call a violation
handler etc) in checking modes with assertions enabled, but if we
don't diagnose it then the behaviour has to be well-defined. So no UB.

I think some of the standard library's preconditions could be turned
from UB to EB with very little loss of performance.

See also https://isocpp.org/files/papers/P3471R1.html for another
approach to making precondition checks more reliable. I am very
strongly in favour of the P3471 direction.



  1   2   >