date:20250529

On Thu, May 29, 2025 at 8:06 AM Kito Cheng  wrote:
>
> `--enable-default-pie` is an option to specify whether to enable
> position-independent executables by default for `target`.
>
> However c++tools is build for `host`, so it should just follow
> `--enable-host-pie` option to determine whether to build with
> position-independent executables or not.
>
> NOTE:
>
> I checked PR 98324 and build with same configure option
> (`--enable-default-pie` and lto bootstrap) on x86-64 linux to make sure
> it won't cause same problem.

Makes sense to me, thus OK if nobody objects over the weekend.

Richard.

> c++tools/ChangeLog:
>
> * configure.ac: Don't check `--enable-default-pie`.
> * configure: Regen.
> ---
>  c++tools/configure| 11 ---
>  c++tools/configure.ac |  6 --
>  2 files changed, 17 deletions(-)
>
> diff --git a/c++tools/configure b/c++tools/configure
> index 1353479beca..6df4a2f0dfa 100755
> --- a/c++tools/configure
> +++ b/c++tools/configure
> @@ -700,7 +700,6 @@ enable_option_checking
>  enable_c___tools
>  enable_maintainer_mode
>  enable_checking
> -enable_default_pie
>  enable_host_pie
>  enable_host_bind_now
>  with_gcc_major_version_only
> @@ -1335,7 +1334,6 @@ Optional Features:
>enable expensive run-time checks. With LIST, enable
>only specific categories of checks. Categories are:
>yes,no,all,none,release.
> -  --enable-default-pieenable Position Independent Executable as default
>--enable-host-pie   build host code as PIE
>--enable-host-bind-now  link host code as BIND_NOW
>
> @@ -2946,15 +2944,6 @@ $as_echo "#define ENABLE_ASSERT_CHECKING 1" 
> >>confdefs.h
>
>  fi
>
> -# Check whether --enable-default-pie was given.
> -# Check whether --enable-default-pie was given.
> -if test "${enable_default_pie+set}" = set; then :
> -  enableval=$enable_default_pie; PICFLAG=-fPIE
> -else
> -  PICFLAG=
> -fi
> -
> -
>  # Enable --enable-host-pie
>  # Check whether --enable-host-pie was given.
>  if test "${enable_host_pie+set}" = set; then :
> diff --git a/c++tools/configure.ac b/c++tools/configure.ac
> index db34ee678e0..8c4b72a8023 100644
> --- a/c++tools/configure.ac
> +++ b/c++tools/configure.ac
> @@ -97,12 +97,6 @@ if test x$ac_assert_checking != x ; then
>  [Define if you want assertions enabled.  This is a cheap check.])
>  fi
>
> -# Check whether --enable-default-pie was given.
> -AC_ARG_ENABLE(default-pie,
> -[AS_HELP_STRING([--enable-default-pie],
> - [enable Position Independent Executable as default])],
> -[PICFLAG=-fPIE], [PICFLAG=])
> -
>  # Enable --enable-host-pie
>  AC_ARG_ENABLE(host-pie,
>  [AS_HELP_STRING([--enable-host-pie],
> --
> 2.34.1
>

[PATCH v1 3/3] RISC-V: Add test cases for avg_ceil vaadd implementation

From: Pan Li 

Add asm and run testcase for avg_ceil vaadd implementation.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/avg.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/avg_data.h: Add test data for
avg_ceil.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i32-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i16.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i32-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i16.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i64.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/avg.h|  17 ++
 .../rvv/autovec/avg_ceil-1-i16-from-i32.c |  12 ++
 .../rvv/autovec/avg_ceil-1-i16-from-i64.c |  12 ++
 .../rvv/autovec/avg_ceil-1-i32-from-i64.c |  12 ++
 .../rvv/autovec/avg_ceil-1-i8-from-i16.c  |  12 ++
 .../rvv/autovec/avg_ceil-1-i8-from-i32.c  |  12 ++
 .../rvv/autovec/avg_ceil-1-i8-from-i64.c  |  12 ++
 .../rvv/autovec/avg_ceil-run-1-i16-from-i32.c |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i16-from-i64.c |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i32-from-i64.c |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i8-from-i16.c  |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i8-from-i32.c  |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i8-from-i64.c  |  16 ++
 .../gcc.target/riscv/rvv/autovec/avg_data.h   | 176 ++
 14 files changed, 361 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i64.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
index 746c635ae57..4aeb637bba7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
@@ -20,4 +20,21 @@ test_##NAME##_##WT##_##NT##_0(NT * restrict a, NT * restrict 
b, \
 #define RUN_AVG_0_WRAP(NT, WT, NAME, a, b, out, n) \
   RUN_AVG_0(NT, WT, NAME, a, b, out, n)
 
+#define DEF_AVG_1(NT, WT, NAME) \
+__attribute__((noinline))   \
+void\
+test_##NAME##_##WT##_##NT##_1(NT * restrict a, NT * restrict b, \
+ NT * restrict out, int n) \
+{   \
+  for (int i = 0; i < n; i++) { \
+out[i] = (NT)(((WT)a[i] + (WT)b[i] + 1) >> 1);  \
+  } \
+}
+#define DEF_AVG_1_WRAP(NT, WT, NAME) DEF_AVG_1(NT, WT, NAME)
+
+#define RUN_AVG_1(NT, WT, NAME, a, b, out, n) \
+  test_##NAME##_##WT##_##NT##_1(a, b, out, n)
+#define RUN_AVG_1_WRAP(NT, WT, NAME, a, b, out, n) \
+  RUN_AVG_1(NT, WT, NAME, a, b, out, n)
+
 #endif
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c
new file mode 100644
index 000..138124c8c4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d" } */
+
+#include "avg.h"
+
+#define NT int16_t
+#defi

[PATCH v1 2/3] RISC-V: Reconcile the existing test for avg_ceil

From: Pan Li 

Some existing avg_floor test need updated due to change to
leverage vaadd.vv directly.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/avg-4.c: Update asm check
to vaadd.
* gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-4.c  | 6 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-5.c  | 6 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-6.c  | 6 ++
 .../gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c| 2 +-
 5 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-4.c
index 8d106aaeed0..986a0ff21cf 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-4.c
@@ -25,11 +25,9 @@ DEF_AVG_CEIL (uint8_t, uint16_t, 512)
 DEF_AVG_CEIL (uint8_t, uint16_t, 1024)
 DEF_AVG_CEIL (uint8_t, uint16_t, 2048)
 
-/* { dg-final { scan-assembler-times {vwadd\.vv} 10 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 10 } } */
-/* { dg-final { scan-assembler-times {vnsra\.wi} 10 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 20 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 10 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 10 } } */
-/* { dg-final { scan-assembler-times {vadd\.vi} 10 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-5.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-5.c
index 981abd51588..c450f80291a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-5.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-5.c
@@ -23,11 +23,9 @@ DEF_AVG_CEIL (uint16_t, uint32_t, 256)
 DEF_AVG_CEIL (uint16_t, uint32_t, 512)
 DEF_AVG_CEIL (uint16_t, uint32_t, 1024)
 
-/* { dg-final { scan-assembler-times {vwadd\.vv} 9 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 9 } } */
-/* { dg-final { scan-assembler-times {vnsra\.wi} 9 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 18 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 9 } } */
-/* { dg-final { scan-assembler-times {vadd\.vi} 9 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 9 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-6.c
index bfe4ba3c4bd..3473e193a5c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-6.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/avg-6.c
@@ -21,11 +21,9 @@ DEF_AVG_CEIL (uint16_t, uint32_t, 128)
 DEF_AVG_CEIL (uint16_t, uint32_t, 256)
 DEF_AVG_CEIL (uint16_t, uint32_t, 512)
 
-/* { dg-final { scan-assembler-times {vwadd\.vv} 8 } } */
-/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 8 } } */
-/* { dg-final { scan-assembler-times {vnsra\.wi} 8 } } */
+/* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*0} 16 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 8 } } */
-/* { dg-final { scan-assembler-times {vadd\.vi} 8 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 8 } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
 /* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c
index b7246a38dba..a5224e78d94 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c
@@ -5,4 +5,4 @@
 
 /* { dg-final { scan-assembler-times {csrwi\s*vxrm,\s*2} 6 } } */
 /* { dg-final { scan-assembler-times {vaaddu\.vv} 6 } } */
-/* { dg-final { scan-assembler-times {vaadd\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {vaadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c
index 3ffe0ef39ee..32446ae3c23 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c
@@ -5,4 +5,4 @@
 
 /* { dg-final { scan-assembler-times {csrwi\s*vxrm,

[PATCH v1 1/3] RISC-V: Leverage vaadd.vv for signed standard name avg_ceil

From: Pan Li 

The avg_ceil has the rounding mode towards +inf, while the
vaadd.vv has the rnu which totally match the sematics.  From
RVV spec, the fixed vaadd.vv with rnu,

roundoff_signed(v, d) = (signed(v) >> d) + r
r = v[d - 1]

For vaadd, d = 1, then we have

roundoff_signed(v, 1) = (signed(v) >> 1) + v[0]

If v[0] is bit 0, nothing need to do as there is no rounding.
If v[0] is bit 1, there will be rounding with 2 cases.

Case 1: v is positive.
  roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf
  roundoff_signed(2 + 3, 1) = (5 >> 1) + 1 = 3

Case 2: v is negative.
  roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf
  roundoff_signed(-9 + 2, 1) = (-7 >> 1) + 1 = -4 + 1 = -3

Thus, we can leverage the vaadd with rnu directly for avg_ceil.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (avg3_ceil): Add insn
expand to leverage vaadd with rnu directly.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a54f552a80c..5ac7b62c2cf 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2510,25 +2510,12 @@ (define_expand "avg3_ceil"
(match_operand: 2 "register_operand")))
   (const_int 1)]
   "TARGET_VECTOR"
-{
-  /* First emit a widening addition.  */
-  rtx tmp1 = gen_reg_rtx (mode);
-  rtx ops1[] = {tmp1, operands[1], operands[2]};
-  insn_code icode = code_for_pred_dual_widen (PLUS, SIGN_EXTEND, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops1);
-
-  /* Then add 1.  */
-  rtx tmp2 = gen_reg_rtx (mode);
-  rtx ops2[] = {tmp2, tmp1, const1_rtx};
-  icode = code_for_pred_scalar (PLUS, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops2);
-
-  /* Finally, a narrowing shift.  */
-  rtx ops3[] = {operands[0], tmp2, const1_rtx};
-  icode = code_for_pred_narrow_scalar (ASHIFTRT, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3);
-  DONE;
-})
+  {
+insn_code icode = code_for_pred (UNSPEC_VAADD, mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP_VXRM_RNU, 
operands);
+DONE;
+  }
+)
 
 ;; csrwi vxrm, 2
 ;; vaaddu.vv vd, vs2, vs1
-- 
2.43.0

[PATCH v1 0/3] Refine the avg_ceil with fixed point vaadd

From: Pan Li 

Similar to the avg_floor, the avg_ceil has the rounding mode
towards +inf, while the vaadd.vv has the rnu which totally match
the sematics.  From RVV spec, the fixed vaadd.vv with rnu,

roundoff_signed(v, d) = (signed(v) >> d) + r
r = v[d - 1]

For vaadd, d = 1, then we have

roundoff_signed(v, 1) = (signed(v) >> 1) + v[0]

If v[0] is bit 0, nothing need to do as there is no rounding.
If v[0] is bit 1, there will be rounding with 2 cases.

Case 1: v is positive.
  roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf
  roundoff_signed(2 + 3, 1) = (5 >> 1) + 1 = 3

Case 2: v is negative.
  roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf
  roundoff_signed(-9 + 2, 1) = (-7 >> 1) + 1 = -4 + 1 = -3

Thus, we can leverage the vaadd with rnu directly for avg_ceil.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (3):
  RISC-V: Leverage vaadd.vv for signed standard name avg_ceil
  RISC-V: Reconcile the existing test for avg_ceil
  RISC-V: Add test cases for avg_ceil vaadd implementation

 gcc/config/riscv/autovec.md   |  25 +--
 .../gcc.target/riscv/rvv/autovec/avg.h|  17 ++
 .../rvv/autovec/avg_ceil-1-i16-from-i32.c |  12 ++
 .../rvv/autovec/avg_ceil-1-i16-from-i64.c |  12 ++
 .../rvv/autovec/avg_ceil-1-i32-from-i64.c |  12 ++
 .../rvv/autovec/avg_ceil-1-i8-from-i16.c  |  12 ++
 .../rvv/autovec/avg_ceil-1-i8-from-i32.c  |  12 ++
 .../rvv/autovec/avg_ceil-1-i8-from-i64.c  |  12 ++
 .../rvv/autovec/avg_ceil-run-1-i16-from-i32.c |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i16-from-i64.c |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i32-from-i64.c |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i8-from-i16.c  |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i8-from-i32.c  |  16 ++
 .../rvv/autovec/avg_ceil-run-1-i8-from-i64.c  |  16 ++
 .../gcc.target/riscv/rvv/autovec/avg_data.h   | 176 ++
 .../gcc.target/riscv/rvv/autovec/vls/avg-4.c  |   6 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-5.c  |   6 +-
 .../gcc.target/riscv/rvv/autovec/vls/avg-6.c  |   6 +-
 .../riscv/rvv/autovec/widen/vec-avg-rv32gcv.c |   2 +-
 .../riscv/rvv/autovec/widen/vec-avg-rv64gcv.c |   2 +-
 20 files changed, 375 insertions(+), 33 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-1-i8-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i32-from-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i64.c

-- 
2.43.0

Re: [PATCH] rtl-ssa: Reject non-address uses of autoinc regs [PR120347]

On Wed, May 28, 2025 at 6:55 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Thu, May 22, 2025 at 12:19 PM Richard Sandiford
> >  wrote:
> >>
> >> As the rtl.texi documentation of RTX_AUTOINC expressions says:
> >>
> >>   If a register used as the operand of these expressions is used in
> >>   another address in an insn, the original value of the register is
> >>   used.  Uses of the register outside of an address are not permitted
> >>   within the same insn as a use in an embedded side effect expression
> >>   because such insns behave differently on different machines and hence
> >>   must be treated as ambiguous and disallowed.
> >>
> >> late-combine was failing to follow this rule.  One option would have
> >> been to enforce it during the substitution phase, like combine does.
> >> This could either be a dedicated condition in the substitution code
> >> or, more generally, an extra condition in can_merge_accesses.
> >> (The latter would include extending is_pre_post_modify to uses.)
> >>
> >> However, since the restriction applies to patterns rather than to
> >> actions on patterns, the more robust fix seemed to be test and reject
> >> this case in (a subroutine of) rtl_ssa::recog.  We already do something
> >> similar for hard-coded register clobbers.
> >>
> >> Using vec_rtx_properties isn't the lightest-weight operation
> >> out there.  I did wonder about relying on the is_pre_post_modify
> >> flag of the definitions in the new_defs array, but that would
> >> require callers that create new autoincs to set the flag before
> >> calling recog.  Normally these flags are instead updated
> >> automatically based on the final pattern.
> >>
> >> Besides, recog itself has had to traverse the whole pattern,
> >> and it is even less light-weight than vec_rtx_properties.
> >> At least the pattern should be in cache.
> >>
> >> Tested on arm-linux-gnueabihf, aarch64-linux-gnu and
> >> x86_64-linux-gnu.  OK for trunk and backports?
> >
> > LGTM, note the 14 branch is currently frozen.
>
> Thanks.  It turns out that I looked at the wrong results for the
> arm-linux-gnueabihf testing :-(, and the Linaro CI flagged up a
> regression.  Although I think the rtl-ssa fix is still the right
> one, it showed up a mistake (of mine) in the rtl_properties walker:
> try_to_add_src would drop all flags except IN_NOTE before recursing
> into RTX_AUTOINC addresses.
>
> RTX_AUTOINCs only occur in addresses, and so for them, the flags coming
> into try_to_add_src are set by:
>
>   unsigned int base_flags = flags & rtx_obj_flags::STICKY_FLAGS;
>   ...
>   if (MEM_P (x))
> {
>   ...
>
>   unsigned int addr_flags = base_flags | rtx_obj_flags::IN_MEM_STORE;
>   if (flags & rtx_obj_flags::IS_READ)
> addr_flags |= rtx_obj_flags::IN_MEM_LOAD;
>   try_to_add_src (XEXP (x, 0), addr_flags);
>   return;
> }
>
> This means that the only flags that can be set are:
>
> - IN_NOTE (the sole member of STICKY_FLAGS)
> - IN_MEM_STORE
> - IN_MEM_LOAD
>
> Thus dropping all flags except IN_NOTE had the effect of dropping
> IN_MEM_STORE and IN_MEM_LOAD, and nothing else.  But those flags
> are the ones that mark something as being part of a mem address.
> The exclusion was therefore exactly wrong.
>
> So is the patch OK with the extra rtlanal.cc hunk below?  I was wondering
> whether it would count as obvious, but the length of the explanation above
> suggests not :)

Yes, the patch is OK.  The 14 branch is unfrozen, the 13 branch is frozen now.

Richard.

> Richard
>
>
> gcc/
> PR rtl-optimization/120347
> * rtlanal.cc (rtx_properties::try_to_add_src): Don't drop the
> IN_MEM_LOAD and IN_MEM_STORE flags for autoinc registers.
> * rtl-ssa/changes.cc (recog_level2): Check whether an
> RTX_AUTOINCed register also appears outside of an address.
>
> gcc/testsuite/
> PR rtl-optimization/120347
> * gcc.dg/torture/pr120347.c: New test.
> ---
>  gcc/rtl-ssa/changes.cc  | 18 ++
>  gcc/rtlanal.cc  |  2 +-
>  gcc/testsuite/gcc.dg/torture/pr120347.c | 13 +
>  3 files changed, 32 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr120347.c
>
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index eb579ad3ad7..f7aa6a66cdf 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -1106,6 +1106,24 @@ recog_level2 (insn_change &change, 
> add_regno_clobber_fn add_regno_clobber)
> }
> }
>
> +  // Per rtl.texi, registers that are modified using RTX_AUTOINC operations
> +  // cannot also appear outside an address.
> +  vec_rtx_properties properties;
> +  properties.add_pattern (pat);
> +  for (rtx_obj_reference def : properties.refs ())
> +if (def.is_pre_post_modify ())
> +  for (rtx_obj_reference use : properties.refs ())
> +   if (def.regno == use.regno && !use.in_address ())
> + {
> +   if

Re: [PATCH] expmed: Prevent non-canonical subreg generation in store_bit_field [PR118873]

On Thu, May 29, 2025 at 12:27 PM Konstantinos Eleftheriou
 wrote:
>
> Hi Richard, thanks for the response.
>
> On Mon, May 26, 2025 at 11:55 AM Richard Biener  wrote:
> >
> > On Mon, 26 May 2025, Konstantinos Eleftheriou wrote:
> >
> > > In `store_bit_field_1`, when the value to be written in the bitfield
> > > and/or the bitfield itself have vector modes, non-canonical subregs
> > > are generated, like `(subreg:V4SI (reg:V8SI x) 0)`. If one them is
> > > a scalar, this happens only when the scalar mode is different than the
> > > vector's inner mode.
> > >
> > > This patch tries to prevent this, using vec_set patterns when
> > > possible.
> >
> > I know almost nothing about this code, but why does the patch
> > fixup things after the fact rather than avoid generating the
> > SUBREG in the first place?
>
> That's what we are doing, we are trying to prevent the non-canonical
> subreg generation (it's not always possible). But, there are cases
> where these types of subregs are passed into `store_bit_field` by its
> caller, in which case we choose not to touch them.
>
> > ISTR it also (unfortunately) depends on the target which forms
> > are considered canonical.
>
> But, the way that we interpret the documentation, the
> canonicalizations are machine-independent. Is that not true? Or,
> specifically for the subregs that operate on vectors, is there any
> target that considers them canonical?
>
> > I'm also not sure you got endianess right for all possible
> > values of SUBREG_BYTE.  One more reason to not generate such
> > subreg in the first place but stick to vec_select/concat.
>
> The only way that we would generate subregs are from the calls to
> `extract_bit_field` or `store_bit_field_1` and these should handle the
> endianness. Also, these subregs wouldn't operate on vectors. Do you
> mean that something could go wrong with these calls?

I wanted to remark that endianess WRT memory order (which is
what store/extract_bit_field deal with) isn't always the same as
endianess in register order (which is what vec_concat and friends
operate on).  If we can avoid transitioning from one to the other
this will help avoid mistakes.

In general it would be more obvious (to me) if you fixed the callers
that create those subregs.

Now, I didn't want to pretend I'm reviewing the patch - so others please
do that (as said, I'm not familiar enough with the code to tell whether
it's actually correct).

Richard.

>
> Konstantinos
>
>
> > Richard.
> >
> > > Bootstrapped/regtested on AArch64 and x86_64.
> > >
> > >   PR rtl-optimization/118873
> > >
> > > gcc/ChangeLog:
> > >
> > >   * expmed.cc (generate_vec_concat): New function.
> > >   (store_bit_field_1): Check for cases where the value
> > >   to be written and/or the bitfield have vector modes
> > >   and try to generate the corresponding vec_set patterns
> > >   instead of subregs.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/i386/pr118873.c: New test.
> > > ---
> > >  gcc/expmed.cc| 174 ++-
> > >  gcc/testsuite/gcc.target/i386/pr118873.c |  33 +
> > >  2 files changed, 200 insertions(+), 7 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr118873.c
> > >
> > > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > > index 8cf10d9c73bf..8c641f55b9c6 100644
> > > --- a/gcc/expmed.cc
> > > +++ b/gcc/expmed.cc
> > > @@ -740,6 +740,42 @@ store_bit_field_using_insv (const extraction_insn 
> > > *insv, rtx op0,
> > >return false;
> > >  }
> > >
> > > +/* Helper function for store_bit_field_1, used in the case that the 
> > > bitfield
> > > +   and the destination are both vectors.  It extracts the elements of OP 
> > > from
> > > +   LOWER_BOUND to UPPER_BOUND using a vec_select and uses a vec_concat to
> > > +   concatenate the extracted elements with the VALUE.  */
> > > +
> > > +rtx
> > > +generate_vec_concat (machine_mode fieldmode, rtx op, rtx value,
> > > +  HOST_WIDE_INT lower_bound,
> > > +  HOST_WIDE_INT upper_bound)
> > > +{
> > > +  if (!VECTOR_MODE_P (fieldmode))
> > > +return NULL_RTX;
> > > +
> > > +  rtvec vec = rtvec_alloc (GET_MODE_NUNITS (fieldmode).to_constant ());
> > > +  machine_mode outermode = GET_MODE (op);
> > > +
> > > +  for (HOST_WIDE_INT i = lower_bound; i < upper_bound; ++i)
> > > +RTVEC_ELT (vec, i) = GEN_INT (i);
> > > +  rtx par = gen_rtx_PARALLEL (VOIDmode, vec);
> > > +  rtx select = gen_rtx_VEC_SELECT (fieldmode, op, par);
> > > +  if (BYTES_BIG_ENDIAN)
> > > +{
> > > +  if (lower_bound > 0)
> > > + return gen_rtx_VEC_CONCAT (outermode, select, value);
> > > +  else
> > > + return gen_rtx_VEC_CONCAT (outermode, value, select);
> > > +}
> > > +  else
> > > +{
> > > +  if (lower_bound > 0)
> > > + return gen_rtx_VEC_CONCAT (outermode, value, select);
> > > +  else
> > > + return gen_rtx_VEC_CONCAT (outermode, select, value);
> > > +}
> >

Re: [PATCH] Fix crash with constant initializer caused by IPA

On Thu, May 29, 2025 at 11:38 AM Eric Botcazou  wrote:
>
> Hi,
>
> the attached Ada testcase compiled with -O2 -gnatn makes the compiler crash in
> vect_can_force_dr_alignment_p during SLP vectorization:
>
>   if (decl_in_symtab_p (decl)
>   && !symtab_node::get (decl)->can_increase_alignment_p ())
> return false;
>
> because symtab_node::get (decl) returns a null node.  The phenomenon occurs
> for a pair of twin symbols listed like so in .cgraph:
>
> Opt7_Pkg.T12b/17 (Opt7_Pkg.T12b)
>   Type: variable definition analyzed
>   Visibility: semantic_interposition external public artificial
>   Aux: @0x44d45e0
>   References:
>   Referring: opt7_pkg__enum_name_table/13 (addr) opt7_pkg__enum_name_table/13
> (addr)
>   Availability: not-ready
>   Varpool flags: initialized read-only const-value-known
>
> Opt7_Pkg.T8b/16 (Opt7_Pkg.T8b)
>   Type: variable definition analyzed
>   Visibility: semantic_interposition external public artificial
>   Aux: @0x7f9fda3fff00
>   References:
>   Referring: opt7_pkg__enum_name_table/13 (addr) opt7_pkg__enum_name_table/13
> (addr)
>   Availability: not-ready
>   Varpool flags: initialized read-only const-value-known
>
> with:
>
> opt7_pkg__enum_name_table/13 (Opt7_Pkg.Enum_Name_Table)
>   Type: variable definition analyzed
>   Visibility: semantic_interposition external public
>   Aux: @0x44d45e0
>   References: Opt7_Pkg.T8b/16 (addr) Opt7_Pkg.T8b/16 (addr) Opt7_Pkg.T12b/17
> (addr) Opt7_Pkg.T12b/17 (addr)
>   Referring: opt7_pkg__image/2 (read) opt7_pkg__image/2 (read)
> opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) opt7_pkg__image/2 (read)
> opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) opt7_pkg__image/2 (read)
>   Availability: not-ready
>   Varpool flags: initialized read-only const-value-known
>
> being the crux of the matter.
>
> What happens is that symtab_remove_unreachable_nodes leaves the last symbol in
> kind of a limbo state: in .remove_symbols, we have:
>
> opt7_pkg__enum_name_table/13 (Opt7_Pkg.Enum_Name_Table)
>   Type: variable
>   Body removed by symtab_remove_unreachable_nodes
>   Visibility: externally_visible semantic_interposition external public
>   References:
>   Referring: opt7_pkg__image/2 (read) opt7_pkg__image/2 (read)
>   Availability: not_available
>   Varpool flags: initialized read-only const-value-known
>
> This means that the "body" (DECL_INITIAL) of the symbol has been disregarded
> during reachability analysis, causing the first two symbols to be discarded:
>
> Reclaiming variables: Opt7_Pkg.T12b/17 Opt7_Pkg.T8b/16
>
> but the DECL_INITIAL is explicitly preserved for later constant folding, which
> makes it possible to retrofit the DECLs corresponding to the first two symbols
> in the GIMPLE IR and ultimately leads vect_can_force_dr_alignment_p to crash.
>
>
> The decision to disregard the "body" (DECL_INITIAL) of the symbol is made in
> the first process_references present in ipa.cc:
>
>   if (node->definition && !node->in_other_partition
>   && ((!DECL_EXTERNAL (node->decl) || node->alias)
>   || (possible_inline_candidate_p (node)
>   /* We use variable constructors during late compilation for
>  constant folding.  Keep references alive so partitioning
>  knows about potential references.  */
>   || (VAR_P (node->decl)
>   && (flag_wpa
>   || flag_incremental_link
>  == INCREMENTAL_LINK_LTO)
>   && dyn_cast  (node)
>->ctor_useable_for_folding_p ()
>
> because neither flag_wpa nor flag_incremental_link = INCREMENTAL_LINK_LTO is
> true, while the decision to ultimately preserve the DECL_INITIAL is made later
> in remove_unreachable_nodes:
>
>   /* Keep body if it may be useful for constant folding. */
>   if ((flag_wpa || flag_incremental_link == INCREMENTAL_LINK_LTO)
>   || ((init = ctor_for_folding (vnode->decl)) == error_mark_node))
> vnode->remove_initializer ();
>   else
> DECL_INITIAL (vnode->decl) = init;
>
>
> I think that the testcase shows that the "body" of ctor_useable_for_folding_p
> symbols must always be considered for reachability analysis (which could make
> the above test on ctor_for_folding useless).  But implementing that introduces
> a regression for g++.dg/ipa/devirt-39.C, because the vtable is preserved and
> in turn forces the method to be preserved, hence the special case for vtables.
>
> The test also renames the first process_references function in ipa.cc to clear
> the confusion with the second function in the same file.
>
> Bootstrapped/regtested on x86-64/Linux, OK for the mainline?

Ah, I've run into the same issue with IPA PTA recently, unfortunately Honza
seems unresponsive in bugzilla.  IMO the patch looks OK, but let's give Honza
the chance to chime in here - esp. the DECL_VIRTUAL special-casing is
sth I'm not familiar with (wouldn't this apply to all COMDATs?  but only
considering w

Re: [PATCH 1/2] forwprop: Change test in loop of optimize_memcpy_to_memset

On Thu, May 29, 2025 at 11:48 PM Andrew Pinski  wrote:
>
> On Tue, May 27, 2025 at 5:14 AM Richard Biener
>  wrote:
> >
> > On Tue, May 27, 2025 at 5:02 AM Andrew Pinski  
> > wrote:
> > >
> > > This was noticed in the review of copy propagation for aggregates
> > > patch, instead of checking for a NULL or a non-ssa name of vuse,
> > > we should instead check if it the vuse is a default name and stop
> > > then.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu.
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Change check
> > > from NULL/non-ssa name to default name.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/tree-ssa-forwprop.cc | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > > index 4c048a9a298..e457a69ed48 100644
> > > --- a/gcc/tree-ssa-forwprop.cc
> > > +++ b/gcc/tree-ssa-forwprop.cc
> > > @@ -1226,7 +1226,8 @@ optimize_memcpy_to_memset (gimple_stmt_iterator 
> > > *gsip, tree dest, tree src, tree
> > >gimple *defstmt;
> > >unsigned limit = param_sccvn_max_alias_queries_per_access;
> > >do {
> > > -if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
> > > +/* If the vuse is the default definition, then there is no stores 
> > > beforhand. */
> > > +if (SSA_NAME_IS_DEFAULT_DEF (vuse))
> >
> > Since forwprop does update_ssa in the end I was wondering whether any
> > bare non-SSA VUSE/VDEFs sneak in - for this the != SSA_NAME check
> > would be useful.  On a GIMPLE stmt gimple_vuse () will return NULL
> > when it's not a load or store (or with a novops call), as you are using
> > gimple_store_p/gimple_assign_load_p there might be a disconnect
> > between those predicates and the presence of a vuse (I hope not, but ...)
> >
> > The patch looks OK to me, the comments above apply to the copy propagation 
> > case.
>
> The copy prop case should be ok too since the vuse/vdef on the
> statement does not change when doing the prop; only the rhs of the
> statement. There is no inserting of a statement.  This is unless we
> remove the statement and then unlink_stmt_vdef will prop the vuse into
> the vdef of the statement which we are removing.
>
> I did test the copy prop using just SSA_NAME_IS_DEFAULT_DEF and there
> were no regressions there either.
>
> When optimize_memcpy_to_memset was part of fold_stmt, a NULL vuse
> and/or a non-SSA vuse was common due to running before ssa. This was
> why there was a check for non-SSA.
> I am not sure why there was a check for NULLness was there when it was
> part of fold-all-builtins though.
>
> On a side note I think many passes have TODO_update_ssa on them when
> they already keep the ssa up to date now. I wonder if most of that
> dates from the days of VMUST_DEF/VMAY_DEF and multiple names on them
> rather than one virtual name.

Could be.  TODO_update_ssa is cheap when nothing is to be done, but of
course it hides missed SSA updates.  Getting rid of unnecessary ones would
be nice.

Richard.

>
> Thanks,
> Andrew
>
>
> >
> > Thanks,
> > Richard.
> >
> > >return false;
> > >  defstmt = SSA_NAME_DEF_STMT (vuse);
> > >  if (is_a (defstmt))
> > > --
> > > 2.43.0
> > >

Re: [PATCH] scc_copy: conditional return TODO_cleanup_cfg.

On Fri, May 30, 2025 at 3:53 AM Andrew Pinski  wrote:
>
> Only have cleanup cfg happen if scc copy did some proping.
> This should be a small compile time improvement by not doing cleanup
> cfg if scc copy does nothing.
>
> Also removes TODO_update_ssa since it should not be needed.

OK.

Richard.

> gcc/ChangeLog:
>
> * gimple-ssa-sccopy.cc (scc_copy_prop::replace_scc_by_value): Return 
> true
> if something was done.
> (scc_copy_prop::propagate): Return true if something was changed.
> (pass_sccopy::execute): Return TODO_cleanup_cfg if a prop happened.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-ssa-sccopy.cc | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> index ee2a7fa8a72..c93374572a9 100644
> --- a/gcc/gimple-ssa-sccopy.cc
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -464,7 +464,7 @@ class scc_copy_prop
>  public:
>scc_copy_prop ();
>~scc_copy_prop ();
> -  void propagate ();
> +  bool propagate ();
>
>  private:
>/* Bitmap tracking statements which were propagated so that they can be
> @@ -474,15 +474,16 @@ private:
>void visit_op (tree op, hash_set &outer_ops,
> hash_set &scc_set, bool &is_inner,
> tree &last_outer_op);
> -  void replace_scc_by_value (vec scc, tree val);
> +  bool replace_scc_by_value (vec scc, tree val);
>  };
>
>  /* For each statement from given SCC, replace its usages by value
> VAL.  */
>
> -void
> +bool
>  scc_copy_prop::replace_scc_by_value (vec scc, tree val)
>  {
> +  bool didsomething = false;
>for (gimple *stmt : scc)
>  {
>tree name = gimple_get_lhs (stmt);
> @@ -497,10 +498,12 @@ scc_copy_prop::replace_scc_by_value (vec scc, 
> tree val)
> }
>replace_uses_by (name, val);
>bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
> +  didsomething = true;
>  }
>
>if (dump_file)
>  fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +  return didsomething;
>  }
>
>  /* Part of 'scc_copy_prop::propagate ()'.  */
> @@ -566,9 +569,10 @@ scc_copy_prop::visit_op (tree op, hash_set 
> &outer_ops,
>   Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
>   Section 3.2.  */
>
> -void
> +bool
>  scc_copy_prop::propagate ()
>  {
> +  bool didsomething = false;
>auto_vec useful_stmts = get_all_stmt_may_generate_copy ();
>scc_discovery discovery;
>
> @@ -636,7 +640,7 @@ scc_copy_prop::propagate ()
> {
>   /* The only operand in outer_ops.  */
>   tree outer_op = last_outer_op;
> - replace_scc_by_value (scc, outer_op);
> + didsomething |= replace_scc_by_value (scc, outer_op);
> }
>else if (outer_ops.elements () > 1)
> {
> @@ -651,6 +655,7 @@ scc_copy_prop::propagate ()
>
>scc.release ();
>  }
> +  return didsomething;
>  }
>
>  scc_copy_prop::scc_copy_prop ()
> @@ -683,7 +688,7 @@ const pass_data pass_data_sccopy =
>0, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +  0, /* todo_flags_finish */
>  };
>
>  class pass_sccopy : public gimple_opt_pass
> @@ -703,8 +708,7 @@ unsigned
>  pass_sccopy::execute (function *)
>  {
>scc_copy_prop sccopy;
> -  sccopy.propagate ();
> -  return 0;
> +  return sccopy.propagate () ?  TODO_cleanup_cfg : 0;
>  }
>
>  } // anon namespace
> --
> 2.43.0
>

RE: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-29 Thread Yuta Mukai (Fujitsu)

Hi Kyrill-san
Thank you for the review and for pushing.
Yuta

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Thursday, May 29, 2025 6:45 PM
> To: Mukai, Yuta/向井 優太 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; 
> andre.simoesdiasvie...@arm.com
> Subject: Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA 
> (-mcpu=fujitsu-monaka) CPU
> 
> 
> 
> > On 28 May 2025, at 13:36, Kyrylo Tkachov  wrote:
> >
> > Hi Yuta-san
> >
> >> On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu)  
> >> wrote:
> >>
> >> Hello,
> >>
> >> We would like to enable features for FUJITSU-MONAKA that were implemented 
> >> in GCC after we added support for
> FUJITSU-MONAKA.
> >> As the features were implemented in GCC15, we also want to backport it to 
> >> GCC15.
> >>
> >> Thanks to Andre Vieira for notifying us.
> >>
> >> Bootstrapped/regtested on aarch64-unknown-linux-gnu.
> >>
> >> We would be grateful if someone could push this on our behalf, as we do 
> >> not have write access.
> >
> > Thanks, this is ok and I’ve pushed it to trunk with an adjusted ChangeLog 
> > entry.
> > I’ll push a backport to the GCC 15 branch next week after some simple smoke 
> > testing.
> 
> I found a bit of time and bootstrapped a backport.
> So pushed to the GCC 15 branch as well.
> Thanks again,
> Kyrill
> 
> 
> >
> > Kyrill
> >
> >   2025-05-23  Yuta Mukai  
> >
> >   gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-cores.def (fujitsu-monaka): Update ISA
> >   features.
> >
> >>
> >> Thanks,
> >> Yuta
> >> --
> >> Yuta Mukai
> >> Fujitsu Limited
> >>
> >> <0001-aarch64-Enable-newly-implemented-features-for-FUJITS.patch>
>

Re: [PATCH 3/3] OpenMP: Handle more cases in user/condition selector

2025-05-29 Thread Sandra Loosemore


On 5/29/25 02:51, Tobias Burnus wrote:

@Jason – The idea is make semantics.cc's maybe_convert_cond callable
from parser.cc + pt.cc, i.e. to make it a non-static function.
Any reasons not do so?

Sandra Loosemore wrote:


[…] By using the existing front-end
hooks for the implicit conversion to bool in conditional expressions,
we also get free support for using a C++ class object that has a bool
conversion operator in the user/condition selector.


Can you also add type-dependent testcases? They seem to work fine,
but are missing. Like

template
void f(T x, T2 y) {
  #pragma omp metadirective when(user={condition(x)}, 
target_device={device_num(y)} : flush)

}

plus calls to them.

* * *

In parser.cc and pt.cc, you don't call maybe_convert_cond (because it is
currently accessible) - but by calling some of its ingredients, you
bypass some code.
I discussed with Jakub and the idea is (see top of the page) to make
maybe_convert_cond non-static and use it instead.

Additionally, it seems as if we should add

   if (!processing_template_decl)
     t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);

as we do for the other clauses.



Like the attached V2 patch?

-SandraFrom 802bbefdf57548cee0e5aaab518b95a99aa26593 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Fri, 30 May 2025 03:14:35 +
Subject: [PATCH V2] OpenMP: Handle more cases in user/condition selector

Tobias had noted that the C front end was not treating C23 constexprs
as constant in the user/condition selector property, which led to
missed opportunities to resolve metadirectives at parse time.
Additionally neither C nor C++ was permitting the expression to have
pointer or floating-point type -- the former being a common idiom in
other C/C++ conditional expressions.  By using the existing front-end
hooks for the implicit conversion to bool in conditional expressions,
we also get free support for using a C++ class object that has a bool
conversion operator in the user/condition selector.

gcc/c/ChangeLog
	* c-parser.cc (c_parser_omp_context_selector): Call
	convert_lvalue_to_rvalue and c_objc_common_truthvalue_conversion
	on the expression for OMP_TRAIT_PROPERTY_BOOL_EXPR.

gcc/cp/ChangeLog
	* cp-tree.h (maybe_convert_cond): Declare.
	* parser.cc (cp_parser_omp_context_selector): Call
	maybe_convert_cond and fold_build_cleanup_point_expr on the
	expression for OMP_TRAIT_PROPERTY_BOOL_EXPR.
	* pt.cc (tsubst_omp_context_selector): Likewise.
	* semantics.cc (maybe_convert_cond): Remove static declaration.

gcc/testsuite/ChangeLog
	* c-c++-common/gomp/declare-variant-2.c: Update expected output.
	* c-c++-common/gomp/metadirective-condition-constexpr.c: New.
	* c-c++-common/gomp/metadirective-condition.c: New.
	* c-c++-common/gomp/metadirective-error-recovery.c: Update expected
	output.
	* g++.dg/gomp/metadirective-condition-class.C: New.
	* g++.dg/gomp/metadirective-condition-template.C: New.
---
 gcc/c/c-parser.cc | 19 ++--
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/parser.cc  | 21 +++--
 gcc/cp/pt.cc  | 30 ++---
 gcc/cp/semantics.cc   |  3 +-
 .../c-c++-common/gomp/declare-variant-2.c |  2 +-
 .../gomp/metadirective-condition-constexpr.c  | 13 ++
 .../gomp/metadirective-condition.c| 25 +++
 .../gomp/metadirective-error-recovery.c   |  9 +++-
 .../gomp/metadirective-condition-class.C  | 43 +++
 .../gomp/metadirective-condition-template.C   | 41 ++
 11 files changed, 188 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-condition-constexpr.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-condition.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/metadirective-condition-class.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/metadirective-condition-template.C

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 4144aa17fde..e11e6034461 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -26865,17 +26865,30 @@ c_parser_omp_context_selector (c_parser *parser, enum omp_tss_code set,
 	  break;
 	case OMP_TRAIT_PROPERTY_DEV_NUM_EXPR:
 	case OMP_TRAIT_PROPERTY_BOOL_EXPR:
-	  t = c_parser_expr_no_commas (parser, NULL).value;
+	  {
+		c_expr texpr = c_parser_expr_no_commas (parser, NULL);
+		texpr = convert_lvalue_to_rvalue (token->location, texpr,
+		  true, true);
+		t = texpr.value;
+	  }
 	  if (t == error_mark_node)
 		return error_mark_node;
 	  mark_exp_read (t);
-	  t = c_fully_fold (t, false, NULL);
-	  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
+	  if (property_kind == OMP_TRAIT_PROPERTY_BOOL_EXPR)
+		{
+		  t = c_objc_common_truthvalue_conversion (token->location,
+			   t,
+			   boolean_type_node);
+		  if (t == error_mark_node)
+		return error_mark_node;
+		}
+	  else if (!INTEGRAL_TYPE_P (TREE

[PATCH] scc_copy: conditional return TODO_cleanup_cfg.

2025-05-29 Thread Andrew Pinski

Only have cleanup cfg happen if scc copy did some proping.
This should be a small compile time improvement by not doing cleanup
cfg if scc copy does nothing.

Also removes TODO_update_ssa since it should not be needed.

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (scc_copy_prop::replace_scc_by_value): Return 
true
if something was done.
(scc_copy_prop::propagate): Return true if something was changed.
(pass_sccopy::execute): Return TODO_cleanup_cfg if a prop happened.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-ssa-sccopy.cc | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index ee2a7fa8a72..c93374572a9 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -464,7 +464,7 @@ class scc_copy_prop
 public:
   scc_copy_prop ();
   ~scc_copy_prop ();
-  void propagate ();
+  bool propagate ();
 
 private:
   /* Bitmap tracking statements which were propagated so that they can be
@@ -474,15 +474,16 @@ private:
   void visit_op (tree op, hash_set &outer_ops,
hash_set &scc_set, bool &is_inner,
tree &last_outer_op);
-  void replace_scc_by_value (vec scc, tree val);
+  bool replace_scc_by_value (vec scc, tree val);
 };
 
 /* For each statement from given SCC, replace its usages by value
VAL.  */
 
-void
+bool
 scc_copy_prop::replace_scc_by_value (vec scc, tree val)
 {
+  bool didsomething = false;
   for (gimple *stmt : scc)
 {
   tree name = gimple_get_lhs (stmt);
@@ -497,10 +498,12 @@ scc_copy_prop::replace_scc_by_value (vec scc, 
tree val)
}
   replace_uses_by (name, val);
   bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
+  didsomething = true;
 }
 
   if (dump_file)
 fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
+  return didsomething;
 }
 
 /* Part of 'scc_copy_prop::propagate ()'.  */
@@ -566,9 +569,10 @@ scc_copy_prop::visit_op (tree op, hash_set 
&outer_ops,
  Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
  Section 3.2.  */
 
-void
+bool
 scc_copy_prop::propagate ()
 {
+  bool didsomething = false;
   auto_vec useful_stmts = get_all_stmt_may_generate_copy ();
   scc_discovery discovery;
 
@@ -636,7 +640,7 @@ scc_copy_prop::propagate ()
{
  /* The only operand in outer_ops.  */
  tree outer_op = last_outer_op;
- replace_scc_by_value (scc, outer_op);
+ didsomething |= replace_scc_by_value (scc, outer_op);
}
   else if (outer_ops.elements () > 1)
{
@@ -651,6 +655,7 @@ scc_copy_prop::propagate ()
 
   scc.release ();
 }
+  return didsomething;
 }
 
 scc_copy_prop::scc_copy_prop ()
@@ -683,7 +688,7 @@ const pass_data pass_data_sccopy =
   0, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+  0, /* todo_flags_finish */
 };
 
 class pass_sccopy : public gimple_opt_pass
@@ -703,8 +708,7 @@ unsigned
 pass_sccopy::execute (function *)
 {
   scc_copy_prop sccopy;
-  sccopy.propagate ();
-  return 0;
+  return sccopy.propagate () ?  TODO_cleanup_cfg : 0;
 }
 
 } // anon namespace
-- 
2.43.0

Re: [PATCH v3] libstdc++: Implement stringstream from string_view [PR119741]

2025-05-29 Thread Jonathan Wakely


On 29/05/25 09:50 -0400, Nathan Myers wrote:

Change in V3:
* Comment that p2495 specifies a drive-by constraint omitted as redundant
* Adjust whitespace to fit in 80 columns

Change in V2:
* apply all review comments
* remove redundant drive-by "requires" on ctor from string allocator arg
* check allocators are plumbed through

-- >8 --

Implement PR libstdc++/119741 (P2495R3)
Add constructors to stringbuf, stringstream, istringstream,
and ostringstream, and a matching overload of str(sv) in each,
that take anything convertible to a string_view in places
where the existing functions take a string.

libstdc++-v3/ChangeLog:

PR libstdc++/119741
* include/std/sstream: full implementation, really just
decls, requires clause and plumbing.
* include/bits/version.def, include/bits/version.h:
new preprocessor symbol
__cpp_lib_sstream_from_string_view.
* testsuite/27_io/basic_stringbuf/cons/char/3.cc: New tests.
* testsuite/27_io/basic_istringstream/cons/char/2.cc: New tests.
* testsuite/27_io/basic_ostringstream/cons/char/4.cc: New tests.
* testsuite/27_io/basic_stringstream/cons/char/2.cc: New tests.


Historically we just named most tests as 1.cc, 2.cc, 3.cc but it's not
very helpful when you see "FAIL: .../1.cc" in the test logs, so since
these new tests are for specific new constructors, I think it would
make more sense for them to all be named string_view.cc

That makes it very clear that they're testing construction from
string_view.

Please add some wchar_t tests too, i.e. cons/wchar_t/string_view.cc
That ensures that the impl doesn't accidentally use char_traits
where it should be char_traits<_CharT> or anything like that. If the
tests only use the char specializations then we wouldn't notice. The
wchar_t tests can be copies of the char ones, with char -> wchar_t
substituted

Or if you want to avoid copy&pasting the whole test, you could use C
instead of char and do this in the char/*.cc versions:

#ifndef C
#define C char
#endif

and then in the wchar_t/*.cc versions do:

#define C wchar_t
#include "../char/string_view.cc"

So that you reuse the char ones.


---
libstdc++-v3/include/bits/version.def |  11 +-
libstdc++-v3/include/bits/version.h   |  10 +
libstdc++-v3/include/std/sstream  | 200 --
.../27_io/basic_istringstream/cons/char/2.cc  | 187 
.../27_io/basic_ostringstream/cons/char/4.cc  | 186 
.../27_io/basic_stringbuf/cons/char/3.cc  | 196 +
.../27_io/basic_stringstream/cons/char/2.cc   | 196 +
7 files changed, 967 insertions(+), 19 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/2.cc
create mode 100644 
libstdc++-v3/testsuite/27_io/basic_ostringstream/cons/char/4.cc
create mode 100644 libstdc++-v3/testsuite/27_io/basic_stringbuf/cons/char/3.cc
create mode 100644 
libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/2.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 282667eabda..8172bcd4e26 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -649,7 +649,7 @@ ftms = {
  };
  values = {
v = 1;
-/* For when there's no gthread.  */
+// For when there is no gthread.
cxxmin = 17;
hosted = yes;
gthread = no;
@@ -1945,6 +1945,15 @@ ftms = {
  };
};

+ftms = {
+  name = sstream_from_string_view;
+  values = {
+v = 202302;


The correct value for this macro is 202306, see [version.syn] in the
working draft, or SD-6:
https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations#__cpp_lib_sstream_from_string_view


+cxxmin = 26;
+hosted = yes;
+  };
+};
+
// Standard test specifications.
stds[97] = ">= 199711L";
stds[03] = ">= 199711L";
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index bb7c0479c72..b4b487fba92 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2174,4 +2174,14 @@
#endif /* !defined(__cpp_lib_modules) && defined(__glibcxx_want_modules) */
#undef __glibcxx_want_modules

+#if !defined(__cpp_lib_sstream_from_string_view)
+# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED
+#  define __glibcxx_sstream_from_string_view 202302L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_sstream_from_string_view)
+#   define __cpp_lib_sstream_from_string_view 202302L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_sstream_from_string_view) && 
defined(__glibcxx_want_sstream_from_string_view) */
+#undef __glibcxx_want_sstream_from_string_view
+
#undef __glibcxx_want_all
diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index ad0c16a91e8..528756ed631 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -38,9 +38,14 @@
#endif

#include  // iostream
+#include 


As

Re: [PATCH] libstdc++: Compare keys and values separately in flat_map::operator==

On Thu, May 29, 2025 at 3:56 PM Patrick Palka  wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/15?
>
> -- >8 --
>
> Instead of effectively doing a zipped comparison of the keys and values,
> compare them separately to leverage the underlying containers' optimized
> equality implementations.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/flat_map (_Flat_map_impl::operator==): Compare
> keys and values separately.
> ---
>  libstdc++-v3/include/std/flat_map | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/flat_map
> b/libstdc++-v3/include/std/flat_map
> index c0716d12412a..134307324190 100644
> --- a/libstdc++-v3/include/std/flat_map
> +++ b/libstdc++-v3/include/std/flat_map
> @@ -873,7 +873,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>[[nodiscard]]
>friend bool
>operator==(const _Derived& __x, const _Derived& __y)
> -  { return std::equal(__x.begin(), __x.end(), __y.begin(),
> __y.end()); }
> +  {
> +   return __x._M_cont.keys == __y._M_cont.keys
> + && __x._M_cont.values == __y._M_cont.values;
>
Previously we supported containers that do not have operator==, by calling
equal.
For the flat_set we also do not compare the containers. I would suggest
using in both:
  ranges::equal(x._M_cont)
Or using == on containers in both flat_map and flat_set.

> +  }
>
>template
> [[nodiscard]]
> --
> 2.50.0.rc0
>
>

Re: [AUTOFDO] Fix annotated profile for de-duplicated call

2025-05-29 Thread Jan Hubicka

> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index 7e0e8c66124..8a317d85277 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1129,6 +1129,26 @@ afdo_set_bb_count (basic_block bb, const stmt_set 
> &promoted)
>gimple *stmt = gsi_stmt (gsi);
>if (gimple_clobber_p (stmt) || is_gimple_debug (stmt))
>  continue;
> +  /* If statements are de-duplicated, we will have same stmt executing 
> from
> +  more than one path (by jumping to same statment).  In this case, the
> +  profile we get will be for multiple paths and would make the annotated
> +  profile wrong.  An example of this is:
> +
> +  if (foo () == 4)
> +{
> +  bar ();
> +}
> +  else if (foo () == 5)
> +{
> +  bar ();
> +}
> + In this case, we want to skip the profile count of bar () and calculate
> + the profile from the edge counts.  In case of LBR/BRBE we are
> + profiling branches and GIMPLE_CALL is the important statement
> + here.  */
> +
> +  if (gimple_code (stmt) == GIMPLE_CALL)
> + continue;

I am not quite sure about this.  With this change you will basically
ignore all samples anotated to calls.  We can deduplciate other
statements, not only calls.  Ignoring all samples annotated with calls
seems to be throwing away good part of useful information, since
pre-inline there are very many of them.  

We can mitigate this particular problem in some cases by deduplicating
early. This would also help inliner.

There are many later optimizations that will inavoidably lead to AFDO
disturption.  We may have something like -fautofdo-collection which will
disable passes that disturbs afdo a lot (like ICF or deduplication), but
I am not sure that makes a lot of sense either...

> location_t phi_loc
>   = gimple_phi_arg_location_from_edge (phi, tmp_e);
> count_info info;
> -   if (afdo_source_profile->get_count_info (phi_loc, &info)
> -   && info.count != 0)
> +   if (afdo_source_profile->get_count_info (phi_loc, &info))
>   {
> if (info.count > max_count)
>   max_count = info.count;

So the idea is to not mark BB as anotated if it only has zero executed statement
since deduplication effectively makes the other BB to contain such
statmeent even while it is executed?

> @@ -1217,7 +1236,9 @@ afdo_find_equiv_class (bb_set *annotated_bb)
> && bb1->loop_father == bb->loop_father)
>   {
> bb1->aux = bb;
> -   if (bb1->count > bb->count && is_bb_annotated (bb1, *annotated_bb))
> +   if (bb1->count > bb->count
> +   && !is_bb_annotated (bb, *annotated_bb)
> +   && is_bb_annotated (bb1, *annotated_bb))
>   {
> bb->count = bb1->count;
> set_bb_annotated (bb, annotated_bb);
> @@ -1229,7 +1250,9 @@ afdo_find_equiv_class (bb_set *annotated_bb)
> && bb1->loop_father == bb->loop_father)
>   {
> bb1->aux = bb;
> -   if (bb1->count > bb->count && is_bb_annotated (bb1, *annotated_bb))
> +   if (bb1->count > bb->count
> +   && !is_bb_annotated (bb, *annotated_bb)
> +   && is_bb_annotated (bb1, *annotated_bb))

Why these two are necessary? The code identifies pairs of BBs that
should execute same number of times (which is visible in CFG) and
attemtps to fixup the counts.  Perhaps the merging should be smarter,
but if we do not make them executed same number of time, we will only
have more inconsistent profiles...
@@ -1269,10 +1293,14 @@ afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
>   else
> total_known_count += AFDO_EINFO (e)->get_count ();
>   num_edge++;
> + if (is_bb_annotated (is_succ ? e->dest : e->src, *annotated_bb))
> +   num_annotated++;
> + else
> +   bb_edge_to_annotate = e;
>}
>  
>  /* Be careful not to annotate block with no successor in special cases.  
> */
> -if (num_unknown_edge == 0 && total_known_count > bb->count)
> +if (num_unknown_edge == 0 && total_known_count >= bb->count)
>{
>   bb->count = total_known_count;
>   if (!is_bb_annotated (bb, *annotated_bb))
> @@ -1281,26 +1309,52 @@ afdo_propagate_edge (bool is_succ, bb_set 
> *annotated_bb)
>}
>  else if (num_unknown_edge == 1 && is_bb_annotated (bb, *annotated_bb))
>{
> - if (bb->count > total_known_count)
> -   {
> -   profile_count new_count = bb->count - total_known_count;
> -   AFDO_EINFO(unknown_edge)->set_count(new_count);
> -   if (num_edge == 1)
> - {
> -   basic_block succ_or_pred_bb = is_succ ? unknown_edge->dest : 
> unknown_edge->src;
> -   if (new_count > succ_or_pred_bb->count)
> - {
> -   succ_or_pred_bb->count = new_count;
> -   if (!is_bb_annotated (succ_or_pred_bb, *annotated_bb))
> -

[PATCH v3] libstdc++: Implement stringstream from string_view [PR119741]

2025-05-29 Thread Nathan Myers

Change in V3:
 * Comment that p2495 specifies a drive-by constraint omitted as redundant
 * Adjust whitespace to fit in 80 columns

Change in V2:
 * apply all review comments
 * remove redundant drive-by "requires" on ctor from string allocator arg
 * check allocators are plumbed through

-- >8 --

Implement PR libstdc++/119741 (P2495R3)
Add constructors to stringbuf, stringstream, istringstream,
and ostringstream, and a matching overload of str(sv) in each,
that take anything convertible to a string_view in places
where the existing functions take a string.

libstdc++-v3/ChangeLog:

PR libstdc++/119741
* include/std/sstream: full implementation, really just
decls, requires clause and plumbing.
* include/bits/version.def, include/bits/version.h:
new preprocessor symbol
__cpp_lib_sstream_from_string_view.
* testsuite/27_io/basic_stringbuf/cons/char/3.cc: New tests.
* testsuite/27_io/basic_istringstream/cons/char/2.cc: New tests.
* testsuite/27_io/basic_ostringstream/cons/char/4.cc: New tests.
* testsuite/27_io/basic_stringstream/cons/char/2.cc: New tests.
---
 libstdc++-v3/include/bits/version.def |  11 +-
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/sstream  | 200 --
 .../27_io/basic_istringstream/cons/char/2.cc  | 187 
 .../27_io/basic_ostringstream/cons/char/4.cc  | 186 
 .../27_io/basic_stringbuf/cons/char/3.cc  | 196 +
 .../27_io/basic_stringstream/cons/char/2.cc   | 196 +
 7 files changed, 967 insertions(+), 19 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/2.cc
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_ostringstream/cons/char/4.cc
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_stringbuf/cons/char/3.cc
 create mode 100644 
libstdc++-v3/testsuite/27_io/basic_stringstream/cons/char/2.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 282667eabda..8172bcd4e26 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -649,7 +649,7 @@ ftms = {
   };
   values = {
 v = 1;
-/* For when there's no gthread.  */
+// For when there is no gthread.
 cxxmin = 17;
 hosted = yes;
 gthread = no;
@@ -1945,6 +1945,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = sstream_from_string_view;
+  values = {
+v = 202302;
+cxxmin = 26;
+hosted = yes;
+  };
+};
+
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index bb7c0479c72..b4b487fba92 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2174,4 +2174,14 @@
 #endif /* !defined(__cpp_lib_modules) && defined(__glibcxx_want_modules) */
 #undef __glibcxx_want_modules
 
+#if !defined(__cpp_lib_sstream_from_string_view)
+# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED
+#  define __glibcxx_sstream_from_string_view 202302L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_sstream_from_string_view)
+#   define __cpp_lib_sstream_from_string_view 202302L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_sstream_from_string_view) && 
defined(__glibcxx_want_sstream_from_string_view) */
+#undef __glibcxx_want_sstream_from_string_view
+
 #undef __glibcxx_want_all
diff --git a/libstdc++-v3/include/std/sstream b/libstdc++-v3/include/std/sstream
index ad0c16a91e8..528756ed631 100644
--- a/libstdc++-v3/include/std/sstream
+++ b/libstdc++-v3/include/std/sstream
@@ -38,9 +38,14 @@
 #endif
 
 #include  // iostream
+#include 
 
 #include 
 #include 
+#ifdef __cpp_lib_sstream_from_string_view
+# include   // is_convertible_v
+#endif
+
 #include  // allocator_traits, __allocator_like
 
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
@@ -52,8 +57,6 @@
 # define _GLIBCXX_SSTREAM_ALWAYS_INLINE [[__gnu__::__always_inline__]]
 #endif
 
-
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -159,6 +162,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { __rhs._M_sync(const_cast(__rhs._M_string.data()), 0, 0); }
 
 #if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
+   // P0408 Efficient access to basic_stringbuf buffer
   explicit
   basic_stringbuf(const allocator_type& __a)
   : basic_stringbuf(ios_base::in | std::ios_base::out, __a)
@@ -197,7 +201,36 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
| ios_base::out)
: basic_stringbuf(__s, __mode, allocator_type{})
{ }
+#endif
+
+#ifdef __cpp_lib_sstream_from_string_view
+  template
+   explicit
+   basic_stringbuf(const _Tp& __t,
+   ios_base::openmode __mode = ios_base::in | ios_base::out)
+ requires (is_convertible_v>)
+   : basic_stringbu

[PATCH] libstdc++: Compare keys and values separately in flat_map::operator==

2025-05-29 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/15?

-- >8 --

Instead of effectively doing a zipped comparison of the keys and values,
compare them separately to leverage the underlying containers' optimized
equality implementations.

libstdc++-v3/ChangeLog:

* include/std/flat_map (_Flat_map_impl::operator==): Compare
keys and values separately.
---
 libstdc++-v3/include/std/flat_map | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/flat_map 
b/libstdc++-v3/include/std/flat_map
index c0716d12412a..134307324190 100644
--- a/libstdc++-v3/include/std/flat_map
+++ b/libstdc++-v3/include/std/flat_map
@@ -873,7 +873,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[nodiscard]]
   friend bool
   operator==(const _Derived& __x, const _Derived& __y)
-  { return std::equal(__x.begin(), __x.end(), __y.begin(), __y.end()); }
+  {
+   return __x._M_cont.keys == __y._M_cont.keys
+ && __x._M_cont.values == __y._M_cont.values;
+  }
 
   template
[[nodiscard]]
-- 
2.50.0.rc0

Re: [PATCH] libstdc++: Compare keys and values separately in flat_map::operator==

On Thu, May 29, 2025 at 4:37 PM Tomasz Kaminski  wrote:

>
>
> On Thu, May 29, 2025 at 3:56 PM Patrick Palka  wrote:
>
>> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/15?
>>
>> -- >8 --
>>
>> Instead of effectively doing a zipped comparison of the keys and values,
>> compare them separately to leverage the underlying containers' optimized
>> equality implementations.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/flat_map (_Flat_map_impl::operator==): Compare
>> keys and values separately.
>> ---
>>  libstdc++-v3/include/std/flat_map | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/libstdc++-v3/include/std/flat_map
>> b/libstdc++-v3/include/std/flat_map
>> index c0716d12412a..134307324190 100644
>> --- a/libstdc++-v3/include/std/flat_map
>> +++ b/libstdc++-v3/include/std/flat_map
>> @@ -873,7 +873,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>[[nodiscard]]
>>friend bool
>>operator==(const _Derived& __x, const _Derived& __y)
>> -  { return std::equal(__x.begin(), __x.end(), __y.begin(),
>> __y.end()); }
>> +  {
>> +   return __x._M_cont.keys == __y._M_cont.keys
>> + && __x._M_cont.values == __y._M_cont.values;
>>
> Previously we supported containers that do not have operator==, by calling
> equal.
> For the flat_set we also do not compare the containers. I would suggest
> using in both:
>   ranges::equal(x._M_cont)
> Or using == on containers in both flat_map and flat_set.
>
queue and stack uses operator== for the containers, so I think  we should
use == on containers in both.

> +  }
>>
>>template
>> [[nodiscard]]
>> --
>> 2.50.0.rc0
>>
>>

Re: [PATCH] libstdc++: Compare keys and values separately in flat_map::operator==

2025-05-29 Thread Jonathan Wakely

On Thu, 29 May 2025 at 15:42, Tomasz Kaminski  wrote:
>
>
>
> On Thu, May 29, 2025 at 3:56 PM Patrick Palka  wrote:
>>
>> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/15?
>>
>> -- >8 --
>>
>> Instead of effectively doing a zipped comparison of the keys and values,
>> compare them separately to leverage the underlying containers' optimized
>> equality implementations.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/flat_map (_Flat_map_impl::operator==): Compare
>> keys and values separately.
>> ---
>>  libstdc++-v3/include/std/flat_map | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/libstdc++-v3/include/std/flat_map 
>> b/libstdc++-v3/include/std/flat_map
>> index c0716d12412a..134307324190 100644
>> --- a/libstdc++-v3/include/std/flat_map
>> +++ b/libstdc++-v3/include/std/flat_map
>> @@ -873,7 +873,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>[[nodiscard]]
>>friend bool
>>operator==(const _Derived& __x, const _Derived& __y)
>> -  { return std::equal(__x.begin(), __x.end(), __y.begin(), __y.end()); }
>> +  {
>> +   return __x._M_cont.keys == __y._M_cont.keys
>> + && __x._M_cont.values == __y._M_cont.values;
>
> Previously we supported containers that do not have operator==, by calling 
> equal.

Oh, good point.
Using == means the element types of the underlying containers must be
equality comparable, but the original approach of using std::equal on
the zipped values only means those tuples must be equality comparable,
and an evil user could have overloaded:

bool operator==(const tuple&, const tuple&);

so that those comparisons work, but MyVal might not be equality comparable.

> For the flat_set we also do not compare the containers. I would suggest using 
> in both:
>   ranges::equal(x._M_cont)
> Or using == on containers in both flat_map and flat_set.
>>
>> +  }
>>
>>template
>> [[nodiscard]]
>> --
>> 2.50.0.rc0
>>

[pushed] c++: C++17 constexpr lambda and goto/static

2025-05-29 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We only want the error for these cases for functions explicitly declared
constexpr, but we still want to set invalid_constexpr on C++17 lambdas so
maybe_save_constexpr_fundef doesn't make them implicitly constexpr.

The potential_constant_expression_1 change isn't necessary for this test,
but still seems correct.

gcc/cp/ChangeLog:

* decl.cc (start_decl): Also set invalid_constexpr
for maybe_constexpr_fn.
* parser.cc (cp_parser_jump_statement): Likewise.
* constexpr.cc (potential_constant_expression_1): Ignore
goto to an artificial label.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-lambda29.C: New test.
---
 gcc/cp/constexpr.cc   |  3 ++
 gcc/cp/decl.cc| 28 +++
 gcc/cp/parser.cc  |  7 +++--
 .../g++.dg/cpp1z/constexpr-lambda29.C | 19 +
 4 files changed, 43 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-lambda29.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fa754b9a176..272fab32896 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -10979,6 +10979,9 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
*jump_target = *target;
return true;
  }
+   if (DECL_ARTIFICIAL (*target))
+ /* The user didn't write this goto, this isn't the problem.  */
+ return true;
if (flags & tf_error)
  constexpr_error (loc, fundef_p, "% is not a constant "
   "expression");
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index a9ef28bfd80..ec4b6298b11 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -6198,22 +6198,28 @@ start_decl (const cp_declarator *declarator,
 }
 
   if (current_function_decl && VAR_P (decl)
-  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+  && maybe_constexpr_fn (current_function_decl)
   && cxx_dialect < cxx23)
 {
   bool ok = false;
   if (CP_DECL_THREAD_LOCAL_P (decl) && !DECL_REALLY_EXTERN (decl))
-   error_at (DECL_SOURCE_LOCATION (decl),
- "%qD defined % in %qs function only "
- "available with %<-std=c++23%> or %<-std=gnu++23%>", decl,
- DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
- ? "consteval" : "constexpr");
+   {
+ if (DECL_DECLARED_CONSTEXPR_P (current_function_decl))
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD defined % in %qs function only "
+ "available with %<-std=c++23%> or %<-std=gnu++23%>", decl,
+ DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
+ ? "consteval" : "constexpr");
+   }
   else if (TREE_STATIC (decl))
-   error_at (DECL_SOURCE_LOCATION (decl),
- "%qD defined % in %qs function only available "
- "with %<-std=c++23%> or %<-std=gnu++23%>", decl,
- DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
- ? "consteval" : "constexpr");
+   {
+ if (DECL_DECLARED_CONSTEXPR_P (current_function_decl))
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "%qD defined % in %qs function only available "
+ "with %<-std=c++23%> or %<-std=gnu++23%>", decl,
+ DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
+ ? "consteval" : "constexpr");
+   }
   else
ok = true;
   if (!ok)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3e39bf33fab..091873cbe3a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15431,11 +15431,12 @@ cp_parser_jump_statement (cp_parser* parser, tree 
&std_attrs)
 
 case RID_GOTO:
   if (parser->in_function_body
- && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+ && maybe_constexpr_fn (current_function_decl)
  && cxx_dialect < cxx23)
{
- error ("% in % function only available with "
-"%<-std=c++23%> or %<-std=gnu++23%>");
+ if (DECL_DECLARED_CONSTEXPR_P (current_function_decl))
+   error ("% in % function only available with "
+  "%<-std=c++23%> or %<-std=gnu++23%>");
  cp_function_chain->invalid_constexpr = true;
}
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda29.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda29.C
new file mode 100644
index 000..9e661b6a55d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda29.C
@@ -0,0 +1,19 @@
+// Test that we don't make lambdas with goto/static implicitly constexpr
+// when an explicitly constexpr function would be ill-formed.
+
+// { dg-do compile { target c++17 } }
+
+int main()
+{
+  constexpr int a = [] {
+return 42;
+goto label;
+  label:
+return 1

Re: [PATCH] RISC-V: Add 'bclr+binv' peephole2 optimization.

2025-05-29 Thread Jeff Law

On 5/28/25 9:05 PM, Jiawei wrote:

This seems like it would be much better as a combine pattern. In
fact, I'm a bit surprised that combine didn't simplify this series of
operations into a IOR. So I'd really like to see the .combine dump
with and without this hunk for the relevant testcase.

Here is the dump log, using
trunk(7fca794e0199baff8f07140a950ba3374c6aa634), more details please see
https://godbolt.org/z/3hfzdz3Ks

===

~/rv/bin/riscv64-unknown-linux-gnu-g++ -march=rv64gc_zba_zbb_zbs -O2 -S
-fdump-rtl-all redundant-bitmap-2.C

before combine in .ext_dce

Thanks! That was helpful.

I was looking for the full .combine dump -- the full dump includes
information about patterns that were tried and failed. That often will
point the way to a better solution.

In particular in the .combine dump we have this nugget:

Trying 15 -> 18:
15: r151:DI=0xfffe<-I think combine really should have simplified that before querying the
target. That really should have been simpified to a bit insertion idiom
or perhaps an simpler ior.

More generally, the question we should first ask is whether or not the
source should have simplified independent of the target. I think the
answer is yes in this case, which means we should try to fix that
problem first since it'll improve every target rather than just RISC-V.

When we do find ourselves needing to write new target patterns, a
define_insn will generally be preferable to a define_peephole.

The define_insn will match when there's a data dependency within a basic
block. A define_peephole requires the insns to be consecutive in the
IL. Thus the define_insn will tend to match more often and is those
preferable to a define_peephole.

Anyway, to recap, I think the better solution is to improve
simplify_binary_operation or one of its children or perhaps
simplify_compound_operation its related functions.

jeff

Re: [EXT] Re: [PATCH v3] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

2025-05-29 Thread Peter Bergner

On 5/29/25 5:35 AM, Segher Boessenkool wrote:
>
> Add yourself to suthors as well?

Agreed.  Just add your name/email address directly under mine, like so:

2025-05-29  Peter Bergner  
Jeevitha Palanisamy  

>> +{   \
>> +  register TYPE _ret asm ("r8");\
>> +  register TYPE _cond asm ("r9") = _COND;   \
>> +  register TYPE _value asm ("r10") = _VALUE;
>> \
>> +  __asm__ __volatile__ (OPCODE " %[ret],%P[addr],%[code]"   \
>> +: [addr] "+Q" (_PTR[0]), [ret] "=r" (_ret)  \
>> +: "r" (_cond), "r" (_value), [code] "n" (FC));  \
>> +  return _ret;  
>> \
>> +}
> 
> Naming the operands is an extra indirection, and makes things way less
> readable (which means *understandable*) as well.  Just use %0, %1, %2
> please?  It's a single line, people will not lose track of what is what
> anyway (and if they would, the code is then way too big for extended
> asm, so named asm operands is always a code stench).

I agree that's a little too much syntactic sugar, but we were just
being consistent with the other existing code that uses this syntax.
I suppose you could use %1,%0,%4 here (%2 & %3 are not used directly)
and then clean up the other code similarly as a follow-on cleanup?

>> +#define _AMO_LD_INCREMENT(NAME, TYPE, OPCODE, FC)   \
>> +static __inline__ TYPE  
>> \
>> +NAME (TYPE *_PTR)   \
>> +{   \
>> +  TYPE _RET;
>> \
>> +  __asm__ volatile (OPCODE " %[ret],%P[addr],%[code]\n" 
>> \
>> +: [addr] "+Q" (_PTR[0]), [ret] "=r" (_RET)  \
>> +: "Q" (*(TYPE (*)[2]) _PTR), [code] "n" (FC));  \
>> +  return _RET;  
>> \
>> +}
> 
> I don't understand the [2].  Should it be [1]?  These instructions
> can use the value at mem+s (as the ISA names things) as input, but not
> mem+2*s.

I think 2 is correct here.  This 2 isn't an index like the 0 in _PTR[0],
but it's a size.  This specific use is trying to say we're reading from
memory and we're reading 2 locations, mem(EA,s) and mem(EA+s,s).
Maybe we could use separate mentions of _PTR[0] and _PTR[1] instead???
We don't actually use that "operand" in the instruction, it's just there
to tell the compiler that those memory locations are read.

Ditto for _AMO_LD_DECREMENT usage, which reads mem(EA-s,s) and mem(EA,s).

Peter

Re: [PATCH] libstdc++: Compare keys and values separately in flat_map::operator==

On Thu, May 29, 2025 at 4:49 PM Jonathan Wakely  wrote:

> On Thu, 29 May 2025 at 15:48, Jonathan Wakely  wrote:
> >
> > On Thu, 29 May 2025 at 15:42, Tomasz Kaminski 
> wrote:
> > >
> > >
> > >
> > > On Thu, May 29, 2025 at 3:56 PM Patrick Palka 
> wrote:
> > >>
> > >> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/15?
> > >>
> > >> -- >8 --
> > >>
> > >> Instead of effectively doing a zipped comparison of the keys and
> values,
> > >> compare them separately to leverage the underlying containers'
> optimized
> > >> equality implementations.
> > >>
> > >> libstdc++-v3/ChangeLog:
> > >>
> > >> * include/std/flat_map (_Flat_map_impl::operator==): Compare
> > >> keys and values separately.
> > >> ---
> > >>  libstdc++-v3/include/std/flat_map | 5 -
> > >>  1 file changed, 4 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/libstdc++-v3/include/std/flat_map
> b/libstdc++-v3/include/std/flat_map
> > >> index c0716d12412a..134307324190 100644
> > >> --- a/libstdc++-v3/include/std/flat_map
> > >> +++ b/libstdc++-v3/include/std/flat_map
> > >> @@ -873,7 +873,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >>[[nodiscard]]
> > >>friend bool
> > >>operator==(const _Derived& __x, const _Derived& __y)
> > >> -  { return std::equal(__x.begin(), __x.end(), __y.begin(),
> __y.end()); }
> > >> +  {
> > >> +   return __x._M_cont.keys == __y._M_cont.keys
> > >> + && __x._M_cont.values == __y._M_cont.values;
> > >
> > > Previously we supported containers that do not have operator==, by
> calling equal.
> >
> > Oh, good point.
> > Using == means the element types of the underlying containers must be
> > equality comparable, but the original approach of using std::equal on
> > the zipped values only means those tuples must be equality comparable,
> > and an evil user could have overloaded:
> >
> > bool operator==(const tuple&, const tuple&);
>
> Or const tuple& or whatever the zipped type is.
>
Actually in [container.reqmts] p42
 we require that:
T  is equality comparable
Which in our case is std::tuple, but then we are comparing
std::tuple.
So I think just comparing containers is fine.

>
>
> >
> > so that those comparisons work, but MyVal might not be equality
> comparable.
> >
> > > For the flat_set we also do not compare the containers. I would
> suggest using in both:
> > >   ranges::equal(x._M_cont)
> > > Or using == on containers in both flat_map and flat_set.
> > >>
> > >> +  }
> > >>
> > >>template
> > >> [[nodiscard]]
> > >> --
> > >> 2.50.0.rc0
> > >>
>
>

Re: [PATCH v4 0/8] Implement layouts from mdspan.

Sending a bit after the fact, but:
I have finished the review, and most of the commits have really minimal
cosmetic changes.
The only major functional one I have requested are for layout_stride
implementation,

On Wed, May 28, 2025 at 4:36 PM Tomasz Kaminski  wrote:

> I have reviewed and posted feedback up to, but not including layout_stride
> today.
> Will try to finish tomorrow.
> Thank you again for continuous work on the patches.
>
> On Tue, May 27, 2025 at 4:40 PM Tomasz Kaminski 
> wrote:
>
>>
>>
>> On Tue, May 27, 2025 at 4:32 PM Luc Grosheintz 
>> wrote:
>>
>>> Since, I believe now we're through the larger questions about
>>> how to implement layouts. If reviewing all three over and over
>>> is too painful, it might now make sense to split the patch into
>>> separate patches, one per layout.
>>>
>> I think we are OK. As you mentioned we are past general discussion,
>> so I need to do more throughroul review with checking against the
>> standard.
>> I will try to book some time for this this week.
>>
>>
>>> On 5/26/25 16:04, Luc Grosheintz wrote:
>>> > This follows up on:
>>> > https://gcc.gnu.org/pipermail/libstdc++/2025-May/061572.html
>>> >
>>> > Note that this patch series can only be applied after merging:
>>> > https://gcc.gnu.org/pipermail/libstdc++/2025-May/061653.html
>>> >
>>> > The important changes since v3 are:
>>> >* Fixed and testsed several related overflow issues that occured in
>>> >  extents of size 0 by using `size_t` to compute products.
>>> >* Fixed and tested default ctors.
>>> >* Add missing code for module support.
>>> >* Documented deviation from standard.
>>> >
>>> > The smaller changes include:
>>> >* Squashed the three small commits that make cosmetic changes to
>>> >  std::extents.
>>> >* Remove layout_left related changes from the layout_stride commit.
>>> >* Remove superfluous `mapping(extents_type(__exts))`.
>>> >* Fix indenting and improve comment in layout_stride.
>>> >* Add an easy check for representable required_span_size to
>>> >  layout_stride.
>>> >* Inline __dynamic_extents_prod
>>> >
>>> > Thank you Tomasz for all the great reviews!
>>> >
>>> > Luc Grosheintz (8):
>>> >libstdc++: Improve naming and whitespace for extents.
>>> >libstdc++: Implement layout_left from mdspan.
>>> >libstdc++: Add tests for layout_left.
>>> >libstdc++: Implement layout_right from mdspan.
>>> >libstdc++: Add tests for layout_right.
>>> >libstdc++: Implement layout_stride from mdspan.
>>> >libstdc++: Add tests for layout_stride.
>>> >libstdc++: Make layout_left(layout_stride) noexcept.
>>> >
>>> >   libstdc++-v3/include/std/mdspan   | 711
>>> +-
>>> >   libstdc++-v3/src/c++23/std.cc.in  |   5 +-
>>> >   .../mdspan/layouts/class_mandate_neg.cc   |  42 ++
>>> >   .../23_containers/mdspan/layouts/ctors.cc | 459 +++
>>> >   .../23_containers/mdspan/layouts/empty.cc |  78 ++
>>> >   .../23_containers/mdspan/layouts/mapping.cc   | 568 ++
>>> >   .../23_containers/mdspan/layouts/stride.cc| 500 
>>> >   7 files changed, 2349 insertions(+), 14 deletions(-)
>>> >   create mode 100644
>>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc
>>> >   create mode 100644
>>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/ctors.cc
>>> >   create mode 100644
>>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/empty.cc
>>> >   create mode 100644
>>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/mapping.cc
>>> >   create mode 100644
>>> libstdc++-v3/testsuite/23_containers/mdspan/layouts/stride.cc
>>> >
>>>
>>>

Re: [PATCH] libstdc++: Define flat_set::operator== in terms of ==