date:20250508

[PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread Jiawei

RISC-V backend don't support '-mgeneral-regs-only' option, skip it.
https://godbolt.org/z/38M8vPW74

gcc/testsuite/ChangeLog:

* gcc.dg/pr119160.c: Skip for RISC-V backend.

---
 gcc/testsuite/gcc.dg/pr119160.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr119160.c b/gcc/testsuite/gcc.dg/pr119160.c
index b4629a11d9..d96e4de169 100644
--- a/gcc/testsuite/gcc.dg/pr119160.c
+++ b/gcc/testsuite/gcc.dg/pr119160.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
-fnon-call-exceptions -fschedule-insns -mgeneral-regs-only -Wno-psabi" } */
+/* { dg-skip-if "" { riscv*-*-* } } */
 
 typedef __attribute__((__vector_size__ (32))) int V;
 
-- 
2.43.0

Re: [PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread Andreas Schwab

On Mai 08 2025, Richard Biener wrote:

> On Thu, May 8, 2025 at 10:02 AM Jiawei  wrote:
>>
>> RISC-V backend don't support '-mgeneral-regs-only' option, skip it.
>> https://godbolt.org/z/38M8vPW74
>
> The test should instead use
>
> /* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-*
> i?86-*-* } } } */

arm and aarch64 support it too, if it matters.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread Andreas Schwab

On Mai 08 2025, Jiawei wrote:

> RISC-V backend don't support '-mgeneral-regs-only' option, skip it.

Almost all backends do not support it.  It should be used only on those
few that do.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread Philipp Tomsich

+Konstantinos Eleftheriou


On Thu, 8 May 2025 at 10:30, Andreas Schwab  wrote:
>
> On Mai 08 2025, Richard Biener wrote:
>
> > On Thu, May 8, 2025 at 10:02 AM Jiawei  wrote:
> >>
> >> RISC-V backend don't support '-mgeneral-regs-only' option, skip it.
> >> https://godbolt.org/z/38M8vPW74
> >
> > The test should instead use
> >
> > /* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-*
> > i?86-*-* } } } */
>
> arm and aarch64 support it too, if it matters.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

[PATCH 1/8] RISC-V: Introduce riscv-ext*.def to define extensions

2025-05-08 Thread Kito Cheng

Adding a new ISA extension to RISC-V GCC requires modifying several places:
1. riscv_ext_version_table for the extension version.
2. riscv.opt for the target option and variable.
3. riscv_ext_flag_table to bind the extension to its target option.
4. riscv_combine_info if this extension is just a macro extension.
5. riscv_implied_info if this extension implies other extensions.
6. invoke.texi for documentation (this one is often forgotten - even by me...).
7. riscv-ext-bitmask.def if this extension has been allocated a bitmask in
   `__riscv_feature_bits`.

And now, we've integrated all the information into riscv-ext.def and generate
(almost) everything from that!

Some of the fields, like URL, are not used yet. They are planned to be updated
later and used for improving the documentation.

gcc/ChangeLog:

* config/riscv/riscv-ext.def: New file; define extension metadata table.
* config/riscv/riscv-ext-corev.def: New.
* config/riscv/riscv-ext-sifive.def: New.
* config/riscv/riscv-ext-thead.def: New.
* config/riscv/riscv-ext-ventana.def: New.
---
 gcc/config/riscv/riscv-ext-corev.def   |   87 ++
 gcc/config/riscv/riscv-ext-sifive.def  |   87 ++
 gcc/config/riscv/riscv-ext-thead.def   |  191 +++
 gcc/config/riscv/riscv-ext-ventana.def |   35 +
 gcc/config/riscv/riscv-ext.def | 1798 
 5 files changed, 2198 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-ext-corev.def
 create mode 100644 gcc/config/riscv/riscv-ext-sifive.def
 create mode 100644 gcc/config/riscv/riscv-ext-thead.def
 create mode 100644 gcc/config/riscv/riscv-ext-ventana.def
 create mode 100644 gcc/config/riscv/riscv-ext.def

diff --git a/gcc/config/riscv/riscv-ext-corev.def 
b/gcc/config/riscv/riscv-ext-corev.def
new file mode 100644
index ..eb97399403cd
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-corev.def
@@ -0,0 +1,87 @@
+/* CORE-V extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder to make sure updated anything.
+
+Format of DEFINE_RISCV_EXT, please refer to riscv-ext.def.  */
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xcvalu,
+  /* UPPERCAE_NAME */ XCVALU,
+  /* FULL_NAME */ "Core-V miscellaneous ALU extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ xcv,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xcvbi,
+  /* UPPERCAE_NAME */ XCVBI,
+  /* FULL_NAME */ "xcvbi extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ xcv,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xcvelw,
+  /* UPPERCAE_NAME */ XCVELW,
+  /* FULL_NAME */ "Core-V event load word extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ xcv,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xcvmac,
+  /* UPPERCAE_NAME */ XCVMAC,
+  /* FULL_NAME */ "Core-V multiply-accumulate extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ xcv,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
+DEFINE_RISCV_EXT(
+  /* NAME */ xcvsimd,
+  /* UPPERCAE_NAME */ XCVSIMD,
+  /* FULL_NAME */ "xcvsimd extension",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ xcv,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
diff --git a/gcc/config/riscv/riscv-ext-sifive.def 
b/gcc/config/riscv/riscv-ext-sifive.def
new file mode 100644
index ..c8d79da479ca
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-sifive.def
@@ -0,0 +1,87 @@
+/* SiFive extension definition file f

[PATCH 6/8] RISC-V: Drop riscv_implied_info and riscv_combine_info in favor of riscv_ext_info_t data

2025-05-08 Thread Kito Cheng

Consolidate implied-extension logic by removing the old `riscv_implied_info`
array and using the `implied_exts` field in the unified riscv_ext_info_t
structures generated from `riscv-ext.def`.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info::riscv_implied_info_t): Remove unused
variant.
(struct riscv_implied_info_t): Remove unsued field.
(riscv_implied_info::match): Remove unused variant, and adjust
the logic.
(get_riscv_ext_info): New.
(riscv_implied_info): Remove.
(riscv_ext_info_t::apply_implied_ext): New.
(riscv_combine_info). Remove.
(riscv_subset_list::handle_implied_ext): Use riscv_ext_info_t
rather than riscv_implied_info.
(riscv_subset_list::check_implied_ext): Ditto.
(riscv_subset_list::handle_combine_ext): Use riscv_ext_info_t
rather than riscv_combine_info.
(riscv_minimal_hwprobe_feature_bits): Use riscv_ext_info_t
rather than riscv_implied_info.
---
 gcc/common/config/riscv/riscv-common.cc | 342 +---
 1 file changed, 65 insertions(+), 277 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 72d5d8181d16..faec954e0c15 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -95,32 +95,17 @@ struct riscv_implied_info_t
   constexpr riscv_implied_info_t (const char *implied_ext,
  riscv_implied_predicator_t predicator
  = nullptr)
-: ext (nullptr), implied_ext (implied_ext), predicator (predicator)
+: implied_ext (implied_ext), predicator (predicator)
   {}
 
-  constexpr riscv_implied_info_t (const char *ext, const char *implied_ext,
- riscv_implied_predicator_t predicator
- = nullptr)
-: ext (ext), implied_ext (implied_ext), predicator (predicator){};
-
-  bool match (const riscv_subset_list *subset_list, const char *ext_name) const
+  bool match (const riscv_subset_list *subset_list) const
   {
-if (ext_name && strcmp (ext_name, ext) != 0)
-  return false;
-
 if (predicator && !predicator (subset_list))
   return false;
 
 return true;
   }
 
-  bool match (const riscv_subset_list *subset_list,
- const riscv_subset_t *subset) const
-  {
-return match (subset_list, subset->name.c_str());
-  }
-
-  const char *ext;
   const char *implied_ext;
   riscv_implied_predicator_t predicator;
 };
@@ -230,196 +215,43 @@ static const std::unordered_map riscv_ext_infos
 #undef DEFINE_RISCV_EXT
 };
 
-/* Implied ISA info, must end with NULL sentinel.  */
-static const riscv_implied_info_t riscv_implied_info[] =
+static const riscv_ext_info_t &
+get_riscv_ext_info (const std::string &ext)
 {
-  {"m", "zmmul"},
-
-  {"d", "f"},
-  {"f", "zicsr"},
-  {"d", "zicsr"},
-
-  {"a", "zaamo"},
-  {"a", "zalrsc"},
-
-  {"c", "zca"},
-  {"c", "zcf",
-   [] (const riscv_subset_list *subset_list) -> bool
-   {
- return subset_list->xlen () == 32 && subset_list->lookup ("f");
-   }},
-  {"c", "zcd",
-   [] (const riscv_subset_list *subset_list) -> bool
-   {
- return subset_list->lookup ("d");
-   }},
-
-  {"zabha", "zaamo"},
-  {"zacas", "zaamo"},
-  {"zawrs", "zalrsc"},
-
-  {"zcmop", "zca"},
-
-  {"b", "zba"},
-  {"b", "zbb"},
-  {"b", "zbs"},
-
-  {"zdinx", "zfinx"},
-  {"zfinx", "zicsr"},
-  {"zdinx", "zicsr"},
-
-  {"zicfiss", "zicsr"},
-  {"zicfiss", "zimop"},
-  {"zicfilp", "zicsr"},
-
-  {"zk", "zkn"},
-  {"zk", "zkr"},
-  {"zk", "zkt"},
-  {"zkn", "zbkb"},
-  {"zkn", "zbkc"},
-  {"zkn", "zbkx"},
-  {"zkn", "zkne"},
-  {"zkn", "zknd"},
-  {"zkn", "zknh"},
-  {"zks", "zbkb"},
-  {"zks", "zbkc"},
-  {"zks", "zbkx"},
-  {"zks", "zksed"},
-  {"zks", "zksh"},
-
-  {"v", "zvl128b"},
-  {"v", "zve64d"},
-
-  {"zve32f", "f"},
-  {"zve64f", "f"},
-  {"zve64d", "d"},
-
-  {"zve32x", "zicsr"},
-  {"zve32x", "zvl32b"},
-  {"zve32f", "zve32x"},
-  {"zve32f", "zvl32b"},
-
-  {"zve64x", "zve32x"},
-  {"zve64x", "zvl64b"},
-  {"zve64f", "zve32f"},
-  {"zve64f", "zve64x"},
-  {"zve64f", "zvl64b"},
-  {"zve64d", "zve64f"},
-  {"zve64d", "zvl64b"},
-
-  {"zvl64b", "zvl32b"},
-  {"zvl128b", "zvl64b"},
-  {"zvl256b", "zvl128b"},
-  {"zvl512b", "zvl256b"},
-  {"zvl1024b", "zvl512b"},
-  {"zvl2048b", "zvl1024b"},
-  {"zvl4096b", "zvl2048b"},
-  {"zvl8192b", "zvl4096b"},
-  {"zvl16384b", "zvl8192b"},
-  {"zvl32768b", "zvl16384b"},
-  {"zvl65536b", "zvl32768b"},
-
-  {"zvkn", "zvkned"},
-  {"zvkn", "zvknhb"},
-  {"zvkn", "zvkb"},
-  {"zvkn", "zvkt"},
-  {"zvknc", "zvkn"},
-  {"zvknc", "zvbc"},
-  {"zvkng", "zvkn"},
-  {"zvkng", "zvkg"},
-  {"zvks", "zvksed"},
-  {"zvks", "zvksh"},
-  {"zvks", "zvkb"},
-  {"zvks", "zvkt"},
-  {"zvksc", "zvks"},
-  {"zvksc", "zvbc"},
-  {"zvksg", "zvks"},
-  {"zvksg", "zvkg"},
-  {"zvbb",  "zvkb"},
-  {"zvbc",   "zve64x"},
-  {"zvkb",   "zve

Re: [PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread jiawei




在 2025/5/8 16:25, Richard Biener 写道:

On Thu, May 8, 2025 at 10:02 AM Jiawei  wrote:

RISC-V backend don't support '-mgeneral-regs-only' option, skip it.
https://godbolt.org/z/38M8vPW74

The test should instead use

/* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-*
i?86-*-* } } } */

OK with that change.

Richard.


Thanks for your suggestion, will update it in the next version.

BR,

Jiawei


gcc/testsuite/ChangeLog:

 * gcc.dg/pr119160.c: Skip for RISC-V backend.

---
  gcc/testsuite/gcc.dg/pr119160.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr119160.c b/gcc/testsuite/gcc.dg/pr119160.c
index b4629a11d9..d96e4de169 100644
--- a/gcc/testsuite/gcc.dg/pr119160.c
+++ b/gcc/testsuite/gcc.dg/pr119160.c
@@ -1,5 +1,6 @@
  /* { dg-do run } */
  /* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
-fnon-call-exceptions -fschedule-insns -mgeneral-regs-only -Wno-psabi" } */
+/* { dg-skip-if "" { riscv*-*-* } } */

  typedef __attribute__((__vector_size__ (32))) int V;

--
2.43.0

[PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread pan2 . li

From: Pan Li 

We have the testcase for vec_duplicate + vadd.vv combine as
below already,  aka:

Before:
  ...
  vmv.v.x
L1:
  vadd.vv
  J L1
  ...

After:
  ...
L1:
  vadd.vx
  J L1
  ...

But there is still another case like below:

Before:
  ...
L1:
  vmv.v.x
  vadd.vv
  J L1
  ...

After:
  ...
L1:
  vadd.vx
  J L1
  ...

This patch series would like to add the testcases for this.  However,
some test results is not that tidy, and we need more tuning for
the vector cost model.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

Pan Li (5):
  RISC-V: Separate the test running of rvv vx_vf
  RISC-V: Rename VX_BINARY test helper to VX_BINARY_CASE_0
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR 
cost 0
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR 
cost 1
  RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR 
cost 2

 .../riscv/rvv/autovec/vx_vf/vx_binary.h   | 62 ---
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c   |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i32.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i64.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i8.c|  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u16.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u32.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u64.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u8.c|  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c|  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-u32.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-u64.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-5-u8.c|  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c|  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c   |  9 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-u32.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-u64.c   |  8 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-6-u8.c|  8 +++
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i8.c  |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c |  4 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u8.c  |  4 +-
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 15 +
 58 files changed, 301 insertions(+), 49 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_

[PATCH 4/8] RISC-V: Adjust riscv_can_inline_p

2025-05-08 Thread Kito Cheng

We don't hold any extenison flags in `target_flags`, so no need to
gather the extenison flags in `target_flags`.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_can_inline_p): Drop
extension flags check from `target_flags`.
* config/riscv/riscv-subset.h (riscv_x_target_flags_isa_mask):
Remove.
* config/riscv/riscv.cc (riscv_x_target_flags_isa_mask): Remove.
---
 gcc/common/config/riscv/riscv-common.cc | 17 -
 gcc/config/riscv/riscv-subset.h |  1 -
 gcc/config/riscv/riscv.cc   |  8 +++-
 3 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b5a06a46e0e7..5a091e3987b1 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1889,23 +1889,6 @@ riscv_ext_is_subset (struct cl_target_option *opts,
   return true;
 }
 
-/* Return the mask of ISA extension in x_target_flags of gcc_options.  */
-
-int
-riscv_x_target_flags_isa_mask (void)
-{
-  int mask = 0;
-  const riscv_ext_flag_table_t *arch_ext_flag_tab;
-  for (arch_ext_flag_tab = &riscv_ext_flag_table[0];
-   arch_ext_flag_tab->ext;
-   ++arch_ext_flag_tab)
-{
-  if (arch_ext_flag_tab->var_ref == &gcc_options::x_target_flags)
-   mask |= arch_ext_flag_tab->mask;
-}
-  return mask;
-}
-
 /* Get the minimal feature bits in Linux hwprobe of the given ISA string.
 
Used for generating Function Multi-Versioning (FMV) dispatcher for RISC-V.
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index 559e70850161..f5baf5d9c4f9 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -127,6 +127,5 @@ extern bool riscv_minimal_hwprobe_feature_bits (const char 
*,
location_t);
 extern bool
 riscv_ext_is_subset (struct cl_target_option *, struct cl_target_option *);
-extern int riscv_x_target_flags_isa_mask (void);
 
 #endif /* ! GCC_RISCV_SUBSET_H */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3ee88db24fa5..7bdb0b142600 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7918,11 +7918,9 @@ riscv_can_inline_p (tree caller, tree callee)
   struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
   struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
 
-  int isa_flag_mask = riscv_x_target_flags_isa_mask ();
-
-  /* Callee and caller should have the same target options except for ISA.  */
-  int callee_target_flags = callee_opts->x_target_flags & ~isa_flag_mask;
-  int caller_target_flags = caller_opts->x_target_flags & ~isa_flag_mask;
+  /* Callee and caller should have the same target options.  */
+  int callee_target_flags = callee_opts->x_target_flags;
+  int caller_target_flags = caller_opts->x_target_flags;
 
   if (callee_target_flags != caller_target_flags)
 return false;
-- 
2.34.1

[PATCH v1 1/5] RISC-V: Separate the test running of rvv vx_vf

2025-05-08 Thread pan2 . li

From: Pan Li 

The default test running in rvv.exp takes the -fno-vect-cost-model
for most of these options.  It is not that suitable as the vx_vf
test depends on the cost-model.  Thus, separate the vx_vf test
cases without -fno-vect-cost-model in another options.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Separate test running of
rvv vx_vf.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 4283d12ccb5..d76a2d7fe74 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -130,6 +130,21 @@ foreach op $AUTOVEC_TEST_OPTS {
 "$op" ""
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/sat/*.\[cS\]]] \
 "$op" ""
+}
+
+# vx_vf tests
+set AUTOVEC_TEST_OPTS [list \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=zvl -mrvv-max-lmul=m1 -ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=zvl -mrvv-max-lmul=m2 -ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=zvl -mrvv-max-lmul=m4 -ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=zvl -mrvv-max-lmul=m8 -ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=zvl -mrvv-max-lmul=dynamic 
-ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=scalable -mrvv-max-lmul=m1 
-ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=scalable -mrvv-max-lmul=m2 
-ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=scalable -mrvv-max-lmul=m4 
-ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=scalable -mrvv-max-lmul=m8 
-ffast-math} \
+  {-ftree-vectorize -O3 -mrvv-vector-bits=scalable -mrvv-max-lmul=dynamic 
-ffast-math} ]
+foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/vx_vf/*.\[cS\]]] 
\
 "$op" ""
 }
-- 
2.43.0

[PATCH v1 4/5] RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR cost 1

2025-05-08 Thread pan2 . li

From: Pan Li 

Add asm dump check and for vec_duplicate + vadd.vv combine case 1 to vadd.vx
with the cost of GR2VR is 1.  The testcases is not that tidy according
to the result, but we will continue tuning the cost model for this.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u32.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u64.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u8.c | 8 
 8 files changed, 64 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c
new file mode 100644
index 000..e5ec8884fc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int16_t, +, VX_BINARY_BODY_X8)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c
new file mode 100644
index 000..ed6c22d059f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i32.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int32_t, +, VX_BINARY_BODY_X4)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c
new file mode 100644
index 000..ef44012e418
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i64.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int64_t, +, VX_BINARY_BODY)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c
new file mode 100644
index 000..d61f9dfbb2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-i8.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int8_t, +, VX_BINARY_BODY_X16)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c
new file mode 100644
index 000..3d1ba7f0742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-5-u16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(uint16_t, +, VX_BINARY_BODY_X8)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_v

[PATCH v1 2/5] RISC-V: Rename VX_BINARY test helper to VX_BINARY_CASE_0

2025-05-08 Thread pan2 . li

From: Pan Li 

This patch would like to rename the VX_BINARY within CASE_0 suffix, as
we have another case of VX_BINARY test code.  Aka case 1:

L1:
  vmv.v.x
  vadd.vv
  J L1

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Rename VX_BINARY
to VX_BINARY_CASE_0 for underlying case 1.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c: Take the
new name for test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c: Ditto

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx_binary.h| 18 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i16.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i32.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i64.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-i8.c |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u16.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u32.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u64.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-1-u8.c |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i16.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i32.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i64.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-i8.c |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u16.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u32.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u64.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-2-u8.c |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i16.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i32.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i64.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-i8.c |  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u16.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u32.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u64.c|  2 +-
 .../riscv/rvv/autovec/vx_vf/vx_vadd-3-u8.c |  2 +-
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i16.c  |  4 ++--
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i32.c  |  4 ++--
 .../rvv/autovec/vx_vf/vx_vadd-run-1-i64.c  |  4 ++--
 .../riscv/rvv/autovec/vx_vf/vx_vadd-run-1-i8.c |  4 ++--
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u16.c  |  4 ++--
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u32.c  |  4 ++--
 .../rvv/autovec/vx_vf/vx_vadd-run-1-u64.c  |  4 ++--
 .../riscv/rvv/autovec/vx_vf/vx_vadd-run-1-u8.c |  4 ++--
 33 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
index 66654eb9022..de5b70dd04b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
@@ -3,15 +3,15 @@
 
 #include 
 
-#define DEF_VX_BINARY(T, OP)\
-void

[committed] fortran: Add testcases for PR120152, PR120153 and PR120158

2025-05-08 Thread Jakub Jelinek

Hi!

The following patch adds testcase coverage for the 3 recently fixed
libgfortran PRs.
On trunk before those fixes I'm getting with -m32
FAIL: gfortran.dg/pr120152_1.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120152_1.f90   -Os  (test for excess errors)
and with -m64
FAIL: gfortran.dg/pr120152_1.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120152_1.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/pr120152_2.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120152_2.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/pr120153.f90   -Os  (test for excess errors)
FAIL: gfortran.dg/pr120158.f90   -O0  execution test
FAIL: gfortran.dg/pr120158.f90   -O1  execution test
FAIL: gfortran.dg/pr120158.f90   -O2  execution test
FAIL: gfortran.dg/pr120158.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/pr120158.f90   -O3 -g  execution test
FAIL: gfortran.dg/pr120158.f90   -Os  execution test
On latest trunk everything PASSes.

Tested on x86_64-linux -m32/-m64, committed to trunk as obvious.
Sorry for not including it with the individual commits.

2025-05-08  Jakub Jelinek  

PR libfortran/120152
PR libfortran/120153
PR libfortran/120158
* gfortran.dg/pr120152_1.f90: New test.
* gfortran.dg/pr120152_2.f90: New test.
* gfortran.dg/pr120153.f90: New test.
* gfortran.dg/pr120158.f90: New test.

--- gcc/testsuite/gfortran.dg/pr120152_1.f90.jj 2025-05-08 10:09:32.020902923 
+0200
+++ gcc/testsuite/gfortran.dg/pr120152_1.f902025-05-08 10:08:47.465516679 
+0200
@@ -0,0 +1,52 @@
+! PR libfortran/120152
+! { dg-do run }
+
+subroutine f1
+  integer(kind=8) :: a (10, 10, 10), b (10, 10)
+  logical :: c (10, 10, 10)
+  a = 0
+  c = .true.
+  b = maxloc (a, 2, c, 8, .true.)
+end
+subroutine f2
+  integer(kind=8) :: a (10, 10, 10)
+  integer(kind=4) :: b (10, 10)
+  logical :: c (10, 10, 10)
+  a = 0
+  c = .true.
+  b = maxloc (a, 2, c, 4, .true.)
+end
+subroutine f3
+  integer(kind=8) :: a (10, 10, 10), b (10, 10)
+  a = 0
+  b = maxloc (a, 2, kind=8, back=.true.)
+end
+subroutine f4
+  integer(kind=8) :: a (10, 10, 10)
+  integer(kind=4) :: b (10, 10)
+  a = 0
+  b = maxloc (a, 2, kind=4, back=.true.)
+end
+subroutine f5
+  integer(kind=8) :: a (10, 10, 10), b (10, 10)
+  logical :: c
+  a = 0
+  c = .false.
+  b = maxloc (a, 2, c, 8, .true.)
+end
+subroutine f6
+  integer(kind=8) :: a (10, 10, 10)
+  integer(kind=4) :: b (10, 10)
+  logical :: c
+  a = 0
+  c = .false.
+  b = maxloc (a, 2, c, 4, .true.)
+end
+program pr120152
+  call f1
+  call f2
+  call f3
+  call f4
+  call f5
+  call f6
+end
--- gcc/testsuite/gfortran.dg/pr120152_2.f90.jj 2025-05-08 10:09:41.793768306 
+0200
+++ gcc/testsuite/gfortran.dg/pr120152_2.f902025-05-08 10:14:50.646513835 
+0200
@@ -0,0 +1,80 @@
+! PR libfortran/120152
+! { dg-do run { target fortran_large_int } }
+
+subroutine f1
+  integer(kind=16) :: a (10, 10, 10)
+  integer(kind=8) :: b (10, 10)
+  logical :: c (10, 10, 10)
+  a = 0
+  c = .true.
+  b = maxloc (a, 2, c, 8, .true.)
+end
+subroutine f2
+  integer(kind=16) :: a (10, 10, 10)
+  integer(kind=4) :: b (10, 10)
+  logical :: c (10, 10, 10)
+  a = 0
+  c = .true.
+  b = maxloc (a, 2, c, 4, .true.)
+end
+subroutine f3
+  integer(kind=16) :: a (10, 10, 10)
+  integer(kind=8) :: b (10, 10)
+  a = 0
+  b = maxloc (a, 2, kind=8, back=.true.)
+end
+subroutine f4
+  integer(kind=16) :: a (10, 10, 10)
+  integer(kind=4) :: b (10, 10)
+  a = 0
+  b = maxloc (a, 2, kind=4, back=.true.)
+end
+subroutine f5
+  integer(kind=16) :: a (10, 10, 10)
+  integer(kind=8) :: b (10, 10)
+  logical :: c
+  a = 0
+  c = .false.
+  b = maxloc (a, 2, c, 8, .true.)
+end
+subroutine f6
+  integer(kind=16) :: a (10, 10, 10)
+  integer(kind=4) :: b (10, 10)
+  logical :: c
+  a = 0
+  c = .false.
+  b = maxloc (a, 2, c, 4, .true.)
+end
+subroutine f7
+  integer(kind=8) :: a (10, 10, 10)
+  integer(kind=16) :: b (10, 10)
+  logical :: c (10, 10, 10)
+  a = 0
+  c = .true.
+  b = maxloc (a, 2, c, 16, .true.)
+end
+subroutine f8
+  integer(kind=8) :: a (10, 10, 10)
+  integer(kind=16) :: b (10, 10)
+  a = 0
+  b = maxloc (a, 2, kind=16, back=.true.)
+end
+subroutine f9
+  integer(kind=8) :: a (10, 10, 10)
+  integer(kind=16) :: b (10, 10)
+  logical :: c
+  a = 0
+  c = .false.
+  b = maxloc (a, 2, c, 16, .true.)
+end
+program pr120152
+  call f1
+  call f2
+  call f3
+  call f4
+  call f5
+  call f6
+  call f7
+  call f8
+  call f9
+end
--- gcc/testsuite/gfortran.dg/pr120153.f90.jj   2025-05-08 10:16:43.88

[PATCH 5/8] RISC-V: Introduce riscv_ext_info_t to hold extension metadata

2025-05-08 Thread Kito Cheng

Define a new riscv_ext_info_t struct to aggregate all ISA extension fields
(name, version, flags, implied extensions, bitmask and extra flags) generated
from riscv-ext.def.

Also adjust riscv_ext_flag_table_t and riscv_implied_info_t to make it
able to not hold extension name, this part will refactor in later
patchs.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_info_t): New
struct.
(opt_var_ref_t): Adjust order.
(cl_opt_var_ref_t): Ditto.
(riscv_ext_flag_table_t): Adjust order, and add a new construct
that not hold the extension name.
(riscv_version_t): New struct.
(riscv_implied_info_t): Adjust order, and add a new construct that not
hold the extension name.
(apply_extra_extension_flags): New function.
(riscv_ext_infos): New.
(riscv_implied_info): Adjust.
* config/riscv/riscv-opts.h (EXT_FLAG_MACRO): New macro.
(BITMASK_NOT_YET_ALLOCATED): New macro.
---
 gcc/common/config/riscv/riscv-common.cc | 191 ++--
 gcc/config/riscv/riscv-opts.h   |   8 +
 2 files changed, 185 insertions(+), 14 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 5a091e3987b1..72d5d8181d16 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include 
 #include 
+#include 
 #include 
 
 #define INCLUDE_STRING
@@ -41,11 +42,62 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_ENDIAN)
 #endif
 
+/* Type for pointer to member of gcc_options and cl_target_option.  */
+typedef int (gcc_options::*opt_var_ref_t);
+typedef int (cl_target_option::*cl_opt_var_ref_t);
+
+/* Types for recording extension to internal flag.  */
+struct riscv_ext_flag_table_t
+{
+  riscv_ext_flag_table_t (const char *ext, opt_var_ref_t var_ref,
+ cl_opt_var_ref_t cl_var_ref, int mask)
+   : ext (ext), var_ref (var_ref), cl_var_ref (cl_var_ref), mask (mask)
+  {}
+  riscv_ext_flag_table_t (opt_var_ref_t var_ref,
+ cl_opt_var_ref_t cl_var_ref, int mask)
+   : ext (nullptr), var_ref (var_ref), cl_var_ref (cl_var_ref), mask (mask)
+  {}
+
+  const char *ext;
+  opt_var_ref_t var_ref;
+  cl_opt_var_ref_t cl_var_ref;
+  int mask;
+
+  void clean (gcc_options *opts) const { opts->*var_ref &= ~mask; }
+
+  void set (gcc_options *opts) const { opts->*var_ref |= mask; }
+
+  bool check (cl_target_option *opts) const
+  {
+return (opts->*cl_var_ref & mask);
+  }
+};
+
+/* Type for hold RISC-V extension version.  */
+struct riscv_version_t
+{
+  riscv_version_t (int major_version, int minor_version,
+  enum riscv_isa_spec_class isa_spec_class
+  = ISA_SPEC_CLASS_NONE)
+: major_version (major_version), minor_version (minor_version),
+  isa_spec_class (isa_spec_class)
+  {}
+  int major_version;
+  int minor_version;
+  enum riscv_isa_spec_class isa_spec_class;
+};
+
 typedef bool (*riscv_implied_predicator_t) (const riscv_subset_list *);
 
 /* Type for implied ISA info.  */
 struct riscv_implied_info_t
 {
+  constexpr riscv_implied_info_t (const char *implied_ext,
+ riscv_implied_predicator_t predicator
+ = nullptr)
+: ext (nullptr), implied_ext (implied_ext), predicator (predicator)
+  {}
+
   constexpr riscv_implied_info_t (const char *ext, const char *implied_ext,
  riscv_implied_predicator_t predicator
  = nullptr)
@@ -53,7 +105,7 @@ struct riscv_implied_info_t
 
   bool match (const riscv_subset_list *subset_list, const char *ext_name) const
   {
-if (strcmp (ext_name, ext) != 0)
+if (ext_name && strcmp (ext_name, ext) != 0)
   return false;
 
 if (predicator && !predicator (subset_list))
@@ -73,6 +125,111 @@ struct riscv_implied_info_t
   riscv_implied_predicator_t predicator;
 };
 
+static void
+apply_extra_extension_flags (const char *ext,
+std::vector &flag_table);
+
+/* Class for hold the extension info.  */
+class riscv_ext_info_t
+{
+public:
+  riscv_ext_info_t (const char *ext,
+   const std::vector &implied_exts,
+   const std::vector &supported_versions,
+   const std::vector &flag_table,
+   int bitmask_group_id, int bitmask_group_bit_pos,
+   unsigned extra_extension_flags)
+: m_ext (ext), m_implied_exts (implied_exts),
+  m_supported_versions (supported_versions), m_flag_table (flag_table),
+  m_bitmask_group_id (bitmask_group_id),
+  m_bitmask_group_bit_pos (bitmask_group_bit_pos),
+  m_extra_extension_flags (extra_extension_flags)
+  {
+apply_extra_extension_flags (ext, m_flag_table);
+  }
+
+  /*

[PATCH v1 5/5] RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR cost 2

2025-05-08 Thread pan2 . li

From: Pan Li 

Add asm dump check and for vec_duplicate + vadd.vv combine case 1 to vadd.vx
with the cost of GR2VR is 2.  The testcases is not that tidy according
to the result, but we will continue tuning the cost model for this.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c   | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c   | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c   | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c| 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c   | 9 +
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u32.c   | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u64.c   | 8 
 .../gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u8.c| 8 
 8 files changed, 65 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c
new file mode 100644
index 000..d80f0c07d55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=2" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int16_t, +, VX_BINARY_BODY_X8)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c
new file mode 100644
index 000..99f6614eb7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i32.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=2" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int32_t, +, VX_BINARY_BODY_X4)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c
new file mode 100644
index 000..ab06c51914b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i64.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=2" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int64_t, +, VX_BINARY_BODY)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c
new file mode 100644
index 000..7ead9d09b79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-i8.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=2" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int8_t, +, VX_BINARY_BODY_X16)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c
new file mode 100644
index 000..79b754b934a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-6-u16.c
@@ -0,0 +1,9 @@
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=2" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(uint16_t, +, VX_BINARY_BODY_X8)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_va

[PATCH 8/8] RISC-V: Drop riscv_ext_flag_table in favor of riscv_ext_info_t data

2025-05-08 Thread Kito Cheng

Refactor extension flag handling by removing the old riscv_ext_flag_table and
sourcing all flag definitions directly from the flags field of the unified
riscv_ext_info_t structures generated from riscv-ext.def.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_extra_ext_flag_table_t):
New.
(riscv_ext_flag_table): Rename to ...
(riscv_extra_ext_flag_table): this, and drop most of definitions
that can obtained from the flags field of the riscv_ext_info_t
structures.
(apply_extra_extension_flags): Use riscv_ext_info_t.
(riscv_ext_is_subset): Ditto.
---
 gcc/common/config/riscv/riscv-common.cc | 213 +++-
 1 file changed, 27 insertions(+), 186 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index e6620600f3d6..e71692354bd3 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -47,21 +47,20 @@ typedef int (gcc_options::*opt_var_ref_t);
 typedef int (cl_target_option::*cl_opt_var_ref_t);
 
 /* Types for recording extension to internal flag.  */
-struct riscv_ext_flag_table_t
+struct riscv_extra_ext_flag_table_t
 {
-  riscv_ext_flag_table_t (const char *ext, opt_var_ref_t var_ref,
- cl_opt_var_ref_t cl_var_ref, int mask)
-   : ext (ext), var_ref (var_ref), cl_var_ref (cl_var_ref), mask (mask)
-  {}
-  riscv_ext_flag_table_t (opt_var_ref_t var_ref,
- cl_opt_var_ref_t cl_var_ref, int mask)
-   : ext (nullptr), var_ref (var_ref), cl_var_ref (cl_var_ref), mask (mask)
-  {}
-
   const char *ext;
   opt_var_ref_t var_ref;
   cl_opt_var_ref_t cl_var_ref;
   int mask;
+};
+
+/* Types for recording extension to internal flag.  */
+struct riscv_ext_flag_table_t
+{
+  opt_var_ref_t var_ref;
+  cl_opt_var_ref_t cl_var_ref;
+  int mask;
 
   void clean (gcc_options *opts) const { opts->*var_ref &= ~mask; }
 
@@ -1353,75 +1352,12 @@ riscv_arch_str (bool version_p)
 #define RISCV_EXT_FLAG_ENTRY(NAME, VAR, MASK) \
   {NAME, &gcc_options::VAR, &cl_target_option::VAR, MASK}
 
-/* Mapping table between extension to internal flag.  */
-static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
+/* Mapping table between extension to internal flag,
+   this table is not needed to add manually unless there is speical rule.  */
+static const riscv_extra_ext_flag_table_t riscv_extra_ext_flag_table[] =
 {
-  RISCV_EXT_FLAG_ENTRY ("e", x_riscv_base_subext, MASK_RVE),
-  RISCV_EXT_FLAG_ENTRY ("m", x_riscv_base_subext, MASK_MUL),
-  RISCV_EXT_FLAG_ENTRY ("a", x_riscv_base_subext, MASK_ATOMIC),
-  RISCV_EXT_FLAG_ENTRY ("f", x_riscv_base_subext, MASK_HARD_FLOAT),
-  RISCV_EXT_FLAG_ENTRY ("d", x_riscv_base_subext, MASK_DOUBLE_FLOAT),
-  RISCV_EXT_FLAG_ENTRY ("c", x_riscv_base_subext, MASK_RVC),
-  RISCV_EXT_FLAG_ENTRY ("v", x_riscv_isa_flags, MASK_FULL_V),
-  RISCV_EXT_FLAG_ENTRY ("v", x_riscv_isa_flags, MASK_VECTOR),
-
-  RISCV_EXT_FLAG_ENTRY ("zicsr",x_riscv_zi_subext, MASK_ZICSR),
-  RISCV_EXT_FLAG_ENTRY ("zifencei", x_riscv_zi_subext, MASK_ZIFENCEI),
-  RISCV_EXT_FLAG_ENTRY ("zicond",   x_riscv_zi_subext, MASK_ZICOND),
-
-  RISCV_EXT_FLAG_ENTRY ("za64rs",  x_riscv_za_subext, MASK_ZA64RS),
-  RISCV_EXT_FLAG_ENTRY ("za128rs", x_riscv_za_subext, MASK_ZA128RS),
-  RISCV_EXT_FLAG_ENTRY ("zawrs",   x_riscv_za_subext, MASK_ZAWRS),
-  RISCV_EXT_FLAG_ENTRY ("zaamo",   x_riscv_za_subext, MASK_ZAAMO),
-  RISCV_EXT_FLAG_ENTRY ("zalrsc",  x_riscv_za_subext, MASK_ZALRSC),
-  RISCV_EXT_FLAG_ENTRY ("zabha",   x_riscv_za_subext, MASK_ZABHA),
-  RISCV_EXT_FLAG_ENTRY ("zacas",   x_riscv_za_subext, MASK_ZACAS),
-  RISCV_EXT_FLAG_ENTRY ("zama16b", x_riscv_za_subext, MASK_ZAMA16B),
-
-  RISCV_EXT_FLAG_ENTRY ("zba", x_riscv_zb_subext, MASK_ZBA),
-  RISCV_EXT_FLAG_ENTRY ("zbb", x_riscv_zb_subext, MASK_ZBB),
-  RISCV_EXT_FLAG_ENTRY ("zbc", x_riscv_zb_subext, MASK_ZBC),
-  RISCV_EXT_FLAG_ENTRY ("zbs", x_riscv_zb_subext, MASK_ZBS),
-
-  RISCV_EXT_FLAG_ENTRY ("zfinx",x_riscv_zinx_subext, MASK_ZFINX),
-  RISCV_EXT_FLAG_ENTRY ("zdinx",x_riscv_zinx_subext, MASK_ZDINX),
-  RISCV_EXT_FLAG_ENTRY ("zhinx",x_riscv_zinx_subext, MASK_ZHINX),
-  RISCV_EXT_FLAG_ENTRY ("zhinxmin", x_riscv_zinx_subext, MASK_ZHINXMIN),
-
-  RISCV_EXT_FLAG_ENTRY ("zbkb",  x_riscv_zk_subext, MASK_ZBKB),
-  RISCV_EXT_FLAG_ENTRY ("zbkc",  x_riscv_zk_subext, MASK_ZBKC),
-  RISCV_EXT_FLAG_ENTRY ("zbkx",  x_riscv_zk_subext, MASK_ZBKX),
-  RISCV_EXT_FLAG_ENTRY ("zknd",  x_riscv_zk_subext, MASK_ZKND),
-  RISCV_EXT_FLAG_ENTRY ("zkne",  x_riscv_zk_subext, MASK_ZKNE),
-  RISCV_EXT_FLAG_ENTRY ("zknh",  x_riscv_zk_subext, MASK_ZKNH),
-  RISCV_EXT_FLAG_ENTRY ("zkr",   x_riscv_zk_subext, MASK_ZKR),
-  RISCV_EXT_FLAG_ENTRY ("zksed", x_riscv_zk_subext, MASK_ZKSED),
-  RISCV_EXT_FLAG_ENTRY ("zksh",  x_riscv_zk_subext, MASK_ZKSH),
-  RISCV_EXT_FLAG_ENTRY ("zkt",   x_riscv_zk_subext, MASK_ZKT),
-
-  RISCV_EXT_FLAG_ENTRY ("zihintntl",   x_riscv_zi

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-08 Thread Richard Earnshaw (lists)

On 07/05/2025 18:21, Richard Sandiford wrote:
> Richard Earnshaw  writes:
>> On 07/05/2025 17:28, Richard Earnshaw (lists) wrote:
>>> On 07/05/2025 16:54, Richard Sandiford wrote:
 Richard Earnshaw  writes:
> On 07/05/2025 13:57, Richard Sandiford wrote:
>> Kyrylo Tkachov  writes:
 On 7 May 2025, at 12:27, Karl Meakin  wrote:

 Commit the test file `cmpbr.c` before rules for generating the new
 instructions are added, so that the changes in codegen are more obvious
 in the next commit.
>>>
>>> I guess that’s an LLVM best practice.
>>> In GCC since we have the check-function-bodies mechanism we usually 
>>> prefer to include the relevant test together with the patch that adds 
>>> the optimization.
>>> But this is not wrong either.
>>>
>>>

 gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/cmpbr.c: New test.
 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1378 ++
 1 file changed, 1378 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

 diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
 b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
 new file mode 100644
 index 000..728d6ead91c
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
 @@ -0,0 +1,1378 @@
 +/* Test that the instructions added by FEAT_CMPBR are emitted */
 +/* { dg-do compile } */
 +/* { dg-options "-march=armv9.5-a+cmpbr -O2" } */
 +/* { dg-final { check-function-bodies "**" "" "" } } */
>>>
>>> As you’ll be adding new instructions to the compiler it’d be good to 
>>> have it a dg-do assemble test where possible.
>>
>> Agreed FWIW, but:
>>
>>> For that you’ll need to create a new aarch64_asm_cmpbr_ok target and 
>>> use it like so to fallback to dg-do compile when the assembler is too 
>>> old:
>>> /* { dg-do compile { target aarch64_asm_cmpbr_ok } } */
>>
>> ...dg-do assemble for this one :)
>
> I don't think that works. If the first dg-do fails the test is just 
> skipped.
>
> You need to replicate the test with separate dg-do directives, IIRC.

 Hmm, can you remember the circumstances when you saw that?
 We've been using the construct that Kyrill suggested with apparent
 success in things like aarch64-sve2-acle-asm.exp.  E.g.:
>>>
>>> Well, the implementation of dg-do contains the comment:
>>>
>>> # Note: A previous occurrence of `dg-do' with target/xfail selectors
>>> # is a user mistake.  We clobber previous values here.
>>>  
>>> So one might interpret that as meaning multiple dg-do's are not intended to 
>>> be supported.
>>>
>>> But I might have misremembered the exact scenario I was facing.  I think it 
>>> might have been that a test failed to fall back to the dg-do-default if a 
>>> specific dg-do didn't match.  The scenario I remember was something like 
>>> dg-do-default = compile, then the test was trying to change that to execute 
>>> if HW was available; but that meant that if it wasn't we didn't fall back 
>>> to checking the assembler output.
>>>
>>
>> The comment at the head of the function says:
>>
>> # Multiple instances are supported (since we don't support target and xfail
>> # selectors on one line), though it doesn't make much sense to change the
>> # compile/assemble/link/run field.  Nor does it make any sense to have
>> # multiple lines of target selectors (use one line).
>>
>> So maybe the code is intended to support multiple reasons for skipping the 
>> test (but why not use require-effective-target
>> for that).
>>
>> I'm not sure now what's going on...
> 
> Richard and I discussed this more off-list, and it turns out that the
> above construct started to work after:
> 
> https://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=569f8718b534a2cd9511a7d640352eb0126ff492
> 
> which was first releasted in 1.6 (9 years ago).  Before that, the "what"
> (compile/assemble/etc.) in the last dg-do won, regardless of whether
> the dg-do was selected or deselected.
> 
> I suppose the question is whether we can reasonably assume that people
> are using dejagnu 1.6+ or whether we need to support older dejagnus.
> 
> It looks like Alex added dg-do-if (in 
> e6f5fadec5f6a719145ed2ed513209ec3e8eeb2f)
> to support older dejagnu, so that's an option.  I.e.:
> 
> /* { dg-do compile } */
> /* { dg-do-if assemble { target aarch64_asm_cmpbr_ok } } */
> 
> Which if it works (haven't tried!) also avoids having to specify the
> selector twice.  Not sure whether it's worth going back and changing
> all existing aarch64 tests to this style though.

If the default action for the directory is "compile", as it often is, you don't 
even need the initial dg-do.

R.

> 
> TIL :)  Thanks Richard for bringing it up.
> 
> Richard

Re: [PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread Richard Biener

On Thu, May 8, 2025 at 10:02 AM Jiawei  wrote:
>
> RISC-V backend don't support '-mgeneral-regs-only' option, skip it.
> https://godbolt.org/z/38M8vPW74

The test should instead use

/* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-*
i?86-*-* } } } */

OK with that change.

Richard.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr119160.c: Skip for RISC-V backend.
>
> ---
>  gcc/testsuite/gcc.dg/pr119160.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.dg/pr119160.c b/gcc/testsuite/gcc.dg/pr119160.c
> index b4629a11d9..d96e4de169 100644
> --- a/gcc/testsuite/gcc.dg/pr119160.c
> +++ b/gcc/testsuite/gcc.dg/pr119160.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
> -fnon-call-exceptions -fschedule-insns -mgeneral-regs-only -Wno-psabi" } */
> +/* { dg-skip-if "" { riscv*-*-* } } */
>
>  typedef __attribute__((__vector_size__ (32))) int V;
>
> --
> 2.43.0
>

Re: [PATCH] testsuite: Skip pr119160 for RISC-V backend.

2025-05-08 Thread Konstantinos Eleftheriou

Hi,
This should be restricted to arm/aarch64 and x86. So it should be:

/* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-*
i?86-*-* aarch64*-*-* arm*-*-* } } } */

Konstantinos

On Thu, May 8, 2025 at 11:36 AM jiawei  wrote:
>
>
> 在 2025/5/8 16:25, Richard Biener 写道:
> > On Thu, May 8, 2025 at 10:02 AM Jiawei  wrote:
> >> RISC-V backend don't support '-mgeneral-regs-only' option, skip it.
> >> https://godbolt.org/z/38M8vPW74
> > The test should instead use
> >
> > /* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-*
> > i?86-*-* } } } */
> >
> > OK with that change.
> >
> > Richard.
>
> Thanks for your suggestion, will update it in the next version.
>
> BR,
>
> Jiawei
>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/pr119160.c: Skip for RISC-V backend.
> >>
> >> ---
> >>   gcc/testsuite/gcc.dg/pr119160.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/pr119160.c 
> >> b/gcc/testsuite/gcc.dg/pr119160.c
> >> index b4629a11d9..d96e4de169 100644
> >> --- a/gcc/testsuite/gcc.dg/pr119160.c
> >> +++ b/gcc/testsuite/gcc.dg/pr119160.c
> >> @@ -1,5 +1,6 @@
> >>   /* { dg-do run } */
> >>   /* { dg-options "-O2 -finstrument-functions-once 
> >> -favoid-store-forwarding -fnon-call-exceptions -fschedule-insns 
> >> -mgeneral-regs-only -Wno-psabi" } */
> >> +/* { dg-skip-if "" { riscv*-*-* } } */
> >>
> >>   typedef __attribute__((__vector_size__ (32))) int V;
> >>
> >> --
> >> 2.43.0
> >>
>

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-08 Thread Richard Earnshaw (lists)

On 08/05/2025 10:21, Kyrylo Tkachov wrote:
> Hi Richard,
> 
>> On 7 May 2025, at 18:15, Richard Earnshaw  wrote:
>>
>>
>> The header file for the Arm implementation of mmintrin.h was changed in 
>> GCC-15
>> to disable access to the intrinsics.  This patch removes the internal code
>> as well.
>>
>> We still allow -mcpu/-march options for the wmmx cpus, but they are now 
>> treated
>> in exactly the same way as XScale - generating code for an Armv5te 
>> architecture.
>>
> 
> Great to see this cleanup.
> 
>> Richard Earnshaw (13):
>>  arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS
>>  arm: testsuite: remove iwmmxt tests
>>  arm: treat -mcpu/arch=iwmmxt{,2} like XScale
>>  arm: remove iWMMX builtins support.
>>  arm: Remove iwmmxt patterns.
>>  arm: remove IWMMXT checks from MD files.
>>  arm: remove support for the iwmmxt ABI variant.
>>  arm: Remove iwmmxt support from arm.cc
>>  arm: remove iwmmxt-related attributes from machine description
>>  arm: cleanup iterators.md after removing iwmmxt
>>  arm: remove dead predefines when using WMMX
>>  arm: remove most remaining iwmmxt code.
>>  arm: remove iwmmxt registers from allocator tables
> 
> There’s a few references to iWMMXT remaining in doc/ referring to builtins 
> and constraints that need to be cleaned up.

Good catch.  I'll fix that when I push the changes.

Thanks.

R.

> Thanks,
> Kyrill
> 
>>
>> gcc/config.gcc |2 +-
>> gcc/config/arm/aout.h  |5 -
>> gcc/config/arm/arm-builtins.cc | 1276 +
>> gcc/config/arm/arm-c.cc|7 -
>> gcc/config/arm/arm-cpus.in |   28 +-
>> gcc/config/arm/arm-generic.md  |4 +-
>> gcc/config/arm/arm-opts.h  |1 -
>> gcc/config/arm/arm-protos.h|8 -
>> gcc/config/arm/arm-tables.opt  |6 -
>> gcc/config/arm/arm-tune.md |   53 +-
>> gcc/config/arm/arm.cc  |  401 +-
>> gcc/config/arm/arm.h   |  169 +--
>> gcc/config/arm/arm.md  |   43 +-
>> gcc/config/arm/arm.opt |3 -
>> gcc/config/arm/constraints.md  |   18 +-
>> gcc/config/arm/iterators.md|   20 +-
>> gcc/config/arm/iwmmxt.md   | 1766 
>> gcc/config/arm/iwmmxt2.md  |  903 
>> gcc/config/arm/marvell-f-iwmmxt.md |  189 ---
>> gcc/config/arm/predicates.md   |8 +-
>> gcc/config/arm/t-arm   |3 -
>> gcc/config/arm/thumb2.md   |2 +-
>> gcc/config/arm/types.md|  123 --
>> gcc/config/arm/unspecs.md  |   29 -
>> gcc/config/arm/vec-common.md   |   31 +-
>> gcc/doc/invoke.texi|2 +-
>> gcc/doc/sourcebuild.texi   |4 -
>> gcc/testsuite/gcc.target/arm/ivopts.c  |3 +-
>> gcc/testsuite/gcc.target/arm/mmx-1.c   |   26 -
>> gcc/testsuite/gcc.target/arm/mmx-2.c   |  166 ---
>> gcc/testsuite/gcc.target/arm/pr64208.c |   25 -
>> gcc/testsuite/gcc.target/arm/pr79145.c |   16 -
>> gcc/testsuite/gcc.target/arm/pr99724.c |   31 -
>> gcc/testsuite/gcc.target/arm/pr99786.c |   30 -
>> gcc/testsuite/lib/target-supports.exp  |   13 -
>> 35 files changed, 141 insertions(+), 5273 deletions(-)
>> delete mode 100644 gcc/config/arm/iwmmxt.md
>> delete mode 100644 gcc/config/arm/iwmmxt2.md
>> delete mode 100644 gcc/config/arm/marvell-f-iwmmxt.md
>> delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-1.c
>> delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c
>> delete mode 100644 gcc/testsuite/gcc.target/arm/pr64208.c
>> delete mode 100644 gcc/testsuite/gcc.target/arm/pr79145.c
>> delete mode 100644 gcc/testsuite/gcc.target/arm/pr99724.c
>> delete mode 100644 gcc/testsuite/gcc.target/arm/pr99786.c
>>
>> -- 
>> 2.43.0
>>
>

Re: Unreviewed COBOL patches

2025-05-08 Thread Rainer Orth

Hi Robert,

> Thank you for the reminder, and accept my apologies for the delays.
>
> Jim and I have been been distracted by an intense effort to rewrite
> exception/declarative processing.  There has also been a serious family
> health issue that caused us significant delays as well.

no worries at all: I'd just kept the patches local for now.  Only when
Iain tried to verify his program_invocation_short_name patch on Solaris
in the cfarm I was reminded that you cannot build COBOL on Solaris out
of the box.

> I finished a sub-project yesterday, and I will look into these four
> patches today.

Excellent, thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] libgcobol: Heed --enable-libgcobol

2025-05-08 Thread Rainer Orth

Hi Robert,

>> -Original Message-
>> From: Rainer Orth 
>> Sent: Friday, April 11, 2025 05:26
>> To: gcc-patches@gcc.gnu.org
>> Cc: Robert Dubner ; James K. Lowden
>> 
>> Subject: [PATCH] libgcobol: Heed --enable-libgcobol
>>
>> If some target isn't listed as supported in configure.tgt,
>> --enable-libgcobol cannot override that.  However, that's what should
>> happen just like an explicit --enable-languages=cobol forces the
>> frontend to be built.
>>
>> This patch, shamelessly adapted from libphobos, does just that.
>>
>> Tested on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11, and
>> x86_64-pc-linux-gnu.
>>
>> Ok for trunk?
>
> I was unable to apply this patch.  "git apply ..." results in
>
> :~/repos/gcc-cobol$ git apply libgcobol-enable-libgcobol.patch
> error: patch failed: libgcobol/configure:788
> error: libgcobol/configure: patch does not apply
> error: patch failed: libgcobol/configure.ac:40
> error: libgcobol/configure.ac: patch does not apply
>
> I don't understand the problem, but I don't know much about how diff and 
> apply work.

That's my fault actually: I had rebased the patch locally a few times
since submission due to upstream changes to libgcobol/configure.ac.
However, since the gist of the patch wasn't affected by those, I'd
neglected to post those rebased versions.

> I have no way of checking the solaris part of it; I was just trying to do 
> "due diligence", and check that it didn't adversely affect x86_64-linux-gnu.

You'd have been able to verify it on every target not currently marked
as supported in libgcobol/configure.tgt, actually.  I'd already run
bootstraps with it on Linux/x86_64 just to make sure nothing breaks.

> But since I can't do that, all I can say is, I see no reason for you not to 
> apply a patch you know works.

Thanks.  All four patches have been committed now.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [GCC16, RFC, V2 06/14] opts: doc: aarch64: add new memtag sanitizer

2025-05-08 Thread Richard Sandiford

Indu Bhagat  writes:
>>> [...]
>>> diff --git a/gcc/opts.cc b/gcc/opts.cc
>>> index 86c6691ecec4..00db662c32ef 100644
>>> --- a/gcc/opts.cc
>>> +++ b/gcc/opts.cc
>>> [...]
>>> @@ -2780,6 +2788,13 @@ common_handle_option (struct gcc_options *opts,
>>>   SET_OPTION_IF_UNSET (opts, opts_set,
>>>param_hwasan_instrument_allocas, 0);
>>> }
>>> +  /* Memtag sanitizer implies HWASAN but with tags always generated by 
>>> the
>>> +hardware randomly.  */
>>> +  if (opts->x_flag_sanitize & SANITIZE_MEMTAG)
>>> +   {
>>> + SET_OPTION_IF_UNSET (opts, opts_set,
>>> +  param_hwasan_random_frame_tag, 1);
>>> +   }
>> 
>> Does this have any effect in practice?  The default seems to be 1,
>> so I would expect this to be a nop.  The pattern elsewhere in the
>> sanitiser code seems to be to use SET_OPTION_IF_UNSET only to turn
>> features off.
>> 
>
> You're right.  This can be removed.
>
> I recall now that what I wanted to achieve was to render users' usage of 
> "--param hwasan-random-frame-tag=0" non-consequential when memtag 
> sanitizer is in effect.
>
> Looks like I need to handle in finish_options ().  Something like:
>
>if ((opts->x_flag_sanitize & SANITIZE_MEMTAG_STACK)
>&& opts->x_param_hwasan_random_frame_tag == 0)
>  {
> warning_at (loc, OPT_fsanitize_,
> "%<--param hwasan-random-frame-tag=0%> is not 
> supported with"
> "%<-fsanitize=memtag-stack%>");
> opts->x_param_hwasan_random_frame_tag = 1;
>  }

LGTM, although there's a missing space in the string continuation.

Richard

[PATCH 2/8] RISC-V: Use riscv-ext.def to generate target options and variables

2025-05-08 Thread Kito Cheng

Leverage the centralized riscv-ext.def definitions to auto-generate
the target option parsing and associated internal flags, replacing
manual listings in riscv.opt; `riscv_ext_flag_table` part will remove in
later patch.

gcc/ChangeLog:

* config/riscv/gen-riscv-ext-opt.cc: New.
* config/riscv/riscv.opt: Drop manual entries for target
options, and include riscv-ext.opt.
* config/riscv/riscv-ext.opt: New.
* config/riscv/riscv-ext.opt.urls: New.
* config.gcc: Add riscv-ext.opt to the list of target options files.
* config/riscv/riscv-common.cc (riscv_ext_flag_table): Adjsut target
option variable entry.
(riscv_set_arch_by_subset_list): Adjust target option variable.
* config/riscv/riscv-c.cc (riscv_ext_flag_table): Adjust target
option variable entry.
* config/riscv/riscv-vector-builtins.cc (pragma_intrinsic_flags):
Adjust variable name.
(riscv_pragma_intrinsic_flags_pollute): Adjust variable name.
(riscv_pragma_intrinsic_flags_restore): Ditto.
* config/riscv/t-riscv: Add the rule for generating
riscv-ext.opt.
---
 gcc/common/config/riscv/riscv-common.cc   | 102 +++---
 gcc/config.gcc|   1 +
 gcc/config/riscv/gen-riscv-ext-opt.cc | 105 ++
 gcc/config/riscv/riscv-c.cc   |   8 +-
 gcc/config/riscv/riscv-ext.opt| 400 ++
 gcc/config/riscv/riscv-ext.opt.urls   |   0
 gcc/config/riscv/riscv-opts.h |  12 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  20 +-
 gcc/config/riscv/riscv.opt| 313 +
 gcc/config/riscv/t-riscv  |  13 +
 10 files changed, 595 insertions(+), 379 deletions(-)
 create mode 100644 gcc/config/riscv/gen-riscv-ext-opt.cc
 create mode 100644 gcc/config/riscv/riscv-ext.opt
 create mode 100644 gcc/config/riscv/riscv-ext.opt.urls

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index ca14eb96b253..b5a06a46e0e7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1638,14 +1638,14 @@ struct riscv_ext_flag_table_t {
 /* Mapping table between extension to internal flag.  */
 static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
 {
-  RISCV_EXT_FLAG_ENTRY ("e", x_target_flags, MASK_RVE),
-  RISCV_EXT_FLAG_ENTRY ("m", x_target_flags, MASK_MUL),
-  RISCV_EXT_FLAG_ENTRY ("a", x_target_flags, MASK_ATOMIC),
-  RISCV_EXT_FLAG_ENTRY ("f", x_target_flags, MASK_HARD_FLOAT),
-  RISCV_EXT_FLAG_ENTRY ("d", x_target_flags, MASK_DOUBLE_FLOAT),
-  RISCV_EXT_FLAG_ENTRY ("c", x_target_flags, MASK_RVC),
-  RISCV_EXT_FLAG_ENTRY ("v", x_target_flags, MASK_FULL_V),
-  RISCV_EXT_FLAG_ENTRY ("v", x_target_flags, MASK_VECTOR),
+  RISCV_EXT_FLAG_ENTRY ("e", x_riscv_base_subext, MASK_RVE),
+  RISCV_EXT_FLAG_ENTRY ("m", x_riscv_base_subext, MASK_MUL),
+  RISCV_EXT_FLAG_ENTRY ("a", x_riscv_base_subext, MASK_ATOMIC),
+  RISCV_EXT_FLAG_ENTRY ("f", x_riscv_base_subext, MASK_HARD_FLOAT),
+  RISCV_EXT_FLAG_ENTRY ("d", x_riscv_base_subext, MASK_DOUBLE_FLOAT),
+  RISCV_EXT_FLAG_ENTRY ("c", x_riscv_base_subext, MASK_RVC),
+  RISCV_EXT_FLAG_ENTRY ("v", x_riscv_isa_flags, MASK_FULL_V),
+  RISCV_EXT_FLAG_ENTRY ("v", x_riscv_isa_flags, MASK_VECTOR),
 
   RISCV_EXT_FLAG_ENTRY ("zicsr",x_riscv_zi_subext, MASK_ZICSR),
   RISCV_EXT_FLAG_ENTRY ("zifencei", x_riscv_zi_subext, MASK_ZIFENCEI),
@@ -1688,22 +1688,22 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("zicclsm", x_riscv_zi_subext, MASK_ZICCLSM),
   RISCV_EXT_FLAG_ENTRY ("ziccrse", x_riscv_zi_subext, MASK_ZICCRSE),
 
-  RISCV_EXT_FLAG_ENTRY ("zicboz", x_riscv_zicmo_subext, MASK_ZICBOZ),
-  RISCV_EXT_FLAG_ENTRY ("zicbom", x_riscv_zicmo_subext, MASK_ZICBOM),
-  RISCV_EXT_FLAG_ENTRY ("zicbop", x_riscv_zicmo_subext, MASK_ZICBOP),
-  RISCV_EXT_FLAG_ENTRY ("zic64b", x_riscv_zicmo_subext, MASK_ZIC64B),
+  RISCV_EXT_FLAG_ENTRY ("zicboz", x_riscv_zi_subext, MASK_ZICBOZ),
+  RISCV_EXT_FLAG_ENTRY ("zicbom", x_riscv_zi_subext, MASK_ZICBOM),
+  RISCV_EXT_FLAG_ENTRY ("zicbop", x_riscv_zi_subext, MASK_ZICBOP),
+  RISCV_EXT_FLAG_ENTRY ("zic64b", x_riscv_zi_subext, MASK_ZIC64B),
 
   RISCV_EXT_FLAG_ENTRY ("zicfiss", x_riscv_zi_subext, MASK_ZICFISS),
   RISCV_EXT_FLAG_ENTRY ("zicfilp", x_riscv_zi_subext, MASK_ZICFILP),
 
-  RISCV_EXT_FLAG_ENTRY ("zimop", x_riscv_mop_subext, MASK_ZIMOP),
-  RISCV_EXT_FLAG_ENTRY ("zcmop", x_riscv_mop_subext, MASK_ZCMOP),
+  RISCV_EXT_FLAG_ENTRY ("zimop", x_riscv_zi_subext, MASK_ZIMOP),
+  RISCV_EXT_FLAG_ENTRY ("zcmop", x_riscv_zc_subext, MASK_ZCMOP),
 
-  RISCV_EXT_FLAG_ENTRY ("zve32x", x_target_flags, MASK_VECTOR),
-  RISCV_EXT_FLAG_ENTRY ("zve32f", x_target_flags, MASK_VECTOR),
-  RISCV_EXT_FLAG_ENTRY ("zve64x", x_target_flags, MASK_VECTOR),
-  RISCV_EXT_FLAG_ENTRY ("zve64f", x_target_flags, MASK_VECTOR),
-  RISCV_EXT_FLAG_ENTRY ("zve64d", x_target_

[PATCH 7/8] RISC-V: Drop riscv_ext_version_table in favor of riscv_ext_info_t data

2025-05-08 Thread Kito Cheng

This commit drops the riscv_ext_version_table and instead uses the
riscv_ext_info_t data structure to provide the version information
for RISC-V extensions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Remove.
(standard_extensions_p): Use riscv_ext_info_t.
(get_default_version): Use riscv_ext_info_t.
(riscv_arch_help): Ditto.
---
 gcc/common/config/riscv/riscv-common.cc | 262 ++--
 1 file changed, 22 insertions(+), 240 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index faec954e0c15..e6620600f3d6 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -263,215 +263,6 @@ struct riscv_ext_version
   int minor_version;
 };
 
-/* All standard extensions defined in all supported ISA spec.  */
-static const struct riscv_ext_version riscv_ext_version_table[] =
-{
-  /* name, ISA spec, major version, minor_version.  */
-  {"e", ISA_SPEC_CLASS_20191213, 2, 0},
-  {"e", ISA_SPEC_CLASS_20190608, 2, 0},
-  {"e", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"i", ISA_SPEC_CLASS_20191213, 2, 1},
-  {"i", ISA_SPEC_CLASS_20190608, 2, 1},
-  {"i", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"m", ISA_SPEC_CLASS_20191213, 2, 0},
-  {"m", ISA_SPEC_CLASS_20190608, 2, 0},
-  {"m", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"a", ISA_SPEC_CLASS_20191213, 2, 1},
-  {"a", ISA_SPEC_CLASS_20190608, 2, 0},
-  {"a", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"f", ISA_SPEC_CLASS_20191213, 2, 2},
-  {"f", ISA_SPEC_CLASS_20190608, 2, 2},
-  {"f", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"d", ISA_SPEC_CLASS_20191213, 2, 2},
-  {"d", ISA_SPEC_CLASS_20190608, 2, 2},
-  {"d", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"c", ISA_SPEC_CLASS_20191213, 2, 0},
-  {"c", ISA_SPEC_CLASS_20190608, 2, 0},
-  {"c", ISA_SPEC_CLASS_2P2,  2, 0},
-
-  {"b",   ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"h",   ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"v",   ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zicsr", ISA_SPEC_CLASS_20191213, 2, 0},
-  {"zicsr", ISA_SPEC_CLASS_20190608, 2, 0},
-
-  {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
-  {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
-
-  {"zicond", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"za64rs",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"za128rs", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zaamo", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zalrsc", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zabha", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zacas", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zama16b", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zbs", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zfinx", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zdinx", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zhinx", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zhinxmin", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zbkb",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zbkc",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zbkx",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zkne",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zknd",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zknh",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zkr",   ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zksed", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zksh",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zihintntl", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zihintpause", ISA_SPEC_CLASS_NONE, 2, 0},
-
-  {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zic64b",   ISA_SPEC_CLASS_NONE, 1, 0},
-  {"ziccamoa", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"ziccif",   ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zicclsm",  ISA_SPEC_CLASS_NONE, 1, 0},
-  {"ziccrse",  ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zicfiss", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zicfilp", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zimop", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zcmop", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zicntr", ISA_SPEC_CLASS_NONE, 2, 0},
-  {"zihpm",  ISA_SPEC_CLASS_NONE, 2, 0},
-
-  {"zk",ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zks",   ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"ztso",  ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zve32x", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zve32f", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zve64x", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zve64f", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zve64d", ISA_SPEC_CLASS_NONE, 1, 0},
-
-  {"zvbb", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvbc", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvkb", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvkg", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvkned", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvknha", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvknhb", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvksed", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvksh", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvkn", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvknc", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvkng", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvks", ISA_SPEC_CLASS_NONE, 1, 0},
-  {"zvksc", ISA_SPEC_CLASS_NONE, 1, 0},

[PATCH] testsuite: Limit option '-mgeneral-regs-only' backends in pr119160.

2025-05-08 Thread Jiawei

Limit option '-mgeneral-regs-only' to those in supported backends.

Version log:
https://patchwork.sourceware.org/project/gcc/patch/20250508080102.1340059-1-jia...@iscas.ac.cn/

gcc/testsuite/ChangeLog:

* gcc.dg/pr119160.c: Limit backends.

---
 gcc/testsuite/gcc.dg/pr119160.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr119160.c b/gcc/testsuite/gcc.dg/pr119160.c
index b4629a11d9..5743b3b760 100644
--- a/gcc/testsuite/gcc.dg/pr119160.c
+++ b/gcc/testsuite/gcc.dg/pr119160.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
-fnon-call-exceptions -fschedule-insns -mgeneral-regs-only -Wno-psabi" } */
+/* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
-fnon-call-exceptions -fschedule-insns -Wno-psabi" } */
+/* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-* 
i?86-*-* arm*-*-* aarch64*-*-* } } } */
 
 typedef __attribute__((__vector_size__ (32))) int V;
 
-- 
2.43.0

Re: [PATCH] testsuite: Limit option '-mgeneral-regs-only' backends in pr119160.

2025-05-08 Thread Richard Biener

On Thu, May 8, 2025 at 11:04 AM Jiawei  wrote:
>
> Limit option '-mgeneral-regs-only' to those in supported backends.
>
> Version log:
> https://patchwork.sourceware.org/project/gcc/patch/20250508080102.1340059-1-jia...@iscas.ac.cn/

OK.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr119160.c: Limit backends.
>
> ---
>  gcc/testsuite/gcc.dg/pr119160.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/pr119160.c b/gcc/testsuite/gcc.dg/pr119160.c
> index b4629a11d9..5743b3b760 100644
> --- a/gcc/testsuite/gcc.dg/pr119160.c
> +++ b/gcc/testsuite/gcc.dg/pr119160.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
> -fnon-call-exceptions -fschedule-insns -mgeneral-regs-only -Wno-psabi" } */
> +/* { dg-options "-O2 -finstrument-functions-once -favoid-store-forwarding 
> -fnon-call-exceptions -fschedule-insns -Wno-psabi" } */
> +/* { dg-additional-options "-mgeneral-regs-only" { target { x86_64-*-* 
> i?86-*-* arm*-*-* aarch64*-*-* } } } */
>
>  typedef __attribute__((__vector_size__ (32))) int V;
>
> --
> 2.43.0
>

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-08 Thread Kyrylo Tkachov

Hi Richard,

> On 7 May 2025, at 18:15, Richard Earnshaw  wrote:
> 
> 
> The header file for the Arm implementation of mmintrin.h was changed in GCC-15
> to disable access to the intrinsics.  This patch removes the internal code
> as well.
> 
> We still allow -mcpu/-march options for the wmmx cpus, but they are now 
> treated
> in exactly the same way as XScale - generating code for an Armv5te 
> architecture.
> 

Great to see this cleanup.

> Richard Earnshaw (13):
>  arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS
>  arm: testsuite: remove iwmmxt tests
>  arm: treat -mcpu/arch=iwmmxt{,2} like XScale
>  arm: remove iWMMX builtins support.
>  arm: Remove iwmmxt patterns.
>  arm: remove IWMMXT checks from MD files.
>  arm: remove support for the iwmmxt ABI variant.
>  arm: Remove iwmmxt support from arm.cc
>  arm: remove iwmmxt-related attributes from machine description
>  arm: cleanup iterators.md after removing iwmmxt
>  arm: remove dead predefines when using WMMX
>  arm: remove most remaining iwmmxt code.
>  arm: remove iwmmxt registers from allocator tables

There’s a few references to iWMMXT remaining in doc/ referring to builtins and 
constraints that need to be cleaned up.
Thanks,
Kyrill

> 
> gcc/config.gcc |2 +-
> gcc/config/arm/aout.h  |5 -
> gcc/config/arm/arm-builtins.cc | 1276 +
> gcc/config/arm/arm-c.cc|7 -
> gcc/config/arm/arm-cpus.in |   28 +-
> gcc/config/arm/arm-generic.md  |4 +-
> gcc/config/arm/arm-opts.h  |1 -
> gcc/config/arm/arm-protos.h|8 -
> gcc/config/arm/arm-tables.opt  |6 -
> gcc/config/arm/arm-tune.md |   53 +-
> gcc/config/arm/arm.cc  |  401 +-
> gcc/config/arm/arm.h   |  169 +--
> gcc/config/arm/arm.md  |   43 +-
> gcc/config/arm/arm.opt |3 -
> gcc/config/arm/constraints.md  |   18 +-
> gcc/config/arm/iterators.md|   20 +-
> gcc/config/arm/iwmmxt.md   | 1766 
> gcc/config/arm/iwmmxt2.md  |  903 
> gcc/config/arm/marvell-f-iwmmxt.md |  189 ---
> gcc/config/arm/predicates.md   |8 +-
> gcc/config/arm/t-arm   |3 -
> gcc/config/arm/thumb2.md   |2 +-
> gcc/config/arm/types.md|  123 --
> gcc/config/arm/unspecs.md  |   29 -
> gcc/config/arm/vec-common.md   |   31 +-
> gcc/doc/invoke.texi|2 +-
> gcc/doc/sourcebuild.texi   |4 -
> gcc/testsuite/gcc.target/arm/ivopts.c  |3 +-
> gcc/testsuite/gcc.target/arm/mmx-1.c   |   26 -
> gcc/testsuite/gcc.target/arm/mmx-2.c   |  166 ---
> gcc/testsuite/gcc.target/arm/pr64208.c |   25 -
> gcc/testsuite/gcc.target/arm/pr79145.c |   16 -
> gcc/testsuite/gcc.target/arm/pr99724.c |   31 -
> gcc/testsuite/gcc.target/arm/pr99786.c |   30 -
> gcc/testsuite/lib/target-supports.exp  |   13 -
> 35 files changed, 141 insertions(+), 5273 deletions(-)
> delete mode 100644 gcc/config/arm/iwmmxt.md
> delete mode 100644 gcc/config/arm/iwmmxt2.md
> delete mode 100644 gcc/config/arm/marvell-f-iwmmxt.md
> delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-1.c
> delete mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c
> delete mode 100644 gcc/testsuite/gcc.target/arm/pr64208.c
> delete mode 100644 gcc/testsuite/gcc.target/arm/pr79145.c
> delete mode 100644 gcc/testsuite/gcc.target/arm/pr99724.c
> delete mode 100644 gcc/testsuite/gcc.target/arm/pr99786.c
> 
> -- 
> 2.43.0
>

[PATCH 1/2] aarch64: Fix up commutative and early-clobber markers on compact insns

2025-05-08 Thread Richard Earnshaw

For constraints there are operand modifiers and constraint qualifiers.
Operand modifiers apply to all alternatives and must appear, in
traditional syntax before the first alternative.  Constraint
qualifiers, on the other hand must appear in each alternative to which
they apply.

There's no easy way to validate the distinction in the traditional md
format, but when using the new compact format we can enforce some
semantic checking of these characters to avoid some potentially
surprising code generation.

Fortunately, all of these errors are benign, but the two misplaced
early-clobber markers were quite suspicious at first sight - it's only
by luck that the second alternative does not need an early-clobber.

The syntax checking will be added in the following patch, but first of
all, fix up the errors in aarch64.md.

gcc/
* config/aarch64/aarch64-sve.md (@aarch64_pred_): Move
commutative marker to the cons specification.
(add3): Likewise.
(@aarch64_pred_abd): Likewise.
(@aarch64_pred_): Likewise.
(*cond__z): Likewise.
(3): Likewise.
(@aarch64_pred_): Likewise.
(*aarch64_pred_abd_relaxed): Likewise.
(*aarch64_pred_abd_strict): Likewise.
(@aarch64_pred_): Likewise.
(@aarch64_pred_): Likewise.
(@aarch64_pred_fma): Likewise.
(@aarch64_pred_fnma): Likewise.
(@aarch64_pred_): Likewise.

* config/aarch64/aarch64-sve2.md (@aarch64_sve_clamp): Move
commutative marker to the cons specification.
(*aarch64_sve_clamp_x): Likewise.
(@aarch64_sve_fclamp): Likewise.
(*aarch64_sve_fclamp_x): Likewise.
(*aarch64_sve2_nor): Likewise.
(*aarch64_sve2_nand): Likewise.
(*aarch64_pred_faminmax_fused): Likewise.

* config/aarch64/aarch64.md (*loadwb_pre_pair_): Move the
early-clobber marker to the relevant alternative.
(*storewb_pre_pair_): Likewise.
(*add3_aarch64): Move commutative marker to the cons
specification.
(*addsi3_aarch64_uxtw): Likewise.
(*add3_poly_1): Likewise.
(add3_compare0): Likewise.
(*addsi3_compare0_uxtw): Likewise.
(*add3nr_compare0): Likewise.
(3): Likewise.
(*si3_uxtw): Likewise.
(*and3_compare0): Likewise.
(*andsi3_compare0_uxtw): Likewise.
(@aarch64_and3nr_compare0): Likewise.
---
 gcc/config/aarch64/aarch64-sve.md  |  56 
 gcc/config/aarch64/aarch64-sve2.md |  28 
 gcc/config/aarch64/aarch64.md  | 102 ++---
 3 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index f39af6e24d5..bf0e57df62d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3984,8 +3984,8 @@ (define_insn_and_split "@aarch64_pred_"
 (match_operand:SVE_I_SIMD_DI 3 
"aarch64_sve__operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 2  , 3 ; attrs: movprfx ]
- [ w, Upl , %0 ,  ; *  ] #
+  {@ [ cons: =0 , 1   , %2 , 3 ; attrs: movprfx ]
+ [ w, Upl , 0  ,  ; *  ] #
  [ w, Upl , 0  , w ; *  ] 
\t%Z0., %1/m, %Z0., %Z3.
  [ ?&w  , Upl , w  ,  ; yes] #
  [ ?&w  , Upl , w  , w ; yes] movprfx\t%Z0, 
%Z2\;\t%Z0., %1/m, %Z0., %Z3.
@@ -4114,8 +4114,8 @@ (define_insn "add3"
  (match_operand:SVE_I 1 "register_operand")
  (match_operand:SVE_I 2 "aarch64_sve_add_operand")))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1  , 2   ; attrs: movprfx ]
- [ w, %0 , vsa ; *  ] add\t%0., %0., 
#%D2
+  {@ [ cons: =0 , %1 , 2   ; attrs: movprfx ]
+ [ w, 0  , vsa ; *  ] add\t%0., %0., 
#%D2
  [ w, 0  , vsn ; *  ] sub\t%0., %0., 
#%N2
  [ w, 0  , vsi ; *  ] << 
aarch64_output_sve_vector_inc_dec ("%0.", operands[2]);
  [ ?w   , w  , vsa ; yes] movprfx\t%0, 
%1\;add\t%0., %0., #%D2
@@ -4333,8 +4333,8 @@ (define_insn "@aarch64_pred_abd"
   (match_dup 3))]
UNSPEC_PRED_X)))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 2  , 3 ; attrs: movprfx ]
- [ w, Upl , %0 , w ; *  ] abd\t%0., %1/m, 
%0., %3.
+  {@ [ cons: =0 , 1   , %2 , 3 ; attrs: movprfx ]
+ [ w, Upl , 0  , w ; *  ] abd\t%0., %1/m, 
%0., %3.
  [ ?&w  , Upl , w  , w ; yes] movprfx\t%0, 
%2\;abd\t%0., %1/m, %0., %3.
   }
 )
@@ -4548,8 +4548,8 @@ (define_insn "@aarch64_pred_"
 MUL_HIGHPART)]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 2  , 3 ; attrs: movprfx ]
- [ w, Upl , %0 , w ; *  ] mulh\t%0., %1/m, 
%0., %3.
+  {@ [ cons: =0 , 1   , %2 , 3 ; attrs: movprfx ]
+ [ w, Upl , 0  ,

[PATCH 2/2] gensupport: validate compact constraint modifiers

2025-05-08 Thread Richard Earnshaw

For constraints there are operand modifiers and constraint qualifiers.
Operand modifiers apply to all alternatives and must appear, in
traditional syntax before the first alternative.  Constraint
qualifiers, on the other hand must appear in each alternative to which
they apply.

There's no easy way to validate the distinction in the traditional md
format, but when using the new compact format we can enforce some
semantic checking of these characters to avoid some potentially
surprising code generation.

gcc/

* gensupport.cc (conlist::conlist): Pass a location to the constructor.
Only allow skipping of non-alpha-numeric characters when parsing a
number and only allow '=', '+' or '%'.  Add some error checking when
parsing an operand number.
(parse_section_layout): Pass the location to the conlist constructor.
(parse_section): Allow an optional list of forbidden characters.
If specified, reject strings containing them.
(convert_syntax): Reject '=', '+' or '%' in an alternative.
---
 gcc/gensupport.cc | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 80f1976faf1..ac0132860a9 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -656,7 +656,7 @@ public:
  i.e. if rtx is the relevant match_operand or match_scratch then
  [ns..ns + len) should equal itoa (XINT (rtx, 0)), and if set_attr then
  [ns..ns + len) should equal XSTR (rtx, 0).  */
-  conlist (const char *ns, unsigned int len, bool numeric)
+  conlist (const char *ns, unsigned int len, bool numeric, file_location loc)
   {
 /* Trim leading whitespaces.  */
 while (len > 0 && ISBLANK (*ns))
@@ -670,16 +670,26 @@ public:
   if (!ISBLANK (ns[i]))
break;
 
-/* Parse off any modifiers.  */
-while (len > 0 && !ISALNUM (*ns))
-  {
-   con += *(ns++);
-   len--;
-  }
+/* Only numeric values can have modifiers.  */
+if (numeric)
+  /* Parse off any modifiers.  */
+  while (len > 0 && !ISALNUM (*ns))
+   {
+ if (*ns != '=' && *ns != '+' && *ns != '%')
+   error_at (loc, "`%c` is not a valid operand modifier", *ns);
+ con += *(ns++);
+ len--;
+   }
 
 name.assign (ns, len);
 if (numeric)
-  idx = strtol (name.c_str (), (char **)NULL, 10);
+  {
+   char *endstr;
+   /* There should only be a numeric value now... */
+   idx = strtol (name.c_str (), &endstr, 10);
+   if (*endstr != '\0')
+ error_at (loc, "operand number expected, found %s", name.c_str ());
+  }
   }
 
   /* Adds a character to the end of the string.  */
@@ -832,7 +842,7 @@ parse_section_layout (file_location loc, const char 
**templ, const char *label,
  *templ += len;
  if (val == ',')
(*templ)++;
- list.push_back (conlist (name_start, len, numeric));
+ list.push_back (conlist (name_start, len, numeric, loc));
}
 }
 }
@@ -845,7 +855,8 @@ parse_section_layout (file_location loc, const char 
**templ, const char *label,
 
 static void
 parse_section (const char **templ, unsigned int n_elems, unsigned int alt_no,
-  vec_conlist &list, file_location loc, const char *name)
+  vec_conlist &list, file_location loc, const char *name,
+  const char *invalid_chars = NULL)
 {
   unsigned int i;
 
@@ -856,6 +867,10 @@ parse_section (const char **templ, unsigned int n_elems, 
unsigned int alt_no,
   {
if (**templ == 0 || **templ == '\n')
  fatal_at (loc, "missing ']'");
+   if (invalid_chars
+   && strchr (invalid_chars, **templ))
+ error_at (loc, "'%c' is not permitted in an alternative for a %s",
+   **templ, name);
list[i].add (**templ);
if (**templ == ',')
  {
@@ -981,7 +996,7 @@ convert_syntax (rtx x, file_location loc)
  /* Parse the constraint list, then the attribute list.  */
  if (tconvec.size () > 0)
parse_section (&templ, tconvec.size (), alt_no, tconvec, loc,
-  "constraint");
+  "constraint", "=+%");
 
  if (attrvec.size () > 0)
{
-- 
2.43.0

Re: [PATCH] AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.

2025-05-08 Thread Richard Sandiford

Sorry for the slow review.

Jennifer Schmitz  writes:
> SVE loads and stores where the predicate is all-true can be optimized to
> unpredicated instructions. For example,
> svuint8_t foo (uint8_t *x)
> {
>   return svld1 (svptrue_b8 (), x);
> }
> was compiled to:
> foo:
>   ptrue   p3.b, all
>   ld1bz0.b, p3/z, [x0]
>   ret
> but can be compiled to:
> foo:
>   ldr z0, [x0]
>   ret
>
> Late_combine2 had already been trying to do this, but was missing the
> instruction:
> (set (reg/i:VNx16QI 32 v0)
> (unspec:VNx16QI [
> (const_vector:VNx16BI repeat [
> (const_int 1 [0x1])
> ])
> (mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106])
> [0 MEM  [(unsigned char *)x_2(D)]+0 S[16, 16] A8])
> ] UNSPEC_PRED_X))
>
> This patch adds a new define_insn_and_split that matches the missing
> instruction and splits it to an unpredicated load/store. Because LDR
> offers fewer addressing modes than LD1[BHWD], the pattern is
> guarded under reload_completed to only apply the transform once the
> address modes have been chosen during RA.
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   * config/aarch64/aarch64-sve.md (*aarch64_sve_ptrue_ldr_str):
>   Add define_insn_and_split to fold predicated SVE loads/stores with
>   ptrue predicates to unpredicated instructions.
>
> gcc/testsuite/
>   * gcc.target/aarch64/sve/ptrue_ldr_str.c: New test.
>   * gcc.target/aarch64/sve/cost_model_14.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/cost_model_4.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/cost_model_5.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/cost_model_6.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/cost_model_7.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/peel_ind_2.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/single_1.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/single_2.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/single_3.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/single_4.c: Adjust expected outcome.
> ---
>  gcc/config/aarch64/aarch64-sve.md | 17 
>  .../aarch64/sve/acle/general/attributes_6.c   |  8 +-
>  .../gcc.target/aarch64/sve/cost_model_14.c|  4 +-
>  .../gcc.target/aarch64/sve/cost_model_4.c |  3 +-
>  .../gcc.target/aarch64/sve/cost_model_5.c |  3 +-
>  .../gcc.target/aarch64/sve/cost_model_6.c |  3 +-
>  .../gcc.target/aarch64/sve/cost_model_7.c |  3 +-
>  .../aarch64/sve/pcs/varargs_2_f16.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_f32.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_f64.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_mf8.c   | 32 +++
>  .../aarch64/sve/pcs/varargs_2_s16.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_s32.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_s64.c   | 93 +--
>  .../gcc.target/aarch64/sve/pcs/varargs_2_s8.c | 34 +++
>  .../aarch64/sve/pcs/varargs_2_u16.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_u32.c   | 93 +--
>  .../aarch64/sve/pcs/varargs_2_u64.c   | 93 +--
>  .../gcc.target/aarch64/sve/pcs/varargs_2_u8.c | 32 +++
>  .../gcc.target/aarch64/sve/peel_ind_2.c   |  4 +-
>  .../gcc.target/aarch64/sve/ptrue_ldr_str.c| 31 +++
>  .../gcc.target/aarch64/sve/single_1.c | 11 ++-
>  .../gcc.target/aarch64/sve/single_2.c | 11 ++-
>  .../gcc.target/aarch64/sve/single_3.c | 11 ++-
>  .../gcc.target/aarch64/sve/single_4.c | 11 ++-
>  25 files changed, 907 insertions(+), 148 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ptrue_ldr_str.c
>
> diff --git a/gcc/confi

Re: [RFC PATCH 0/5] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-05-08 Thread Richard Sandiford

Kyrylo Tkachov  writes:
>  In Hi Richard,
>
>> On 6 May 2025, at 12:34, Richard Sandiford  wrote:
>> 
>>  writes:
>>> From: Soumya AR 
>>> 
>>> Hi,
>>> 
>>> This RFC and subsequent patch series introduces support for printing and 
>>> parsing
>>> of aarch64 tuning parameters in the form of JSON.
>> 
>> Thanks for doing this.  It looks really useful.  My main question is:
>> rather than write the parsing and printing routines by hand, could we
>> generate the structure definitions, the parsing code, and the printing
>> code from the schema?
>> 
>> The schema would need to provide more information about the structures
>> compared to the current one.  The approach would also presumably need
>> build/*.o versions of the json routines.  But it seems like something
>> that we might want to do elsewhere, so would be worth building a bit
>> of infrastructure around.  And it would reduce the maintenance burden
>> associated with adding a new field or changing an existing one.
>> 
>
> Thanks for your thoughts. I suspected that we may need something like that 
> eventually.
> Hypothetically in the future we’d like to be able to batch up the various 
> generic —params in a JSON input file as well to help superoptimiser tools for 
> performance exploration purposes.
> It looks like the parsing and printing code would be easy to autogenerate.
> The structure definitions in aarch64-protos.h may be tricker. As long as they 
> are effectively containers of primitive data it should be okay, though 
> currently some extend others (like the issue info structs).

I think we could handle that with extra json fields.  But yeah, maybe
generating the structure definitions is going too far.  As Soumya mentioned,
there's no comment syntax, so if we did auto-generate the structure
definitions, we'd either need to add the comments as strings (ick) or
put them in a big block comment somewhere.  Neither of those sounds like
an improvement over writing the definitions separately.

Richard

[PATCH][v2] tree-optimization/120043 - bogus conditional store elimination

2025-05-08 Thread Richard Biener

The following fixes conditional store elimination to properly
check for conditional stores to readonly memory which we can
obviously not store to unconditionally.  The tree_could_trap_p
predicate used is only considering rvalues and the chosen
approach mimics that of loop store motion.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

The previous idea of changing tree_could_trap_p had too much fallout.

Richard.

PR tree-optimization/120043
* tree-ssa-phiopt.cc (cond_store_replacement): Check
whether the store is to readonly memory.

* gcc.dg/torture/pr120043.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr120043.c | 10 ++
 gcc/tree-ssa-phiopt.cc  |  8 +++-
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr120043.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr120043.c 
b/gcc/testsuite/gcc.dg/torture/pr120043.c
new file mode 100644
index 000..ae27468d86d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr120043.c
@@ -0,0 +1,10 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fallow-store-data-races" } */
+
+const int a;
+int *b;
+int main()
+{
+  &a != b || (*b = 1);
+  return 0;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 54ecd93495a..4e96cf33a52 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3546,8 +3546,14 @@ cond_store_replacement (basic_block middle_bb, 
basic_block join_bb,
   /* If LHS is an access to a local variable without address-taken
 (or when we allow data races) and known not to trap, we could
 always safely move down the store.  */
+  tree base;
   if (ref_can_have_store_data_races (lhs)
- || tree_could_trap_p (lhs))
+ || tree_could_trap_p (lhs)
+ /* tree_could_trap_p is a predicate for rvalues, so check
+for readonly memory explicitly.  */
+ || ((base = get_base_address (lhs))
+ && DECL_P (base)
+ && TREE_READONLY (base)))
return false;
 }
 
-- 
2.43.0

Re: [RFC PATCH 0/2] Add target_clones profile option support

2025-05-08 Thread Richard Sandiford

Yangyu Chen  writes:
>> On 6 May 2025, at 17:49, Alfie Richards  wrote:
>> 
>> On 06/05/2025 09:36, Yangyu Chen wrote:
 On 6 May 2025, at 16:01, Alfie Richards  wrote:
 
 Hello,
 
 I like this idea. I have a couple thoughts to add.
 
 On 05/05/2025 09:46, Yangyu Chen wrote:
>> On 5 May 2025, at 16:34, Kyrylo Tkachov  wrote:
>> 
>>> On 4 May 2025, at 19:19, Yangyu Chen  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> This patch series introduces support for the target_clones profile
>>> option in GCC. This option enables users to specify target_clones
>>> attributes in a separate file, allowing GCC to generate multiple
>>> versions of the function with different ISA extensions based on the
>>> specified profile. This is achieved using the -ftarget-profile
>>> option.
>> 
>> Interesting idea, but the terminology is confusing as is.
>> In GCC “profile” usually refers to an execution profile gathered through 
>> PGO instrumentation or perf.
>> Whereas here I think you use “profile” to mean a “RISC-V profile” which 
>> is something like a set of target architecture extensions?
>> Thanks,
>> Kyrill
>> 
> Sorry for the unclear information. The target profile here refers
> to the target_clones attribute for each function. You can find an
> example in the second patch.
 
 I was also confused by the naming initially. Maybe something like
 "-ffunction-clone-table" instead?
>>> Indeed. I will change that in the next revision.
 
> For instance, we want a function foo to generate default and RISC-V
> vector targets, while a function bar should generate default and
> zba,zbb targets. The corresponding source code could be as follows:
> ```
> __attribute__((target_clones("default","arch=+v")))
> void foo();
> __attribute__((target_clones("default","arch=+zba,+zbb")))
> void bar();
> ```
> But if we have a target profile, we can describe it as follows in
> a separate file:
> ```
> foo:default#arch=+v
> bar:default#arch=+zba,+zbb
> ```
 
 As every function needs the default version it might be nice to make that 
 implicit. We can then avoid representing and subsequently diagnosing 
 invalid files. This could then be:
 
 ```
 foo:arch=+v
 bar:arch=+zba,+zbb#+v
 ```
>>> Great idea! This will make the table simpler.
 
 Additionally, I think ideally the file can express functions disambiguated 
 by file, signature, and namespace.
 I imagine we could use similar syntax to gdb supports?
 
 For example:
 
 ```
 foo  |arch=+v
 bar(int, char)   |arch=+zba,+zbb
 file.C:baz(char) |arch=+zba,+zbb#arch=+v
 namespace::qux   |arch=+v
 ```
>>> Also a great idea. However, I think it's not easy to use to implement
>>> it now in GCC. But I would like to accept any further feedback if
>>> we have such a simple API in GCC to do so, or if it will be implemented
>>> by the community.
>>> And something behind this idea is that I'm researching auto-generating
>>> target clones attributes for developers. Only accepting the ASM
>>> name is enough to implement this.
>> 
>> Ah that makes sense, apologies I missed that.
>> 
>> I think accepting the assembler name is good, and solves the overloading 
>> ambiguity issue.
>> 
>> Maybe we can use the pipe '|' instead of ':' in the file format to leave 
>> room for both in future?
>
>
> I will consider using the pipe '|' in the next revision. Thanks for
> the advice.

How about instead using a json file?  There's already a parser built into
the compiler.

That has the advantage of being an established format that generators
can use.  It would also allow other ways of specifying the functions
to be added in future.

Thanks,
Richard

[PATCH] tree-optimization/116352 - amend previous fix

2025-05-08 Thread Richard Biener

The previous fix restricted external vector builds to defs from
the same basic-block.  That turns out too restrictive so we have
to mitigate the original issue in a different way which is
restricting it to the original case where all defs are in the
same basic-block.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116352
* tree-vect-slp.cc (vect_build_slp_tree_2): When compressing
operands from a two-operator node make sure the resulting
operation does not mix defs from different basic-blocks.
---
 gcc/tree-vect-slp.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 6d5824a97bf..f7c51b6cf68 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2619,13 +2619,14 @@ out:
   if (oprnds_info[0]->def_stmts[0]
  && is_a (oprnds_info[0]->def_stmts[0]->stmt))
code = gimple_assign_rhs_code (oprnds_info[0]->def_stmts[0]->stmt);
+  basic_block bb = nullptr;
 
   for (unsigned j = 0; j < group_size; ++j)
{
  FOR_EACH_VEC_ELT (oprnds_info, i, oprnd_info)
{
  stmt_vec_info stmt_info = oprnd_info->def_stmts[j];
- if (!stmt_info || !stmt_info->stmt
+ if (!stmt_info
  || !is_a (stmt_info->stmt)
  || gimple_assign_rhs_code (stmt_info->stmt) != code
  || skip_args[i])
@@ -2633,6 +2634,14 @@ out:
  success = false;
  break;
}
+ /* Avoid mixing lanes with defs in different basic-blocks.  */
+ if (!bb)
+   bb = gimple_bb (vect_orig_stmt (stmt_info)->stmt);
+ else if (gimple_bb (vect_orig_stmt (stmt_info)->stmt) != bb)
+   {
+ success = false;
+ break;
+   }
 
  bool exists;
  unsigned &stmt_idx
-- 
2.43.0

Re: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread Robin Dapp


This patch series would like to add the testcases for this.  However,
some test results is not that tidy, and we need more tuning for
the vector cost model.


The test adjustments LGTM but what do you mean by not tidy?  I see you're 
scanning just for the presence of "vx" instead of an exact number so

it's just a vector cost model issue and some loops are not profitable
to vectorize?

--
Regards
Robin

[PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-08 Thread liuhongt

The only part I changed is related to size_cost of sse_to_ineteger, as below

114+  /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd.
115+ W/o it, it's movd + psrlq/unpckldq + movd.  */
116+  else if (!TARGET_64BIT && smode != SImode)
117+cost *= TARGET_SSE4_1 ? 2 : 3;
118+

Ok for trunk?


n some benchmark, I notice stv failed due to cost unprofitable, but the igain
is inside the loop, but sse<->integer conversion is outside the loop, current 
cost
model doesn't consider the frequency of those gain/cost.
The patch weights those cost with frequency.

The patch regressed gcc.target/i386/minmax-6.c under -m32 Since the
place of integer<->sse is before the branch, and the conversion to
min/max is in the branch, with static profile, the cost model is not
profitable anymore which is exactly the patch try to do.
Considering the original testcase is to guard RA issue, so restrict
the testcase under ! ia32 should still be ok.

gcc/ChangeLog:

* config/i386/i386-features.cc
(scalar_chain::mark_dual_mode_def): Weight
n_integer_to_sse/n_sse_to_integer with bb frequency.
(general_scalar_chain::compute_convert_gain): Ditto, and
adjust function prototype to return true/false when cost model
is profitable or not.
(timode_scalar_chain::compute_convert_gain): Ditto.
(convert_scalars_to_vector): Adjust after the upper two
function prototype are changed.
* config/i386/i386-features.h (class scalar_chain): Change
n_integer_to_sse/n_sse_to_integer to cost_sse_integer, and add
weighted_cost_sse_integer.
(class general_scalar_chain): Adjust prototype to return bool
intead of int.
(class timode_scalar_chain): Ditto.

gcc/testsuite/ChangeLog:
* gcc.target/i386/minmax-6.c: Adjust testcase.
---
 gcc/config/i386/i386-features.cc | 216 ---
 gcc/config/i386/i386-features.h  |  11 +-
 gcc/testsuite/gcc.target/i386/minmax-6.c |   2 +-
 3 files changed, 124 insertions(+), 105 deletions(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c35ac24fd8a..5f21130db58 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -296,8 +296,8 @@ scalar_chain::scalar_chain (enum machine_mode smode_, enum 
machine_mode vmode_)
   insns_conv = BITMAP_ALLOC (NULL);
   queue = NULL;
 
-  n_sse_to_integer = 0;
-  n_integer_to_sse = 0;
+  cost_sse_integer = 0;
+  weighted_cost_sse_integer = 0 ;
 
   max_visits = x86_stv_max_visits;
 }
@@ -337,20 +337,40 @@ scalar_chain::mark_dual_mode_def (df_ref def)
   /* Record the def/insn pair so we can later efficiently iterate over
  the defs to convert on insns not in the chain.  */
   bool reg_new = bitmap_set_bit (defs_conv, DF_REF_REGNO (def));
+  basic_block bb = BLOCK_FOR_INSN (DF_REF_INSN (def));
+  profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
+  bool speed_p = optimize_bb_for_speed_p (bb);
+  sreal bb_freq = bb->count.to_sreal_scale (entry_count);
+  int cost = 0;
+
   if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def)))
 {
   if (!bitmap_set_bit (insns_conv, DF_REF_INSN_UID (def))
  && !reg_new)
return;
-  n_integer_to_sse++;
+
+  /* ???  integer_to_sse but we only have that in the RA cost table.
+Assume sse_to_integer/integer_to_sse are the same which they.  */
+  cost = speed_p ? ix86_cost->sse_to_integer
+   : ix86_size_cost.sse_to_integer;
 }
   else
 {
   if (!reg_new)
return;
-  n_sse_to_integer++;
+  cost = speed_p ? ix86_cost->sse_to_integer
+   : ix86_size_cost.sse_to_integer;
 }
 
+  if (speed_p)
+weighted_cost_sse_integer += bb_freq * cost;
+  /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd.
+ W/o it, it's movd + psrlq/unpckldq + movd.  */
+  else if (!TARGET_64BIT && smode != SImode)
+cost *= TARGET_SSE4_1 ? 2 : 3;
+
+  cost_sse_integer += cost;
+
   if (dump_file)
 fprintf (dump_file,
 "  Mark r%d def in insn %d as requiring both modes in chain #%d\n",
@@ -529,15 +549,15 @@ general_scalar_chain::vector_const_cost (rtx exp)
   return ix86_cost->sse_load[smode == DImode ? 1 : 0];
 }
 
-/* Compute a gain for chain conversion.  */
+/* Return true if it's cost profitable for chain conversion.  */
 
-int
+bool
 general_scalar_chain::compute_convert_gain ()
 {
   bitmap_iterator bi;
   unsigned insn_uid;
   int gain = 0;
-  int cost = 0;
+  sreal weighted_gain = 0;
 
   if (dump_file)
 fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id);
@@ -556,25 +576,30 @@ general_scalar_chain::compute_convert_gain ()
   rtx src = SET_SRC (def_set);
   rtx dst = SET_DEST (def_set);
   int igain = 0;
+  basic_block bb = BLOCK_FOR_INSN (insn);
+  profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
+  bool speed_p = optimize_bb_for_speed_p (bb);
+  sreal bb_freq = bb->count.to_srea

Re: [PATCH v3 1/2] Rewrite VCEs of integral types [PR116939]

2025-05-08 Thread Richard Biener

On Thu, May 8, 2025 at 6:09 AM Andrew Pinski  wrote:
>
> Like the patch to phiopt (r15-4033-g1f619fe25925a5f7), this adds rewriting
> of VCE to gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow.
> In the case of moving VCE of a bool from being conditional to unconditional,
> it needs to be rewritten to not to use VCE but a normal cast. pr120122-1.c is
> an example of where LIM needs this rewriting. The precision of the outer type
> needs to be less then the inner one.
>
> This also renames gimple_with_undefined_signed_overflow to 
> gimple_needing_rewrite_undefined
> and rewrite_to_defined_overflow to rewrite_to_defined_unconditional as they 
> will be doing
> more than just handling signed overflow.
>
> Changes since v1:
> * v2: rename the functions.
> * v3: Add check for precision to be smaller.
>
> Bootstrappd and tested on x86_64-linux-gnu.

OK.

> PR tree-optimization/120122
> PR tree-optimization/116939
>
> gcc/ChangeLog:
>
> * gimple-fold.h (gimple_with_undefined_signed_overflow): Rename to ..
> (rewrite_to_defined_overflow): This.
> (gimple_needing_rewrite_undefined): Rename to ...
> (rewrite_to_defined_unconditional): this.
> * gimple-fold.cc (gimple_with_undefined_signed_overflow): Rename to 
> ...
> (gimple_needing_rewrite_undefined): This. Return true for VCE with 
> integral
> types of smaller precision.
> (rewrite_to_defined_overflow): Rename to ...
> (rewrite_to_defined_unconditional): This. Handle VCE rewriting to a 
> cast.
> * tree-if-conv.cc: 
> s/gimple_with_undefined_signed_overflow/gimple_needing_rewrite_undefined/
> s/rewrite_to_defined_overflow/rewrite_to_defined_unconditional.
> * tree-scalar-evolution.cc: Likewise
> * tree-ssa-ifcombine.cc: Likewise.
> * tree-ssa-loop-im.cc: Likewise.
> * tree-ssa-loop-split.cc: Likewise.
> * tree-ssa-reassoc.cc: Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr120122-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-fold.cc| 56 ++-
>  gcc/gimple-fold.h |  6 +--
>  gcc/testsuite/gcc.dg/torture/pr120122-1.c | 51 +
>  gcc/tree-if-conv.cc   |  6 +--
>  gcc/tree-scalar-evolution.cc  |  4 +-
>  gcc/tree-ssa-ifcombine.cc |  4 +-
>  gcc/tree-ssa-loop-im.cc   |  4 +-
>  gcc/tree-ssa-loop-split.cc|  4 +-
>  gcc/tree-ssa-reassoc.cc   |  4 +-
>  9 files changed, 111 insertions(+), 28 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr120122-1.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 7721795b20d..fd52b58905c 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -10592,10 +10592,12 @@ arith_code_with_undefined_signed_overflow 
> (tree_code code)
>
>  /* Return true if STMT has an operation that operates on a signed
> integer types involves undefined behavior on overflow and the
> -   operation can be expressed with unsigned arithmetic.  */
> +   operation can be expressed with unsigned arithmetic.
> +   Also returns true if STMT is a VCE that needs to be rewritten
> +   if moved to be executed unconditionally.   */
>
>  bool
> -gimple_with_undefined_signed_overflow (gimple *stmt)
> +gimple_needing_rewrite_undefined (gimple *stmt)
>  {
>if (!is_gimple_assign (stmt))
>  return false;
> @@ -10606,6 +10608,16 @@ gimple_with_undefined_signed_overflow (gimple *stmt)
>if (!INTEGRAL_TYPE_P (lhs_type)
>&& !POINTER_TYPE_P (lhs_type))
>  return false;
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  /* VCE from integral types to a integral types but with
> + a smaller precision need to be changed into casts
> + to be well defined. */
> +  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR
> +  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (rhs, 0)))
> +  && is_gimple_val (TREE_OPERAND (rhs, 0))
> +  && TYPE_PRECISION (lhs_type)
> + < TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (rhs, 0
> +return true;
>if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
>  return false;
>if (!arith_code_with_undefined_signed_overflow
> @@ -10625,19 +10637,39 @@ gimple_with_undefined_signed_overflow (gimple *stmt)
> contain a modified form of STMT itself.  */
>
>  static gimple_seq
> -rewrite_to_defined_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
> -bool in_place)
> +rewrite_to_defined_unconditional (gimple_stmt_iterator *gsi, gimple *stmt,
> + bool in_place)
>  {
> +  gcc_assert (gimple_needing_rewrite_undefined (stmt));
>if (dump_file && (dump_flags & TDF_DETAILS))
>  {
> -  fprintf (dump_file, "rewriting stmt with undefined signed "
> -  "overflow ");
> +  fprintf (dump_file, "rewriting stmt for bein

Re: [PATCH v3 2/2] phiopt: Use rewrite_to_defined_overflow in move_stmt [PR116938]

2025-05-08 Thread Richard Biener

On Thu, May 8, 2025 at 6:08 AM Andrew Pinski  wrote:
>
> As mentioned previously the rewrite in move_stmt should be
> using gimple_needing_rewrite_undefined/rewrite_to_defined_unconditional
> instead of just rewriting the VCE.
> This moves move_stmt over to those APIs.
>
> A few testcases needed to be updated due to ABS_EXPR rewrite that happens.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (move_stmt): Use rewrite_to_defined_overflow
> isntead of manually doing the rewrite of the VCE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-40.c: Update to expect ABSU_EXPR.
> * gcc.dg/tree-ssa/phi-opt-41.c: Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c |  7 +++---
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c |  4 ++--
>  gcc/tree-ssa-phiopt.cc | 26 +++---
>  3 files changed, 9 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
> index a9011ce97fb..70629165bb6 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
> @@ -20,6 +20,7 @@ int f1(int x)
>
>  /* { dg-final { scan-tree-dump-times "if " 1 "phiopt1" } } */
>  /* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */
> -/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 2 "phiopt1" } } */
> -/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 1 "phiopt2" } } */
> -/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 1 "phiopt2" } } */
> +/* The ABS_EXPR in f gets rewritten to ABSU_EXPR as phiopt can't prove it 
> was not undefined when moving it. */
> +/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 1 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 1 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 2 "phiopt2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> index 9774e283a7b..817d4feb027 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> @@ -29,6 +29,6 @@ int fge(int a, unsigned char b)
>return a > 0 ? a : -a;
>  }
>
> -
> +/* The ABS_EXPR gets rewritten to ABSU_EXPR as phiopt can't prove it was not 
> undefined when moving it. */
>  /* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
> -/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 4 "phiopt1" } } */
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 54ecd93495a..efd43d2d77e 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -838,33 +838,13 @@ move_stmt (gimple *stmt, gimple_stmt_iterator *gsi, 
> auto_bitmap &inserted_exprs)
>// Mark the name to be renamed if there is one.
>bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
>gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt);
> -  gsi_move_before (&gsi1, gsi);
> +  gsi_move_before (&gsi1, gsi, GSI_NEW_STMT);
>reset_flow_sensitive_info (name);
>
>/* Rewrite some code which might be undefined when
>   unconditionalized. */
> -  if (gimple_assign_single_p (stmt))
> -{
> -  tree rhs = gimple_assign_rhs1 (stmt);
> -  /* VCE from integral types to another integral types but with
> -different precisions need to be changed into casts
> -to be well defined when unconditional. */
> -  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR
> - && INTEGRAL_TYPE_P (TREE_TYPE (name))
> - && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (rhs, 0
> -   {
> - if (dump_file && (dump_flags & TDF_DETAILS))
> -   {
> - fprintf (dump_file, "rewriting stmt with maybe undefined VCE ");
> - print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> -   }
> - tree new_rhs = TREE_OPERAND (rhs, 0);
> - gcc_assert (is_gimple_val (new_rhs));
> - gimple_assign_set_rhs_code (stmt, NOP_EXPR);
> - gimple_assign_set_rhs1 (stmt, new_rhs);
> - update_stmt (stmt);
> -   }
> -}
> +  if (gimple_needing_rewrite_undefined (stmt))
> +rewrite_to_defined_unconditional (gsi);
>  }
>
>  /* RAII style class to temporarily remove flow sensitive
> --
> 2.34.1
>

Re: [PATCH] testsuite: g++.dg/cpp2a/decomp2.C requires tls_runtime

2025-05-08 Thread Christophe Lyon

Ping?

Le jeu. 17 avr. 2025, 11:21, Christophe Lyon  a
écrit :

> Since this test is a 'dg-do run', it requires tls_runtime rather than
> just tls.
>
> This makes the test UNSUPPORTED on targets such as arm-non-eabi,
> instead of FAIL/UNRESOLVED because __aeabi_read_tp is not provided
> (e.g. when GCC is configured with --enable-threads=no.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp2a/decomp2.C: Require tls_runtime.
> ---
>  gcc/testsuite/g++.dg/cpp2a/decomp2.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/cpp2a/decomp2.C
> b/gcc/testsuite/g++.dg/cpp2a/decomp2.C
> index c2bfe46976d..d13f4243045 100644
> --- a/gcc/testsuite/g++.dg/cpp2a/decomp2.C
> +++ b/gcc/testsuite/g++.dg/cpp2a/decomp2.C
> @@ -1,7 +1,7 @@
>  // P1091R3
>  // { dg-do run { target c++11 } }
>  // { dg-options "" }
> -// { dg-require-effective-target tls }
> +// { dg-require-effective-target tls_runtime }
>  // { dg-add-options tls }
>
>  namespace std {
> --
> 2.34.1
>
>

[PATCH 3/8] RISC-V: Generate extension table in documentation from riscv-ext.def

2025-05-08 Thread Kito Cheng

Automatically build the ISA extension reference table in invoke.texi from
the unified riscv-ext.def metadata, ensuring documentation stays in sync
with extension definitions and reducing manual maintenance.

gcc/ChangeLog:

* doc/invoke.texi: Replace hand‑written extension table with
`@include riscv-ext.texi` to pull in auto‑generated entries.
* doc/riscv-ext.texi: New generated definition file
containing formatted documentation entries for each extension.
* Makefile.in: Add riscv-ext.texi to the list of files to be
processed by the Texinfo generator.
* config/riscv/gen-riscv-ext-texi.cc: New.
* config/riscv/t-riscv: Add rule for generating riscv-ext.texi.
---
 gcc/Makefile.in|   2 +-
 gcc/config/riscv/gen-riscv-ext-texi.cc |  88 
 gcc/config/riscv/t-riscv   |  34 +-
 gcc/doc/invoke.texi| 495 +--
 gcc/doc/riscv-ext.texi | 629 +
 5 files changed, 751 insertions(+), 497 deletions(-)
 create mode 100644 gcc/config/riscv/gen-riscv-ext-texi.cc
 create mode 100644 gcc/doc/riscv-ext.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 55b4cd7dbed3..0fa5d0c925af 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3702,7 +3702,7 @@ TEXI_GCC_FILES = gcc.texi gcc-common.texi gcc-vers.texi 
frontends.texi\
 contribute.texi compat.texi funding.texi gnu.texi gpl_v3.texi  \
 fdl.texi contrib.texi cppenv.texi cppopts.texi avr-mmcu.texi   \
 implement-c.texi implement-cxx.texi gcov-tool.texi gcov-dump.texi \
-lto-dump.texi
+lto-dump.texi riscv-ext.texi
 
 # we explicitly use $(srcdir)/doc/tm.texi here to avoid confusion with
 # the generated tm.texi; the latter might have a more recent timestamp,
diff --git a/gcc/config/riscv/gen-riscv-ext-texi.cc 
b/gcc/config/riscv/gen-riscv-ext-texi.cc
new file mode 100644
index ..e15fdbf36f6e
--- /dev/null
+++ b/gcc/config/riscv/gen-riscv-ext-texi.cc
@@ -0,0 +1,88 @@
+#include 
+#include 
+#include 
+#include 
+#include "riscv-opts.h"
+
+struct version_t
+{
+  int major;
+  int minor;
+  version_t (int major, int minor,
+enum riscv_isa_spec_class spec = ISA_SPEC_CLASS_NONE)
+: major (major), minor (minor)
+  {}
+  bool operator<(const version_t &other) const
+  {
+if (major != other.major)
+  return major < other.major;
+return minor < other.minor;
+  }
+
+  bool operator== (const version_t &other) const
+  {
+return major == other.major && minor == other.minor;
+  }
+};
+
+static void
+print_ext_doc_entry (const std::string &ext_name, const std::string &full_name,
+const std::string &desc,
+const std::vector &supported_versions)
+{
+  // Implementation of the function to print the documentation entry
+  // for the extension.
+  std::set unique_versions;
+  for (const auto &version : supported_versions)
+unique_versions.insert (version);
+  printf ("@item %s\n", ext_name.c_str ());
+  printf ("@tab");
+  for (const auto &version : unique_versions)
+{
+  printf (" %d.%d", version.major, version.minor);
+}
+  printf ("\n");
+  printf ("@tab %s", full_name.c_str ());
+  if (desc.size ())
+printf (", %s", desc.c_str ());
+  printf ("\n\n");
+}
+
+int
+main ()
+{
+  puts ("@c Copyright (C) 2025 Free Software Foundation, Inc.");
+  puts ("@c This is part of the GCC manual.");
+  puts ("@c For copying conditions, see the file gcc/doc/include/fdl.texi.");
+  puts ("");
+  puts ("@c This file is generated automatically using");
+  puts ("@c  gcc/config/riscv/gen-riscv-ext-texi.cc from:");
+  puts ("@c   gcc/config/riscv/riscv-ext.def");
+  puts ("@c   gcc/config/riscv/riscv-opts.h");
+  puts ("");
+  puts ("@c Please *DO NOT* edit manually.");
+  puts ("");
+  puts ("@multitable @columnfractions .10 .10 .80");
+  puts ("@headitem Extension Name @tab Supported Version @tab Description");
+  puts ("");
+
+  /* g extension is a very speical extension that no clear version...  */
+  puts ("@item g");
+  puts ("@tab -");
+  puts (
+"@tab General-purpose computing base extension, @samp{g} will expand to");
+  puts ("@samp{i}, @samp{m}, @samp{a}, @samp{f}, @samp{d}, @samp{zicsr} and");
+  puts ("@samp{zifencei}.");
+  puts ("");
+
+#define DEFINE_RISCV_EXT(NAME, UPPERCAE_NAME, FULL_NAME, DESC, URL, DEP_EXTS,  
\
+SUPPORTED_VERSIONS, FLAG_GROUP, BITMASK_GROUP_ID, \
+BITMASK_BIT_POSITION, EXTRA_EXTENSION_FLAGS)  \
+  print_ext_doc_entry (#NAME, FULL_NAME, DESC, 
\
+  std::vector SUPPORTED_VERSIONS);
+#include "riscv-ext.def"
+#undef DEFINE_RISCV_EXT
+
+  puts ("@end multitable");
+  return 0;
+}
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 9a9fb1bc1b52..e99d6689ba06 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv

[PATCH v1 3/5] RISC-V: Add testcases for vec_duplicate + vadd.vv combine case 1 with GR2VR cost 0

2025-05-08 Thread pan2 . li

From: Pan Li 

Add asm dump check and for vec_duplicate + vadd.vv combine case 1 to vadd.vx.
The late-combine will take action when GR2VR cost is 0, because the vmv
and the vadd.vx will consume the same cost of GR2VR.  Aka:

Before:
L1:
  vmv.v.x
  vadd.vv
  J L1

After:
L1:
  vadd.vx
  J L1

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u8.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   | 44 +++
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c   |  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i32.c   |  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i64.c   |  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-i8.c|  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u16.c   |  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u32.c   |  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u64.c   |  8 
 .../riscv/rvv/autovec/vx_vf/vx_vadd-4-u8.c|  8 
 9 files changed, 108 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-u8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
index de5b70dd04b..db802bdefd7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h
@@ -14,4 +14,48 @@ test_vx_binary_case_0 (T * restrict out, T * restrict in, T 
x, unsigned n) \
 #define RUN_VX_BINARY_CASE_0(out, in, x, n)  test_vx_binary_case_0(out, 
in, x, n)
 #define RUN_VX_BINARY_CASE_0_WRAP(out, in, x, n) RUN_VX_BINARY_CASE_0(out, in, 
x, n)
 
+#define VX_BINARY_BODY(op)   \
+  out[k + 0] = in[k + 0] op tmp; \
+  out[k + 1] = in[k + 1] op tmp; \
+  k += 2;
+
+#define VX_BINARY_BODY_X4(op) \
+  VX_BINARY_BODY(op)  \
+  VX_BINARY_BODY(op)
+
+#define VX_BINARY_BODY_X8(op) \
+  VX_BINARY_BODY_X4(op)   \
+  VX_BINARY_BODY_X4(op)
+
+#define VX_BINARY_BODY_X16(op) \
+  VX_BINARY_BODY_X8(op)\
+  VX_BINARY_BODY_X8(op)
+
+#define VX_BINARY_BODY_X32(op) \
+  VX_BINARY_BODY_X16(op)   \
+  VX_BINARY_BODY_X16(op)
+
+#define VX_BINARY_BODY_X64(op) \
+  VX_BINARY_BODY_X32(op)   \
+  VX_BINARY_BODY_X32(op)
+
+#define VX_BINARY_BODY_X128(op) \
+  VX_BINARY_BODY_X64(op)\
+  VX_BINARY_BODY_X64(op)
+
+#define DEF_VX_BINARY_CASE_1(T, OP, BODY)  \
+void   \
+test_vx_binary_case_1 (T * restrict out, T * restrict in, T x, unsigned n) \
+{  \
+  unsigned k = 0;  \
+  T tmp = x + 3;   \
+   \
+  while (k < n)\
+{  \
+  tmp = tmp ^ 0x3f;\
+  BODY(OP) \
+}  \
+}
+#define DEF_VX_BINARY_CASE_1_WRAP(T, OP, BODY) DEF_VX_BINARY_CASE_1(T, OP, 
BODY)
+
 #endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c
new file mode 100644
index 000..9a26601165e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vadd-4-i16.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-m

[PATCH] libstdc++: Make dg-require-namedlocale work for more targets [PR65909]

2025-05-08 Thread Jonathan Wakely

As noted in the PR, some embedded targets do not support command-line
arguments, which means that the dg-require-namedlocale check always
fails. Use Sandra's suggestion of hardcoding the argument into the
executable instead of passing it as a command-line argument.

Realistically, those embedded targets probably don't support the named
locales anyway, but at least now the tests will be UNSUPPORTED for the
right reason.

libstdc++-v3/ChangeLog:

PR libstdc++/65909
* testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
Hardcode the locale name instead of passing it to the
executable. Do not hardcode buffer size for string.
---

Tested x86_64-linux.

 libstdc++-v3/testsuite/lib/libstdc++.exp | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 5e958d159de2..fbc9f7f13e64 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1034,7 +1034,7 @@ proc check_v3_target_namedlocale { args } {
puts $f "using namespace std;"
puts $f "char *transform_locale(const char *name)"
puts $f "{"
-   puts $f "char *result = new char\[50\];"
+   puts $f "char *result = new char\[strlen(name)+6\];"
puts $f "strcpy(result, name);"
puts $f "#if defined __FreeBSD__ || defined __DragonFly__ || defined 
__NetBSD__"
puts $f "/* fall-through */"
@@ -1045,14 +1045,9 @@ proc check_v3_target_namedlocale { args } {
puts $f "#endif"
puts $f "return result;"
puts $f "}"
-   puts $f "int main (int argc, char** argv)"
+   puts $f "int main ()"
puts $f "{"
-   puts $f "  if (argc < 2)"
-   puts $f "  {"
-   puts $f "printf(\"locale support test not supported\\n\");"
-   puts $f "return 1;"
-   puts $f "  }"
-   puts $f "  const char *namedloc = transform_locale(*(argv + 1));"
+   puts $f "  const char *namedloc = transform_locale(\"$args\");"
puts $f "  try"
puts $f "  {"
puts $f "locale((const char*)namedloc);"
@@ -1076,7 +1071,7 @@ proc check_v3_target_namedlocale { args } {
  return 0
}
 
-   set result [${tool}_load "./$exe" "$args" ""]
+   set result [${tool}_load "./$exe" "" ""]
set status [lindex $result 0]
 
verbose "check_v3_target_namedlocale <$args>: status is <$status>" 2
-- 
2.49.0

Re: [PATCH] libstdc++: Update rows in C++17 status table

2025-05-08 Thread Jonathan Wakely

On Thu, 8 May 2025 at 14:59, Jakub Jelinek  wrote:
>
> On Thu, May 08, 2025 at 02:50:27PM +0100, Jonathan Wakely wrote:
> > Document that std::to_chars and std::from_chars are complete, mentioning
> > the libraries used for floating-point types.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * doc/xml/manual/status_cxx2017.xml: Update status for
> >   std::to_chars and std::from_chars.
> >   * doc/html/manual/*: Regenerate.
> > ---
> >
> > Patrick, please check that what I've added is accurate (see the XML
> > change at the end of the diff).
>
> > + __strfrom128).
>
> s/__strfrom128/strfromf128/
>
> Missing f before 128 and the __ is just the name of the weak alias for it,
> it uses __asm ("strfromf128").
>
> > + __strfrom128).
>
> Ditto.


Thanks, fixed locally.

RE: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread Li, Pan2

> it's just a vector cost model issue and some loops are not profitable
> to vectorize?

Yes. For example, when gpr2vr is 1, int8_t cannot vectorize while uint8_t can.

+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int8_t, +, VX_BINARY_BODY_X16)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */

+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(uint8_t, +, VX_BINARY_BODY_X16)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */

Another case is int64_t can combine when gpr2vr is 1, and failed to combine 
when gpr2vr is 2.

+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=1" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int64_t, +, VX_BINARY_BODY)
+
+/* { dg-final { scan-assembler {vadd.vx} } } */

+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128b -mabi=lp64d --param=gpr2vr-cost=2" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_1(int64_t, +, VX_BINARY_BODY)
+
+/* { dg-final { scan-assembler-not {vadd.vx} } } */

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, May 8, 2025 8:01 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken ; Liu, Hongtao 
; Robin Dapp 
Subject: Re: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + 
vadd.vv combine

> This patch series would like to add the testcases for this.  However,
> some test results is not that tidy, and we need more tuning for
> the vector cost model.

The test adjustments LGTM but what do you mean by not tidy?  I see you're 
scanning just for the presence of "vx" instead of an exact number so
it's just a vector cost model issue and some loops are not profitable
to vectorize?

-- 
Regards
 Robin

RE: [PATCH ]RISCV :Added MIPS P8700 Subtarget

2025-05-08 Thread Umesh Kalappa

Hi All ,

We have couple of patch series that enables the P8700 tune for RISCV core to 
upstream for GCC mainline.

It will be good to hear from you guys on the patch feedback 

Thank you in advance
~U



-Original Message-
From: Umesh Kalappa 
Sent: 03 May 2025 11:27
To: Jeff Law ; gcc-patches@gcc.gnu.org; 
pal...@dabbelt.com
Cc: kito.ch...@sifive.com; Jesse Huang ; 
and...@sifive.com
Subject: Re: [PATCH]RISCV :Added MIPS P8700 Subtarget

Hi @Jeff Law and @pal...@dabbelt.com ,

Please do needful by reviewing the below changes and helps us to upstream the 
same .

Thank you
~U

-Original Message-
From: Umesh Kalappa
Sent: 29 April 2025 16:16
To: Umesh Kalappa ; Jeff Law ; 
gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; Jesse Huang ; 
pal...@dabbelt.com; and...@sifive.com
Subject: RE: [EXTERNAL]Re: [PATCH]RISCV :Added MIPS P8700 Subtarget

Hi all,

Here is the updated patch that address some of the   @Jeff Law comments .

P8700  don't  have a vector engine and we support the insns type till 
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.md#L358 
and schedule module enabled the same .

---
 gcc/config/riscv/mips-p8700.md   | 139 +++
 gcc/config/riscv/riscv-cores.def |   5 ++
 gcc/config/riscv/riscv-opts.h|   3 +-
 gcc/config/riscv/riscv.cc|  22 +
 gcc/config/riscv/riscv.md|   3 +-
 5 files changed, 170 insertions(+), 2 deletions(-)  create mode 100644 
gcc/config/riscv/mips-p8700.md

diff --git a/gcc/config/riscv/mips-p8700.md b/gcc/config/riscv/mips-p8700.md 
new file mode 100644 index 000..11d0b1ca793
--- /dev/null
+++ b/gcc/config/riscv/mips-p8700.md
@@ -0,0 +1,139 @@
+;; DFA-based pipeline description for MIPS P8700.
+;;
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published ;; by 
+the Free Software Foundation; either version 3, or (at your ;; option) 
+any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT 
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public ;; 
+License for more details.
+
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+.
+
+(define_automaton "mips_p8700_agen_alq_pipe, mips_p8700_mdu_pipe,
+mips_p8700_fpu_pipe")
+
+;; The address generation queue (AGQ) has AL2, CTISTD and LDSTA pipes 
+(define_cpu_unit "mips_p8700_agq, mips_p8700_al2, mips_p8700_ctistd, 
mips_p8700_lsu"
+"mips_p8700_agen_alq_pipe")
+
+(define_cpu_unit "mips_p8700_gpmul, mips_p8700_gpdiv" 
+"mips_p8700_mdu_pipe")
+
+;; The arithmetic-logic-unit queue (ALQ) has ALU pipe (define_cpu_unit 
+"mips_p8700_alq, mips_p8700_alu" "mips_p8700_agen_alq_pipe")
+
+;; The floating-point-unit queue (FPQ) has short and long pipes 
+(define_cpu_unit "mips_p8700_fpu_short, mips_p8700_fpu_long"
+"mips_p8700_fpu_pipe")
+
+;; Long FPU pipeline.
+(define_cpu_unit "mips_p8700_fpu_apu" "mips_p8700_fpu_pipe")
+
+(define_reservation "mips_p8700_agq_al2" "mips_p8700_agq,
+mips_p8700_al2") (define_reservation "mips_p8700_agq_ctistd" 
+"mips_p8700_agq, mips_p8700_ctistd") (define_reservation 
+"mips_p8700_agq_lsu" "mips_p8700_agq, mips_p8700_lsu") 
+(define_reservation "mips_p8700_alq_alu" "mips_p8700_alq,
+mips_p8700_alu")
+
+;;
+;; FPU pipe
+;;
+
+(define_insn_reservation "mips_p8700_fpu_fadd" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fabs" 2
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcmp,fmove"))
+  "mips_p8700_fpu_short, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fload" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpload"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fstore" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpstore"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fmadd" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmul" 5
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmul"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_div" 17
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fdiv,fsqrt"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu*17")
+
+(define_insn_reservation "mips_p8700_fpu_fcvt" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcvt,fcvt_i2f,fcvt_f2i"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmtc" 7
+

Re: [PATCH 2/2] gensupport: validate compact constraint modifiers

2025-05-08 Thread Richard Sandiford

Richard Earnshaw  writes:
> For constraints there are operand modifiers and constraint qualifiers.
> Operand modifiers apply to all alternatives and must appear, in
> traditional syntax before the first alternative.  Constraint
> qualifiers, on the other hand must appear in each alternative to which
> they apply.
>
> There's no easy way to validate the distinction in the traditional md
> format, but when using the new compact format we can enforce some
> semantic checking of these characters to avoid some potentially
> surprising code generation.
>
> gcc/
>
>   * gensupport.cc (conlist::conlist): Pass a location to the constructor.
>   Only allow skipping of non-alpha-numeric characters when parsing a
>   number and only allow '=', '+' or '%'.  Add some error checking when
>   parsing an operand number.
>   (parse_section_layout): Pass the location to the conlist constructor.
>   (parse_section): Allow an optional list of forbidden characters.
>   If specified, reject strings containing them.
>   (convert_syntax): Reject '=', '+' or '%' in an alternative.

OK, thanks.  Patch 1 looks good to me as well.

Richard

> ---
>  gcc/gensupport.cc | 37 ++---
>  1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
> index 80f1976faf1..ac0132860a9 100644
> --- a/gcc/gensupport.cc
> +++ b/gcc/gensupport.cc
> @@ -656,7 +656,7 @@ public:
>   i.e. if rtx is the relevant match_operand or match_scratch then
>   [ns..ns + len) should equal itoa (XINT (rtx, 0)), and if set_attr then
>   [ns..ns + len) should equal XSTR (rtx, 0).  */
> -  conlist (const char *ns, unsigned int len, bool numeric)
> +  conlist (const char *ns, unsigned int len, bool numeric, file_location loc)
>{
>  /* Trim leading whitespaces.  */
>  while (len > 0 && ISBLANK (*ns))
> @@ -670,16 +670,26 @@ public:
>if (!ISBLANK (ns[i]))
>   break;
>  
> -/* Parse off any modifiers.  */
> -while (len > 0 && !ISALNUM (*ns))
> -  {
> - con += *(ns++);
> - len--;
> -  }
> +/* Only numeric values can have modifiers.  */
> +if (numeric)
> +  /* Parse off any modifiers.  */
> +  while (len > 0 && !ISALNUM (*ns))
> + {
> +   if (*ns != '=' && *ns != '+' && *ns != '%')
> + error_at (loc, "`%c` is not a valid operand modifier", *ns);
> +   con += *(ns++);
> +   len--;
> + }
>  
>  name.assign (ns, len);
>  if (numeric)
> -  idx = strtol (name.c_str (), (char **)NULL, 10);
> +  {
> + char *endstr;
> + /* There should only be a numeric value now... */
> + idx = strtol (name.c_str (), &endstr, 10);
> + if (*endstr != '\0')
> +   error_at (loc, "operand number expected, found %s", name.c_str ());
> +  }
>}
>  
>/* Adds a character to the end of the string.  */
> @@ -832,7 +842,7 @@ parse_section_layout (file_location loc, const char 
> **templ, const char *label,
> *templ += len;
> if (val == ',')
>   (*templ)++;
> -   list.push_back (conlist (name_start, len, numeric));
> +   list.push_back (conlist (name_start, len, numeric, loc));
>   }
>  }
>  }
> @@ -845,7 +855,8 @@ parse_section_layout (file_location loc, const char 
> **templ, const char *label,
>  
>  static void
>  parse_section (const char **templ, unsigned int n_elems, unsigned int alt_no,
> -vec_conlist &list, file_location loc, const char *name)
> +vec_conlist &list, file_location loc, const char *name,
> +const char *invalid_chars = NULL)
>  {
>unsigned int i;
>  
> @@ -856,6 +867,10 @@ parse_section (const char **templ, unsigned int n_elems, 
> unsigned int alt_no,
>{
>   if (**templ == 0 || **templ == '\n')
> fatal_at (loc, "missing ']'");
> + if (invalid_chars
> + && strchr (invalid_chars, **templ))
> +   error_at (loc, "'%c' is not permitted in an alternative for a %s",
> + **templ, name);
>   list[i].add (**templ);
>   if (**templ == ',')
> {
> @@ -981,7 +996,7 @@ convert_syntax (rtx x, file_location loc)
> /* Parse the constraint list, then the attribute list.  */
> if (tconvec.size () > 0)
>   parse_section (&templ, tconvec.size (), alt_no, tconvec, loc,
> -"constraint");
> +"constraint", "=+%");
>  
> if (attrvec.size () > 0)
>   {

Re: [PATCH] testsuite: arm: Fix unsigned-extend-2.c [PR116445]

2025-05-08 Thread Christophe Lyon

Ping?

Le ven. 11 avr. 2025, 18:36, Christophe Lyon  a
écrit :

> The test was designed to pass with thumb2, but code generation changed
> with the introduction of Low Overhead Loops, so the test can fail if
> one overrides the flags when running the testsuite.
>
> In addition, useless subtract / extension instructions require -O2 to
> remove them (-O is not sufficient), so replace -O with -O2 in
> dg-options.
>
> arm_thumb2_ok_no_arm_v8_1m_lob does not do what the test needs (it can
> fail because some flags conflict, rather than because lob are
> supported, and we do not need to check runtime support in this test
> anyway), so the patch reverts back to arm_thumb2_ok.
>
> Finally, replace the scan-assembler directives with
> check-function-bodies, checking both types of code generation (with
> and without LOL).  Depending on architecture version, the two insns
> and r0, r1, r0, lsr #1
> andsr3, r3, #255
> can be swapped, so accept both orders.
>
> gcc/testsuite/ChangeLog:
>
> PR target/116445
> * gcc.target/arm/unsigned-extend-2.c: Fix dg directives.
> ---
>  .../gcc.target/arm/unsigned-extend-2.c| 33 +++
>  1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
> b/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
> index 41ee994c1ec..d9f95a14277 100644
> --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
> +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
> @@ -1,6 +1,31 @@
>  /* { dg-do compile } */
> -/* { dg-require-effective-target arm_thumb2_ok_no_arm_v8_1m_lob } */
> -/* { dg-options "-O" } */
> +/* { dg-require-effective-target arm_thumb2_ok } */
> +/* { dg-options "-O2 -mthumb" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** foo:
> +** movs(r[0-9]+), #8
> +** (
> +** subs\1, \1, #1
> +** ands\1, \1, #255
> +** and r0, r1, r0, lsr #1
> +** bne .L[0-9]+
> +** bx  lr
> +** |
> +** subs\1, \1, #1
> +** and r0, r1, r0, lsr #1
> +** ands\1, \1, #255
> +** bne .L[0-9]+
> +** bx  lr
> +** |
> +** push{lr}
> +** dls lr, \1
> +** and r0, r1, r0, lsr #1
> +** le  lr, .L[0-9]+
> +** pop {pc}
> +** )
> +*/
>
>  unsigned short foo (unsigned short x, unsigned short c)
>  {
> @@ -12,7 +37,3 @@ unsigned short foo (unsigned short x, unsigned short c)
>  }
>return x;
>  }
> -
> -/* { dg-final { scan-assembler "ands" } } */
> -/* { dg-final { scan-assembler-not "uxtb" } } */
> -/* { dg-final { scan-assembler-not "cmp" } } */
> --
> 2.34.1
>
>

[PATCH] libstdc++: Update rows in C++17 status table

2025-05-08 Thread Jonathan Wakely

Document that std::to_chars and std::from_chars are complete, mentioning
the libraries used for floating-point types.

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2017.xml: Update status for
std::to_chars and std::from_chars.
* doc/html/manual/*: Regenerate.
---

Patrick, please check that what I've added is accurate (see the XML
change at the end of the diff).

 .../doc/html/manual/source_code_style.html|  2 +-
 libstdc++-v3/doc/html/manual/status.html  | 17 -
 .../doc/xml/manual/status_cxx2017.xml | 25 ---
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/source_code_style.html 
b/libstdc++-v3/doc/html/manual/source_code_style.html
index b0b22683f67c..a66e3a079471 100644
--- a/libstdc++-v3/doc/html/manual/source_code_style.html
+++ b/libstdc++-v3/doc/html/manual/source_code_style.html
@@ -474,7 +474,7 @@
 
 Â Â Â Â Â Â Examples:Â Â _M_num_elementsÂ Â _M_initializeÂ 
()
 
-Â Â Â Â Â Â StaticÂ dataÂ members,Â constants,Â andÂ enumerations:Â _S_.*
+Â Â Â Â Â Â StaticÂ dataÂ andÂ functionÂ members,Â constants,Â andÂ 
enumerations:Â _S_.*
 
 Â Â Â Â Â Â Examples:Â _S_max_elementsÂ Â 
_S_default_value
 
diff --git a/libstdc++-v3/doc/html/manual/status.html 
b/libstdc++-v3/doc/html/manual/status.html
index 3d55e2652729..924b2e3d861e 100644
--- a/libstdc++-v3/doc/html/manual/status.html
+++ b/libstdc++-v3/doc/html/manual/status.html
@@ -927,7 +927,22 @@ since C++14 and the implementation is complete.
23
   
General utilities
-  23.1GeneralÂ Â 23.2Utility componentsÂ Â 23.2.1Header  synopsisÂ Â 23.2.2OperatorsYÂ 
23.2.3swapYÂ 
23.2.4exchangeYÂ 
23.2.5Forward/move 
helpersYÂ 23.2.6Function template as_constYÂ 
23.2.7Function template 
declvalYÂ 
23.2.8Primitive numeric 
output conversionPartialÂ 
23.2.9Primitive numeric 
input conversionPartialÂ 
23.3Compile-time integer 
sequencesÂ Â 23.4PairsYÂ 23.5TuplesYÂ 
23.6Optional 
objectsYÂ 23.7VariantsYÂ 23.8Storage 
for any typeYÂ 23.9BitsetsYÂ 23.10MemoryYÂ 
23.10.1In generalÂ Â 23.10.2Header  synopsisYÂ 23.10.3Pointer traitsYÂ 
23.10.4Pointer 
safetyYÂ 23.10.5AlignYÂ 23.10.6Allocator argument tagYÂ 23.10.7uses_allocatorYÂ 23.10.8Allocator traitsYÂ 23.10.9The default allocatorYÂ 23.10.10Specialized algorithmsYÂ 23.10.11C library memory allocationYÂ 23.11Smart pointersÂ 
Â 23.11.1Class template unique_ptrYÂ 23.11.2Shared-ownership pointersYÂ 23.12Memory resourcesÂ 
Â 23.12.1Header  
synopsisYÂ 23.12.2Class memory_resourceYÂ 23.12.3Class template polymorphic_allocatorYÂ 23.12.4Access to program-wide memory_resource 
objectsYÂ 23.12.5Pool resource classesYÂ 23.12.6Class monotonic_buffer_resourceYÂ 23.13Class 
template scoped_allocator_adaptorYÂ 23.14Function objectsÂ 
Â 23.14.1Header  
synopsisÂ Â 23.14.2DefinitionsÂ 
Â 23.14.3RequirementsÂ Â 
23.14.4Function template 
invokeYÂ 
23.14.5Class template 
reference_wrapperYÂ 23.14.6Arithmetic operationYÂ 
23.14.7ComparisonsYÂ 23.14.8Logical operationsYÂ 23.14.9Bitwise operationsYÂ 23.14.10Function template not_fnYÂ 
23.14.11Function object 
bindersYÂ 23.14.12Function template mem_fnYÂ 
23.14.13Polymorphic 
function wrappersYÂ 
23.14.14SearchersYÂ 23.14.15Class template hashYÂ 
23.15Metaprogramming and 
type traitsÂ Â 23.15.1RequirementsÂ 
Â 23.15.2Header  
synopsisYÂ 23.15.3Helper classesYÂ 23.15.4Unary Type TraitsYÂ 23.15.5Type property queriesYÂ 23.15.6Relationships between typesYÂ 23.15.7Transformations between typesYÂ 23.15.8Logical operator traitsYÂ 23.16Compile-time rational 
arithmeticYÂ 23.17.1In generalÂ 
Â 23.17.2Header  synopsisÂ Â 23.17Time utilitiesÂ 
Â 23.17.3Clock requirementsYÂ 
23.17.4Time-related 
traitsYÂ 23.17.5Class template durationYÂ 
23.17.6Class template 
time_pointYÂ 23.17.7ClocksYÂ 
23.17.8Header  synopsisYÂ 23.18Class 
type_indexYÂ 23.19Execution policiesÂ Â 
23.19.1In generalÂ Â 23.19.2Header  synopsisÂ Â 23.19.3Execution policy type traitYÂ 23.19.4Sequenced execution policyYÂ 23.19.5Parallel execution policyYÂ 23.19.6Parallel and unsequenced execution policyYÂ 23.19.7Execution policy objectsYÂ 
+  23.1GeneralÂ Â 23.2Utility componentsÂ Â 23.2.1Header  synopsisÂ Â 23.2.2OperatorsYÂ 
23.2.3swapYÂ 
23.2.4exchangeYÂ 
23.2.5Forward/move 
helpersYÂ 23.2.6Function template as_constYÂ 
23.2.7Function template 
declvalYÂ 
23.2.8Primitive numeric 
output conversionY
+   Floating-point types up to 64-bit are formatted using
+   https://github.com/ulfjack/ryu"; 
target="_top">Ryu.
+   Types with greater precision are formatted using the C library
+   (sprintf and conditionally
+   __strfrom128).
+   For powerpc64le-unknown-linux-gnu __sprintfieee128
+   must be provided by Glibc.
+  23.2.9Primitive 
numeric input conversionY
+   Floating-point types up to 64-bit are parsed using
+   https://github.com/fast_float/fast_float"; 
targ

Re: [PATCH] emit-rtl: Add extra checks for paradoxical hardware subregs [PR119966]

2025-05-08 Thread Richard Sandiford

Dimitar Dimitrov  writes:
> On Tue, May 06, 2025 at 01:17:40PM +0100, Richard Sandiford wrote:
>> Dimitar Dimitrov  writes:
>> > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
>> > following RTL for pru-unknown-elf:
>> >
>> >   (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])
>> >   (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855])
>> >   (const_int 3 [0x3])))
>> >(nil))
>> >   ...
>> >   (insn 3961 7067 3962 255 (set (reg:SI 56 r14.b0)
>> >   (zero_extend:SI (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])))
>> >(nil))
>> >
>> > into:
>> >
>> >   (insn 3961 7067 3962 255 (set (reg:SI 56 r14.b0)
>> >   (and:SI (subreg:SI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855]) 0)
>> >   (const_int 3 [0x3])))
>> >(nil))
>> >
>> > That caused libbacktrace build to break for pru-unknown-elf.  Register
>> > r0.b1 (regno 1) is not valid for SImode, which validate_subreg failed to
>> > reject.
>> >
>> > Fix by calling HARD_REGNO_MODE_OK to ensure that both inner and outer
>> > modes are valid for the hardware subreg.  Remove the premature "success"
>> > return for paradoxical subregs, in order to allow subsequent validity
>> > checks to be executed.
>> >
>> > This patch fixes the broken PRU toolchain build.  It leaves only two
>> > test case regressions for PRU, caused by rnreg pass renaming a valid
>> > paradoxical subreg into an invalid one.
>> >   gcc.c-torture/execute/20040709-1.c
>> >   gcc.c-torture/execute/20040709-2.c
>> > I consider these two a separate issue.
>> >
>> > I ensured that test results with and without this patch for
>> > x86_64-pc-linux-gnu are the same for C and C++.
>> >
>> > Ok for trunk?
>> >
>> >PR target/119966
>> >
>> > gcc/ChangeLog:
>> >
>> >* emit-rtl.cc (validate_subreg): Do not exit immediately for
>> >paradoxical subregs.  Add mode checks for validity of
>> >hardware subregs.
>> >
>> > Signed-off-by: Dimitar Dimitrov 
>> > ---
>> >  gcc/emit-rtl.cc | 9 ++---
>> >  1 file changed, 6 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
>> > index 3e2c4309dee..d63543038bb 100644
>> > --- a/gcc/emit-rtl.cc
>> > +++ b/gcc/emit-rtl.cc
>> > @@ -969,10 +969,10 @@ validate_subreg (machine_mode omode, machine_mode 
>> > imode,
>> >  }
>> >  
>> >/* Paradoxical subregs must have offset zero.  */
>> > -  if (maybe_gt (osize, isize))
>> > -return known_eq (offset, 0U);
>> > +  if (maybe_gt (osize, isize) && !known_eq (offset, 0U))
>> > +return false;
>> >  
>> > -  /* This is a normal subreg.  Verify that the offset is representable.  
>> > */
>> > +  /* Verify that the offset is representable.  */
>> >  
>> >/* For hard registers, we already have most of these rules collected in
>> >   subreg_offset_representable_p.  */
>> 
>> This part seems fine, but...
>
> Linaro CI bot notified me that this chunk caused ICE regression for
> armv8l-unknown-linux-gnueabihf.  Combine creates paradoxical MEM
> subregs, which fail the validate_subreg check a few lines below:
>
>   /* Do not allow SUBREG with stricter alignment than the inner MEM.  */
>   else if (reg && MEM_P (reg) && STRICT_ALIGNMENT
>&& MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode))
> return false;
>
> And then the following assert is triggered:
>
>   rtx
>   gen_rtx_SUBREG (machine_mode mode, rtx reg, poly_uint64 offset)
>   {
> gcc_assert (validate_subreg (mode, GET_MODE (reg), reg, offset));

Ah, right, I missed that one.  I think that too should be guarded
by known_le (osize, isize), since it will almost inevitably fail
for paradoxical subregs of mems on strict-alignment targets, and
doesn't make conceptual sense for them.

Thanks,
Richard

Re: [PATCH] Fix tree-ssa/pr31261.c testcase after r16-400 [PR120168]

2025-05-08 Thread Richard Biener




> Am 08.05.2025 um 18:19 schrieb Andrew Pinski :
> 
> AFter r16-400-g5e363ffefaceb9, on targets where char is unsigned by
> default, tree-ssa/pr31261.c testcase started to fail:
> FAIL: gcc.dg/tree-ssa/pr31261.c scan-tree-dump-times original "return 
> (char) -(unsigned char) c & 31;" 1
> 
> This is because the casts are no longer needed as both char and
> unsigned char are the same signedness.
> I was deciding between add -fsigned-char or changing the testcase
> to use explicitly `signed char`. I went with using an explicit
> `signed char` as that would be case normally.
> 
> OK?

Ok

Richard 

>PR testsuite/120168
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/tree-ssa/pr31261.c: Use `signed char` instead
>of plain char.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/testsuite/gcc.dg/tree-ssa/pr31261.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c
> index 127300fdd24..dafb4c46c9c 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c
> @@ -14,8 +14,8 @@ f2 (long int b)
>   return (16 + (b & 7)) & 15;
> }
> 
> -char
> -f3 (char c)
> +signed char
> +f3 (signed char c)
> {
>   return -(c & 63) & 31;
> }
> @@ -34,7 +34,7 @@ f5 (int e)
> 
> /* { dg-final { scan-tree-dump-times "return -a \& 7;" 1 "original" } } */
> /* { dg-final { scan-tree-dump-times "return b \& 7;" 1 "original" } } */
> -/* { dg-final { scan-tree-dump-times "return \\(char\\) -\\(unsigned char\\) 
> c \& 31;" 1 "original" } } */
> +/* { dg-final { scan-tree-dump-times "return \\(signed char\\) -\\(unsigned 
> char\\) c \& 31;" 1 "original" } } */
> /* { dg-final { scan-tree-dump-times "return \\(int\\) \\(12 - \\(unsigned 
> int\\) d\\) \& 7;" 1 "original" { target { ! int16 } } } } */
> /* { dg-final { scan-tree-dump-times "return \\(int\\) \\(12 - \\(unsigned 
> short\\) d\\) \& 7;" 1 "original" { target { int16 } } } } */
> /* { dg-final { scan-tree-dump-times "return 12 - \\(e \& 7\\) \& 15;" 1 
> "original" } } */
> --
> 2.43.0
>

[PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-08 Thread Pengfei Li

This patch folds vector expressions of the form (x + y) >> 1 into
IFN_AVG_FLOOR (x, y), reducing instruction count on platforms that
support averaging operations. For example, it can help improve the
codegen on AArch64 from:
add v0.4s, v0.4s, v31.4s
ushrv0.4s, v0.4s, 1
to:
uhadd   v0.4s, v0.4s, v31.4s

As this folding is only valid when the most significant bit of each
element in both x and y is known to be zero, this patch checks leading
zero bits of elements in x and y, and extends get_nonzero_bits_1() to
handle uniform vectors. When the input is a uniform vector, the function
now returns the nonzero bits of its element.

Additionally, this patch adds more checks to reject vector types in bit
constant propagation (tree-bit-ccp), since tree-bit-ccp was designed for
scalar values only, and the new vector logic in get_non_zero_bits_1()
could lead to incorrect propagation results.

Bootstrapped and tested on aarch64-linux-gnu and x86_64_linux_gnu.

gcc/ChangeLog:

* match.pd: Add folding rule for vector average.
* tree-ssa-ccp.cc (get_default_value): Reject vector types.
(evaluate_stmt): Reject vector types.
* tree-ssanames.cc (get_nonzero_bits_1): Extend to handle
uniform vectors.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/uhadd_1.c: New test.
---
 gcc/match.pd  |  9 +
 .../gcc.target/aarch64/acle/uhadd_1.c | 34 +++
 gcc/tree-ssa-ccp.cc   |  8 ++---
 gcc/tree-ssanames.cc  |  8 +
 4 files changed, 55 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ab496d923cc..ddd16a10944 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2177,6 +2177,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (view_convert (rshift (view_convert:ntype @0) @1))
 (convert (rshift (convert:ntype @0) @1))
 
+ /* Fold ((x + y) >> 1 into IFN_AVG_FLOOR (x, y) if x and y are vectors in
+which each element is known to have at least one leading zero bit.  */
+(simplify
+ (rshift (plus:cs @0 @1) integer_onep)
+ (if (VECTOR_TYPE_P (type)
+  && wi::clz (get_nonzero_bits (@0)) > 0
+  && wi::clz (get_nonzero_bits (@1)) > 0)
+  (IFN_AVG_FLOOR @0 @1)))
+
 /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
when profitable.
For bitwise binary operations apply operand conversions to the
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c 
b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
new file mode 100644
index 000..f1748a199ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/uhadd_1.c
@@ -0,0 +1,34 @@
+/* Test if SIMD fused unsigned halving adds are generated */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include 
+
+#define FUSED_SIMD_UHADD(vectype, q, ts, mask) \
+  vectype simd_uhadd ## q ## _ ## ts ## _1 (vectype a) \
+  { \
+vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
+vectype v2 = vdup ## q ## _n_ ## ts (mask); \
+return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
+  } \
+  \
+  vectype simd_uhadd ## q ## _ ## ts ## _2 (vectype a, vectype b) \
+  { \
+vectype v1 = vand ## q ## _ ## ts (a, vdup ## q ## _n_ ## ts (mask)); \
+vectype v2 = vand ## q ## _ ## ts (b, vdup ## q ## _n_ ## ts (mask)); \
+return vshr ## q ## _n_ ## ts (vadd ## q ## _ ## ts (v1, v2), 1); \
+  }
+
+FUSED_SIMD_UHADD (uint8x8_t, , u8, 0x7f)
+FUSED_SIMD_UHADD (uint8x16_t, q, u8, 0x7f)
+FUSED_SIMD_UHADD (uint16x4_t, , u16, 0x7fff)
+FUSED_SIMD_UHADD (uint16x8_t, q, u16, 0x7fff)
+FUSED_SIMD_UHADD (uint32x2_t, , u32, 0x7fff)
+FUSED_SIMD_UHADD (uint32x4_t, q, u32, 0x7fff)
+
+/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.8b,} 2 } } */
+/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.16b,} 2 } } */
+/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.4h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.8h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.2s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tuhadd\tv[0-9]+\.4s,} 2 } } */
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 8d2cbb384c4..3e0c75cf2be 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -298,7 +298,7 @@ get_default_value (tree var)
{
  val.lattice_val = VARYING;
  val.mask = -1;
- if (flag_tree_bit_ccp)
+ if (flag_tree_bit_ccp && !VECTOR_TYPE_P (TREE_TYPE (var)))
{
  wide_int nonzero_bits = get_nonzero_bits (var);
  tree value;
@@ -2491,11 +2491,11 @@ evaluate_stmt (gimple *stmt)
   is_constant = (val.lattice_val == CONSTANT);
 }
 
+  tree lhs = gimple_get_lhs (stmt);
   if (flag_tree_bit_ccp
+  && lhs && TREE_CODE (lhs) == SSA_NAME && !VECTOR_TYPE_P (TREE_TYPE (lhs))
   && ((is_constant && TREE_C

[PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-08 Thread Pengfei Li

This patch improves the auto-vectorization for loops with known small
trip counts by enabling the use of subvectors - bit fields of original
wider vectors. A subvector must have the same vector element type as the
original vector and enough bits for all vector elements to be processed
in the loop. Using subvectors is beneficial because machine instructions
operating on narrower vectors usually show better performance.

To enable this optimization, this patch introduces a new target hook.
This hook allows the vectorizer to query the backend for a suitable
subvector type given the original vector type and the number of elements
to be processed in the small-trip-count loop. The target hook also has a
could_trap parameter to say if the subvector is allowed to have more
bits than needed.

This optimization is currently enabled for AArch64 only. Below example
shows how it uses AdvSIMD vectors as subvectors of SVE vectors for
higher instruction throughput.

Consider this loop operating on an array of 16-bit integers:

for (int i = 0; i < 5; i++) {
  a[i] = a[i] < 0 ? -a[i] : a[i];
}

Before this patch, the generated AArch64 code would be:

ptrue   p7.h, vl5
ptrue   p6.b, all
ld1hz31.h, p7/z, [x0]
abs z31.h, p6/m, z31.h
st1hz31.h, p7, [x0]

After this patch, it is optimized to:

ptrue   p7.h, vl5
ld1hz31.h, p7/z, [x0]
abs v31.8h, v31.8h
st1hz31.h, p7, [x0]

This patch also helps eliminate the ptrue in the case.

Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_find_subvector_type):
Implement target hook for finding subvectors for AArch64.
* doc/tm.texi: Document the new target hook.
* doc/tm.texi.in: Document the new target hook.
* expmed.cc (extract_bit_field_as_subreg): Support expanding
BIT_FIELD_REF for subvector types to SUBREG in RTL.
* match.pd: Prevent simplification of BIT_FIELD_REF for
subvector types to VIEW_CONVERT.
* target.def: New target hook definition.
* targhooks.cc (default_vectorize_find_subvector_type): Provide
default implementation for the target hook.
* tree-cfg.cc (verify_types_in_gimple_reference): Update GIMPLE
verification for BIT_FIELD_REF used for subvectors.
* tree-vect-stmts.cc (vectorizable_operation): Output vectorized
GIMPLE with subvector types.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cond_unary_6.c: Adjust loop trip counts
to avoid triggering this new optimization.
* gcc.target/aarch64/vect-subvector-1.c: New test.
* gcc.target/aarch64/vect-subvector-2.c: New test.
---
 gcc/config/aarch64/aarch64.cc | 39 
 gcc/doc/tm.texi   | 12 +++
 gcc/doc/tm.texi.in|  2 +
 gcc/expmed.cc |  5 +-
 gcc/match.pd  |  3 +-
 gcc/target.def| 17 
 gcc/targhooks.cc  |  8 ++
 gcc/targhooks.h   |  3 +
 .../gcc.target/aarch64/sve/cond_unary_6.c |  4 +-
 .../gcc.target/aarch64/vect-subvector-1.c | 28 ++
 .../gcc.target/aarch64/vect-subvector-2.c | 28 ++
 gcc/tree-cfg.cc   |  8 ++
 gcc/tree-vect-stmts.cc| 90 ++-
 13 files changed, 240 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-subvector-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-subvector-2.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index fff8d9da49d..700f1646706 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -17012,6 +17012,42 @@ aarch64_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
 }
 }
 
+/* Implement TARGET_VECTORIZE_FIND_SUBVECTOR_TYPE.  */
+static tree
+aarch64_find_subvector_type (tree vectype, unsigned HOST_WIDE_INT elem_cnt,
+bool could_trap)
+{
+  gcc_assert (VECTOR_TYPE_P (vectype));
+
+  /* AArch64 AdvSIMD vectors are treated as subvectors of SVE for all
+ vectorization preferences except "sve-only".  */
+  if (aarch64_autovec_preference == AARCH64_AUTOVEC_SVE_ONLY)
+return NULL_TREE;
+
+  /* No subvectors for AdvSIMD or partial vectors, since elements in partial
+ vectors could be non-consecutive.  */
+  machine_mode mode = TYPE_MODE (vectype);
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  if ((vec_flags & VEC_ADVSIMD) || (vec_flags & VEC_PARTIAL))
+return NULL_TREE;
+
+  tree innertype = TREE_TYPE (vectype);
+  unsigned int scalar_prec = TYPE_PRECISION (innertype);
+  unsigned int data_bits = elem_cnt * scalar_prec;
+
+  /* If the operation could trap, w

Re: [PATCH v2] libstdc++: Provide ability to query _Sink_iter if writes are discarded.

2025-05-08 Thread Jonathan Wakely

On Tue, 6 May 2025 at 13:30, Tomasz Kamiński  wrote:
>
> This patch provides _M_discarding functiosn for _Sink_iter and _Sink function
> that returns true, if any further writes to the _Sink_iter and underlying 
> _Sink,
> will be discared, and thus can be omitted.
>
> Currently only the _Padding_sink reports discarding mode of if width of 
> sequence
> characters is greater than _M_maxwidth (precision), or underlying _Sink is
> discarding characters. The _M_discarding override, is separate function from
> _M_ignoring, that remain annotated with [[__gnu__::__always_inline__]].
>
> Despite having notion of maximum characters to be written (_M_max), _Iter_sink
> nevers discard characters, as the total number of characters that would be 
> written
> needs to be returned by format_to_n. This is documented in-source by 
> providing an
> _Iter_sink::_M_discarding override, that always returns false.
>
> The function is currently queried only by the _Padding_sinks, that may be 
> stacked
> for example a range is formatted, with padding with being specified both for 
> range
> itself and it's elements. The state of underlying sink is checked during 
> construction
> and after each write (_M_sync_discarding).
>
> libstdc++-v3/ChangeLog:
>
> * include/std/format (__Sink_iter<_CharT>::_M_discarding)
> (__Sink<_CharT>::_M_discarding, _Iter_sink<_CharT, 
> _OutIter>::_M_discarding)
> (_Padding_sinl<_CharT, _Out>::_M_padwidth)
> (_Padding_sink<_CharT, _Out>::_M_maxwidth): Remove const.
> (_Padding_sink<_CharT, _Out>::_M_sync_discarding)
> (_Padding_sink<_CharT, _Out>::_M_discarding): Define.
> (_Padding_sink<_CharT, _Out>::_Padding_sink(_Out, size_t, size_t))
> (_Padding_sink<_CharT, _Out>::_M_force_update):
> (_Padding_sink<_CharT, _Out>::_M_flush): Call _M_sync_discarding.
> (_Padding_sink<_CharT, _Out>::_Padding_sink(_Out, size_t)): Delegate.
> ---
> I have replaced operator==(default_sentinel_t) with _M_discarding member
> function. Replaced standard reference to textual one.
> For the comments on _Iter_sink, I have removed the second sentence:
> +   // format_to_n return total number of characters, that would be 
> written,
> +   // see C++20 [format.functions] p20
> OK for trunk?

OK, thanks.


>
>
>  libstdc++-v3/include/std/format | 64 ++---
>  1 file changed, 52 insertions(+), 12 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
> index 054ce350440..b3192cf2868 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format
> @@ -3144,6 +3144,10 @@ namespace __format
>auto
>_M_reserve(size_t __n) const
>{ return _M_sink->_M_reserve(__n); }
> +
> +  bool
> +  _M_discarding() const
> +  { return _M_sink->_M_discarding(); }
>  };
>
>// Abstract base class for type-erased character sinks.
> @@ -3263,6 +3267,11 @@ namespace __format
>_M_bump(size_t __n)
>{ _M_next += __n; }
>
> +  // Returns true if the _Sink is discarding incoming characters.
> +  virtual bool
> +  _M_discarding() const
> +  { return false; }
> +
>  public:
>_Sink(const _Sink&) = delete;
>_Sink& operator=(const _Sink&) = delete;
> @@ -3488,6 +3497,14 @@ namespace __format
> _M_count += __s.size();
>}
>
> +  bool
> +  _M_discarding() const override
> +  {
> +   // format_to_n return total number of characters, that would be 
> written,
> +   // see C++20 [format.functions] p20
> +   return false;
> +  }
> +
>  public:
>[[__gnu__::__always_inline__]]
>explicit
> @@ -3550,6 +3567,14 @@ namespace __format
>   }
>}
>
> +  bool
> +  _M_discarding() const override
> +  {
> +   // format_to_n return total number of characters, that would be 
> written,
> +   // see C++20 [format.functions] p20
> +   return false;
> +  }
> +
>typename _Sink<_CharT>::_Reservation
>_M_reserve(size_t __n) final
>{
> @@ -3636,17 +3661,15 @@ namespace __format
>template
>  class _Padding_sink : public _Str_sink<_CharT>
>  {
> -  const size_t _M_padwidth;
> -  const size_t _M_maxwidth;
> +  size_t _M_padwidth;
> +  size_t _M_maxwidth;
>_Out _M_out;
>size_t _M_printwidth;
>
>[[__gnu__::__always_inline__]]
>bool
>_M_ignoring() const
> -  {
> -   return _M_printwidth >= _M_maxwidth;
> -  }
> +  { return _M_printwidth >= _M_maxwidth; }
>
>[[__gnu__::__always_inline__]]
>bool
> @@ -3659,12 +3682,21 @@ namespace __format
> return false;
>}
>
> +  void
> +  _M_sync_discarding()
> +  {
> +   if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
> + if (_M_out._M_discarding())
> +   _M_maxwidth = _M_printwidth;
> +  }
> +
>void
>

[PATCH] tree-optimization/119960 - failed external SLP promotion

2025-05-08 Thread Richard Biener

The following addresses a too conservative sanity check of SLP nodes
we want to promote external.  The issue lies in code generation
for such external which relies on get_later_stmt to figure an
insert location.  But get_later_stmt relies on the ability to
totally order stmts, specifically implementation-wise that they
are all from the same BB, which is what is verified at the moment.

The patch changes this to require stmts to be orderable by
dominance queries.  For simplicity and seemingly enough for the
testcase in PR119960, this handles the case of two distinct BBs.

Bootstrapped on x86_64-unknown-linux-gnu.

I'm considering this for GCC 15.2 given it's a recent optimization
regression.  It requires some dependences to be backported though.

PR tree-optimization/119960
* tree-vect-slp.cc (vect_slp_can_convert_to_external):
Handle cases where defs from multiple BBs are ordered
by their dominance relation.

* gcc.dg/vect/bb-slp-pr119960-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr119960-1.c | 15 +
 gcc/tree-vect-slp.cc  | 63 ---
 2 files changed, 71 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr119960-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr119960-1.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr119960-1.c
new file mode 100644
index 000..955fc7e3220
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr119960-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+double foo (double *dst, double *src, int b)
+{
+  double y = src[1];
+  if (b)
+{
+  dst[0] = src[0];
+  dst[1] = y;
+}
+  return y;
+}
+
+/* { dg-final { scan-tree-dump "optimized: basic block part vectorized" "slp2" 
{ target vect_double } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7791fe1f87f..f7c51b6cf68 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7846,21 +7846,70 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
node, node_instance, cost_vec);
 }
 
+static int
+sort_ints (const void *a_, const void *b_)
+{
+  int a = *(const int *)a_;
+  int b = *(const int *)b_;
+  return a - b;
+}
+
 /* Verify if we can externalize a set of internal defs.  */
 
 static bool
 vect_slp_can_convert_to_external (const vec &stmts)
 {
+  /* Constant generation uses get_later_stmt which can only handle
+ defs from the same BB or a set of defs that can be ordered
+ with a dominance query.  */
   basic_block bb = NULL;
+  bool all_same = true;
+  auto_vec bbs;
+  bbs.reserve_exact (stmts.length ());
   for (stmt_vec_info stmt : stmts)
-if (!stmt)
-  return false;
-/* Constant generation uses get_later_stmt which can only handle
-   defs from the same BB.  */
-else if (!bb)
-  bb = gimple_bb (stmt->stmt);
-else if (gimple_bb (stmt->stmt) != bb)
+{
+  if (!stmt)
+   return false;
+  else if (!bb)
+   bb = gimple_bb (stmt->stmt);
+  else if (gimple_bb (stmt->stmt) != bb)
+   all_same = false;
+  bbs.quick_push (gimple_bb (stmt->stmt)->index);
+}
+  if (all_same)
+return true;
+
+  /* Produce a vector of unique BB indexes for the defs.  */
+  bbs.qsort (sort_ints);
+  unsigned i, j;
+  for (i = 1, j = 1; i < bbs.length (); ++i)
+if (bbs[i] != bbs[j-1])
+  bbs[j++] = bbs[i];
+  gcc_assert (j >= 2);
+  bbs.truncate (j);
+
+  if (bbs.length () == 2)
+return (dominated_by_p (CDI_DOMINATORS,
+   BASIC_BLOCK_FOR_FN (cfun, bbs[0]),
+   BASIC_BLOCK_FOR_FN (cfun, bbs[1]))
+   || dominated_by_p (CDI_DOMINATORS,
+  BASIC_BLOCK_FOR_FN (cfun, bbs[1]),
+  BASIC_BLOCK_FOR_FN (cfun, bbs[0])));
+
+  /* ???  For more than two BBs we can sort the vector and verify the
+ result is a total order.  But we can't use vec::qsort with a
+ compare function using a dominance query since there's no way to
+ signal failure and any fallback for an unordered pair would
+ fail qsort_chk later.
+ For now simply hope that ordering after BB index provides the
+ best candidate total order.  If required we can implement our
+ own mergesort or export an entry without checking.  */
+  for (unsigned i = 1; i < bbs.length (); ++i)
+if (!dominated_by_p (CDI_DOMINATORS,
+BASIC_BLOCK_FOR_FN (cfun, bbs[i]),
+BASIC_BLOCK_FOR_FN (cfun, bbs[i-1])))
   return false;
+
   return true;
 }
 
-- 
2.43.0

[14.x PATCH] c: Allow bool and enum null pointer constants [PR112556]

2025-05-08 Thread Sam James

From: Joseph Myers 

As reported in bug 112556, GCC wrongly rejects conversion of null
pointer constants with bool or enum type to pointers in
convert_for_assignment (assignment, initialization, argument passing,
return).  Fix the code there to allow BOOLEAN_TYPE and ENUMERAL_TYPE;
it already allowed INTEGER_TYPE and BITINT_TYPE.

This bug (together with -std=gnu23 meaning false has type bool rather
than int) has in turn resulted in people thinking they need to fix
code using false as a null pointer constant for C23 compatibility.
While such a usage is certainly questionable, it has nothing to do
with C23 compatibility and the right place for warnings about such
usage is -Wzero-as-null-pointer-constant.  I think it would be
appropriate to extend -Wzero-as-null-pointer-constant to cover
BOOLEAN_TYPE, ENUMERAL_TYPE and BITINT_TYPE (in all the various
contexts in which that option generates warnings), though this patch
doesn't do anything about that option.

Bootstrapped with no regressions for x86-64-pc-linux-gnu.

PR c/112556

gcc/c/
* c-typeck.cc (convert_for_assignment): Allow conversion of
ENUMERAL_TYPE and BOOLEAN_TYPE null pointer constants to pointers.

gcc/testsuite/
* gcc.dg/c11-null-pointer-constant-1.c,
gcc.dg/c23-null-pointer-constant-1.c: New tests.

(cherry picked from commit 3d525fce70fa0ffa0b22af6e213643e1ceca5ab5)
---
As discussed on the PR, I feel like this is worth having for 14 as we're
asking upstreams to try reproduce issues w/ -std=gnu23 (or -std=c23) if
they don't have access to GCC 15, and this bug may lead to them being
confused.

Regtested on x86_64-pc-linux-gnu with no regressions.

OK?

 gcc/c/c-typeck.cc |   2 +
 .../gcc.dg/c11-null-pointer-constant-1.c  |  55 
 .../gcc.dg/c23-null-pointer-constant-1.c  | 120 ++
 3 files changed, 177 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/c11-null-pointer-constant-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-null-pointer-constant-1.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 44d705befdcc..e1c35cdcda12 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -8151,6 +8151,8 @@ convert_for_assignment (location_t location, location_t 
expr_loc, tree type,
 }
   else if (codel == POINTER_TYPE
   && (coder == INTEGER_TYPE
+  || coder == ENUMERAL_TYPE
+  || coder == BOOLEAN_TYPE
   || coder == NULLPTR_TYPE
   || coder == BITINT_TYPE))
 {
diff --git a/gcc/testsuite/gcc.dg/c11-null-pointer-constant-1.c 
b/gcc/testsuite/gcc.dg/c11-null-pointer-constant-1.c
new file mode 100644
index ..f463a1a59da3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-null-pointer-constant-1.c
@@ -0,0 +1,55 @@
+/* Test zero with different types as null pointer constant: bug 112556.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic-errors -Wno-pointer-compare" } */
+
+enum e { ZERO };
+
+void *p1 = 0;
+void *p2 = 0LL;
+void *p3 = (char) 0;
+void *p4 = 0UL;
+void *p5 = (_Bool) 0;
+void *p6 = (enum e) ZERO;
+
+void f (void *);
+
+void *
+g (void)
+{
+  p1 = 0;
+  p2 = 0LL;
+  p3 = (char) 0;
+  p4 = 0UL;
+  p5 = (_Bool) 0;
+  p6 = (enum e) ZERO;
+  f (0);
+  f (0ULL);
+  f (0L);
+  f ((char) 0);
+  f ((_Bool) 0);
+  f ((enum e) ZERO);
+  (1 ? p1 : 0);
+  (1 ? p1 : 0L);
+  (1 ? p1 : 0ULL);
+  (1 ? p1 : (char) 0);
+  (1 ? p1 : (_Bool) 0);
+  (1 ? p1 : (enum e) 0);
+  p1 == 0;
+  p1 == 0LL;
+  p1 == 0U;
+  p1 == (char) 0;
+  p1 == (_Bool) 0;
+  p1 == (enum e) 0;
+  p1 != 0;
+  p1 != 0LL;
+  p1 != 0U;
+  p1 != (char) 0;
+  p1 != (_Bool) 0;
+  p1 != (enum e) 0;
+  return 0;
+  return 0UL;
+  return 0LL;
+  return (char) 0;
+  return (_Bool) 0;
+  return (enum e) 0;
+}
diff --git a/gcc/testsuite/gcc.dg/c23-null-pointer-constant-1.c 
b/gcc/testsuite/gcc.dg/c23-null-pointer-constant-1.c
new file mode 100644
index ..71b66cc35d6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-null-pointer-constant-1.c
@@ -0,0 +1,120 @@
+/* Test zero with different types as null pointer constant: bug 112556.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c23 -pedantic-errors -Wno-pointer-compare" } */
+
+enum e { ZERO };
+enum e2 : bool { BZERO };
+enum e3 : long { LZERO };
+
+void *p1 = 0;
+void *p2 = 0LL;
+void *p3 = (char) 0;
+void *p4 = 0UL;
+void *p5 = (bool) 0;
+void *p6 = (enum e) ZERO;
+void *p7 = false;
+void *p8 = BZERO;
+void *p9 = (enum e2) 0;
+void *p10 = LZERO;
+void *p11 = (enum e3) 0;
+#ifdef __BITINT_MAXWIDTH__
+void *p12 = 0wb;
+void *p13 = 0uwb;
+#endif
+
+void f (void *);
+
+void *
+g (void)
+{
+  p1 = 0;
+  p2 = 0LL;
+  p3 = (char) 0;
+  p4 = 0UL;
+  p5 = (bool) 0;
+  p6 = (enum e) ZERO;
+  p7 = false;
+  p8 = BZERO;
+  p9 = (enum e2) 0;
+  p10 = LZERO;
+  p11 = (enum e3) 0;
+#ifdef __BITINT_MAXWIDTH__
+  p12 = 0wb;
+  p13 = 0uwb;
+#endif
+  f (0);
+  f (0ULL);
+  f (0L);
+  f ((char) 0);
+  f ((bool) 0);
+  f ((enum e) ZERO);
+  f (false);
+  f (BZERO

Re: [PATCH] libstdc++: Update rows in C++17 status table

2025-05-08 Thread Jakub Jelinek

On Thu, May 08, 2025 at 02:50:27PM +0100, Jonathan Wakely wrote:
> Document that std::to_chars and std::from_chars are complete, mentioning
> the libraries used for floating-point types.
> 
> libstdc++-v3/ChangeLog:
> 
>   * doc/xml/manual/status_cxx2017.xml: Update status for
>   std::to_chars and std::from_chars.
>   * doc/html/manual/*: Regenerate.
> ---
> 
> Patrick, please check that what I've added is accurate (see the XML
> change at the end of the diff).

> + __strfrom128).

s/__strfrom128/strfromf128/

Missing f before 128 and the __ is just the name of the weak alias for it,
it uses __asm ("strfromf128").

> + __strfrom128).

Ditto.

Jakub

Re: [PATCH] testsuite: g++.dg/cpp2a/decomp2.C requires tls_runtime

2025-05-08 Thread Jakub Jelinek

On Thu, May 08, 2025 at 03:07:29PM +0200, Christophe Lyon wrote:
> Ping?
> 
> Le jeu. 17 avr. 2025, 11:21, Christophe Lyon  a
> écrit :
> 
> > Since this test is a 'dg-do run', it requires tls_runtime rather than
> > just tls.
> >
> > This makes the test UNSUPPORTED on targets such as arm-non-eabi,
> > instead of FAIL/UNRESOLVED because __aeabi_read_tp is not provided
> > (e.g. when GCC is configured with --enable-threads=no.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/cpp2a/decomp2.C: Require tls_runtime.

LGTM.
> > --- a/gcc/testsuite/g++.dg/cpp2a/decomp2.C
> > +++ b/gcc/testsuite/g++.dg/cpp2a/decomp2.C
> > @@ -1,7 +1,7 @@
> >  // P1091R3
> >  // { dg-do run { target c++11 } }
> >  // { dg-options "" }
> > -// { dg-require-effective-target tls }
> > +// { dg-require-effective-target tls_runtime }
> >  // { dg-add-options tls }
> >
> >  namespace std {

Jakub

Re: [PATCH v2] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-08 Thread Richard Sandiford

Konstantinos Eleftheriou  writes:
> During the base register initialization, when we are eliminating the load
> instruction, we were calling `emit_move_insn` on registers of the same
> size but of different mode in some cases, causing an ICE.
>
> This patch uses `lowpart_subreg` for the base register initialization,
> instead of zero-extending it. We had tried this solution before, but
> we were leaving undefined bytes in the upper part of the register.
> This shouldn't be happening as we are supposed to write the whole
> register when the load is eliminated. This was occurring when having
> multiple stores with the same offset as the load, generating a
> register move for all of them, overwriting the bit inserts that
> were inserted before them. With this patch we are generating a register
> move only for the first store of this kind, using a bit insert for the
> rest of them.

That feels wrong though.  If there are multiple stores to the same offset
then it becomes a question of which bytes of which stores survive until
the load.  E.g. for a QI store followed by an HI store followed by an SI
store, the final SI store wins and the previous ones should be ignored.
If it's QI, SI, HI, then for little endian, the low two bytes come from
the HI and the next two bytes come from the SI.  The QI store should
again be ignored.

So I would expect this to depend on which store is widest, with ties
broken by picking later stores (i.e. those earlier in the list).

I'm also not sure why this is only a problem with using lowparts.
Wouldn't the same issue apply when using zero_extend?  The bytes are
fully-defined for zero_extend, but not necessarily to the right values.

Did you consider making the main:

  /* Check if we can emit bit insert instructions for all forwarded stores.  */
  FOR_EACH_VEC_ELT (stores, i, it)
{

loop also maintain a bitmap of bytes that still need to be forwarded, so
that we can skip stores that contribute nothing?  Later RTL optimisations
might not be able to work that out in all cases, and in any case it would
be good to avoid redundant operations given that we have the information
to hand.

Thanks,
Richard

>
> Bootstrapped/regtested on AArch64 and x86_64.
>
> PR rtl-optimization/119884
>
> gcc/ChangeLog:
>
> * avoid-store-forwarding.cc (process_store_forwarding):
>   Use `lowpart_subreg` for the base register initialization,
>   only for the first store that has the same offset as the
>   load.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr119884.c: New test.
> ---
>  gcc/avoid-store-forwarding.cc| 28 ++--
>  gcc/testsuite/gcc.target/i386/pr119884.c | 13 +++
>  2 files changed, 34 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119884.c
>
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> index ded8d7e596e0..90e6563e7b26 100644
> --- a/gcc/avoid-store-forwarding.cc
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -225,24 +225,35 @@ process_store_forwarding (vec &stores, 
> rtx_insn *load_insn,
>  
>int move_to_front = -1;
>int total_cost = 0;
> +  unsigned int curr_zero_offset_count = 0;
> +
> +  /* Count the stores with zero offset.  */
> +  unsigned int zero_offset_store_num = 0;
> +  FOR_EACH_VEC_ELT (stores, i, it)
> +{
> +  if (it->offset == 0)
> + zero_offset_store_num++;
> +}
>  
>/* Check if we can emit bit insert instructions for all forwarded stores.  
> */
>FOR_EACH_VEC_ELT (stores, i, it)
>  {
>it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
>rtx_insn *insns = NULL;
> +  const bool has_zero_offset = it->offset == 0;
> +  const bool is_first_store_in_chain
> + = curr_zero_offset_count == (zero_offset_store_num - 1);
>  
>/* If we're eliminating the load then find the store with zero offset
> -  and use it as the base register to avoid a bit insert if possible.  */
> -  if (load_elim && it->offset == 0)
> +  and use it as the base register to avoid a bit insert if possible.
> +  If there are multiple stores with zero offset, do it for the first
> +  one only (the last in the reversed vector).  */
> +  if (load_elim && has_zero_offset && is_first_store_in_chain)
>   {
> start_sequence ();
>  
> -   machine_mode dest_mode = GET_MODE (dest);
> -   rtx base_reg = it->mov_reg;
> -   if (known_gt (GET_MODE_BITSIZE (dest_mode),
> - GET_MODE_BITSIZE (GET_MODE (it->mov_reg
> - base_reg = gen_rtx_ZERO_EXTEND (dest_mode, it->mov_reg);
> +   rtx base_reg = lowpart_subreg (GET_MODE (dest), it->mov_reg,
> +  GET_MODE (it->mov_reg));
>  
> if (base_reg)
>   {
> @@ -257,6 +268,9 @@ process_store_forwarding (vec &stores, 
> rtx_insn *load_insn,
> end_sequence ();
>   }
>  
> +  if (has_zero_offset)
> + curr_zero_offset_count+

Re: [PATCH] testsuite: g++.dg/cpp2a/constinit16.C requires tls

2025-05-08 Thread Christophe Lyon

Ping?

Le jeu. 17 avr. 2025, 11:21, Christophe Lyon  a
écrit :

> This test is 'dg-do compile', so require tls instead of tls_runtime.
>
> This enables it on targets such as arm-none-eabi configured with
> --enable-threads=no.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp2a/constinit16.C: Require tls.
> ---
>  gcc/testsuite/g++.dg/cpp2a/constinit16.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/cpp2a/constinit16.C
> b/gcc/testsuite/g++.dg/cpp2a/constinit16.C
> index dda81d50619..046e9aa7c6a 100644
> --- a/gcc/testsuite/g++.dg/cpp2a/constinit16.C
> +++ b/gcc/testsuite/g++.dg/cpp2a/constinit16.C
> @@ -2,7 +2,7 @@
>  // { dg-do compile { target c++20 } }
>  // { dg-add-options tls }
>  // { dg-require-alias "" }
> -// { dg-require-effective-target tls_runtime }
> +// { dg-require-effective-target tls }
>  // { dg-final { scan-assembler-not "_ZTH17mythreadlocalvar1" } }
>  // { dg-final { scan-assembler "_ZTH17mythreadlocalvar2" } }
>  // { dg-final { scan-assembler-not "_ZTH17mythreadlocalvar3" } }
> --
> 2.34.1
>
>

Re: [RFC PATCH 0/2] Add target_clones profile option support

2025-05-08 Thread Yangyu Chen




> On 8 May 2025, at 18:36, Richard Sandiford  wrote:
> 
> Yangyu Chen  writes:
>>> On 6 May 2025, at 17:49, Alfie Richards  wrote:
>>> 
>>> On 06/05/2025 09:36, Yangyu Chen wrote:
> On 6 May 2025, at 16:01, Alfie Richards  wrote:
> 
> Additionally, I think ideally the file can express functions 
> disambiguated by file, signature, and namespace.
> I imagine we could use similar syntax to gdb supports?
> 
> For example:
> 
> ```
> foo  |arch=+v
> bar(int, char)   |arch=+zba,+zbb
> file.C:baz(char) |arch=+zba,+zbb#arch=+v
> namespace::qux   |arch=+v
> ```
 Also a great idea. However, I think it's not easy to use to implement
 it now in GCC. But I would like to accept any further feedback if
 we have such a simple API in GCC to do so, or if it will be implemented
 by the community.
 And something behind this idea is that I'm researching auto-generating
 target clones attributes for developers. Only accepting the ASM
 name is enough to implement this.
>>> 
>>> Ah that makes sense, apologies I missed that.
>>> 
>>> I think accepting the assembler name is good, and solves the overloading 
>>> ambiguity issue.
>>> 
>>> Maybe we can use the pipe '|' instead of ':' in the file format to leave 
>>> room for both in future?
>> 
>> 
>> I will consider using the pipe '|' in the next revision. Thanks for
>> the advice.
> 
> How about instead using a json file?  There's already a parser built into
> the compiler.
> 
> That has the advantage of being an established format that generators
> can use.  It would also allow other ways of specifying the functions
> to be added in future.
> 

Thanks for this useful information. I also want to extend the current
function multi-versioning to have the ability to set -mtune for
different micro-architectures in the future. Using JSON can provide
more extensibility in the future.

Thanks,
Yangyu Chen

Re: [PATCH] testsuite: g++.dg/cpp2a/constinit16.C requires tls

2025-05-08 Thread Jakub Jelinek

On Thu, May 08, 2025 at 03:07:50PM +0200, Christophe Lyon wrote:
> Ping?
> 
> Le jeu. 17 avr. 2025, 11:21, Christophe Lyon  a
> écrit :
> 
> > This test is 'dg-do compile', so require tls instead of tls_runtime.
> >
> > This enables it on targets such as arm-none-eabi configured with
> > --enable-threads=no.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/cpp2a/constinit16.C: Require tls.

Ok.

> > --- a/gcc/testsuite/g++.dg/cpp2a/constinit16.C
> > +++ b/gcc/testsuite/g++.dg/cpp2a/constinit16.C
> > @@ -2,7 +2,7 @@
> >  // { dg-do compile { target c++20 } }
> >  // { dg-add-options tls }
> >  // { dg-require-alias "" }
> > -// { dg-require-effective-target tls_runtime }
> > +// { dg-require-effective-target tls }
> >  // { dg-final { scan-assembler-not "_ZTH17mythreadlocalvar1" } }
> >  // { dg-final { scan-assembler "_ZTH17mythreadlocalvar2" } }
> >  // { dg-final { scan-assembler-not "_ZTH17mythreadlocalvar3" } }

Jakub

[PATCH] libstdc++: Update C++23 status table

2025-05-08 Thread Jonathan Wakely

This should have been updated for the GCC 15.1 release.

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2023.xml: Update status of proposals
implemented after GCC 14.2 release.
* doc/html/manual/status.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/status.html  | 40 +
 .../doc/xml/manual/status_cxx2023.xml | 45 ++-
 2 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/status.html 
b/libstdc++-v3/doc/html/manual/status.html
index 7308d47824f0..71e05a0269a9 100644
--- a/libstdc++-v3/doc/html/manual/status.html
+++ b/libstdc++-v3/doc/html/manual/status.html
@@ -1847,13 +1847,15 @@ or any notes about the implementation.
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2278r4.html"; 
target="_top">
 P2278R4
 
-   13.1  __cpp_lib_ranges_as_const >= 202207L  ranges::to 

+   13.1  __cpp_lib_ranges_as_const >= 202207L  ranges::to 
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1206r7.pdf"; 
target="_top">
 P1206R7
 
-   14.1 (ranges::to 
function) 
-   __cpp_lib_containers_ranges >= 202202L,
-   __cpp_lib_ranges_to_container >= 202202L
+  
+14.1 (ranges::to function)  15.1 (new members 
in containers) 
+  
+   __cpp_lib_ranges_to_container >= 202202L,
+   __cpp_lib_containers_ranges >= 202202L
Ranges iterators as 
inputs to non-Ranges algorithms 
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2408r5.html"; 
target="_top">
 P2408R5
@@ -1893,11 +1895,11 @@ or any notes about the implementation.
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2322r6.html"; 
target="_top">
 P2322R6
 
-   13.1  __cpp_lib_ranges_fold >= 202207L  Relaxing Ranges Just A Smidge
+   13.1  __cpp_lib_ranges_fold >= 202207L  Relaxing Ranges Just A Smidge
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2609r3.html"; 
target="_top">
 P2609R3
 
-__cpp_lib_ranges >= 202302L 
+   14.3  __cpp_lib_ranges >= 202302L 
 Compile-time programming
A proposal for a type trait to detect 
scoped enumerations 
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1048r1.pdf"; 
target="_top">
@@ -1927,11 +1929,11 @@ or any notes about the implementation.
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0533r9.pdf"; 
target="_top">
 P0533R9
 
-__cpp_lib_constexpr_cmath >= 202202L  Deprecate std::aligned_storage and 
std::aligned_union 
+__cpp_lib_constexpr_cmath >= 202202L  Deprecate std::aligned_storage and std::aligned_union 
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1413r3.pdf"; 
target="_top">
 P1413R3
 
-   Â  A type trait to detect reference binding to temporary 
+   13.1 Â  A type trait to detect reference binding to temporary 
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2255r2.html"; 
target="_top">
 P2255R2
 
@@ -1973,15 +1975,15 @@ or any notes about the implementation.
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2077r3.html"; 
target="_top">
 P2077R3
 
-__cpp_lib_associative_heterogeneous_erasure >= 202110L 
  
+__cpp_lib_associative_heterogeneous_erasure >= 202110L 
  

 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p0429r9.pdf"; 
target="_top">
 P0429R9
 
-__cpp_lib_flat_map >= 202207L   

+   15.1  __cpp_lib_flat_map >= 202207L   
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1222r4.pdf"; 
target="_top">
 P1222R4
 
-__cpp_lib_flat_set >= 202207L  mdspan 
+   15.1  __cpp_lib_flat_set >= 202207L  mdspan 
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p0009r18.html"; 
target="_top">
 P0009R18
 
@@ -2048,27 +2050,29 @@ or any notes about the implementation.
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2508r1.html"; 
target="_top">
 P2508R1
 
-   13.1 (feature test macro not defined) __cpp_lib_format >= 202207L 

+   13.1 (feature test macro not updated until 
15.1)  __cpp_lib_format >= 
202207L 
Clarify handling of encodings in localized formatting of chrono types
   
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2419r2.html"; 
target="_top">
 P2419R2
 
-__cpp_lib_format >= 202207L 
+   15.1  __cpp_lib_format >= 202207L 
Formatting pointers
   
 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2510r3.pdf"; 
target="_top">
 P2510R3
 
-   13.2 (feature test macro not defined) __cpp_lib_format >= 202207L 
 Formatting Ranges 
+   13.2 (feature test macro not upd

[PATCH] Fix tree-ssa/pr31261.c testcase after r16-400 [PR120168]

2025-05-08 Thread Andrew Pinski

AFter r16-400-g5e363ffefaceb9, on targets where char is unsigned by
default, tree-ssa/pr31261.c testcase started to fail:
FAIL: gcc.dg/tree-ssa/pr31261.c scan-tree-dump-times original "return 
(char) -(unsigned char) c & 31;" 1

This is because the casts are no longer needed as both char and
unsigned char are the same signedness.
I was deciding between add -fsigned-char or changing the testcase
to use explicitly `signed char`. I went with using an explicit
`signed char` as that would be case normally.

OK?

PR testsuite/120168

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr31261.c: Use `signed char` instead
of plain char.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr31261.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c
index 127300fdd24..dafb4c46c9c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31261.c
@@ -14,8 +14,8 @@ f2 (long int b)
   return (16 + (b & 7)) & 15;
 }
 
-char
-f3 (char c)
+signed char
+f3 (signed char c)
 {
   return -(c & 63) & 31;
 }
@@ -34,7 +34,7 @@ f5 (int e)
 
 /* { dg-final { scan-tree-dump-times "return -a \& 7;" 1 "original" } } */
 /* { dg-final { scan-tree-dump-times "return b \& 7;" 1 "original" } } */
-/* { dg-final { scan-tree-dump-times "return \\(char\\) -\\(unsigned char\\) c 
\& 31;" 1 "original" } } */
+/* { dg-final { scan-tree-dump-times "return \\(signed char\\) -\\(unsigned 
char\\) c \& 31;" 1 "original" } } */
 /* { dg-final { scan-tree-dump-times "return \\(int\\) \\(12 - \\(unsigned 
int\\) d\\) \& 7;" 1 "original" { target { ! int16 } } } } */
 /* { dg-final { scan-tree-dump-times "return \\(int\\) \\(12 - \\(unsigned 
short\\) d\\) \& 7;" 1 "original" { target { int16 } } } } */
 /* { dg-final { scan-tree-dump-times "return 12 - \\(e \& 7\\) \& 15;" 1 
"original" } } */
-- 
2.43.0

Re: [PATCH] libstdc++: Update rows in C++17 status table

2025-05-08 Thread Björn Schäpers


Am 08.05.2025 um 15:50 schrieb Jonathan Wakely:

Document that std::to_chars and std::from_chars are complete, mentioning
the libraries used for floating-point types.

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2017.xml: Update status for
std::to_chars and std::from_chars.
* doc/html/manual/*: Regenerate.
---

Patrick, please check that what I've added is accurate (see the XML
change at the end of the diff).

  .../doc/html/manual/source_code_style.html|  2 +-
  libstdc++-v3/doc/html/manual/status.html  | 17 -
  .../doc/xml/manual/status_cxx2017.xml | 25 ---
  3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/source_code_style.html 
b/libstdc++-v3/doc/html/manual/source_code_style.html
index b0b22683f67c..a66e3a079471 100644
--- a/libstdc++-v3/doc/html/manual/source_code_style.html
+++ b/libstdc++-v3/doc/html/manual/source_code_style.html
@@ -474,7 +474,7 @@
  
    Examples:  _M_num_elements  _M_initialize ()
  
-  Static data members, constants, and enumerations: _S_.*
+  Static data and function members, constants, and enumerations: _S_.*
  
    Examples: _S_max_elements  _S_default_value
  
diff --git a/libstdc++-v3/doc/html/manual/status.html 
b/libstdc++-v3/doc/html/manual/status.html
index 3d55e2652729..924b2e3d861e 100644
--- a/libstdc++-v3/doc/html/manual/status.html
+++ b/libstdc++-v3/doc/html/manual/status.html
@@ -927,7 +927,22 @@ since C++14 and the implementation is complete.
23

General utilities
-  23.1General  23.2Utility components  23.2.1Header  synopsis  23.2.2OperatorsY 23.2.3swapY 23.2.4exchangeY 23.2.5Forward/move helpersY 23.2.6Function template as_constY 23.2.7Function template declvalY 23.2.8Primitive numeric output conversionPartial 23.2.9Primitive numeric input conversionPartial 23.3Compile-time integer sequences  23.4PairsY 23.5TuplesY 23.6Optional objectsY 23.7VariantsY 23.8Storage for any typeY 23.9BitsetsY 23.10MemoryY 23.10.1In general  23.10.2Header  synopsisY 23.10.3Pointer traitsY 23.10.4Pointer safetyY 23.10.5AlignY 23.10.6Allocator argument tagY 23.10.7uses_allocatorY 23.10.8Allocator traitsY 23.10.9The default allocatorY 23.10.10Specialized algorithmsY 23.10.11C library memory allocationY 23.11Smart pointers  23.11.1Class template unique_ptrY 23.11.2Shared-ownership pointersY 23.12Memory resources  23.12.1Header  synopsisY 23.12.2Class memory_resourceY 23.12.3Class template polymorphic_allocatorY 23.12.4Access to program-wide memory_resource objectsY 23.12.5Pool resource 
classesY 23.12.6Class monotonic_buffer_resourceY 23.13Class template scoped_allocator_adaptorY 23.14Function objects  23.14.1Header  synopsis  23.14.2Definitions  23.14.3Requirements  23.14.4Function template invokeY 23.14.5Class template reference_wrapperY 23.14.6Arithmetic operationY 23.14.7ComparisonsY 23.14.8Logical operationsY 23.14.9Bitwise operationsY 23.14.10Function template not_fnY 23.14.11Function object bindersY 23.14.12Function template mem_fnY 23.14.13Polymorphic function wrappersY 23.14.14SearchersY 23.14.15Class template hashY 23.15Metaprogramming and type traits  23.15.1Requirements  23.15.2Header  synopsisY 23.15.3Helper classesY 23.15.4Unary Type TraitsY 23.15.5Type property queriesY 23.15.6Relationships between typesY 23.15.7Transformations between typesY 23.15.8Logical operator traitsY 23.16Compile-time rational arithmeticY 23.17.1In general  23.17.2Header  synopsis  23.17Time utilities  23.17.3Clock requirementsY 23.17.4Time-related traitsY 23.17.5Class template durationY 23.17.6Class template time_pointY 23.17.7ClocksY 23.17.8Header  synopsisY 23.18Class type_indexY 23.19Execution policies  23.19.1In general  23.19.2Header  synopsis  23.19.3Execution policy type traitY 23.19.4Sequenced execution policyY 23.19.5Parallel execution policyY 23.19.6Parallel and unsequenced execution policyY 23.19.7Execution policy objectsY 
+  23.1General  23.2Utility components  23.2.1Header  synopsis  23.2.2OperatorsY 23.2.3swapY 23.2.4exchangeY 23.2.5Forward/move helpersY 23.2.6Function template as_constY 23.2.7Function template declvalY 23.2.8Primitive 
numeric output conversionY
+   Floating-point types up to 64-bit are formatted using
+   https://github.com/ulfjack/ryu"; 
target="_top">Ryu.
+   Types with greater precision are formatted using the C library
+   (sprintf and conditionally
+   __strfrom128).
+   For powerpc64le-unknown-linux-gnu __sprintfieee128
+   must be provided by Glibc.
+  23.2.9Primitive numeric input conversionY
+   Floating-point types up to 64-bit are parsed using
+   https://github.com/fast_float/fast_float"; 
target="_top">fast_float.
+   Types with greater precision are parsed using the C library
+   (strtold).
+   For powerpc64le-unknown-linux-gnu __strtoiee

Re: [PATCH] libstdc++: Use _Padding_sink in __formatter_chrono to produce padded output.

2025-05-08 Thread Jonathan Wakely

On Wed, 7 May 2025 at 12:00, Tomasz Kamiński  wrote:
>
> Formatting code is extracted to _M_format_to function, that produced output
> to specified iterator. This function is now invoked either with __fc.out()
> directly (if width is not specified) or _Padding_sink::out().
>
> This avoid formatting to temporary string if no padding is requested,
> and minimize allocations otherwise. For more details see commit message of
> r16-142-g01e5ef3e8b91288f5d387a27708f9f8979a50edf.
>
> This should not increase number of instantiations, as implementation only
> produce basis_format_context with _Sink_iter as iterator, which is also
> _Padding_sink iterator.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (__formatter_chrono::_M_format_to):
> Extracted from _M_format.
> (__formatter_chrono::_M_format): Use _Padding_sink and delegate
> to _M_format_to.
> ---
> I have checked that there are no other calls to out() in this file,
> so _M_format_to uses only __out, and not iterator from __fc.
> Testing on x86_64-linux. OK for trunk?

OK, thanks.

>
>  libstdc++-v3/include/bits/chrono_io.h | 55 ++-
>  1 file changed, 20 insertions(+), 35 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
> b/libstdc++-v3/include/bits/chrono_io.h
> index 620227a9f35..ace8b9f2629 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -503,9 +503,7 @@ namespace __format
> _M_format(const _Tp& __t, _FormatContext& __fc,
>   bool __is_neg = false) const
> {
> - auto __first = _M_spec._M_chrono_specs.begin();
> - const auto __last = _M_spec._M_chrono_specs.end();
> - if (__first == __last)
> + if (_M_spec._M_chrono_specs.empty())
> return _M_format_to_ostream(__t, __fc, __is_neg);
>
>  #if defined _GLIBCXX_USE_NL_LANGINFO_L && __CHAR_BIT__ == 8
> @@ -525,29 +523,29 @@ namespace __format
> __fc._M_loc =  __with_encoding_conversion(__loc);
> }
>  #endif
> -
> - _Sink_iter<_CharT> __out;
> - __format::_Str_sink<_CharT> __sink;
> - bool __write_direct = false;
> - if constexpr (is_same_v - _Sink_iter<_CharT>>)
> -   {
> - if (_M_spec._M_width_kind == __format::_WP_none)
> -   {
> - __out = __fc.out();
> - __write_direct = true;
> -   }
> - else
> -   __out = __sink.out();
> -   }
> - else
> -   __out = __sink.out();
> -
>   // formatter passes the correct value of __is_neg
>   // for durations but for hh_mm_ss we decide it here.
>   if constexpr (__is_specialization_of<_Tp, chrono::hh_mm_ss>)
> __is_neg = __t.is_negative();
>
> + const size_t __padwidth = _M_spec._M_get_width(__fc);
> + if (__padwidth == 0)
> +   return _M_format_to(__t, __fc.out(), __fc, __is_neg);
> +
> + using _Out = typename _FormatContext::iterator;
> + _Padding_sink<_Out, _CharT> __sink(__fc.out(), __padwidth);
> + _M_format_to(__t, __sink.out(), __fc, __is_neg);
> + return __sink._M_finish(_M_spec._M_align, _M_spec._M_fill);
> +   }
> +
> +  template
> +   _Out
> +   _M_format_to(const _Tp& __t, _Out __out, _FormatContext& __fc,
> +bool __is_neg) const
> +   {
> + auto __first = _M_spec._M_chrono_specs.begin();
> + const auto __last = _M_spec._M_chrono_specs.end();
> +
>   auto __print_sign = [&__is_neg, &__out] {
> if constexpr (chrono::__is_duration_v<_Tp>
> || __is_specialization_of<_Tp, chrono::hh_mm_ss>)
> @@ -699,20 +697,7 @@ namespace __format
> }
> }
>   while (__first != __last);
> -
> - if constexpr (is_same_v - _Sink_iter<_CharT>>)
> -   if (__write_direct)
> - return __out;
> -
> - auto __str = __sink.view();
> - size_t __width;
> - if constexpr (__unicode::__literal_encoding_is_unicode<_CharT>())
> -   __width = __unicode::__field_width(__str);
> - else
> -   __width = __str.size();
> - return __format::__write_padded_as_spec(__str, __width,
> - __fc, _M_spec);
> + return std::move(__out);
> }
>
>_ChronoSpec<_CharT> _M_spec;
> --
> 2.49.0
>

Re: [PATCH] libstdc++: Use scope guard for deallocating nodes in deque.

2025-05-08 Thread Jonathan Wakely

On Fri, 18 Apr 2025 at 10:03, Tomasz Kamiński  wrote:
>
> This patch adds a _Guard_nodes scope guard nested to the _Deque_base,
> that deallocates the range of nodes, and replaces __try/__catch block
> with approparietly constructed guard object.

"appropriately"

>
> libstdc++-v3/ChangeLog:
>
> * include/bits/deque.tcc (_Deque_base<_Tp, _Alloc>::_Guard_nodes): 
> Define.

There's no need for the template argument list here, just
"_Deque_base" is unambiguous (there's no partial or explicit
specialization that could be disambiguated with template argument
lists). And just "deque" below.

> (_Deque_base<_Tp, _Alloc>::_M_create_nodes): Moved defintion from 
> stl_deque.h
> and replace __try/__catch with _Guard_nodes scope object.
> (deque<_Tp, _Alloc>::_M_fill_insert, deque<_Tp, 
> _Alloc>::_M_default_append)
> (deque<_Tp, _Alloc>::_M_push_back_aux, deque<_Tp, 
> _Alloc>::_M_push_front_aux)
> (deque<_Tp, _Alloc>::_M_range_prepend, deque<_Tp, 
> _Alloc>::_M_range_append)
> (deque<_Tp, _Alloc>::_M_insert_aux): Replace __try/__catch with 
> _Guard_nodes
> scope object.
> (deque<_Tp, _Alloc>::_M_new_elements_at_back)
> (deque<_Tp, _Alloc>::_M_new_elements_at_back): Use _M_create_nodes.
> * include/bits/stl_deque.h (_Deque_base<_Tp, _Alloc>::_Guard_nodes): 
> Declare.
> (_Deque_base<_Tp, _Alloc)::_M_create_nodes): Move defintion to 
> deque.tcc.
> (deque<_Tp, _Alloc>::_Guard_nodes): Add typedef, so name is found by 
> lookup.
> ---
> Testing x86_64-linux, default test configuration passed.
> OK for trunk?
>
>  libstdc++-v3/include/bits/deque.tcc   | 424 --
>  libstdc++-v3/include/bits/stl_deque.h |  20 +-
>  2 files changed, 196 insertions(+), 248 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/deque.tcc 
> b/libstdc++-v3/include/bits/deque.tcc
> index dabb6ec5365..b70eed69294 100644
> --- a/libstdc++-v3/include/bits/deque.tcc
> +++ b/libstdc++-v3/include/bits/deque.tcc
> @@ -63,6 +63,40 @@ namespace std _GLIBCXX_VISIBILITY(default)
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>
> +  template
> +struct

No new line here, just "struct _Deque_base...".

> +_Deque_base<_Tp, _Alloc>::_Guard_nodes
> +  {
> +   _Guard_nodes(_Deque_base& __self,
> +_Map_pointer __first, _Map_pointer __last)
> +   : _M_self(__self), _M_first(__first), _M_last(__last)
> +   { }
> +
> +   ~_Guard_nodes()
> +   { _M_self._M_destroy_nodes(_M_first, _M_last); }
> +
> +   void _M_disarm()
> +   { _M_first = _M_last; }
> +
> +   _Deque_base& _M_self;
> +   _Map_pointer _M_first;
> +   _Map_pointer _M_last;
> +
> +  private:
> +   _Guard_nodes(_Guard_nodes const&);
> +  };
> +
> +  template
> +void
> +_Deque_base<_Tp, _Alloc>::
> +_M_create_nodes(_Map_pointer __nstart, _Map_pointer __nfinish)
> +{
> +  _Guard_nodes __guard(*this, __nstart, __nstart);
> +  for (_Map_pointer& __cur = __guard._M_last; __cur < __nfinish; ++__cur)
> +   *__cur = this->_M_allocate_node();
> +  __guard._M_disarm();
> +}
> +
>  #if __cplusplus >= 201103L
>template 
>  void
> @@ -310,35 +344,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>if (__pos._M_cur == this->_M_impl._M_start._M_cur)
> {
>   iterator __new_start = _M_reserve_elements_at_front(__n);
> - __try
> -   {
> - std::__uninitialized_fill_a(__new_start, this->_M_impl._M_start,
> - __x, _M_get_Tp_allocator());
> - this->_M_impl._M_start = __new_start;
> -   }
> - __catch(...)
> -   {
> - _M_destroy_nodes(__new_start._M_node,
> -  this->_M_impl._M_start._M_node);
> - __throw_exception_again;
> -   }
> + _Guard_nodes __guard(*this, __new_start._M_node,
> + this->_M_impl._M_start._M_node);
> +
> + std::__uninitialized_fill_a(__new_start, this->_M_impl._M_start,
> + __x, _M_get_Tp_allocator());
> + __guard._M_disarm();
> + this->_M_impl._M_start = __new_start;
> }
>else if (__pos._M_cur == this->_M_impl._M_finish._M_cur)
> {
>   iterator __new_finish = _M_reserve_elements_at_back(__n);
> - __try
> -   {
> - std::__uninitialized_fill_a(this->_M_impl._M_finish,
> - __new_finish, __x,
> - _M_get_Tp_allocator());
> - this->_M_impl._M_finish = __new_finish;
> -   }
> - __catch(...)
> -   {
> - _M_destroy_nodes(this->_M_impl._M_finish._M_node + 1,
> -  __new_finish._M_node + 1);
> - __throw_exception_again;
> -   }
> +

RE: [PATCH ]RISCV :Added MIPS P8700 Subtarget

2025-05-08 Thread Palmer Dabbelt


On Thu, 08 May 2025 08:53:18 PDT (-0700), ukala...@mips.com wrote:

Hi All ,

We have couple of patch series that enables the P8700 tune for RISCV core to 
upstream for GCC mainline.

It will be good to hear from you guys on the patch feedback 


It's kind of hard to read because your patch is getting mangled by some 
email-related thing.


Can you try using git-send-email to send a clean v2 of the patch?



Thank you in advance
~U



-Original Message-
From: Umesh Kalappa 
Sent: 03 May 2025 11:27

To: Jeff Law ; gcc-patches@gcc.gnu.org; 
pal...@dabbelt.com
Cc: kito.ch...@sifive.com; Jesse Huang ; 
and...@sifive.com
Subject: Re: [PATCH]RISCV :Added MIPS P8700 Subtarget

Hi @Jeff Law and @pal...@dabbelt.com ,

Please do needful by reviewing the below changes and helps us to upstream the 
same .

Thank you
~U

-Original Message-
From: Umesh Kalappa
Sent: 29 April 2025 16:16
To: Umesh Kalappa ; Jeff Law ; 
gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; Jesse Huang ; 
pal...@dabbelt.com; and...@sifive.com
Subject: RE: [EXTERNAL]Re: [PATCH]RISCV :Added MIPS P8700 Subtarget

Hi all,

Here is the updated patch that address some of the   @Jeff Law comments .

P8700  don't  have a vector engine and we support the insns type till 
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.md#L358 
and schedule module enabled the same .

---
 gcc/config/riscv/mips-p8700.md   | 139 +++
 gcc/config/riscv/riscv-cores.def |   5 ++
 gcc/config/riscv/riscv-opts.h|   3 +-
 gcc/config/riscv/riscv.cc|  22 +
 gcc/config/riscv/riscv.md|   3 +-
 5 files changed, 170 insertions(+), 2 deletions(-)  create mode 100644 
gcc/config/riscv/mips-p8700.md

diff --git a/gcc/config/riscv/mips-p8700.md b/gcc/config/riscv/mips-p8700.md 
new file mode 100644 index 000..11d0b1ca793
--- /dev/null
+++ b/gcc/config/riscv/mips-p8700.md
@@ -0,0 +1,139 @@
+;; DFA-based pipeline description for MIPS P8700.
+;;
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it ;; 
+under the terms of the GNU General Public License as published ;; by 
+the Free Software Foundation; either version 3, or (at your ;; option) 
+any later version.

+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT 
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public ;; 
+License for more details.

+
+;; You should have received a copy of the GNU General Public License ;; 
+along with GCC; see the file COPYING3.  If not see ;; 
+.

+
+(define_automaton "mips_p8700_agen_alq_pipe, mips_p8700_mdu_pipe,
+mips_p8700_fpu_pipe")
+
+;; The address generation queue (AGQ) has AL2, CTISTD and LDSTA pipes 
+(define_cpu_unit "mips_p8700_agq, mips_p8700_al2, mips_p8700_ctistd, mips_p8700_lsu"

+"mips_p8700_agen_alq_pipe")
+
+(define_cpu_unit "mips_p8700_gpmul, mips_p8700_gpdiv" 
+"mips_p8700_mdu_pipe")

+
+;; The arithmetic-logic-unit queue (ALQ) has ALU pipe (define_cpu_unit 
+"mips_p8700_alq, mips_p8700_alu" "mips_p8700_agen_alq_pipe")

+
+;; The floating-point-unit queue (FPQ) has short and long pipes 
+(define_cpu_unit "mips_p8700_fpu_short, mips_p8700_fpu_long"

+"mips_p8700_fpu_pipe")
+
+;; Long FPU pipeline.
+(define_cpu_unit "mips_p8700_fpu_apu" "mips_p8700_fpu_pipe")
+
+(define_reservation "mips_p8700_agq_al2" "mips_p8700_agq,
+mips_p8700_al2") (define_reservation "mips_p8700_agq_ctistd" 
+"mips_p8700_agq, mips_p8700_ctistd") (define_reservation 
+"mips_p8700_agq_lsu" "mips_p8700_agq, mips_p8700_lsu") 
+(define_reservation "mips_p8700_alq_alu" "mips_p8700_alq,

+mips_p8700_alu")
+
+;;
+;; FPU pipe
+;;
+
+(define_insn_reservation "mips_p8700_fpu_fadd" 4
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fabs" 2
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fcmp,fmove"))
+  "mips_p8700_fpu_short, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fload" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpload"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fstore" 1
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fpstore"))
+  "mips_p8700_agq_lsu")
+
+(define_insn_reservation "mips_p8700_fpu_fmadd" 8
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmadd"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_fmul" 5
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fmul"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu")
+
+(define_insn_reservation "mips_p8700_fpu_div" 17
+  (and (eq_attr "tune" "mips_p8700")
+   (eq_attr "type" "fdiv,fsqrt"))
+  "mips_p8700_fpu_long, mips_p8700_fpu_apu*17")

[PATCH] gimple-fold: Don't replace `{true/false} != false` with `true/false` inside GIMPLE_COND

2025-05-08 Thread Andrew Pinski

This is like the patch where we don't want to replace `bool_name != 0`
with `bool_name` but for instead for INTEGER_CST. The only thing
difference is there are a few different forms for always true/always
false; only handle it if it was in the canonical form. A few new helpers are
added for the canonical form detection.

This also replaces the previous version of the patch which did an early
exit from fold_stmt_1 instead so we can change the non-canonical form
into a canonical in the end.

gcc/ChangeLog:

* gimple.h (gimple_cond_true_canonical_p): New function.
(gimple_cond_false_canonical_p): New function.
* gimple-fold.cc (replace_stmt_with_simplification): Return
false if replacing the operands of GIMPLE_COND with an INTEGER_CST
and already in canonical form.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc | 15 +--
 gcc/gimple.h   | 30 ++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index f801e8b6d41..e63fd6f2f2f 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6258,10 +6258,21 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
}
   else if (code == INTEGER_CST)
{
+ /* Make into the canonical form `1 != 0` and `0 != 0`.
+If already in the canonical form return false
+saying nothing has been done.  */
  if (integer_zerop (ops[0]))
-   gimple_cond_make_false (cond_stmt);
+   {
+ if (gimple_cond_false_canonical_p (cond_stmt))
+   return false;
+ gimple_cond_make_false (cond_stmt);
+   }
  else
-   gimple_cond_make_true (cond_stmt);
+   {
+ if (gimple_cond_true_canonical_p (cond_stmt))
+   return false;
+ gimple_cond_make_true (cond_stmt);
+   }
}
   else if (!inplace)
{
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 032365f3da2..977ff1c923c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -3875,6 +3875,21 @@ gimple_cond_true_p (const gcond *gs)
   return false;
 }
 
+/* Check if conditional statement GS is in the caonical form of 'if (1 != 0)'. 
*/
+
+inline bool
+gimple_cond_true_canonical_p (const gcond *gs)
+{
+  tree lhs = gimple_cond_lhs (gs);
+  tree rhs = gimple_cond_rhs (gs);
+  tree_code code = gimple_cond_code (gs);
+  if (code == NE_EXPR
+  && lhs == boolean_true_node
+  && rhs == boolean_false_node)
+return true;
+  return false;
+}
+
 /* Check if conditional statement GS is of the form 'if (1 != 1)',
'if (0 != 0)', 'if (1 == 0)' or 'if (0 == 1)' */
 
@@ -3900,6 +3915,21 @@ gimple_cond_false_p (const gcond *gs)
   return false;
 }
 
+/* Check if conditional statement GS is in the caonical form of 'if (0 != 0)'. 
*/
+
+inline bool
+gimple_cond_false_canonical_p (const gcond *gs)
+{
+  tree lhs = gimple_cond_lhs (gs);
+  tree rhs = gimple_cond_rhs (gs);
+  tree_code code = gimple_cond_code (gs);
+  if (code == NE_EXPR
+  && lhs == boolean_false_node
+  && rhs == boolean_false_node)
+return true;
+  return false;
+}
+
 /* Set the code, LHS and RHS of GIMPLE_COND STMT from CODE, LHS and RHS.  */
 
 inline void
-- 
2.43.0

Re: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread Robin Dapp


it's just a vector cost model issue and some loops are not profitable
to vectorize?


Yes. For example, when gpr2vr is 1, int8_t cannot vectorize while uint8_t can.


OK, understood.  I think that's expected given the fine granularity of the 
tests.  IMHO nothing that should block progress.


--
Regards
Robin

Re: [PATCH] libstdc++: Update rows in C++17 status table

2025-05-08 Thread Jonathan Wakely

On Thu, 8 May 2025 at 18:57, Björn Schäpers  wrote:
>
> Am 08.05.2025 um 15:50 schrieb Jonathan Wakely:
> > Document that std::to_chars and std::from_chars are complete, mentioning
> > the libraries used for floating-point types.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * doc/xml/manual/status_cxx2017.xml: Update status for
> >   std::to_chars and std::from_chars.
> >   * doc/html/manual/*: Regenerate.
> > ---
> >
> > Patrick, please check that what I've added is accurate (see the XML
> > change at the end of the diff).
> >
> >   .../doc/html/manual/source_code_style.html|  2 +-
> >   libstdc++-v3/doc/html/manual/status.html  | 17 -
> >   .../doc/xml/manual/status_cxx2017.xml | 25 ---
> >   3 files changed, 38 insertions(+), 6 deletions(-)
> >
> > diff --git a/libstdc++-v3/doc/html/manual/source_code_style.html 
> > b/libstdc++-v3/doc/html/manual/source_code_style.html
> > index b0b22683f67c..a66e3a079471 100644
> > --- a/libstdc++-v3/doc/html/manual/source_code_style.html
> > +++ b/libstdc++-v3/doc/html/manual/source_code_style.html
> > @@ -474,7 +474,7 @@
> >   
> > Examples:  _M_num_elements  _M_initialize 
> > ()
> >   
> > -  Static data members, constants, and enumerations:  > class="literal">_S_.*
> > +  Static data and function members, constants, and enumerations:  > class="literal">_S_.*
> >   
> > Examples: _S_max_elements  
> > _S_default_value
> >   
> > diff --git a/libstdc++-v3/doc/html/manual/status.html 
> > b/libstdc++-v3/doc/html/manual/status.html
> > index 3d55e2652729..924b2e3d861e 100644
> > --- a/libstdc++-v3/doc/html/manual/status.html
> > +++ b/libstdc++-v3/doc/html/manual/status.html
> > @@ -927,7 +927,22 @@ since C++14 and the implementation is complete.
> >   23
> > 
> >   General utilities
> > -  23.1 > align="left">General  
> > 23.2Utility 
> > components   > align="left">23.2.1Header  > class="code"> synopsis  > align="left"> 23.2.2 > align="left">OperatorsY 
> > 23.2.3 > class="code">swapY 
> > 23.2.4 > class="code">exchangeY 
> > 23.2.5Forward/move 
> > helpersY  > align="left">23.2.6Function template  > class="code">as_constY 
> > 23.2.7Function 
> > template declvalY > align="left"> 23.2.8 > align="left">Primitive numeric output conversion > align="left">Partial  > align="left">23.2.9Primitive numeric input 
> > conversionPartial 
> > 23.3Compile-time 
> > integer sequences  
> > 23.4Pairs > align="left">Y  > align="left">23.5Tuples > align="left">Y  > align="left">23.6Optional objects > align="left">Y  > align="left">23.7Variants > align="left">Y  > align="left">23.8Storage for any type > align="left">Y  > align="left">23.9Bitsets > align="left">Y  > align="left">23.10Memory > align="left">Y  > align="left">23.10.1In general 
> >  23.10.2 > align="left">Header  
> > synopsisY  > align="left">23.10.3Pointer traits > align="left">Y  > align="left">23.10.4Pointer safety > align="left">Y  > align="left">23.10.5Align > align="left">Y  > align="left">23.10.6Allocator argument tag > align="left">Y  > align="left">23.10.7 > class="code">uses_allocatorY > align="left"> 23.10.8 > align="left">Allocator traitsY 
> > 23.10.9The default 
> > allocatorY  > align="left">23.10.10Specialized algorithms > align="left">Y  > align="left">23.10.11C library memory 
> > allocationY  > align="left">23.11Smart pointers > align="left">   > align="left">23.11.1Class template  > class="code">unique_ptrY 
> > 23.11.2 > align="left">Shared-ownership pointersY > align="left"> 23.12 > align="left">Memory resources  
> > 23.12.1Header  > class="code"> synopsis > align="left">Y  > align="left">23.12.2Class  > class="code">memory_resourceY > align="left"> 23.12.3 > align="left">Class template  > class="code">polymorphic_allocatorY > align="left"> 23.12.4 > align="left">Access to program-wide  > class="code">memory_resource objectsY > align="left"> 23.12.5 > align="left">Pool resource classesY > align="left"> 23.12.6 > align="left">Class  > class="code">monotonic_buffer_resource > align="left">Y  > align="left">23.13Class template  > class="code">scoped_allocator_adaptorY > align="left"> 23.14 > align="left">Function objects  
> > 23.14.1Header  > class="code"> synopsis 
> >  23.14.2 > align="left">Definitions  
> > 23.14.3 > align="left">Requirements  
> > 23.14.4Function 
> > template invokeY > align="left"> 23.14.5 > align="left">Class template  > class="code">reference_wrapperY > align="left"> 23.14.6 > align="left">Arithmetic operationY > align="left"> 23.14.7 > align="left">ComparisonsY 
> > 23.14.8Logical 
> > operationsY  > align="left">23.14.9Bitwise operations > align="left">Y  > align="left">23.14.10Function template  > class="code">not_fnY 
> > 23.14.11Function 
> > object bindersY 
> > 23.14.12Function 
> > template mem_fnY > align="left"> 23.14.13 > align="left">Polymorphic function wrappersY > align="left"> 23.14.14 > align="left">SearchersY 
>

[PATCH 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-05-08 Thread Karl Meakin

Move the rules for CBZ/TBZ to be above the rules for
CBB/CBH/CB. We want them to have higher priority
because they can express larger displacements.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cbz1): Move
above rules for CBB/CBH/CB.
(*aarch64_tbz1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: Update tests.
---
 gcc/config/aarch64/aarch64.md| 170 ---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c |  35 ++---
 2 files changed, 110 insertions(+), 95 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 641c3653a40..aa528cd13b4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -697,27 +697,38 @@ (define_insn "jump"
 ;; Maximum PC-relative positive/negative displacements for various branching
 ;; instructions.
 (define_constants
   [
 ;; +/- 128MiB.  Used by B, BL.
 (BRANCH_LEN_P_128MiB  134217724)
 (BRANCH_LEN_N_128MiB -134217728)
 
 ;; +/- 1MiB.  Used by B., CBZ, CBNZ.
 (BRANCH_LEN_P_1MiB  1048572)
 (BRANCH_LEN_N_1MiB -1048576)
 
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
 
 ;; +/- 1KiB.  Used by CBB, CBH, CB.
 (BRANCH_LEN_P_1Kib  1020)
 (BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
 ;; ---
 ;; Conditional jumps
+;; The order of the rules below is important.
+;; Higher priority rules are preferred because they can express larger
+;; displacements.
+;; 1) EQ/NE comparisons against zero are handled by CBZ/CBNZ.
+;; 2) LT/GE comparisons against zero are handled by TBZ/TBNZ.
+;; 3) When the CMPBR extension is enabled:
+;;   a) Comparisons between two registers are handled by
+;;  CBB/CBH/CB.
+;;   b) Comparisons between a GP register and an immediate in the range 0-63 
are
+;;  handled by CB.
+;; 4) Otherwise, emit a CMP+B sequence.
 ;; ---
 
 (define_expand "cbranch4"
@@ -770,63 +781,140 @@ (define_expand "cbranch4"
 (define_expand "cbranchcc4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (match_operand 2 "const0_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
   ""
 )
 
+;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
+(define_insn "aarch64_cbz1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
+(define_insn "*aarch64_tbz1"
+  [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "register_operand" "r")
+(const_int 0))
+  (label_ref (match_operand 1))
+  (pc)))
+   (clobber (reg:CC CC_REGNUM))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  {
+   if (get_attr_far_branch (insn) == FAR_BRANCH_YES)
+ return aarch64_gen_far_branch (operands, 1, "Ltb",
+"\\t%0, , ");
+   else
+ {
+   char buf[64];
+   uint64_t val = ((uint64_t) 1)
+   << (GET_MODE_SIZE (mode) * BITS_PER_UNIT - 1);
+   sprintf (buf, "tst\t%%0, %" PRId64, val);
+   output_asm_insn (buf, operands);
+   return "\t%l1";
+ }
+  }
+else
+  return "\t%0, , %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_32KiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_32KiB)))
+ (const_int 4)
+

[PATCH 7/9] AArch64: precommit test for CMPBR instructions

2025-05-08 Thread Karl Meakin

Commit the test file `cmpbr.c` before rules for generating the new
instructions are added, so that the changes in codegen are more obvious
in the next commit.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 1378 ++
 1 file changed, 1378 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

diff --git a/gcc/testsuite/gcc.target/aarch64/cmpbr.c 
b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
new file mode 100644
index 000..728d6ead91c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpbr.c
@@ -0,0 +1,1378 @@
+/* Test that the instructions added by FEAT_CMPBR are emitted */
+/* { dg-do compile } */
+/* { dg-options "-march=armv9.5-a+cmpbr -O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+
+typedef uint8_t u8;
+typedef int8_t i8;
+
+typedef uint16_t u16;
+typedef int16_t i16;
+
+typedef uint32_t u32;
+typedef int32_t i32;
+
+typedef uint64_t u64;
+typedef int64_t i64;
+
+int taken();
+int not_taken();
+
+#define COMPARE(ty, name, op, rhs) 
\
+  int ty##_x0_##name##_##rhs(ty x0, ty x1) {   
\
+return (x0 op rhs) ? taken() : not_taken();
\
+  }
+
+#define COMPARE_ALL(unsigned_ty, signed_ty, rhs)   
\
+  COMPARE(unsigned_ty, eq, ==, rhs);   
\
+  COMPARE(unsigned_ty, ne, !=, rhs);   
\
+   
\
+  COMPARE(unsigned_ty, ult, <, rhs);   
\
+  COMPARE(unsigned_ty, ule, <=, rhs);  
\
+  COMPARE(unsigned_ty, ugt, >, rhs);   
\
+  COMPARE(unsigned_ty, uge, >=, rhs);  
\
+   
\
+  COMPARE(signed_ty, slt, <, rhs); 
\
+  COMPARE(signed_ty, sle, <=, rhs);
\
+  COMPARE(signed_ty, sgt, >, rhs); 
\
+  COMPARE(signed_ty, sge, >=, rhs);
+
+//  CBB (register) 
+COMPARE_ALL(u8, i8, x1);
+
+//  CBH (register) 
+COMPARE_ALL(u16, i16, x1);
+
+//  CB (register) 
+COMPARE_ALL(u32, i32, x1);
+COMPARE_ALL(u64, i64, x1);
+
+//  CB (immediate) 
+COMPARE_ALL(u32, i32, 42);
+COMPARE_ALL(u64, i64, 42);
+
+//  Special cases 
+// CBB and CBH cannot have immediate operands. Instead we have to do a MOV+CB
+COMPARE_ALL(u8, i8, 42);
+COMPARE_ALL(u16, i16, 42);
+
+// 65 is out of the range for immediate operands (0 to 63).
+// * For 8/16-bit types, use a MOV+CB as above.
+// * For 32/64-bit types, use a CMP+B instead, because
+//   B has a longer range than CB.
+COMPARE_ALL(u8, i8, 65);
+COMPARE_ALL(u16, i16, 65);
+COMPARE_ALL(u32, i32, 65);
+COMPARE_ALL(u64, i64, 65);
+
+// Comparisons against zero can use the wzr/xzr register.
+COMPARE_ALL(u8, i8, 0);
+COMPARE_ALL(u16, i16, 0);
+COMPARE_ALL(u32, i32, 0);
+COMPARE_ALL(u64, i64, 0);
+
+/*
+** u8_x0_eq_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L4
+** b   not_taken
+** b   taken
+*/
+
+/*
+** u8_x0_ne_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** beq .L6
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_ult_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bls .L8
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_ule_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bcc .L10
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_ugt_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bcs .L12
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u8_x0_uge_x1:
+** and w1, w1, 255
+** cmp w1, w0, uxtb
+** bhi .L14
+** b   taken
+** b   not_taken
+*/
+
+/*
+** i8_x0_slt_x1:
+** sxtbw1, w1
+** cmp w1, w0, sxtb
+** ble .L16
+** b   taken
+** b   not_taken
+*/
+
+/*
+** i8_x0_sle_x1:
+** sxtbw1, w1
+** cmp w1, w0, sxtb
+** blt .L18
+** b   taken
+** b   not_taken
+*/
+
+/*
+** i8_x0_sgt_x1:
+** sxtbw1, w1
+** cmp w1, w0, sxtb
+** bge .L20
+** b   taken
+** b   not_taken
+*/
+
+/*
+** i8_x0_sge_x1:
+** sxtbw1, w1
+** cmp w1, w0, sxtb
+** bgt .L22
+** b   taken
+** b   not_taken
+*/
+
+/*
+** u16_x0_eq_x1:
+** and w1, w1, 65535
+** cmp w1, w0, uxth
+** beq .L25
+** b   not_taken
+** b   taken
+*/
+
+/*
+** u16_x0_ne_x1:
+** and

[PATCH 8/9] AArch64: rules for CMPBR instructions

2025-05-08 Thread Karl Meakin

Add rules for lowering `cbranch4` to CBB/CBH/CB when
CMPBR extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Mmit CMPBR
instructions if possible.
(BRANCH_LEN_P_1Kib): New constant.
(BRANCH_LEN_N_1Kib): Likewise.
(cbranch4): New expand rule.
(aarch64_cb): Likewise.
(aarch64_cb): Likewise.
* config/aarch64/iterators.md (cmpbr_suffix): New mode attr.
* config/aarch64/predicates.md (const_0_to_63_operand): New
predicate.
(aarch64_cb_immediate): Likewise.
(aarch64_cb_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: update tests.
---
 gcc/config/aarch64/aarch64.md|  87 +++-
 gcc/config/aarch64/iterators.md  |   5 +
 gcc/config/aarch64/predicates.md |  17 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c | 484 ---
 4 files changed, 275 insertions(+), 318 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 248b0e8644f..641c3653a40 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -697,37 +697,60 @@ (define_insn "jump"
 ;; Maximum PC-relative positive/negative displacements for various branching
 ;; instructions.
 (define_constants
   [
 ;; +/- 128MiB.  Used by B, BL.
 (BRANCH_LEN_P_128MiB  134217724)
 (BRANCH_LEN_N_128MiB -134217728)
 
 ;; +/- 1MiB.  Used by B., CBZ, CBNZ.
 (BRANCH_LEN_P_1MiB  1048572)
 (BRANCH_LEN_N_1MiB -1048576)
 
 ;; +/- 32KiB.  Used by TBZ, TBNZ.
 (BRANCH_LEN_P_32KiB  32764)
 (BRANCH_LEN_N_32KiB -32768)
+
+;; +/- 1KiB.  Used by CBB, CBH, CB.
+(BRANCH_LEN_P_1Kib  1020)
+(BRANCH_LEN_N_1Kib -1024)
   ]
 )
 
 ;; ---
 ;; Conditional jumps
 ;; ---
 
-(define_expand "cbranch4"
+(define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], mode))
+{
+  emit_jump_insn (gen_aarch64_cb (operands[0], operands[1],
+   operands[2], operands[3]));
+  DONE;
+}
+  else
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+operands[1], operands[2]);
+  operands[2] = const0_rtx;
+}
+  }
+)
+
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 
"aarch64_cb_short_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  ""
 )
 
 (define_expand "cbranch4"
@@ -747,13 +770,65 @@ (define_expand "cbranch4"
 (define_expand "cbranchcc4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (match_operand 2 "const0_operand")])
   (label_ref (match_operand 3))
   (pc)))]
   ""
   ""
 )
 
+;; Emit a `CB (register)` or `CB (immediate)` instruction.
+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPI 1 "register_operand")
+(match_operand:GPI 2 "aarch64_cb_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb%m0\\t%1, %2, %l3";
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; Emit a `CBB (register)` or `CBH (register)` instruction.
+(define_insn "aarch64_cb"
+  [(set (pc) (if

[PATCH 2/9] AArch64: reformat branch instruction rules

2025-05-08 Thread Karl Meakin

Make the formatting of the RTL templates in the rules for branch
instructions more consistent with each other.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): Reformat.
(cbranchcc4): Likewise.
(condjump): Likewise.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 77 +--
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4d556d886bc..7d0af5bd700 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -705,229 +705,228 @@ (define_insn "jump"
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
-  (label_ref (match_operand 3 "" ""))
+  (label_ref (match_operand 3))
   (pc)))]
   ""
   "
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 operands[2]);
   operands[2] = const0_rtx;
   "
 )
 
 (define_expand "cbranch4"
-  [(set (pc) (if_then_else
-   (match_operator 0 "aarch64_comparison_operator"
-[(match_operand:GPF_F16 1 "register_operand")
- (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
-   (label_ref (match_operand 3 "" ""))
-   (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPF_F16 1 "register_operand")
+(match_operand:GPF_F16 2 
"aarch64_fp_compare_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "
+  {
   operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
 operands[2]);
   operands[2] = const0_rtx;
-  "
+  }
 )
 
 (define_expand "cbranchcc4"
-  [(set (pc) (if_then_else
- (match_operator 0 "aarch64_comparison_operator"
-  [(match_operand 1 "cc_register")
-   (match_operand 2 "const0_operand")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register")
+(match_operand 2 "const0_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   ""
-  "")
+  ""
+)
 
 (define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
-   [(match_operand 1 "cc_register" "") (const_int 0)])
-  (label_ref (match_operand 2 "" ""))
+   [(match_operand 1 "cc_register")
+(const_int 0)])
+  (label_ref (match_operand 2))
   (pc)))]
   ""
   {
 /* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
but the "." is required for SVE conditions.  */
 bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 2, "Lbcond",
 use_dot_p ? "b.%M0\\t" : "b%M0\\t");
 else
   return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
   (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
  (const_int 0)
  (const_int 1)))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
 ;; and branch sequence from:
 ;; mov x0, #imm1
 ;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
 ;; cmp x1, x0
 ;; b .Label
 ;; into the shorter:
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
 (define_insn_and_split "*compare_condjump"
-  [(set (pc) (if_then_else (EQL
- (match_operand:GPI 0 "register_operand" "r")
- (match_operand:GPI 1 "aarch64_imm24" "n"))
-  (label_ref:P (match_operand 2 "" ""))
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+

[PATCH 1/9] AArch64: place branch instruction rules together

2025-05-08 Thread Karl Meakin

The rules for conditional branches were spread throughout `aarch64.md`.
Group them together so it is easier to understand how `cbranch4`
is lowered to RTL.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Move.
(*compare_condjump): Likewise.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(tbranch_3): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 387 ++
 1 file changed, 201 insertions(+), 186 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c678f7afb1a..4d556d886bc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -674,6 +674,10 @@ (define_insn "aarch64_write_sysregti"
  "msrr\t%x0, %x1, %H1"
 )
 
+;; ---
+;; Unconditional jumps
+;; ---
+
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
@@ -692,6 +696,12 @@ (define_insn "jump"
   [(set_attr "type" "branch")]
 )
 
+
+
+;; ---
+;; Conditional jumps
+;; ---
+
 (define_expand "cbranch4"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
@@ -731,6 +741,197 @@ (define_expand "cbranchcc4"
   ""
   "")
 
+(define_insn "condjump"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand 1 "cc_register" "") (const_int 0)])
+  (label_ref (match_operand 2 "" ""))
+  (pc)))]
+  ""
+  {
+/* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
+   but the "." is required for SVE conditions.  */
+bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond",
+use_dot_p ? "b.%M0\\t" : "b%M0\\t");
+else
+  return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+ (const_int 0)
+ (const_int 1)))]
+)
+
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov x0, #imm1
+;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp x1, x0
+;; b .Label
+;; into the shorter:
+;; sub x0, x1, #(CST & 0xfff000)
+;; subsx0, x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+ (match_operand:GPI 0 "register_operand" "r")
+ (match_operand:GPI 1 "aarch64_imm24" "n"))
+  (label_ref:P (match_operand 2 "" ""))
+  (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode,
+ cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
+(define_insn "aarch64_cb1"
+  [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
+   (const_int 0))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))]
+  "!aarch64_track_speculation"
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
+else
+  return "\\t%0, %l1";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
+  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minu

[PATCH 5/9] AArch64: make `far_branch` attribute a boolean

2025-05-08 Thread Karl Meakin

The `far_branch` attribute only ever takes the values 0 or 1, so make it
a `no/yes` valued string attribute instead.

gcc/ChangeLog:

* config/aarch64/aarch64.md (far_branch): Replace 0/1 with
no/yes.
(aarch64_bcond): Handle rename.
(aarch64_cb1): Likewise.
(*cb1): Likewise.
(@aarch64_tb): Likewise.
---
 gcc/config/aarch64/aarch64.md | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bba3d1c505d..248b0e8644f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -554,16 +554,14 @@ (define_attr "mode_enabled" "false,true"
 ;; Attribute that controls whether an alternative is enabled or not.
 (define_attr "enabled" "no,yes"
   (if_then_else (and (eq_attr "arch_enabled" "yes")
 (eq_attr "mode_enabled" "true"))
(const_string "yes")
(const_string "no")))
 
 ;; Attribute that specifies whether we are dealing with a branch to a
 ;; label that is far away, i.e. further away than the maximum/minimum
 ;; representable in a signed 21-bits number.
-;; 0 :=: no
-;; 1 :=: yes
-(define_attr "far_branch" "" (const_int 0))
+(define_attr "far_branch" "no,yes" (const_string "no"))
 
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
@@ -759,45 +757,45 @@ (define_expand "cbranchcc4"
 ;; Emit `B`, assuming that the condition is already in the CC register.
 (define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
   (label_ref (match_operand 2))
   (pc)))]
   ""
   {
 /* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
but the "." is required for SVE conditions.  */
 bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 2, "Lbcond",
 use_dot_p ? "b.%M0\\t" : "b%M0\\t");
 else
   return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
(if_then_else (and (ge (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
 ;; and branch sequence from:
 ;; mov x0, #imm1
 ;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
 ;; cmp x1, x0
 ;; b .Label
 ;; into the shorter:
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
@@ -829,77 +827,77 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
 ;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
 (define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
 else
   return "\\t%0, %l1";
   }
   [(set_attr "type" "branch")
(set (attr "length")
(if_then_else (and (ge (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 1) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
(if_then_else (and (ge (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_N_1MiB))
   (lt (minus (match_dup 2) (pc))
   (const_int BRANCH_LEN_P_1MiB)))
- (const_int 0)
- (const_int 1)))]
+ (const_string "no")
+ (const_string "yes")))]
 )
 
 ;; For an LT/GE comparison against zero, emit `TBZ`/`TBNZ`
 (define_insn "*aarch64_tbz1"
   [(set (pc) (if_then_else (LTGE (match_operand:ALLI 0 "regi

[PATCH 3/9] AArch64: rename branch instruction rules

2025-05-08 Thread Karl Meakin

Give the `define_insn` rules used in lowering `cbranch4` to RTL
more descriptive and consistent names: from now on, each rule is named
after the AArch64 instruction that it generates. Also add comments to
document each rule.

gcc/ChangeLog:

* config/aarch64/aarch64.md (condjump): Rename to ...
(aarch64_bcond): ...here.
(*compare_condjump): Rename to ...
(*aarch64_bcond_wide_imm): ...here.
(restore_stack_nonlocal): Handle rename.
(stack_protect_combined_test): Likewise.
* config/aarch64/aarch64-simd.md (cbranch4): Likewise.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
---
 gcc/config/aarch64/aarch64-simd.md |  2 +-
 gcc/config/aarch64/aarch64-sme.md  |  2 +-
 gcc/config/aarch64/aarch64.cc  |  4 ++--
 gcc/config/aarch64/aarch64.md  | 21 -
 4 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e2afe87e513..197a5f65f34 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3913,41 +3913,41 @@ (define_expand "vcond_mask_"
 (define_expand "cbranch4"
   [(set (pc)
 (if_then_else
   (match_operator 0 "aarch64_equality_operator"
 [(match_operand:VDQ_I 1 "register_operand")
  (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
   (label_ref (match_operand 3 ""))
   (pc)))]
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
   rtx tmp = operands[1];
 
   /* If comparing against a non-zero vector we have to do a comparison first
  so we can have a != 0 comparison with the result.  */
   if (operands[2] != CONST0_RTX (mode))
 {
   tmp = gen_reg_rtx (mode);
   emit_insn (gen_xor3 (tmp, operands[1], operands[2]));
 }
 
   /* For 64-bit vectors we need no reductions.  */
   if (known_eq (128, GET_MODE_BITSIZE (mode)))
 {
   /* Always reduce using a V4SI.  */
   rtx reduc = gen_lowpart (V4SImode, tmp);
   rtx res = gen_reg_rtx (V4SImode);
   emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
   emit_move_insn (tmp, gen_lowpart (mode, res));
 }
 
   rtx val = gen_reg_rtx (DImode);
   emit_move_insn (val, gen_lowpart (DImode, tmp));
 
   rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
   rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  emit_jump_insn (gen_aarch64_bcond (cmp_rtx, cc_reg, operands[3]));
   DONE;
 })
 
 ;; Patterns comparing two vectors to produce a mask.
diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index c49affd0dd3..4e4ac71c5a3 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -366,42 +366,42 @@ (define_insn "aarch64_tpidr2_restore"
 ;; Check whether a lazy save set up by aarch64_save_za was committed
 ;; and restore the saved contents if so.
 ;;
 ;; Operand 0 is the address of the current function's TPIDR2 block.
 (define_insn_and_split "aarch64_restore_za"
   [(set (reg:DI ZA_SAVED_REGNUM)
(unspec:DI [(match_operand 0 "pmode_register_operand" "r")
(reg:DI SME_STATE_REGNUM)
(reg:DI TPIDR2_SETUP_REGNUM)
(reg:DI ZA_SAVED_REGNUM)] UNSPEC_RESTORE_ZA))
(clobber (reg:DI R0_REGNUM))
(clobber (reg:DI R14_REGNUM))
(clobber (reg:DI R15_REGNUM))
(clobber (reg:DI R16_REGNUM))
(clobber (reg:DI R17_REGNUM))
(clobber (reg:DI R18_REGNUM))
(clobber (reg:DI R30_REGNUM))
(clobber (reg:CC CC_REGNUM))]
   ""
   "#"
   "&& epilogue_completed"
   [(const_int 0)]
   {
 auto label = gen_label_rtx ();
 auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM);
 emit_insn (gen_aarch64_read_tpidr2 (tpidr2));
-auto jump = emit_likely_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label));
+auto jump = emit_likely_jump_insn (gen_aarch64_cbznedi1 (tpidr2, label));
 JUMP_LABEL (jump) = label;
 
 aarch64_restore_za (operands[0]);
 emit_label (label);
 DONE;
   }
 )
 
 ;; This instruction is emitted after asms that alter ZA, in order to model
 ;; the effect on dataflow.  The asm itself can't have ZA as an input or
 ;; an output, since there is no associated data type.  Instead it retains
 ;; the original "za" clobber, which on its own would indicate that ZA
 ;; is dead.
 ;;
 ;; The operand is a unique identifier.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index fff8d9da49d..b5ac6d3f37e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2872,44 +2872,44 @@ static rtx
 aarch64_gen_test_and_branch (rtx_code code, rtx x, int bitnum,
 rtx_code_label *label)
 {
   auto mode = GET_MODE (x);
   if (aarch64_track_speculation)
 {
   auto

[PATCH 4/9] AArch64: add constants for branch displacements

2025-05-08 Thread Karl Meakin

Extract the hardcoded values for the minimum PC-relative displacements
into named constants and document them.

gcc/ChangeLog:

* config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
(BRANCH_LEN_N_128MiB): Likewise.
(BRANCH_LEN_P_1MiB): Likewise.
(BRANCH_LEN_N_1MiB): Likewise.
(BRANCH_LEN_P_32KiB): Likewise.
(BRANCH_LEN_N_32KiB): Likewise.
---
 gcc/config/aarch64/aarch64.md | 64 ++-
 1 file changed, 48 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1b1e982d466..bba3d1c505d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -692,12 +692,28 @@ (define_insn "indirect_jump"
 (define_insn "jump"
   [(set (pc) (label_ref (match_operand 0 "" "")))]
   ""
   "b\\t%l0"
   [(set_attr "type" "branch")]
 )
 
+;; Maximum PC-relative positive/negative displacements for various branching
+;; instructions.
+(define_constants
+  [
+;; +/- 128MiB.  Used by B, BL.
+(BRANCH_LEN_P_128MiB  134217724)
+(BRANCH_LEN_N_128MiB -134217728)
+
+;; +/- 1MiB.  Used by B., CBZ, CBNZ.
+(BRANCH_LEN_P_1MiB  1048572)
+(BRANCH_LEN_N_1MiB -1048576)
 
+;; +/- 32KiB.  Used by TBZ, TBNZ.
+(BRANCH_LEN_P_32KiB  32764)
+(BRANCH_LEN_N_32KiB -32768)
+  ]
+)
 
 ;; ---
 ;; Conditional jumps
 ;; ---
@@ -743,41 +759,45 @@ (define_expand "cbranchcc4"
 ;; Emit `B`, assuming that the condition is already in the CC register.
 (define_insn "aarch64_bcond"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand 1 "cc_register")
 (const_int 0)])
   (label_ref (match_operand 2))
   (pc)))]
   ""
   {
 /* GCC's traditional style has been to use "beq" instead of "b.eq", etc.,
but the "." is required for SVE conditions.  */
 bool use_dot_p = GET_MODE (operands[1]) == CC_NZCmode;
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 2, "Lbcond",
 use_dot_p ? "b.%M0\\t" : "b%M0\\t");
 else
   return use_dot_p ? "b.%m0\\t%l2" : "b%m0\\t%l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 2) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 0)
  (const_int 1)))]
 )
 
 ;; For a 24-bit immediate CST we can optimize the compare for equality
 ;; and branch sequence from:
 ;; mov x0, #imm1
 ;; movkx0, #imm2, lsl 16 /* x0 contains CST.  */
 ;; cmp x1, x0
 ;; b .Label
 ;; into the shorter:
 ;; sub x0, x1, #(CST & 0xfff000)
 ;; subsx0, x0, #(CST & 0x000fff)
 ;; b .Label
@@ -809,69 +829,77 @@ (define_insn_and_split "*aarch64_bcond_wide_imm"
 ;; For an EQ/NE comparison against zero, emit `CBZ`/`CBNZ`
 (define_insn "aarch64_cbz1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
(const_int 0))
   (label_ref (match_operand 1))
   (pc)))]
   "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
 else
   return "\\t%0, %l1";
   }
   [(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int -1048576))
-  (lt (minus (match_dup 1) (pc)) (const_int 1048572)))
+   (if_then_else (and (ge (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_N_1MiB))
+  (lt (minus (match_dup 1) (pc))
+  (const_int BRANCH_LEN_P_1MiB)))
  (const_int 4)
  (const_int 8)))
(set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576))
-

[PATCH 6/9] AArch64: recognize `+cmpbr` option

2025-05-08 Thread Karl Meakin

Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cmpbr): New
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): New macro.
* doc/invoke.texi (cmpbr): New option.
---
 gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
 gcc/config/aarch64/aarch64.h | 3 +++
 gcc/doc/invoke.texi  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index dbbb021f05a..1c3e69799f5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -249,6 +249,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("cmpbr", CMPBR, (), (), (), "cmpbr")
+
 AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")
 
 AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e8bd8c73c12..d5c4a42e96d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -202,326 +202,329 @@ constexpr auto AARCH64_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
   = AARCH64_ISA_MODE_SM_OFF;
 constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
   = aarch64_feature_flags (AARCH64_DEFAULT_ISA_MODE);
 
 #endif
 
 /* Macros to test ISA flags.
 
There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
is not always set when its constituent features are present.
Check (TARGET_AES && TARGET_SHA2) instead.  */
 
 #define AARCH64_HAVE_ISA(X) (bool (aarch64_isa_flags & AARCH64_FL_##X))
 
 #define AARCH64_ISA_MODE((aarch64_isa_flags & AARCH64_FL_ISA_MODES).val[0])
 
 /* The current function is a normal non-streaming function.  */
 #define TARGET_NON_STREAMING AARCH64_HAVE_ISA (SM_OFF)
 
 /* The current function has a streaming body.  */
 #define TARGET_STREAMING AARCH64_HAVE_ISA (SM_ON)
 
 /* The current function has a streaming-compatible body.  */
 #define TARGET_STREAMING_COMPATIBLE \
   ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0)
 
 /* PSTATE.ZA is enabled in the current function body.  */
 #define TARGET_ZA AARCH64_HAVE_ISA (ZA_ON)
 
 /* AdvSIMD is supported in the default configuration, unless disabled by
-mgeneral-regs-only or by the +nosimd extension.  The set of available
instructions is then subdivided into:
 
- the "base" set, available both in SME streaming mode and in
  non-streaming mode
 
- the full set, available only in non-streaming mode.  */
 #define TARGET_BASE_SIMD AARCH64_HAVE_ISA (SIMD)
 #define TARGET_SIMD (TARGET_BASE_SIMD && TARGET_NON_STREAMING)
 #define TARGET_FLOAT AARCH64_HAVE_ISA (FP)
 
 /* AARCH64_FL options necessary for system register implementation.  */
 
 /* Define AARCH64_FL aliases for architectural features which are protected
by -march flags in binutils but which receive no special treatment by GCC.
 
Such flags are inherited from the Binutils definition of system registers
and are mapped to the architecture in which the feature is implemented.  */
 #define AARCH64_FL_RASAARCH64_FL_V8A
 #define AARCH64_FL_LORAARCH64_FL_V8_1A
 #define AARCH64_FL_PANAARCH64_FL_V8_1A
 #define AARCH64_FL_AMUAARCH64_FL_V8_4A
 #define AARCH64_FL_SCXTNUMAARCH64_FL_V8_5A
 #define AARCH64_FL_ID_PFR2AARCH64_FL_V8_5A
 
 /* Armv8.9-A extension feature bits defined in Binutils but absent from GCC,
aliased to their base architecture.  */
 #define AARCH64_FL_AIEAARCH64_FL_V8_9A
 #define AARCH64_FL_DEBUGv8p9  AARCH64_FL_V8_9A
 #define AARCH64_FL_FGT2   AARCH64_FL_V8_9A
 #define AARCH64_FL_ITEAARCH64_FL_V8_9A
 #define AARCH64_FL_PFAR   AARCH64_FL_V8_9A
 #define AARCH64_FL_PMUv3_ICNTRAARCH64_FL_V8_9A
 #define AARCH64_FL_PMUv3_SS   AARCH64_FL_V8_9A
 #define AARCH64_FL_PMUv3p9AARCH64_FL_V8_9A
 #define AARCH64_FL_RASv2  AARCH64_FL_V8_9A
 #define AARCH64_FL_S1PIE  AARCH64_FL_V8_9A
 #define AARCH64_FL_S1POE  AARCH64_FL_V8_9A
 #define AARCH64_FL_S2PIE  AARCH64_FL_V8_9A
 #define AARCH64_FL_S2POE  AARCH64_FL_V8_9A
 #define AARCH64_FL_SCTLR2 AARCH64_FL_V8_9A
 #define AARCH64_FL_SEBEP  AARCH64_FL_V8_9A
 #define AARCH64_FL_SPE_FDSAARCH64_FL_V8_9A
 #define AARCH64_FL_TCR2   AARCH64_FL_V8_9A
 
 #define TARGET_V8R AARCH64_HAVE_ISA (V8R)
 #define TARGET_V9A AARCH64_HAVE_ISA (V9A)
 
 
 /* SHA2 is an optional extension to AdvSIMD.  */
 #define TARGET_SHA2 AARCH64_HAVE_ISA (SHA2)
 
 /* SHA3 is an optional extension to AdvSIMD.  */
 #define TARGET_SHA3 AARCH64_HAVE_ISA (SHA3)
 
 /* AES is an optional extension to AdvSIMD.  */
 #define TARGET_AES AARCH64_HAVE_ISA (AES)
 
 /* SM is an optiona

Re: [PATCH v2] RISC-V: Fix missing implied Zicsr from Zve32x

2025-05-08 Thread Nelson Chu

I think this should be sent to gcc-patches@gcc.gnu.org rather than
binut...@sourceware.org, so redirect it to the right place.

Nelson

On Wed, Apr 30, 2025 at 10:30 AM Jerry Zhang Jian <
jerry.zhangj...@sifive.com> wrote:

> The Zve32x extension depends on the Zicsr extension.
> Currently, enabling Zve32x alone does not automatically imply Zicsr in GCC.
>
> gcc/ChangeLog:
> * common/config/riscv/riscv-common.cc: Add Zve32x depends on Zicsr
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/predef-19.c: set the march to rv64i_zve32x
>   instead of rv64gc_zve32x to avoid Zicsr implied by g, add -c to
>   avoid multilib not supported in the test time
>
> Signed-off-by: Jerry Zhang Jian 
> ---
>  gcc/common/config/riscv/riscv-common.cc|  1 +
>  gcc/testsuite/gcc.target/riscv/predef-19.c | 34 ++
>  2 files changed, 4 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc
> b/gcc/common/config/riscv/riscv-common.cc
> index 15df22d5377..145a0f2bd95 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -137,6 +137,7 @@ static const riscv_implied_info_t riscv_implied_info[]
> =
>{"zve64f", "f"},
>{"zve64d", "d"},
>
> +  {"zve32x", "zicsr"},
>{"zve32x", "zvl32b"},
>{"zve32f", "zve32x"},
>{"zve32f", "zvl32b"},
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c
> b/gcc/testsuite/gcc.target/riscv/predef-19.c
> index 2b90702192b..d1d44fec577 100644
> --- a/gcc/testsuite/gcc.target/riscv/predef-19.c
> +++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -march=rv64gc_zve32x -mabi=lp64d -mcmodel=medlow
> -misa-spec=2.2" } */
> +/* { dg-options "-O2 -march=rv64i_zve32x -mabi=lp64 -c -mcmodel=medlow
> -misa-spec=2.2" } */
>
>  int main () {
>
> @@ -15,40 +15,12 @@ int main () {
>  #error "__riscv_i"
>  #endif
>
> -#if !defined(__riscv_c)
> -#error "__riscv_c"
> -#endif
> -
>  #if defined(__riscv_e)
>  #error "__riscv_e"
>  #endif
>
> -#if !defined(__riscv_a)
> -#error "__riscv_a"
> -#endif
> -
> -#if !defined(__riscv_m)
> -#error "__riscv_m"
> -#endif
> -
> -#if !defined(__riscv_f)
> -#error "__riscv_f"
> -#endif
> -
> -#if !defined(__riscv_d)
> -#error "__riscv_d"
> -#endif
> -
> -#if defined(__riscv_v)
> -#error "__riscv_v"
> -#endif
> -
> -#if defined(__riscv_zvl128b)
> -#error "__riscv_zvl128b"
> -#endif
> -
> -#if defined(__riscv_zvl64b)
> -#error "__riscv_zvl64b"
> +#if !defined(__riscv_zicsr)
> +#error "__riscv_zicsr"
>  #endif
>
>  #if !defined(__riscv_zvl32b)
> --
> 2.49.0
>
>

[AUTOFDO] Fix annotated profile for de-duplicated call

2025-05-08 Thread Kugan Vivekanandarajah

This patch fixes wrong annotation of profiles when call statement is
de-duplicated. i.e., when we may have same stmt executing from
more than one path (by jumping to same statment). Thus, the
profile we get will be for multiple paths and would make the annotated
profile wrong. As a fix, we dont annotate profile for GIMPLE_CALL stmt
and extract BB counts from edge counts.

Regression tested on aarch64-linux-gnu with no new regression. 
Also successfully  done autoprofiledbootstrap with the relevant patch.

Is this OK for trunk?
Thanks,
Kugan



0001-AUTOFDO-Fix-annotated-profile-for-de-duplicated-call.patch
Description: 0001-AUTOFDO-Fix-annotated-profile-for-de-duplicated-call.patch

[AUTOFDO] Merge profiles of clones before annotating

2025-05-08 Thread Kugan Vivekanandarajah

This patch add support for merging profiles from multiple clones.
That is, when optimized binaries have clones such as IPA-CP clone or SRA
clones, genarted gcov will have profiled them spereately.
Currently we pick one and ignore the rest. This patch fixes this by
merging the profiles.


Regression tested on aarch64-linux-gnu with no new regression.
Also successfully  done autoprofiledbootstrap with the relevant patch.

Is this OK for trunk?
Thanks,
Kugan



0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch
Description: 0002-AUTOFDO-Merge-profiles-of-clones-before-annotating.patch

[to-be-committed][V2][RISC-V] Synthesize more efficient IOR/XOR sequences

2025-05-08 Thread Jeff Law

Bah!  I hand-edited the patch to fix some missing HOST_WIDE_INT_UC 
macros I saw and botched it.  While I was at it, I fixed various lint 
issues.  No functional changes though.


--

So mvconst_internal's primary benefit is in constant synthesis not 
impacting the combine budget in terms of the number of instructions it 
is willing to combine together at any given time.  The downside is 
mvconst_internal breaks combine's toplevel costing model and as a result 
many other patterns have to be implemented as define_insn_and_splits 
rather than the often more natural define_splits.


This primarily impacts logical operations where we want to see the 
constant operand and potentially simplify the logical with other nearby 
logicals or shifts.


We can reduce our reliance on mvconst_internal and generate better code 
for various cases by generating better initial code for logical operations.


So let's assume we have a inclusive-or of a register with a nontrivial 
constant.  Right now we will load the nontrivial constant into a new 
pseudo (using multiple instructions), then emit a two register source 
ior operation.


For some cases we can just generate the code we want at expansion time. 
Concretely let's take this testcase:



> unsigned long foo(unsigned long src) { return src | 0x8807; }

Right now we generate this code:

> li  a5,-15
> sllia5,a5,59
> addia5,a5,7
> or  a0,a0,a5

The first three instructions are synthesizing the constant.  The last 
instruction performs the desired operation.  But we can do better:


> ori a0,a0,7
> bseti   a0,a0,59
> bseti   a0,a0,63

Notice how we never even bother to synthesize the constant.

IOR/XOR are pretty simple and this patch focuses exclusively on those. 
We use [x]ori to set whatever low 11 bits we need, then bset/binv for a 
small number of higher bits.  We use the cost of constant synthesis as 
our budget.


We also support a couple special cases.  First, we might be able to 
rotate the source value such that all the bits we want to manipulate are 
in the low 11 bits.  So we rotate the source, manipulate the bits, then 
rotate things back to where they belong.  I didn't see this trigger in 
spec, but I did trivially find a testcase where it was likely faster.


Second, we can have cases where we want to invert most of the bits, but 
a small number are supposed to be preserved.  We can pre-flip the bits 
we want to preserve with binv, then invert the whole register with not 
(which puts the bits to be preserved back in their original state).


I suspect there are likely a few more cases that could be improved, but 
the patch should stand on its own now and getting it out of the way 
allows us to focus on logical AND which is far tougher, but also more 
important in the task of removing mvconst_internal.


As we're not removing mvconst_internal yet, this patch is mostly a nop. 
I did look at spec before/after and didn't see anything particular 
interesting.  I also temporarily removed mvconst_internal and looked at 
spec before/after to hopefully ensure we weren't missing anything 
obvious in the XOR/IOR cases.  Obviously that latter test showed all 
kinds of regressions with AND.


We're still working through implementation details on the AND case and 
determining what bridge patterns we're going to need to ensure we don't 
regress.   But this XOR/IOR patch is in good enough shape that it can go 
forward now.



Naturally this has been run through my tester (bootstrap & regression 
test is in flight, but won't finish for many more hours).  Obviously I'm 
quite interested in anything spit out by the pre-commit CI system.



Jeff
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 214c20ba7b8..584b345f02c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -262,6 +262,9 @@ (define_code_iterator fix_ops [fix unsigned_fix])
 
 (define_code_attr fix_uns [(fix "fix") (unsigned_fix "fixuns")])
 
+(define_code_attr OPTAB [(ior "IOR")
+ (xor "XOR")])
+
 
 ;; ---
 ;; Code Attributes
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index c9a638cd103..23690792b32 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -380,14 +380,6 @@ (define_predicate "single_bit_mask_operand"
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (UINTVAL (op))")))
 
-;; Register, small constant or single bit constant for use in
-;; bseti/binvi.
-(define_predicate "arith_or_zbs_operand"
-  (ior (match_operand 0 "const_arith_operand")
-   (match_operand 0 "register_operand")
-   (and (match_test "TARGET_ZBS")
-   (match_operand 0 "single_bit_mask_operand"
-
 (define_predicate "not_single_bit_mask_operand"
   (and (match_code "const_int")
(match_test "S

PR target/108958 -- use mtvsrdd to zero extend GPR DImode to VSX TImode

2025-05-08 Thread Michael Meissner

This is an old patch that has been submitted off and on, and I'm resubmitting
it again.

Previously GCC would zero externd a DImode GPR value to TImode by first zero
extending the DImode value into a GPR TImode value, and then do a MTVSRDD to
move this value to a VSX register.

This patch does the move directly, since if the middle argument to MTVSRDD is 0,
it does the zero extend.

If the DImode value is already in a vector register, it does a XXSPLTIB and
XXPERMDI to get the value into the bottom 64-bits of the register.

I have built GCC with the patches in this patch set applied on both little and
big endian PowerPC systems and there were no regressions.  Can I apply this
patch to GCC 15?

2025-04-30  Michael Meissner  

gcc/

PR target/108598
* gcc/config/rs6000/rs6000.md (zero_extendditi2): New insn.

gcc/testsuite/

PR target/108598
* gcc.target/powerpc/pr108958.c: New test.
---
 gcc/config/rs6000/rs6000.md | 46 +
 gcc/testsuite/gcc.target/powerpc/pr108958.c | 27 
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108958.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 02c31b576b6..4c9e2dc6390 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1026,6 +1026,52 @@ (define_insn_and_split "*zero_extendsi2_dot2"
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
+(define_insn_and_split "zero_extendditi2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=r,wa,&wa")
+   (zero_extend:TI
+(match_operand:DI 1 "gpc_reg_operand" "rwa,r,wa")))]
+  "TARGET_P9_VECTOR && TARGET_POWERPC64"
+  "@
+  #
+  mtvsrdd %x0,0,%1
+  #"
+  "&& reload_completed
+   && (int_reg_operand (operands[0], TImode)
+   || vsx_register_operand (operands[1], DImode))"
+  [(set (match_dup 2)
+   (match_dup 3))
+   (set (match_dup 4)
+   (match_dup 5))]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  int r = reg_or_subregno (op0);
+
+  if (int_reg_operand (op0, TImode))
+{
+  int lo = BYTES_BIG_ENDIAN ? 1 : 0;
+  int hi = 1 - lo;
+
+  operands[2] = gen_rtx_REG (DImode, r + lo);
+  operands[3] = op1;
+  operands[4] = gen_rtx_REG (DImode, r + hi);
+  operands[5] = const0_rtx;
+}
+  else
+{
+  rtx op0_di = gen_rtx_REG (DImode, r);
+  rtx op0_v2di = gen_rtx_REG (V2DImode, r);
+  rtx lo = WORDS_BIG_ENDIAN ? op1 : op0_di;
+  rtx hi = WORDS_BIG_ENDIAN ? op0_di : op1;
+
+  operands[2] = op0_v2di;
+  operands[3] = CONST0_RTX (V2DImode);
+  operands[4] = op0_v2di;
+  operands[5] = gen_rtx_VEC_CONCAT (V2DImode, hi, lo);
+}
+}
+  [(set_attr "type" "*,mtvsr,vecperm")
+   (set_attr "length" "8,*,8")])
 
 (define_insn "extendqi2"
   [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*v")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108958.c 
b/gcc/testsuite/gcc.target/powerpc/pr108958.c
new file mode 100644
index 000..03eb58d069e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108958.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* PR target/108958, use mtvsrdd to zero extend gpr to vsx register.  */
+
+void
+gpr_to_vsx (unsigned long long x, __uint128_t *p)
+{
+  /* mtvsrdd vsx,0,gpr.  */
+  __uint128_t y = x;
+  __asm__ (" # %x0" : "+wa" (y));
+  *p = y;
+}
+
+void
+gpr_to_gpr (unsigned long long x, __uint128_t *p)
+{
+  /* mr and li.  */
+  __uint128_t y = x;
+  __asm__ (" # %0" : "+r" (y));
+  *p = y;
+}
+
+/* { dg-final { scan-assembler-times {\mli\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mmtvsrdd .*,0,.*\M} 1 } } */
-- 
2.49.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations

2025-05-08 Thread Michael Meissner

This patch was previous submitted during the GCC 15 time frame.

The multibuff.c benchmark attached to the PR target/117251 compiled for Power10
PowerPC that implement SHA3 has a slowdown in the current trunk and GCC 14
compared to GCC 11 - GCC 13, due to excessive amounts of spilling.

The main function for the multibuf.c file has 3,747 lines, all of which are
using vector unsigned long long.  There are 696 vector rotates (all rotates are
constant), 1,824 vector xor's and 600 vector andc's.

In looking at it, the main thing that steps out is the reason for either
spilling or moving variables is the support in fusion.md (generated by
genfusion.pl) that tries to fuse the vec_andc feeding into vec_xor, and other
vec_xor's feeding into vec_xor.

On the powerpc for power10, there is a special fusion mode that happens if the
machine has a VANDC or VXOR instruction that is adjacent to a VXOR instruction
and the VANDC/VXOR feeds into the 2nd VXOR instruction.

While the Power10 has 64 vector registers (which uses the XXL prefix to do
logical operations), the fusion only works with the older Altivec instruction
set (which uses the V prefix).  The Altivec instruction only has 32 vector
registers (which are overlaid over the VSX vector registers 32-63).

By having the combiner patterns fuse_vandc_vxor and fuse_vxor_vxor to do this
fusion, it means that the register allocator has more register pressure for the
traditional Altivec registers instead of the VSX registers.

In addition, since there are vector rotates, these rotates only work on the
traditional Altivec registers, which adds to the Altivec register pressure.

Finally in addition to doing the explicit xor, andc, and rotates using the
Altivec registers, we have to also load vector constants for the rotate amount
and these registers also are allocated as Altivec registers.

Current trunk and GCC 12-14 have more vector spills than GCC 11, but GCC 11 has
many more vector moves that the later compilers.  Thus even though it has way
less spills, the vector moves are why GCC 11 have the slowest results.

There is an instruction that was added in power10 (XXEVAL) that does provide
fusion between VSX vectors that includes ANDC->XOR and XOR->XOR fusion.

The latency of XXEVAL is slightly more than the fused VANDC/VXOR or VXOR/VXOR,
so I have written the patch to prefer doing the Altivec instructions if they
don't need a temporary register.

Here are the results for adding support for XXEVAL for the multibuff.c
benchmark attached to the PR.  Note that we essentially recover the speed with
this patch that were lost with GCC 14 and the current trunk:

  XXEVALTrunk   GCC14   GCC13   GCC12GCC11
  ---   -   -   --
Benchmark time in seconds   5.53 6.156.265.575.61 9.56

Fuse VANDC -> VXOR   209 600  600 600 600  600
Fuse VXOR -> VXOR  0 240  240 120 120  120
XXEVAL to fuse ANDC -> XOR   391   00   0   00
XXEVAL to fuse XOR -> XOR240   00   0   00

Spill vector to stack 78 364  364 172 184  110
Load spilled vector from stack   431 962  962 713 723  166
Vector moves  10 100  100  70  723,055

Vector rotate right  696 696  696 696 696  696
XXLANDC or VANDC 209 600  600 600 600  600
XXLXOR or VXOR   953   1,8241,824   1,824   1,8241,825
XXEVAL   631   00   0   00

Load vector rotate constants  24  24   24  24  24   24

Here are the results for adding support for XXEVAL for the singlebuff.c
benchmark attached to the PR.  Note that adding XXEVAL greatly speeds up this
particular benchmark:

  XXEVALTrunk   GCC14   GCC13   GCC12GCC11
  ---   -   -   --
Benchmark time in seconds   4.46 5.405.405.355.36 7.54

Fuse VANDC -> VXOR   210  600 600 600 600  600
Fuse VXOR -> VXOR  0  240 240 120 120  120
XXEVAL to fuse ANDC -> XOR   3900   0   0  0 0
XXEVAL to fuse XOR -> XOR2400   0   0  0 0

Spill vector to stack113  379 379 38238263
Load spilled vector from stack   333  796 796 75775768
Vector moves  34   80  80 119119 2,409

Vector rotate right  696  696 696 696696   696
XXLANDC or VANDC 210  600 600 600600   600
XXLXOR or VXOR   9541,824   1,

[pushed] c++: adjust PR99599/CWG2369 workaround

2025-05-08 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

This tweak to CWG2369 has gotten more discussion lately in CWG, including in
P3606.  In those discussions, it occurred to me that having the check depend
on whether a class has been instantiated yet is unstable, that it should
only check for user-defined conversions.

Also, one commenter was surprised that adding an explicitly-declared default
constructor to a class changed things, so this patch also changes the
aggregate check to more narrowly checking for one-argument constructors
other than the copy/move constructors.

As a result, this early filter resembles how LOOKUP_DEFAULTED rejects any
candidate that would need a UDC: in both cases we want to avoid considering
arbitrary UDCs.  But here, rather than rejecting, we want the early filter
to let the candidate past without considering the conversion.

PR c++/99599

gcc/cp/ChangeLog:

* cp-tree.h (type_has_converting_constructor): Declare.
* class.cc (type_has_converting_constructor): New.
* pt.cc (conversion_may_instantiate_p): Don't check completeness.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-recursive-sat4.C: Adjust again.
* g++.dg/cpp2a/concepts-nondep5.C: New test.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/class.cc   | 41 +++
 gcc/cp/pt.cc  | 40 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-nondep5.C | 34 +++
 .../g++.dg/cpp2a/concepts-recursive-sat4.C|  2 +-
 5 files changed, 88 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep5.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index a42c07a330b..175ab287490 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7056,6 +7056,7 @@ extern tree in_class_defaulted_default_constructor (tree);
 extern bool user_provided_p(tree);
 extern bool type_has_user_provided_constructor  (tree);
 extern bool type_has_non_user_provided_default_constructor (tree);
+extern bool type_has_converting_constructor(tree);
 extern bool vbase_has_user_provided_move_assign (tree);
 extern tree default_init_uninitialized_part (tree);
 extern bool trivial_default_constructor_is_constexpr (tree);
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 6767ac10358..370bfa35f9e 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -5724,6 +5724,47 @@ type_has_user_provided_constructor (tree t)
   return false;
 }
 
+/* Returns true iff class T has a constructor that accepts a single argument
+   and does not have a single parameter of type reference to T.
+
+   This does not exclude explicit constructors because they are still
+   considered for conversions within { } even though choosing one is
+   ill-formed.  */
+
+bool
+type_has_converting_constructor (tree t)
+{
+  if (!CLASS_TYPE_P (t))
+return false;
+
+  if (!TYPE_HAS_USER_CONSTRUCTOR (t))
+return false;
+
+  for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t)); iter; ++iter)
+{
+  tree fn = *iter;
+  tree parm = FUNCTION_FIRST_USER_PARMTYPE (fn);
+  if (parm == void_list_node
+ || !sufficient_parms_p (TREE_CHAIN (parm)))
+   /* Can't accept a single argument, so won't be considered for
+  conversion.  */
+   continue;
+  if (TREE_CODE (fn) == TEMPLATE_DECL
+ || TREE_CHAIN (parm) != void_list_node)
+   /* Not a simple single parameter.  */
+   return true;
+  if (TYPE_MAIN_VARIANT (non_reference (TREE_VALUE (parm)))
+ != DECL_CONTEXT (fn))
+   /* The single parameter has the wrong type.  */
+   return true;
+  if (get_constraints (fn))
+   /* Constrained.  */
+   return true;
+}
+
+  return false;
+}
+
 /* Returns true iff class T has a user-provided or explicit constructor.  */
 
 bool
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 7b296d14a09..0694c28cde3 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23501,9 +23501,13 @@ maybe_adjust_types_for_deduction (tree tparms,
   return result;
 }
 
-/* Return true if computing a conversion from FROM to TO might induce template
-   instantiation.  Conversely, if this predicate returns false then computing
-   the conversion definitely won't induce template instantiation.  */
+/* Return true if computing a conversion from FROM to TO might consider
+   user-defined conversions, which could lead to arbitrary template
+   instantiations (e.g. g++.dg/cpp2a/concepts-nondep1.C).  If this predicate
+   returns false then computing the conversion definitely won't try UDCs.
+
+   Note that this restriction parallels LOOKUP_DEFAULTED for CWG1092, but in
+   this case we want the early filter to pass instead of fail.  */
 
 static bool
 conversion_may_instantiate_p (tree to, tree from)
@@ -23511,36 +23515,14 @@ conversion_may_instantiate_p (tree to, tree from)
   to = non_reference (to);
   from = non_reference (from);
 
-  bool ptr_c

[PATCH 0/9] AArch64: CMPBR support

2025-05-08 Thread Karl Meakin

This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Karl Meakin (9):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions
  AArch64: make rules for CBZ/TBZ higher priority

 .../aarch64/aarch64-option-extensions.def |2 +
 gcc/config/aarch64/aarch64-simd.md|2 +-
 gcc/config/aarch64/aarch64-sme.md |2 +-
 gcc/config/aarch64/aarch64.cc |4 +-
 gcc/config/aarch64/aarch64.h  |3 +
 gcc/config/aarch64/aarch64.md |  564 +---
 gcc/config/aarch64/iterators.md   |5 +
 gcc/config/aarch64/predicates.md  |   17 +
 gcc/doc/invoke.texi   |3 +
 gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1239 +
 10 files changed, 1623 insertions(+), 218 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

-- 
2.45.2

[PATCH RFC] libstdc++: run testsuite with -Wabi

2025-05-08 Thread Jason Merrill

Tested x86_64-pc-linux-gnu.  Does this make sense for trunk?

-- 8< --

I added this locally to check whether the PR120012 fix affects libstdc++ (it
doesn't) but it seems generally useful to catch whether compiler ABI
changes have library impact.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp: Add -Wabi.
---
 libstdc++-v3/testsuite/lib/libstdc++.exp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 5e958d159de..74e7e5e98eb 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -586,6 +586,7 @@ proc v3_target_compile { source dest type options } {
 global tool
 
 lappend options "additional_flags=-fdiagnostics-plain-output"
+lappend options "additional_flags=-Wabi=20";
 
 if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } {
lappend options "libs=${gluefile}"

base-commit: abab79397ef97acf7c689c43e27d58d8d7d5c599
-- 
2.49.0

Re: [PATCH 2/3] gimple-fold: Return early for GIMPLE_COND with true/false

2025-05-08 Thread Andrew Pinski

On Wed, Apr 23, 2025 at 2:03 AM Richard Biener
 wrote:
>
> On Wed, Apr 23, 2025 at 5:59 AM Andrew Pinski  
> wrote:
> >
> > To speed up things slightly so not needing to call all the way through
> > to match and simplify, we should return early for true/false on GIMPLE_COND.
>
> I think we'd still canonicalize the various forms matched by
> gimple_cond_true/false_p
> to a standard one - we should go through resimplify2 which should constant 
> fold
> the compare and in the end we do gimple_cond_make_true/false.
>
> I'm also not sure it's worth short-cutting this, it shouldn't be common to 
> fold
> an already canonical if (0) or if (1), no?

yes I agree, I posted a new version of the patch which does similar to
the `bool_name != 0` and what was suggested above:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683018.html

Thanks,
Andrew



>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * gimple-fold.cc (fold_stmt_1): For GIMPLE_COND return early
> > for true/false.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/gimple-fold.cc | 13 ++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> > index 94d5a1ebbd7..2381a82d2b1 100644
> > --- a/gcc/gimple-fold.cc
> > +++ b/gcc/gimple-fold.cc
> > @@ -6646,12 +6646,19 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool 
> > inplace, tree (*valueize) (tree),
> >break;
> >  case GIMPLE_COND:
> >{
> > +   gcond *gc = as_a  (stmt);
> > +   /* If the cond is already true/false, just return false.  */
> > +   if (gimple_cond_true_p (gc)
> > +   || gimple_cond_false_p (gc))
> > + {
> > +   fold_undefer_overflow_warnings (false, stmt, 0);
> > +   return false;
> > + }
> > /* Canonicalize operand order.  */
> > -   tree lhs = gimple_cond_lhs (stmt);
> > -   tree rhs = gimple_cond_rhs (stmt);
> > +   tree lhs = gimple_cond_lhs (gc);
> > +   tree rhs = gimple_cond_rhs (gc);
> > if (tree_swap_operands_p (lhs, rhs))
> >   {
> > -   gcond *gc = as_a  (stmt);
> > gimple_cond_set_lhs (gc, rhs);
> > gimple_cond_set_rhs (gc, lhs);
> > gimple_cond_set_code (gc,
> > --
> > 2.43.0
> >

[to-be-committed][RISC-V] Synthesize more efficient IOR/XOR sequences

2025-05-08 Thread Jeff Law

This is Shreya's next packet of work -- infrastructure for removing 
mvconst_internal which would ultimately make Vineet happier  :-)


--


So mvconst_internal's primary benefit is in constant synthesis not 
impacting the combine budget in terms of the number of instructions it 
is willing to combine together at any given time.  The downside is 
mvconst_internal breaks combine's toplevel costing model and as a result 
many other patterns have to be implemented as define_insn_and_splits 
rather than the often more natural define_splits.


This primarily impacts logical operations where we want to see the 
constant operand and potentially simplify the logical with other nearby 
logicals or shifts.


We can reduce our reliance on mvconst_internal and generate better code 
for various cases by generating better initial code for logical operations.


So let's assume we have a inclusive-or of a register with a nontrivial 
constant.  Right now we will load the nontrivial constant into a new 
pseudo (using multiple instructions), then emit a two register source 
ior operation.


For some cases we can just generate the code we want at expansion time. 
Concretely let's take this testcase:




unsigned long foo(unsigned long src) { return src | 0x8807; }


Right now we generate this code:


li  a5,-15
sllia5,a5,59
addia5,a5,7
or  a0,a0,a5


The first three instructions are synthesizing the constant.  The last 
instruction performs the desired operation.  But we can do better:



ori a0,a0,7
bseti   a0,a0,59
bseti   a0,a0,63


Notice how we never even bother to synthesize the constant.

IOR/XOR are pretty simple and this patch focuses exclusively on those. 
We use [x]ori to set whatever low 11 bits we need, then bset/binv for a 
small number of higher bits.  We use the cost of constant synthesis as 
our budget.


We also support a couple special cases.  First, we might be able to 
rotate the source value such that all the bits we want to manipulate are 
in the low 11 bits.  So we rotate the source, manipulate the bits, then 
rotate things back to where they belong.  I didn't see this trigger in 
spec, but I did trivially find a testcase where it was likely faster.


Second, we can have cases where we want to invert most of the bits, but 
a small number are supposed to be preserved.  We can pre-flip the bits 
we want to preserve with binv, then invert the whole register with not 
(which puts the bits to be preserved back in their original state).


I suspect there are likely a few more cases that could be improved, but 
the patch should stand on its own now and getting it out of the way 
allows us to focus on logical AND which is far tougher, but also more 
important in the task of removing mvconst_internal.


As we're not removing mvconst_internal yet, this patch is mostly a nop. 
I did look at spec before/after and didn't see anything particular 
interesting.  I also temporarily removed mvconst_internal and looked at 
spec before/after to hopefully ensure we weren't missing anything 
obvious in the XOR/IOR cases.  Obviously that latter test showed all 
kinds of regressions with AND.


We're still working through implementation details on the AND case and 
determining what bridge patterns we're going to need to ensure we don't 
regress.   But this XOR/IOR patch is in good enough shape that it can go 
forward now.



Naturally this has been run through my tester (bootstrap & regression 
test is in flight, but won't finish for many more hours).  Obviously I'm 
quite interested in anything spit out by the pre-commit CI system.



Jeff




gcc/

* config/riscv/iterator.md (OPTAB): New iterator.
* config/riscv/predicates.md (arith_or_zbs_operand): Remove.
(reg_or_const_int_operand): New predicate.
* config/riscv/riscv-protos.h (synthesize_ior_xor): Prototype.
* config/riscv/riscv.cc (synthesize_ior_xor): New function.
* cofnig/riscv/riscv.md (ior/xor expander): Use synthesize_ior_xor.

gcc/testsuite/

* gcc.target/riscv/ior-synthesis-1.c: New test.
* gcc.target/riscv/ior-synthesis-2.c: New test.
* gcc.target/riscv/xor-synthesis-1.c: New test.
* gcc.target/riscv/xor-synthesis-2.c: New test.
* gcc.target/riscv/xor-synthesis-3.c: New test.

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 214c20ba7b8..584b345f02c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -262,6 +262,9 @@ (define_code_iterator fix_ops [fix unsigned_fix])
 
 (define_code_attr fix_uns [(fix "fix") (unsigned_fix "fixuns")])
 
+(define_code_attr OPTAB [(ior "IOR")
+ (xor "XOR")])
+
 
 ;; ---
 ;; Code Attributes
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index c9a638cd103..23690792b32 100644
--- a/gcc

[PATCH, committed] Fortran: parsing issue with DO CONCURRENT; ENDDO on same line [PR120179]

2025-05-08 Thread Harald Anlauf


Dear all,

the attached patch fixes a 15/16 regression for parsing DO CONCURRENT
when there was another statement following on the same line after a
semicolon, because gfc_match_eos was called twice instead of just once.

The patch was OK'ed by Jerry in the PR, regtested and pushed to mainline
so far as r16-480-g6ce73ad4370c14.

A backport to 15 will follow soon.

Thanks,
Harald

From 4914d9b0ccce843452ab3c921817513441e187ff Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 8 May 2025 22:21:03 +0200
Subject: [PATCH] Fortran: parsing issue with DO CONCURRENT;ENDDO on same line
 [PR120179]

	PR fortran/120179

gcc/fortran/ChangeLog:

	* match.cc (gfc_match_do): Do not attempt to match end-of-statement
	twice.

gcc/testsuite/ChangeLog:

	* gfortran.dg/do_concurrent_basic.f90: Extend testcase.
---
 gcc/fortran/match.cc  | 3 ++-
 gcc/testsuite/gfortran.dg/do_concurrent_basic.f90 | 7 +--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 474ba81b2aa..a99a757bede 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -2892,7 +2892,7 @@ gfc_match_do (void)
 	  locus where = gfc_current_locus;
 
 	  if (gfc_match_eos () == MATCH_YES)
-	break;
+	goto concurr_ok;
 
 	  else if (gfc_match ("local ( ") == MATCH_YES)
 	{
@@ -3141,6 +3141,7 @@ gfc_match_do (void)
   if (gfc_match_eos () != MATCH_YES)
 	goto concurr_cleanup;
 
+concurr_ok:
   if (label != NULL
 	   && !gfc_reference_st_label (label, ST_LABEL_DO_TARGET))
 	goto concurr_cleanup;
diff --git a/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90 b/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90
index fe8723d48b4..bdb6e0e6fe2 100644
--- a/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90
+++ b/gcc/testsuite/gfortran.dg/do_concurrent_basic.f90
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do compile }
 program basic_do_concurrent
   implicit none
   integer :: i, arr(10)
@@ -7,5 +7,8 @@ program basic_do_concurrent
 arr(i) = i
   end do
 
+  do concurrent (i=1:10);enddo
+  do,concurrent (i=1:10);arr(i)=i;enddo
+
   print *, arr
-end program basic_do_concurrent
\ No newline at end of file
+end program basic_do_concurrent
-- 
2.43.0

RE: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread Li, Pan2

> OK, understood.  I think that's expected given the fine granularity of the 
> tests.  IMHO nothing that should block progress.

Thanks Robin, then we can move to other vx/vf insns.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, May 8, 2025 11:44 PM
To: Li, Pan2 ; Robin Dapp ; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken ; Liu, Hongtao ; Robin Dapp 

Subject: Re: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + 
vadd.vv combine

>> it's just a vector cost model issue and some loops are not profitable
>> to vectorize?
>
> Yes. For example, when gpr2vr is 1, int8_t cannot vectorize while uint8_t can.

OK, understood.  I think that's expected given the fine granularity of the 
tests.  IMHO nothing that should block progress.

-- 
Regards
 Robin

[PATCH v2] MIPS: Fix the issue with the '-fpatchable-function-entry=' feature.

2025-05-08 Thread Lulu Cheng

From: ChengLulu 

PR target/99217

gcc/ChangeLog:

* config/mips/mips.cc (mips_start_function_definition):
Implements the functionality of '-fpatchable-function-entry='.
(mips_print_patchable_function_entry): Define empty function.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define macro.

gcc/testsuite/ChangeLog:

* gcc.target/mips/pr99217.c: New test.

---
v1 -> v2:
Add testsuite.
---
 gcc/config/mips/mips.cc | 33 +
 gcc/testsuite/gcc.target/mips/pr99217.c | 10 
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr99217.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 24a28dcf817..f4ec59713b4 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -7478,6 +7478,9 @@ static void
 mips_start_function_definition (const char *name, bool mips16_p,
tree decl ATTRIBUTE_UNUSED)
 {
+  unsigned HOST_WIDE_INT patch_area_size = crtl->patch_area_size;
+  unsigned HOST_WIDE_INT patch_area_entry = crtl->patch_area_entry;
+
   if (mips16_p)
 fprintf (asm_out_file, "\t.set\tmips16\n");
   else
@@ -7490,6 +7493,10 @@ mips_start_function_definition (const char *name, bool 
mips16_p,
 fprintf (asm_out_file, "\t.set\tnomicromips\n");
 #endif
 
+  /* Emit the patching area before the entry label, if any.  */
+  if (patch_area_entry > 0)
+default_print_patchable_function_entry (asm_out_file,
+   patch_area_entry, true);
   if (!flag_inhibit_size_directive)
 {
   fputs ("\t.ent\t", asm_out_file);
@@ -7501,6 +7508,13 @@ mips_start_function_definition (const char *name, bool 
mips16_p,
 
   /* Start the definition proper.  */
   ASM_OUTPUT_FUNCTION_LABEL (asm_out_file, name, decl);
+
+  /* And the area after the label.  Record it if we haven't done so yet.  */
+  if (patch_area_size > patch_area_entry)
+default_print_patchable_function_entry (asm_out_file,
+   patch_area_size
+   - patch_area_entry,
+   patch_area_entry == 0);
 }
 
 /* End a function definition started by mips_start_function_definition.  */
@@ -23338,6 +23352,21 @@ mips_bit_clear_p (enum machine_mode mode, unsigned 
HOST_WIDE_INT m)
   return false;
 }
 
+/* define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY */
+
+/* The MIPS function start is implemented in the prologue function.
+   TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY needs to be inserted
+   before or after the function name, so this function does not
+   use a public implementation. This function is implemented in
+   mips_start_function_definition. */
+
+void
+mips_print_patchable_function_entry (FILE *file ATTRIBUTE_UNUSED,
+unsigned HOST_WIDE_INT
+patch_area_size ATTRIBUTE_UNUSED,
+bool record_p ATTRIBUTE_UNUSED)
+{}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -23651,6 +23680,10 @@ mips_bit_clear_p (enum machine_mode mode, unsigned 
HOST_WIDE_INT m)
 #undef TARGET_DOCUMENTATION_NAME
 #define TARGET_DOCUMENTATION_NAME "MIPS"
 
+#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
+#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
+mips_print_patchable_function_entry
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-mips.h"
diff --git a/gcc/testsuite/gcc.target/mips/pr99217.c 
b/gcc/testsuite/gcc.target/mips/pr99217.c
new file mode 100644
index 000..f5851bb1606
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr99217.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fpatchable-function-entry=1" } */
+/* { dg-final { scan-assembler 
"foo:*.*.LPFE0:\n\t.set\tnoreorder\n\tnop\n\t.set\treorder" } } */
+
+/* Test the placement of the .LPFE0 label.  */
+
+void
+foo (void)
+{
+}
-- 
2.20.1

Re: [PATCH] ctf: emit CTF_K_ARRAY for GNU vector types

2025-05-08 Thread Indu


On 2025-05-01 2:34 p.m., Bruce McCulloch wrote:

Currently, there is a check in gen_ctf_array_type that prevents GNU vectors
generated by the vector attribute from being emitted (e.g. typedef int v8si
__attribute__ ((vector_size (32)));). Because this check happens in
dwarf2ctf.cc, this prevents GNU vectors from being emitted not only in CTF,
but also in BTF. This is a problem, as there are a handful of GNU vectors
present in the kernel that are not being accurately represented in the
vmlinux.{ctfa,btfa}. Additionally, BTF generated by clang emits these vectors
as arrays.

This patch solves the issue by simply removing the check that prevents
these types from being appropriately emitted. Additionally, a new test is
included that checks for the appropriate asm emission when generating CTF.



Hi Bruce,

(CC Nick)

Vector type is different from an array type.  A CTF consumer may want to 
distinguish between the two for various reasons.  I found this useful in 
this regard: https://dwarfstd.org/issues/230413.1.html.


If, for the case of BTF, it suffices to emit vectors with kind 
BTF_K_ARRAY (although I would assume BTF to have cared for the 
distinction for the same reasons as CTF..), we will need to add a new 
"internal" kind to CTF, say CTF_K_VECTOR, and not emit them in the 
output section when -gctf is in effect.  Any types, vars etc.  referring 
to the vector type will continue to be emitted as referring to a 
CTF_K_UNKNOWN, as CTF does not have representation for vector types in 
CTF V3.




gcc/ChangeLog:

* dwarf2ctf.cc (gen_ctf_array_type): Remove check for DW_AT_GNU_vector.

gcc/testsuite/ChangeLog:

* gcc.dg/debug/ctf/ctf-vector.c: New test.


Signed-off-by: Bruce McCulloch 
---
  gcc/dwarf2ctf.cc|  4 ---
  gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c | 32 +
  2 files changed, 32 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c

diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index fd326b320af..a3497d58504 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -417,10 +417,6 @@ gen_ctf_array_type (ctf_container_ref ctfc,
dw_die_ref first, last, array_elems_type;
ctf_dtdef_ref array_dtd, elem_dtd;
  
-  int vector_type_p = get_AT_flag (array_type, DW_AT_GNU_vector);

-  if (vector_type_p)
-return NULL;
-
/* Find the first and last array dimension DIEs.  */
last = dw_get_die_child (array_type);
first = dw_get_die_sib (last);
diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c 
b/gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c
new file mode 100644
index 000..368046db214
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf-vector.c
@@ -0,0 +1,32 @@
+/* Tests for CTF SIMD vector type.
+   - Verify that there is a record of:
+ + int
+ + void
+ + int[8] -> int
+ + v8si -> int[8] -> int.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -gctf -dA" } */
+
+/* Check for presence of strings:  */
+/* { dg-final { scan-assembler-times "ascii \"int.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"void.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"v8si.0\"\[\t 
\]+\[^\n\]*ctf_string" 1 } } */
+
+/* Check for information about int.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x600\[ \t\]+\[^\n\]*# 
ctt_info" 2 } } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x4\[ \t\]+\[^\n\]*# ctt_size or 
ctt_type" 1 } } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x120\[ \t\]+\[^\n\]*# 
ctf_encoding_data" 1 } } */
+
+/* Check for information about void.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0\[ \t\]+\[^\n\]*# ctt_size or 
ctt_type" 2 } } */
+
+/* Check for information about int[8] array.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x1200\[ \t\]+\[^\n\]*# 
ctt_info" 1 } } */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x8\[ \t\]+\[^\n\]*# 
cta_nelems" 1 } } */
+
+/* Check for information about v8si.  */
+/* { dg-final { scan-assembler-times ".long\[ \t\]+0x2a00\[ \t\]+\[^\n\]*# 
ctt_info" 1 } } */
+
+typedef int v8si __attribute__ ((vector_size (32)));
+v8si foo;

[PATCH] match: Don't allow folling statements that can throw internally [PR119903]

2025-05-08 Thread Andrew Pinski

This removes the ability to follow statements that can throw internally.
This was suggested in bug report as a way to solve the issue here.
The overhead is not that high since without non-call exceptions turned
on, there is an early exit for non-calls.

PR tree-optimization/119903

gcc/ChangeLog:

* gimple-match-head.cc (get_def): Reject statements that can throw
internally.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr119903-1.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-match-head.cc   |  5 -
 gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C | 24 ++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 6b3c5febbea..62ff8e57fbb 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -63,7 +63,10 @@ get_def (tree (*valueize)(tree), tree name)
 {
   if (valueize && ! valueize (name))
 return NULL;
-  return SSA_NAME_DEF_STMT (name);
+  gimple *t = SSA_NAME_DEF_STMT (name);
+  if (stmt_can_throw_internal (cfun, t))
+return nullptr;
+  return t;
 }
 
 /* Routine to determine if the types T1 and T2 are effectively
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
new file mode 100644
index 000..605f989a2eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr119903-1.C
@@ -0,0 +1,24 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fnon-call-exceptions -ftrapping-math 
-fdump-tree-optimized-eh" }
+
+// PR tree-optimization/119903
+// match and simplify would cause the internal throwable fp comparison
+// to become only external throwable and lose the landing pad.
+
+int f() noexcept;
+int g() noexcept;
+
+int m(double a)
+{
+  try {
+if (a < 1.0)
+  return f();
+return g();
+  }catch(...)
+  {
+return -1;
+  }
+}
+
+// Make sure There is a landing pad for the non-call exception from the 
comparison.
+// { dg-final { scan-tree-dump "LP " "optimized" } }
-- 
2.43.0

Re: [PATCH] aarch64: Use LDR for first-element loads for Advanced SIMD

2025-05-08 Thread Richard Sandiford

Dhruv Chawla  writes:
> This patch modifies Advanced SIMD assembly generation to emit an LDR
> instruction when a vector is created using a load to the first element with 
> the
> other elements being zero.
>
> This is similar to what *aarch64_combinez already does.
>
> Example:
>
> uint8x16_t foo(uint8_t *x) {
>uint8x16_t r = vdupq_n_u8(0);
>r = vsetq_lane_u8(*x, r, 0);
>return r;
> }
>
> Currently, this generates:
>
> foo:
>   moviv0.4s, 0
>   ld1 {v0.b}[0], [x0]
>   ret
>
> After applying the patch, this generates:
>
> foo:
>   ldr b0, [x0]
>   ret
>
> Bootstrapped and regtested on aarch64-linux-gnu. Tested on
> aarch64_be-unknown-linux-gnu as well.
>
> Signed-off-by: Dhruv Chawla 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md
>   (*aarch64_simd_vec_set_low): New pattern.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/simd/ldr_first_le.c: New test.
>   * gcc.target/aarch64/simd/ldr_first_be.c: Likewise.
> ---
>   gcc/config/aarch64/aarch64-simd.md|  12 ++
>   .../gcc.target/aarch64/simd/ldr_first_be.c| 140 ++
>   .../gcc.target/aarch64/simd/ldr_first_le.c| 139 +
>   3 files changed, 291 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/ldr_first_be.c
>   create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/ldr_first_le.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index e2afe87e513..7be1c685fcf 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1164,6 +1164,18 @@
> [(set_attr "type" "neon_logic")]
>   )
>   
> +(define_insn "*aarch64_simd_vec_set_low"
> +  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> + (vec_merge:VALL_F16
> + (vec_duplicate:VALL_F16
> + (match_operand: 1 "aarch64_simd_nonimmediate_operand" "m"))

The constraint should be "Utv" rather than "m", since the operand doesn't
accept all addresses that are valid for .  E.g. a normal SImode
memory would allow [reg, #imm], whereas this address does't.

> + (match_operand:VALL_F16 3 "aarch64_simd_imm_zero" "i")
> + (match_operand:SI 2 "immediate_operand" "i")))]

I think we should drop the two "i"s here, since the pattern doesn't
accept all immediates.  The predicate on the final operand should be
const_int_operand rather than immediate_operand.

Otherwise it looks good.  But I think we should think about how we
plan to integrate the related optimisation for register inputs.  E.g.:

int32x4_t foo(int32_t x) {
return vsetq_lane_s32(x, vdupq_n_s32(0), 0);
}

generates:

foo:
moviv0.4s, 0
ins v0.s[0], w0
ret

rather than a single UMOV.  Same idea when the input is in an FPR rather
than a GPR, but using FMOV rather than UMOV.

Conventionally, the register and memory forms should be listed as
alternatives in a single pattern, but that's somewhat complex because of
the different instruction availability for 64-bit+32-bit, 16-bit, and
8-bit register operations.

My worry is that if we handle the register case as an entirely separate
patch, it would have to rewrite this one.

The register case is somewhat related to Pengxuan's work on permutations.

Thanks,
Richard

[pushed: r16-487] diagnostics: convert HTML output test plugin to 'experimental-html' sink [PR116792]

2025-05-08 Thread David Malcolm

In r15-3752-g48261bd26df624 I added a test plugin that overrode the
regular output, instead emitting diagnostics in crude HTML form.

In r15-4760-g0b73e9382ab51c I added support for multiple kinds of
diagnostic output simultaneously, adding
 -fdiagnostics-add-output=DIAGNOSTICS-OUTPUT-SPEC
 -fdiagnostics-set-output=DIAGNOSTICS-OUTPUT-SPEC
for adding/changing the kind of diagnostics output, supporting
"text" and "sarif" output schemes.

This patch promotes the HTML output code from the test plugins so
that it is available from "-fdiagnostics-add-output=", using a
new "experimental-html" scheme, to allow simultaneous text, sarif
and html output, and to make it easier to experiment with.  The
patch adds Python-based testing of the emitted HTML.

The patch does not affect the generated HTML, which is still crude, and
not yet ready for end-users.  I hope to improve it in followups.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r16-487-g1a2c62212bd912.

gcc/ChangeLog:
PR other/116792
* Makefile.in (OBJS-libcommon): Add diagnostic-format-html.o.
* diagnostic-format-html.cc: Move here from
testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc.
Simplify includes.  Rename "xhtml" to "html" throughout.
(write_escaped_text): Drop.
(class xhtml_stream_output_format): Drop.
(class html_file_output_format): Reimplement using
diagnostic_output_file.
(diagnostic_output_format_init_xhtml): Drop.
(diagnostic_output_format_init_xhtml_stderr): Drop.
(diagnostic_output_format_init_xhtml_file): Drop.
(diagnostic_output_format_open_html_file): New.
(make_html_sink): New.
(xhtml_format_selftests): Convert to...
(diagnostic_format_html_cc_tests): ...this.
(plugin_is_GPL_compatible): Drop.
(plugin_init): Drop.
* diagnostic-format-html.h: New file.
* doc/invoke.texi (-fdiagnostics-add-output=): Add
"experimental-html" scheme.
* opts-diagnostic.cc: Include "diagnostic-format-html.h".
(class html_scheme_handler): New.
(output_factory::output_factory): Add html_scheme_handler.
(html_scheme_handler::make_sink): New.
* selftest-run-tests.cc (selftest::run_tests): Call the new
selftests.
* selftest.h (selftest::diagnostic_format_html_cc_tests): New
decl.

gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc: Move to
gcc/diagnostic-format-html.cc.
* gcc.dg/html-output/html-output.exp: New support script.
* gcc.dg/html-output/missing-semicolon.c: New test.
* gcc.dg/html-output/missing-semicolon.py: New test script.
* gcc.dg/plugin/diagnostic-test-xhtml-1.c: Deleted test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Drop moved plugin
and its deleted test.
* lib/gcc-dg.exp (load_lib): Add load_lib of scanhtml.exp.
* lib/htmltest.py: New support script.
* lib/scanhtml.exp: New support script, based on scansarif.exp.

libatomic/ChangeLog:
PR other/116792
* testsuite/lib/libatomic.exp: Add load_lib of scanhtml.exp.

libgomp/ChangeLog:
PR other/116792
* testsuite/lib/libgomp.exp: Add load_lib of scanhtml.exp.

libitm/ChangeLog:
PR other/116792
* testsuite/lib/libitm.exp: Add load_lib of scanhtml.exp.

libphobos/ChangeLog:
PR other/116792
* testsuite/lib/libphobos-dg.exp: Add load_lib of scanhtml.exp.

libvtv/ChangeLog:
PR other/116792
* testsuite/lib/libvtv-dg.exp: Add load_lib of scanhtml.exp.
---
 gcc/Makefile.in   |   1 +
 ...ml_format.cc => diagnostic-format-html.cc} | 323 +++---
 gcc/diagnostic-format-html.h  |  37 ++
 gcc/doc/invoke.texi   |  18 +-
 gcc/opts-diagnostic.cc|  60 
 gcc/selftest-run-tests.cc |   1 +
 gcc/selftest.h|   1 +
 .../gcc.dg/html-output/html-output.exp|  31 ++
 .../gcc.dg/html-output/missing-semicolon.c|  13 +
 .../gcc.dg/html-output/missing-semicolon.py   |  84 +
 .../gcc.dg/plugin/diagnostic-test-xhtml-1.c   |  19 --
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 -
 gcc/testsuite/lib/gcc-dg.exp  |   1 +
 gcc/testsuite/lib/htmltest.py |   9 +
 gcc/testsuite/lib/scanhtml.exp|  90 +
 libatomic/testsuite/lib/libatomic.exp |   1 +
 libgomp/testsuite/lib/libgomp.exp |   1 +
 libitm/testsuite/lib/libitm.exp   |   1 +
 libphobos/testsuite/lib/libphobos-dg.exp  |   1 +
 libvtv/testsuite/lib/libvtv-dg.exp|   1 +
 20 files changed, 479 insertions(+), 216 deletions(-)
 rename gcc/{testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc => 
diagnostic-format-html.cc} (72%)

Fix PR 118541, do not generate unordered fp cmoves for IEEE compares

2025-05-08 Thread Michael Meissner

This has been posted previously.  This patch includes fixing some typos that
Bernhard Reutner-Fischer suggested.

In bug PR target/118541 on power9, power10, and power11 systems, for the
function:

extern double __ieee754_acos (double);

double
__acospi (double x)
{
  double ret = __ieee754_acos (x) / 3.14;
  return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
}

GCC currently generates the following code:

Power9  Power10 and Power11
==  ===
bl __ieee754_acos   bl __ieee754_acos@notoc
nop plfd 0,.LC0@pcrel
addis 9,2,.LC2@toc@ha   xxspltidp 12,1065353216
addi 1,1,32 addi 1,1,32
lfd 0,.LC2@toc@l(9) ld 0,16(1)
addis 9,2,.LC0@toc@ha   fdiv 0,1,0
ld 0,16(1)  mtlr 0
lfd 12,.LC0@toc@l(9)xscmpgtdp 1,0,12
fdiv 0,1,0  xxsel 1,0,12,1
mtlr 0  blr
xscmpgtdp 1,0,12
xxsel 1,0,12,1
blr

This is because ifcvt.c optimizes the conditional floating point move to use the
XSCMPGTDP instruction.

However, the XSCMPGTDP instruction will generate an interrupt if one of the
arguments is a signalling NaN and signalling NaNs can generate an interrupt.
The IEEE comparison functions (isgreater, etc.) require that the comparison not
raise an interrupt.

The following patch changes the PowerPC back end so that ifcvt.c will not change
the if/then test and move into a conditional move if the comparison is one of
the comparisons that do not raise an error with signalling NaNs and -Ofast is
not used.  If a normal comparison is used or -Ofast is used, GCC will continue
to generate XSCMPGTDP and XXSEL.

For the following code:

double
ordered_compare (double a, double b, double c, double d)
{
  return __builtin_isgreater (a, b) ? c : d;
}

/* Verify normal > does generate xscmpgtdp.  */

double
normal_compare (double a, double b, double c, double d)
{
  return a > b ? c : d;
}

with the following patch, GCC generates the following for power9, power10, and
power11:

ordered_compare:
fcmpu 0,1,2
fmr 1,4
bnglr 0
fmr 1,3
blr

normal_compare:
xscmpgtdp 1,1,2
xxsel 1,4,3,1
blr

I have built bootstrap compilers on big endian power9 systems and little endian
power9/power10 systems and there were no regressions.  Can I check this patch
into the GCC trunk, and after a waiting period, can I check this into the active
older branches?

2025-04-30  Michael Meissner  

gcc/

PR target/118541
* config/rs6000/predicates.md (invert_fpmask_comparison_operator): Do
not allow UNLT and UNLE unless -ffast-math.
* config/rs6000/rs6000-protos.h (enum rev_cond_ordered): New 
enumeration.
(rs6000_reverse_condition): Add argument.
* config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
ordered comparisons to be reversed for floating point conditional moves,
but allow ordered comparisons to be reversed on jumps.
(rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
* config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
* config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
Adjust rs6000_reverse_condition calls.

gcc/testsuite/

PR target/118541
* gcc.target/powerpc/pr118541-1.c: New test.
* gcc.target/powerpc/pr118541-2.c: Likewise.
* gcc.target/powerpc/pr118541-3.c: Likewise.
* gcc.target/powerpc/pr118541-4.c: Likewise.
---
 gcc/config/rs6000/predicates.md   | 10 +++-
 gcc/config/rs6000/rs6000-protos.h | 17 ++-
 gcc/config/rs6000/rs6000.cc   | 46 ++-
 gcc/config/rs6000/rs6000.h| 10 +++-
 gcc/config/rs6000/rs6000.md   | 25 ++
 gcc/testsuite/gcc.target/powerpc/pr118541-1.c | 28 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-2.c | 26 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-3.c | 26 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-4.c | 26 +++
 9 files changed, 190 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-4.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 647e89afb6a..ba8df6a7979 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@

[PATCH] vxworks: undefine TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL

2025-05-08 Thread Alexandre Oliva



config.gcc arranges for vxworks 7r2+ targets to include linux.h,
because of the similarity, but linux.h defines
TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL to a function declared in
linux-protos.h, and defined in linux.cc, neither of which vxworks
targets include.  Undefine it in vxworks.h.

Tested with gcc-14 targeting ppc-vx7r2 and ppc64-vx7r2.  Also tested
with trunk on ppc64le-linux-gnu, and with gcc-14 targeting powerpc-elf.
Ok to install?


for  gcc/ChangeLog

* config/vxworks.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL):
Undefine.
---
 gcc/config/vxworks.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/vxworks.h b/gcc/config/vxworks.h
index 204a8e000d405..1ad4c1553ba9b 100644
--- a/gcc/config/vxworks.h
+++ b/gcc/config/vxworks.h
@@ -433,3 +433,6 @@ extern void vxworks_emit_call_builtin___clear_cache (rtx 
begin, rtx end);
so silence the warning (instead of passing -flinker-output=nolto-rel).  */
 #undef LTO_PLUGIN_SPEC
 #define LTO_PLUGIN_SPEC "%{!mrtp:-plugin-opt=-linker-output-auto-nolto-rel}"
+
+/* Undo the linux.h definition.  */
+#undef TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL


-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!

1 2 >

1 - 100 of 124 matches

Mail list logo