[PATCH, v2] Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

2025-01-10 Thread Harald Anlauf

Thomas, Steve,

thanks for the swift feedback!

Am 10.01.25 um 23:57 schrieb Thomas Koenig:

Hello Harald,


Regtested on x86_64-pc-linux-gnu.  OK for mainline?


I just started to run a bootstrap on cfarm120 (because it is
the only machine I can lay my hands on where I can run
"make -j128" without disturbing anybody :-) and I got

../../trunk/gcc/fortran/trans-intrinsic.cc: In function ‘void 
gfc_conv_intrinsic_out_of_range(gfc_se*, gfc_expr*)’:
../../trunk/gcc/fortran/trans-intrinsic.cc:7178:22: error: ‘tmp’ may be 
used uninitialized [-Werror=maybe-uninitialized]

  7178 |   se->expr = convert (gfc_typenode_for_spec (&expr->ts), tmp);
   |  ^~~~
../../trunk/gcc/fortran/trans-intrinsic.cc:7001:8: note: ‘tmp’ was 
declared here

  7001 |   tree tmp, tmp1, tmp2;

(Simply initializing tmp to NULL_TREE could probably be enough).
Could you check?


Thanks for pointing this out!  I've also added a few gcc_unreachable()
to prevent other potential false positives, see attached.

I've also removed the "-fno-finite-math-only" option after verifying
that the testsuite does indeed not excercise -Ofast.

Seems like I got lost looking too long at tree and optimized dumps...

Thanks,
Harald


Best regards

 Thomas



From 2ff2308edabbcd412bf137f3e74a6db3e5cea387 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sat, 11 Jan 2025 08:35:44 +0100
Subject: [PATCH] Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

Implementation of the Fortran 2018 standard intrinsic OUT_OF_RANGE, with
the GNU Fortran extension to unsigned integers.

Runtime code is fully inline expanded.

	PR fortran/115788

gcc/fortran/ChangeLog:

	* check.cc (gfc_check_out_of_range): Check arguments to intrinsic.
	* expr.cc (free_expr0): Fix a memleak with unsigned literals.
	* gfortran.h (enum gfc_isym_id): Define GFC_ISYM_OUT_OF_RANGE.
	* intrinsic.cc (add_functions): Add Fortran prototype.  Break some
	nearby lines with excessive length.
	* intrinsic.h (gfc_check_out_of_range): Add prototypes.
	* intrinsic.texi: Fortran documentation of OUT_OF_RANGE.
	* simplify.cc (gfc_simplify_out_of_range): Compile-time simplification
	of OUT_OF_RANGE.
	* trans-intrinsic.cc (gfc_conv_intrinsic_out_of_range): Generate
	inline expansion of runtime code for OUT_OF_RANGE.
	(gfc_conv_intrinsic_function): Use it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/ieee/out_of_range.f90: New test.
	* gfortran.dg/out_of_range_1.f90: New test.
	* gfortran.dg/out_of_range_2.f90: New test.
	* gfortran.dg/out_of_range_3.f90: New test.
---
 gcc/fortran/check.cc  |  42 
 gcc/fortran/expr.cc   |   1 +
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/intrinsic.cc  |  28 ++-
 gcc/fortran/intrinsic.h   |   2 +
 gcc/fortran/intrinsic.texi|  64 ++
 gcc/fortran/simplify.cc   | 208 ++
 gcc/fortran/trans-intrinsic.cc| 196 +
 .../gfortran.dg/ieee/out_of_range.f90 |  65 ++
 gcc/testsuite/gfortran.dg/out_of_range_1.f90  |  91 
 gcc/testsuite/gfortran.dg/out_of_range_2.f90  | 115 ++
 gcc/testsuite/gfortran.dg/out_of_range_3.f90  |  25 +++
 12 files changed, 831 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/out_of_range.f90
 create mode 100644 gcc/testsuite/gfortran.dg/out_of_range_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/out_of_range_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/out_of_range_3.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index e29ad398611..35458643835 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -4864,6 +4864,48 @@ gfc_check_null (gfc_expr *mold)
 }
 
 
+bool
+gfc_check_out_of_range (gfc_expr *x, gfc_expr *mold, gfc_expr *round)
+{
+  if (!int_or_real_or_unsigned_check (x, 0))
+return false;
+
+  if (mold == NULL)
+return false;
+
+  if (!int_or_real_or_unsigned_check (mold, 1))
+return false;
+
+  if (!scalar_check (mold, 1))
+return false;
+
+  if (round)
+{
+  if (!type_check (round, 2, BT_LOGICAL))
+	return false;
+
+  if (!scalar_check (round, 2))
+	return false;
+
+  if (x->ts.type != BT_REAL
+	  || (mold->ts.type != BT_INTEGER && mold->ts.type != BT_UNSIGNED))
+	{
+	  gfc_error ("%qs argument of %qs intrinsic at %L shall appear "
+		 "only if %qs is of type REAL and %qs is of type "
+		 "INTEGER or UNSIGNED",
+		 gfc_current_intrinsic_arg[2]->name,
+		 gfc_current_intrinsic, &round->where,
+		 gfc_current_intrinsic_arg[0]->name,
+		 gfc_current_intrinsic_arg[1]->name);
+
+	  return false;
+	}
+}
+
+  return true;
+}
+
+
 bool
 gfc_check_pack (gfc_expr *array, gfc_expr *mask, gfc_expr *vector)
 {
diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 0e40b2493a5..7f3f6c52fb5 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/e

Re: [Patch, fortran] PR108434 - [12/13/14/15 Regression] ICE in class_allocatable, at fortran/expr.cc:5000

2025-01-10 Thread Steve Kargl
On Fri, Jan 10, 2025 at 05:19:34PM +, Paul Richard Thomas wrote:
> 
> As of today, Gerhard Steinmetz has no fewer than 33 regressions to his name
> out of a total of 54 for fortran and libgfortran. It's time that some of
> these bugs are swatted, I think :-)
> 

PR 70949 appears to have been fixed at some point
in the past.  The following patch converts Gerhard's
code into testcases.

diff --git a/gcc/testsuite/gfortran.dg/pr70949_1.f90 
b/gcc/testsuite/gfortran.dg/pr70949_1.f90
new file mode 100644
index 000..91cd18069fc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr70949_1.f90
@@ -0,0 +1,27 @@
+!
+! { dg-do run}
+!
+program p
+
+   type t1
+   end type
+
+   type t2
+  type(t1), pointer :: q
+   end type
+
+   type(t1), pointer :: a
+   type(t2) :: c
+
+   allocate(a)
+   c%q => a
+   if (.not. associated(a, f(c))) stop 1
+
+   contains
+
+  function f(x) result (z)
+ type(t2), intent(in) :: x
+ class(t1), pointer :: z
+ z => x%q
+  end function f
+end
diff --git a/gcc/testsuite/gfortran.dg/pr70949_2.f90 
b/gcc/testsuite/gfortran.dg/pr70949_2.f90
new file mode 100644
index 000..eb064b6fa80
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr70949_2.f90
@@ -0,0 +1,27 @@
+!
+! { dg-do run}
+!
+program p
+
+   type t1
+   end type
+
+   type t2
+  type(t1), pointer :: q
+   end type
+
+   type(t1), pointer :: a
+   type(t2) :: c
+
+   allocate(a)
+   c%q => a
+   if (.not. associated(a, f(c))) stop 1
+
+   contains
+
+  function f(x) result (z)
+ type(t2), intent(in) :: x
+ type(t1), pointer :: z
+ z => x%q
+  end function f
+end



-- 
Steve


Re: [Patch, fortran] PR108434 - [12/13/14/15 Regression] ICE in class_allocatable, at fortran/expr.cc:5000

2025-01-10 Thread Steve Kargl
On Fri, Jan 10, 2025 at 05:19:34PM +, Paul Richard Thomas wrote:
> 
> As of today, Gerhard Steinmetz has no fewer than 33 regressions to his name
> out of a total of 54 for fortran and libgfortran. It's time that some of
> these bugs are swatted, I think :-)
> 

This patch fixes PR71844.  As the error message indicates,
the source-expr in 'allocate(x, source=null())' cannot
be null().

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index dab0c3af601..538917fe56a 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -8965,6 +8965,13 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code, bool 
*array_alloc_wo_spec)
   gfc_component *c;
   bool t;
 
+  /* source-expr in either SOURCE= or MODE= cannot be NULL().  */
+  if (code->expr3 && code->expr3->expr_type == EXPR_NULL)
+{
+  gfc_error ("Source-expr at %L cannot be NULL()", &code->expr3->where);
+  goto failure;
+}
+
   /* Mark the utmost array component as being in allocate to allow DIMEN_STAR
  checking of coarrays.  */
   for (ref = e->ref; ref; ref = ref->next)
diff --git a/gcc/testsuite/gfortran.dg/pr71844.f90 
b/gcc/testsuite/gfortran.dg/pr71844.f90
new file mode 100644
index 000..af990f32fbb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr71844.f90
@@ -0,0 +1,10 @@
+!
+! { dg-do compile }
+!
+program p
+   class(*), allocatable :: x, y
+   character(:), allocatable :: z
+   allocate (x, source=null())   ! { dg-error "cannot be NULL" }
+   allocate (y, mold=null()) ! { dg-error "cannot be NULL" }
+   allocate (character(*) :: z)  ! { dg-error "Incompatible allocate-object" }
+end



-- 
Steve


Re: [PATCH] Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

2025-01-10 Thread Steve Kargl
On Fri, Jan 10, 2025 at 09:41:13PM +, Harald Anlauf wrote:
> 
> There is one question to the reviewer(s), or those knowing better
> than me how to handle IEEE infinity and NaN: with -Ofast, I needed
> to add "-fno-finite-math-only" to the new testcase
> gfortran.dg/ieee/out_of_range.f90, as the needed finiteness test
> was otherwise optimized to always true and leading to a failure.
> Is there a particular trick to disable a certain optimization
> at the tree level to such checks?
> 

It's been a long time since I've looked at the collection
of options that automatically are used with 'make check-fortran'.
Is -Ofast one the tested options?

As you have found, +-inf and NaN are incompatible with -Ofast.
That is, if a user uses -Ofast, s/he is telling gfortran that
the code does not encounter/generate exceptional FP values.
If dejagnu uses -Ofast during testing, you have no choice
but to use the -fno-finite-math-only option.

-- 
steve


[PATCH 05/10] libstdc++: Fix race condition in new atomic notify code

2025-01-10 Thread Jonathan Wakely
When using a proxy object for atomic waiting and notifying operations,
we need to ensure that the _M_ver value is always incremented by a
notifying operation, even if we return early without doing the futex
wake syscall. Otherwise we get missed wake-ups because the notifying
thread doesn't modify the value that other threads are doing a futex
wait on.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__notify_impl): Increment the
proxy value before returning early for the uncontended case.
---
 libstdc++-v3/include/bits/atomic_wait.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 29b83cad6e6c..4a9652ed8f1d 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -392,13 +392,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __wait_args __args{ *__a };
   __waiter_pool_impl* __pool = nullptr;
 
-  if (__args & __wait_flags::__track_contention)
-   {
- __pool = &__waiter_pool_impl::_S_impl_for(__addr);
- if (!__pool->_M_waiting())
-   return;
-   }
-
   const __platform_wait_t* __wait_addr;
   if (__args & __wait_flags::__proxy_wait)
{
@@ -416,6 +409,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   else // Use the atomic variable's own address.
__wait_addr = __addr;
 
+  if (__args & __wait_flags::__track_contention)
+   {
+ __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+ if (!__pool->_M_waiting())
+   return;
+   }
+
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
   __platform_notify(__wait_addr, __all);
 #else
-- 
2.47.1



Re: [PATCH] RISC-V: Let strided loads/stores demand proper SEW/LMUL [PR118154].

2025-01-10 Thread Robin Dapp
> Strided load store should demand RATIO instead of SEW and LMUL.
> Is it VSETVL PASS bug ? I don't understand why configure it depand SEW + LMUL 

Yeah, you're right, I was looking at indexed loads in the spec...
It's a problem in the vsetvl pass, yes.  Half of it I already fixed but the
other half (phase 3) is still pending.

-- 
Regards
 Robin



[PATCH 08/10] libstdc++: Rename __atomic_compare to __atomic_eq

2025-01-10 Thread Jonathan Wakely
This is an equality comparison rather than a three-way comparison like
memcmp and <=>, so name it more precisely.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h
(__atomic_wait_address_until_v): Replace __atomic_compare with
__atomic_eq.
(__atomic_wait_address_for_v): Likewise.
* include/bits/atomic_wait.h (__atomic_compare): Rename to
__atomic_eq.
(__atomic_wait_address_v): Replace __atomic_compare with
__atomic_eq.
---
 libstdc++-v3/include/bits/atomic_timed_wait.h | 10 ++
 libstdc++-v3/include/bits/atomic_wait.h   |  5 +++--
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 645b8cfc4a8b..9a60f34c130d 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -304,8 +304,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  const chrono::time_point<_Clock, _Dur>& 
__atime,
  bool __bare_wait = false) noexcept
 {
-   auto __pfn = [&](const _Tp& __val)
-  { return !__detail::__atomic_compare(__old, __val); };
+   auto __pfn = [&](const _Tp& __val) {
+return !__detail::__atomic_eq(__old, __val);
+   };
return __atomic_wait_address_until(__addr, __pfn, 
forward<_ValFn>(__vfn),
  __atime, __bare_wait);
 }
@@ -352,8 +353,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
const chrono::duration<_Rep, _Period>& __rtime,
bool __bare_wait = false) noexcept
 {
-  auto __pfn = [&](const _Tp& __val)
- { return !__detail::__atomic_compare(__old, __val); };
+  auto __pfn = [&](const _Tp& __val) {
+   return !__detail::__atomic_eq(__old, __val);
+  };
   return __atomic_wait_address_for(__addr, __pfn, forward<_ValFn>(__vfn),
   __rtime, __bare_wait);
 }
diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index db4fa031d2cf..0b29b17178e9 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -154,7 +154,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 // return true if equal
 template
-  bool __atomic_compare(const _Tp& __a, const _Tp& __b)
+  inline bool
+  __atomic_eq(const _Tp& __a, const _Tp& __b)
   {
// TODO make this do the correct padding bit ignoring comparison
return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
@@ -469,7 +470,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_ValFn __vfn) noexcept
 {
   auto __pfn = [&](const _Tp& __val)
- { return !__detail::__atomic_compare(__old, __val); };
+ { return !__detail::__atomic_eq(__old, __val); };
   __atomic_wait_address(__addr, __pfn, forward<_ValFn>(__vfn));
 }
 
-- 
2.47.1



[PATCH 06/10] libstdc++: Simplify futex wrapper functions for atomic wait/notify

2025-01-10 Thread Jonathan Wakely
Making these non-templates will allow them to be moved into the library
at some point.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__platform_wait): Change function
template to a normal function. The parameter is always
__platform_wait_t* which is just int* for this implementation of
the function.
(__platform_notify): Likewise.
---
 libstdc++-v3/include/bits/atomic_wait.h | 40 -
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 4a9652ed8f1d..38a2bd3f95f2 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -108,27 +108,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __bitset_match_any = -1
 };
 
-template
-  void
-  __platform_wait(const _Tp* __addr, __platform_wait_t __val) noexcept
-  {
-   auto __e = syscall (SYS_futex, static_cast(__addr),
-   
static_cast(__futex_wait_flags::__wait_private),
-   __val, nullptr);
-   if (!__e || errno == EAGAIN)
- return;
-   if (errno != EINTR)
- __throw_system_error(errno);
-  }
+// If the futex *__addr is equal to __val, wait on the futex until woken.
+inline void
+__platform_wait(const int* __addr, int __val) noexcept
+{
+  auto __e = syscall (SYS_futex, __addr,
+ static_cast(__futex_wait_flags::__wait_private),
+ __val, nullptr);
+  if (!__e || errno == EAGAIN)
+   return;
+  if (errno != EINTR)
+   __throw_system_error(errno);
+}
 
-template
-  void
-  __platform_notify(const _Tp* __addr, bool __all) noexcept
-  {
-   syscall (SYS_futex, static_cast(__addr),
-static_cast(__futex_wait_flags::__wake_private),
-__all ? INT_MAX : 1);
-  }
+// Wake threads waiting on the futex *__addr.
+inline void
+__platform_notify(const int* __addr, bool __all) noexcept
+{
+  syscall (SYS_futex, __addr,
+  static_cast(__futex_wait_flags::__wake_private),
+  __all ? INT_MAX : 1);
+}
 #endif
 
 inline void
-- 
2.47.1



[PATCH 09/10] libstdc++: Use safe integer comparisons in std::latch [PR98749]

2025-01-10 Thread Jonathan Wakely
Also add missing precondition check to constructor and fix existing
check in count_down which was duplicated by mistake.

libstdc++-v3/ChangeLog:

PR libstdc++/98749
* include/std/latch (latch::max()): Use std::cmp_less to handle
the case where __platform_wait_t is wider than ptrdiff_t or is
unsigned.
(latch::latch(ptrdiff_t)): Add assertion.
(latch::count_down): Fix copy & pasted duplicate assertion. Use
std::cmp_equal to compare __platform_wait_t and ptrdiff_t
values.
---
 libstdc++-v3/include/std/latch | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index c81a6631d53f..8bdf68f3390a 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -41,6 +41,7 @@
 #ifdef __cpp_lib_latch // C++ >= 20 && atomic_wait
 #include 
 #include 
+#include  // cmp_equal, cmp_less_equal, etc.
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -51,24 +52,34 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   public:
 static constexpr ptrdiff_t
 max() noexcept
-{ return __gnu_cxx::__int_traits<__detail::__platform_wait_t>::__max; }
+{
+  using __gnu_cxx::__int_traits;
+  constexpr auto __max = __int_traits<__detail::__platform_wait_t>::__max;
+  if constexpr (std::cmp_less(__max, __PTRDIFF_MAX__))
+   return __max;
+  return __PTRDIFF_MAX__;
+}
 
-constexpr explicit latch(ptrdiff_t __expected) noexcept
-  : _M_a(__expected) { }
+constexpr explicit
+latch(ptrdiff_t __expected) noexcept
+: _M_a(__expected)
+{ __glibcxx_assert(__expected >= 0 && __expected <= max()); }
 
 ~latch() = default;
+
 latch(const latch&) = delete;
 latch& operator=(const latch&) = delete;
 
 _GLIBCXX_ALWAYS_INLINE void
 count_down(ptrdiff_t __update = 1)
 {
-  __glibcxx_assert(__update >= 0);
-  auto const __old = __atomic_impl::fetch_sub(&_M_a,
-   __update, memory_order::release);
-  __glibcxx_assert(__update >= 0);
-  if (__old == static_cast<__detail::__platform_wait_t>(__update))
+  __glibcxx_assert(__update >= 0 && __update <= max());
+  auto const __old = __atomic_impl::fetch_sub(&_M_a, __update,
+ memory_order::release);
+  if (std::cmp_equal(__old, __update))
__atomic_impl::notify_all(&_M_a);
+  else
+   __glibcxx_assert(std::cmp_less(__update, __old));
 }
 
 _GLIBCXX_ALWAYS_INLINE bool
@@ -97,7 +108,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   private:
 // This alignas is not redundant, it increases the alignment for
 // long long on x86.
-alignas(__alignof__(__detail::__platform_wait_t)) 
__detail::__platform_wait_t _M_a;
+alignas(__alignof__(__detail::__platform_wait_t))
+__detail::__platform_wait_t _M_a;
   };
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
-- 
2.47.1



[PATCH] Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

2025-01-10 Thread Harald Anlauf
Dear all,

the attached patch is supposed to be a complete implementation of
the F2018 intrinsic OUT_OF_RANGE.  This is mostly straightforward,
with runtime code fully expanded inline.  It is also extended to
support the new UNSIGNED type of gfortran as of current 15-mainline.

The testcases are cross-checked with NAG and Intel, as long as these
"cooperated".  Meaning I could get those reject valid code (Intel)
or crash at runtime (both).

There is one question to the reviewer(s), or those knowing better
than me how to handle IEEE infinity and NaN: with -Ofast, I needed
to add "-fno-finite-math-only" to the new testcase
gfortran.dg/ieee/out_of_range.f90, as the needed finiteness test
was otherwise optimized to always true and leading to a failure.
Is there a particular trick to disable a certain optimization
at the tree level to such checks?

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 39f9632844370eaf7377d9bfa182e824b898 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 10 Jan 2025 22:16:09 +0100
Subject: [PATCH] Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

Implementation of the Fortran 2018 standard intrinsic OUT_OF_RANGE, with
the GNU Fortran extension to unsigned integers.

Runtime code is fully inline expanded.

	PR fortran/115788

gcc/fortran/ChangeLog:

	* check.cc (gfc_check_out_of_range): Check arguments to intrinsic.
	* expr.cc (free_expr0): Fix a memleak with unsigned literals.
	* gfortran.h (enum gfc_isym_id): Define GFC_ISYM_OUT_OF_RANGE.
	* intrinsic.cc (add_functions): Add Fortran prototype.  Break some
	nearby lines with excessive length.
	* intrinsic.h (gfc_check_out_of_range): Add prototypes.
	* intrinsic.texi: Fortran documentation of OUT_OF_RANGE.
	* simplify.cc (gfc_simplify_out_of_range): Compile-time simplification
	of OUT_OF_RANGE.
	* trans-intrinsic.cc (gfc_conv_intrinsic_out_of_range): Generate
	inline expansion of runtime code for OUT_OF_RANGE.
	(gfc_conv_intrinsic_function): Use it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/ieee/out_of_range.f90: New test.
	* gfortran.dg/out_of_range_1.f90: New test.
	* gfortran.dg/out_of_range_2.f90: New test.
	* gfortran.dg/out_of_range_3.f90: New test.
---
 gcc/fortran/check.cc  |  42 
 gcc/fortran/expr.cc   |   1 +
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/intrinsic.cc  |  28 ++-
 gcc/fortran/intrinsic.h   |   2 +
 gcc/fortran/intrinsic.texi|  64 ++
 gcc/fortran/simplify.cc   | 208 ++
 gcc/fortran/trans-intrinsic.cc| 192 
 .../gfortran.dg/ieee/out_of_range.f90 |  65 ++
 gcc/testsuite/gfortran.dg/out_of_range_1.f90  |  91 
 gcc/testsuite/gfortran.dg/out_of_range_2.f90  | 115 ++
 gcc/testsuite/gfortran.dg/out_of_range_3.f90  |  25 +++
 12 files changed, 827 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/out_of_range.f90
 create mode 100644 gcc/testsuite/gfortran.dg/out_of_range_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/out_of_range_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/out_of_range_3.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index e29ad398611..35458643835 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -4864,6 +4864,48 @@ gfc_check_null (gfc_expr *mold)
 }


+bool
+gfc_check_out_of_range (gfc_expr *x, gfc_expr *mold, gfc_expr *round)
+{
+  if (!int_or_real_or_unsigned_check (x, 0))
+return false;
+
+  if (mold == NULL)
+return false;
+
+  if (!int_or_real_or_unsigned_check (mold, 1))
+return false;
+
+  if (!scalar_check (mold, 1))
+return false;
+
+  if (round)
+{
+  if (!type_check (round, 2, BT_LOGICAL))
+	return false;
+
+  if (!scalar_check (round, 2))
+	return false;
+
+  if (x->ts.type != BT_REAL
+	  || (mold->ts.type != BT_INTEGER && mold->ts.type != BT_UNSIGNED))
+	{
+	  gfc_error ("%qs argument of %qs intrinsic at %L shall appear "
+		 "only if %qs is of type REAL and %qs is of type "
+		 "INTEGER or UNSIGNED",
+		 gfc_current_intrinsic_arg[2]->name,
+		 gfc_current_intrinsic, &round->where,
+		 gfc_current_intrinsic_arg[0]->name,
+		 gfc_current_intrinsic_arg[1]->name);
+
+	  return false;
+	}
+}
+
+  return true;
+}
+
+
 bool
 gfc_check_pack (gfc_expr *array, gfc_expr *mask, gfc_expr *vector)
 {
diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 0e40b2493a5..7f3f6c52fb5 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -466,6 +466,7 @@ free_expr0 (gfc_expr *e)
   switch (e->ts.type)
 	{
 	case BT_INTEGER:
+	case BT_UNSIGNED:
 	  mpz_clear (e->value.integer);
 	  break;

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index aa495b5487e..6eaf84cea2a 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -626,6 +626,7 @@ enum gfc_isym_id
   GFC

Re: [PATCH V4 0/2] RISC-V: Add intrinsics support and testcases for SiFive Xsfvcp extension.

2025-01-10 Thread Jeff Law




On 1/10/25 12:20 AM, Kito Cheng wrote:

Could you rebase and send the patch set again? I can't apply the patch set:

[kitoc@hsinchu18 gcc]$ git am
/tmp/git-pw8sm7zbop/RISC-V-Add-intrinsics-support-and-testcases-for-SiFive-Xsfvcp-extension..patch
Applying: RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.
error: patch failed: gcc/config/riscv/riscv-vector-builtins-types.def:369
error: gcc/config/riscv/riscv-vector-builtins-types.def: patch does not apply
error: patch failed: gcc/config/riscv/riscv-vector-builtins.cc:3600
error: gcc/config/riscv/riscv-vector-builtins.cc: patch does not apply
error: patch failed: gcc/config/riscv/riscv-vector-builtins.def:729
error: gcc/config/riscv/riscv-vector-builtins.def: patch does not apply
error: patch failed: gcc/config/riscv/riscv-vector-builtins.h:297
error: gcc/config/riscv/riscv-vector-builtins.h: patch does not apply
error: patch failed: gcc/config/riscv/vector-iterators.md:4814
error: gcc/config/riscv/vector-iterators.md: patch does not apply
error: patch failed: gcc/config/riscv/vector.md:56
error: gcc/config/riscv/vector.md: patch does not apply
Patch failed at 0001 RISC-V: Add intrinsics support for SiFive Xsfvcp
extensions.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
[kitoc@hsinchu18 gcc]$
Also note we are well into stage3, nearing stage4 and I think this 
patchset came in a month after the stage1 development window closed.


I'd tend to lean towards deferring until gcc-16 development opens in a 
few months.


Jeff


Re: [PATCH] RISC-V: Fix riscv_modes_tieable_p

2025-01-10 Thread Jeff Law




On 1/10/25 4:59 PM, Palmer Dabbelt wrote:

On Fri, 10 Jan 2025 12:21:15 PST (-0800), jeffreya...@gmail.com wrote:



On 1/10/25 12:11 PM, Robin Dapp wrote:

Integer values and floating-point values need to be converted
by fmv series instructions. So if mode1 is MODE_INT and mode2
is MODE_FLOAT, we should return false in riscv_modes_tieable_p,
and vice versa.


I think that's on purpose because we can read and write float values
from/to integer registers.  Maybe it's a cost problem that we spill
at some point rather than access directly?

But even if you spill, as long as loads/stores don't modify the value
then I think we're OK from a correctness standpoint.




If I compile your test case I do see converting moves in the final
assembly - is there something you're concerned about in particular?


Which appears to be the glibc code (or very similar to it), and I don't 
think we've had users reporting incorrect results there.
There's certainly cases in glibc that very much want to just move a blob 
of data from an FP register into a GPR without any kind of interpretation.





Which was my general question as well.  Under precisely what
circumstances is this causing a problem?  The secondary question would
be how does this change interact with the finx and related extensions?


FWIW I'm also a bit lost here: I'd expect riscv_hard_regno_mode_ok() to 
be sufficient to handle these X/F register mixing cases, and thus us not 
to need any more special handling in riscv_modes_tieable_p().


(I think we're safe for finx with the current code, as we can access the 
registers safely there.)


So maybe there's something else also needed to trigger this?
I don't think you're lost, Robin and I have similar concerns as yours. 
At this point I don't think the patch is right/correct, but I'm also 
open to the possibility there's something more complex going on that 
hasn't been fully explained.  Hence my request for an explanation of the 
precise circumstances when Zhijin thinks this is necessary.



Jeff



Re: [pushed][PATCH v2] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-10 Thread Lulu Cheng

Pushed to r15-6817.

在 2025/1/10 上午10:27, mengqinggang 写道:

Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load.c: Not generate ">>".
---
Changes in v2:
- Change imm-load test to scan-assembler-not >>.

  gcc/config/loongarch/lasx.md  |  2 +-
  gcc/config/loongarch/loongarch-protos.h   |  2 +-
  gcc/config/loongarch/loongarch.cc | 14 ++--
  gcc/config/loongarch/loongarch.md | 34 ---
  gcc/config/loongarch/lsx.md   |  2 +-
  gcc/testsuite/gcc.target/loongarch/imm-load.c |  1 +
  6 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index edaf64eeb95..a37c85a25a4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -723,7 +723,7 @@ (define_insn "mov_lasx"
[(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
(match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
"ISA_HAS_LASX"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
[(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert")
 (set_attr "mode" "")
 (set_attr "length" "8,4,4,4,4")])
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index fb544ad75ca..6601f767dab 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -86,7 +86,7 @@ extern void loongarch_split_move (rtx, rtx);
  extern bool loongarch_addu16i_imm12_operand_p (HOST_WIDE_INT, machine_mode);
  extern void loongarch_split_plus_constant (rtx *, machine_mode);
  extern void loongarch_split_vector_move (rtx, rtx);
-extern const char *loongarch_output_move (rtx, rtx);
+extern const char *loongarch_output_move (rtx *);
  #ifdef RTX_CODE
  extern void loongarch_expand_scc (rtx *);
  extern void loongarch_expand_vec_cmp (rtx *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 89237c377e7..f26c1346acc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4721,8 +4721,10 @@ loongarch_split_vector_move (rtx dest, rtx src)
 that SRC is operand 1 and DEST is operand 0.  */
  
  const char *

-loongarch_output_move (rtx dest, rtx src)
+loongarch_output_move (rtx *operands)
  {
+  rtx src = operands[1];
+  rtx dest = operands[0];
enum rtx_code dest_code = GET_CODE (dest);
enum rtx_code src_code = GET_CODE (src);
machine_mode mode = GET_MODE (dest);
@@ -4875,13 +4877,19 @@ loongarch_output_move (rtx dest, rtx src)
if (src_code == CONST_INT)
{
  if (LU12I_INT (src))
-   return "lu12i.w\t%0,%1>>12\t\t\t# %X1";
+   {
+ operands[1] = GEN_INT (INTVAL (operands[1]) >> 12);
+ return "lu12i.w\t%0,%1\t\t\t# %X1";
+   }
  else if (IMM12_INT (src))
return "addi.w\t%0,$r0,%1\t\t\t# %X1";
  else if (IMM12_INT_UNSIGNED (src))
return "ori\t%0,$r0,%1\t\t\t# %X1";
  else if (LU52I_INT (src))
-   return "lu52i.d\t%0,$r0,%X1>>52\t\t\t# %1";
+   {
+ operands[1] = GEN_INT (INTVAL (operands[1]) >> 52);
+ return "lu52i.d\t%0,$r0,%X1\t\t\t# %1";
+   }
  else
gcc_unreachable ();
}
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 3eff4077160..59f45770311 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2209,7 +2209,7 @@ (define_insn_and_split "*movdi_32bit"
"!TARGET_64BIT
 && (register_operand (operands[0], DImode)
 || reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  { return loongarch_output_move (operands); }
"CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
(operands[0]))"
[(const_int 0)]
@@ -2228,7 +2228,9 @@ (define_insn_and_split "*movdi_64bit"
"TARGET_64BIT
 && (register_operand (operands[0], DImode)
 || reg_or_0_operand (operands[1], DImode))"
-  { return loongarch_output_move (operands[0], operands[1]); }
+  {
+return loongarch_output_move (operands);
+  }
"CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO
(operands[0]))"

Re: [PATCH] PR tree-optimization/88575 - Use relations when simplifying MIN and MAX.

2025-01-10 Thread Jeff Law




On 1/10/25 2:43 PM, Andrew MacLeod wrote:

This should have been done a while ago.
Funny I said kind of the same thing when I did the DOM variant for 
integral types a little while back.




The call to simplify MIN and MAX was guarded by a check for INTEGRAL, so 
I removed that as the code was already generalized to work with any type.


And no attempt was being made to pass in a relation... so I query for a 
relation between op0 and op1, and pass it to fold_range. And then all 
the right things happen.


I see Jeff fixed PR 110199 in DOM.   I copied the tests from that and 
created the same tests to test that

   a) EVRP is removing all the MIN_EXPR and MAX_EXPRs
   b) Added a float version of the tests with -ffast-math  to show its 
also working with floats.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK for trunk?

OK.





Andrew

PS.    The same patch will not work on gcc-14, but the code could be 
ported if we wanted to.  Presumably DOM is getting the integral 
versions, so it would only be floats we would be new to handling.
I don't think it's worth backporting to gcc-13 or gcc-14.  It's a pretty 
minor missed optimization in my mind.



jeff


[PATCH 03/10] libstdc++: Whitespace fixes in atomic wait/notify code

2025-01-10 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h: Whitespace fixes.
* include/bits/atomic_wait.h: Likewise.
---
 libstdc++-v3/include/bits/atomic_timed_wait.h | 198 +-
 libstdc++-v3/include/bits/atomic_wait.h   |   8 +-
 2 files changed, 100 insertions(+), 106 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 4504b1b84bb8..73acea939504 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -101,8 +101,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (errno != EINTR && errno != EAGAIN)
__throw_system_error(errno);
}
-   return true;
-  }
+  return true;
+}
 #else
 // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement 
__platform_wait_until()
 // if there is a more efficient primitive supported by the platform
@@ -115,23 +115,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __cond_wait_until(__condvar& __cv, mutex& __mx,
  const __wait_clock_t::time_point& __atime)
 {
-   auto __s = chrono::time_point_cast(__atime);
-   auto __ns = chrono::duration_cast(__atime - __s);
+  auto __s = chrono::time_point_cast(__atime);
+  auto __ns = chrono::duration_cast(__atime - __s);
 
-   __gthread_time_t __ts =
- {
-   static_cast(__s.time_since_epoch().count()),
-   static_cast(__ns.count())
- };
+  __gthread_time_t __ts =
+   {
+ static_cast(__s.time_since_epoch().count()),
+ static_cast(__ns.count())
+   };
 
 #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-   if constexpr (is_same_v)
- __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
-   else
+  if constexpr (is_same_v)
+   __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
+  else
 #endif
- __cv.wait_until(__mx, __ts);
-   return __wait_clock_t::now() < __atime;
-  }
+   __cv.wait_until(__mx, __ts);
+  return __wait_clock_t::now() < __atime;
+}
 #endif // _GLIBCXX_HAS_GTHREADS
 
 inline __wait_result_type
@@ -146,34 +146,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __platform_wait_t __val;
   auto __now = __wait_clock_t::now();
   for (; __now < __deadline; __now = __wait_clock_t::now())
-  {
-   auto __elapsed = __now - __t0;
+   {
+ auto __elapsed = __now - __t0;
 #ifndef _GLIBCXX_NO_SLEEP
-   if (__elapsed > 128ms)
-   {
- this_thread::sleep_for(64ms);
-   }
-   else if (__elapsed > 64us)
-   {
- this_thread::sleep_for(__elapsed / 2);
-   }
-   else
+ if (__elapsed > 128ms)
+   this_thread::sleep_for(64ms);
+ else if (__elapsed > 64us)
+   this_thread::sleep_for(__elapsed / 2);
+ else
 #endif
-   if (__elapsed > 4us)
-   {
- __thread_yield();
-   }
-   else
-   {
- auto __res = __detail::__spin_impl(__addr, __a);
- if (__res.first)
-   return __res;
-   }
+ if (__elapsed > 4us)
+   __thread_yield();
+ else
+   {
+ auto __res = __detail::__spin_impl(__addr, __a);
+ if (__res.first)
+   return __res;
+   }
 
-   __atomic_load(__addr, &__val, __args._M_order);
-   if (__val != __args._M_old)
-   return make_pair(true, __val);
-  }
+ __atomic_load(__addr, &__val, __args._M_order);
+ if (__val != __args._M_old)
+ return make_pair(true, __val);
+   }
   return make_pair(false, __val);
 }
 
@@ -212,14 +206,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 
   if (!(__args & __wait_flags::__track_contention))
-  {
-   // caller does not externally track contention
+   {
+ // caller does not externally track contention
 #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
-   __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
-: __pool;
+ __pool = (__pool == nullptr) ? 
&__waiter_pool_impl::_S_impl_for(__addr)
+  : __pool;
 #endif
-   __pool->_M_enter_wait();
-  }
+ __pool->_M_enter_wait();
+   }
 
   __wait_result_type __res;
 #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
@@ -232,19 +226,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __atomic_load(__wait_addr, &__val, __args._M_order);
   if (__val == __args._M_old)
{
-  lock_guard __l{ __pool->_M_mtx };
-  __atomic_load(__wait_addr, &__val, __args._M_order);
-  if (__val == __args._M_old &&
-  __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime))
-__res = make_pair(true, __val);
+ lock_guard __l{ __pool->_M_mtx };
+ __atomic_load(__wait_addr, &__val, __args._M_order);
+ if (__val == __args._M_old &&
+ __cond_wait_until(__poo

[PATCH 10/10] libstdc++: Optimise std::latch::arrive_and_wait

2025-01-10 Thread Jonathan Wakely
We don't need to wait if we know the counter has reached zero.

libstdc++-v3/ChangeLog:

* include/std/latch (latch::arrive_and_wait): Optimise.
---

This one's commented out for now, but sending for review anyway.

 libstdc++-v3/include/std/latch | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index 8bdf68f3390a..af24dd081e04 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -101,8 +101,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_ALWAYS_INLINE void
 arrive_and_wait(ptrdiff_t __update = 1) noexcept
 {
+#if 0
+  __glibcxx_assert(__update >= 0 && __update <= max());
+  auto const __old = __atomic_impl::fetch_sub(&_M_a, __update,
+ memory_order::release);
+  if (std::cmp_equal(__old, __update))
+   __atomic_impl::notify_all(&_M_a);
+  else if (std::cmp_greater(__old, __update))
+   wait();
+  else
+   __glibcxx_assert(std::cmp_less_equal(__update, __old));
+#else
   count_down(__update);
   wait();
+#endif
 }
 
   private:
-- 
2.47.1



Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-10 Thread Qing Zhao


> On Jan 10, 2025, at 15:34, Jeff Law  wrote:
> 
> 
> 
> On 1/9/25 1:39 PM, Qing Zhao wrote:
>>> On Jan 9, 2025, at 14:10, Jeff Law  wrote:
>>> 
>>> 
>>> 
>>> On 1/9/25 10:48 AM, Qing Zhao wrote:
>>> 
> 
> I think Jeff's patch is not reasonable since it boils down to not diagnose
> -Warray-bounds but instead remove those stmts.
 If these stmts are dead-code that are generated by compiler optimization 
 (NOT from source code),
 removing them before diagnosis is correct. (To avoid false positive 
 warnings).
>>> But I don't think we generally know if the problematic statements came from 
>>> user code or were generated by the compiler.
>> To help the compiler catches real problems in the source code and avoid 
>> false positive warnings introduced by the compiler transformation, we might 
>> need to add flags in the IR to distinguish this?
> This sounds like a path lined with peril -- I just don't see that we're 
> likely to keep this data consistent through the various transformations.
You are right, it’s hard to keep such flag correctly through the compiler 
transformations. :)

Qomg
> 
> Jeff




Re: [PATCH] Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

2025-01-10 Thread Thomas Koenig

Hello Harald,


Regtested on x86_64-pc-linux-gnu.  OK for mainline?


I just started to run a bootstrap on cfarm120 (because it is
the only machine I can lay my hands on where I can run
"make -j128" without disturbing anybody :-) and I got

../../trunk/gcc/fortran/trans-intrinsic.cc: In function ‘void 
gfc_conv_intrinsic_out_of_range(gfc_se*, gfc_expr*)’:
../../trunk/gcc/fortran/trans-intrinsic.cc:7178:22: error: ‘tmp’ may be 
used uninitialized [-Werror=maybe-uninitialized]

 7178 |   se->expr = convert (gfc_typenode_for_spec (&expr->ts), tmp);
  |  ^~~~
../../trunk/gcc/fortran/trans-intrinsic.cc:7001:8: note: ‘tmp’ was 
declared here

 7001 |   tree tmp, tmp1, tmp2;

(Simply initializing tmp to NULL_TREE could probably be enough).
Could you check?

Best regards

Thomas




Subject: [PATCH] RISC-V: testsuite: Skip test with -flto.

2025-01-10 Thread Robin Dapp
Hi,

the zbb-rol-ror and stack_save_restore tests use the -fno-lto option and
scan the final assembly.  For an invocation like -flto ... -fno-lto the
output file we scan is still something like
  zbb-rol-ror-09.ltrans0.ltrans.s.

Therefore skip the tests when "-flto" is present.  This gets rid
of a few UNRESOLVED tests.

Regtested on rv64gcv_zvl512b.  Going to push if the CI agrees.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_1.c: Skip for -flto.
* gcc.target/riscv/stack_save_restore_2.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-04.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-05.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-06.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-07.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-08.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-09.c: Ditto.
---
 gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-06.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-07.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-08.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-rol-ror-09.c   | 4 ++--
 8 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
index d8b0668a820..e0a7c68760a 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64imafc -mabi=lp64f -msave-restore -O2 
-fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops 
-fno-lto" } */
+/* { dg-options "-march=rv64imafc -mabi=lp64f -msave-restore -O2 
-fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 char my_getchar();
diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
index 5f0389243b1..aadeaa58230 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32imafc -mabi=ilp32f -msave-restore -O2 
-fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops 
-fno-lto" } */
+/* { dg-options "-march=rv32imafc -mabi=ilp32f -msave-restore -O2 
-fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 char my_getchar();
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c 
b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c
index 28350e5e937..b413b10ea93 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-04.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbb -mabi=lp64d -fno-lto -O2" } */
-/* { dg-skip-if "" { *-*-* } { "-g" } } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64d -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 /* { dg-final { scan-assembler-not {\mand} } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c 
b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c
index cc44653acfb..179477ed93b 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-05.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32gc_zbb -mabi=ilp32 -fno-lto -O2" } */
-/* { dg-skip-if "" { *-*-* } { "-g" } } */
+/* { dg-options "-march=rv32gc_zbb -mabi=ilp32 -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 /* { dg-final { scan-assembler-not {\mand} } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-06.c 
b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-06.c
index 7a98a5712bf..b5f0b8b9027 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-06.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-06.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbb -mabi=lp64d -fno-lto -O2" } */
-/* { dg-skip-if "" { *-*-* } { "-g" } } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64d -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 /* { dg-final { scan-assembler-not {\mand} } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-07.c 
b/gcc/testsuite/gcc.target/riscv/zbb-rol-ror-07.c
index a08a9eb772e..037230625fb 100644
--- a/gcc/testsuite/gcc.targe

[PATCH] RISC-V: Let strided loads/stores demand proper SEW/LMUL [PR118154].

2025-01-10 Thread Robin Dapp
Hi,

in PR118154 we emit strided stores but the first of those does not
always have the proper VTYPE.  That's because we assume it only
demands an SEW/LMUL ratio rather than the proper SEW and LMUL and
subsequently optimize away the accompanying vsetvl.

This patch corrects the ratio attribute for strided loads and stores.

Regtested on rv64gcv_zvl512b.

Regards
 Robin

PR target/118154

gcc/ChangeLog:

* config/riscv/vector.md: Do not return a ratio for strided
loads and stores.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118154-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.
---
 gcc/config/riscv/vector.md|  5 ++-
 .../gcc.target/riscv/rvv/autovec/pr118154-1.c | 23 ++
 .../gcc.target/riscv/rvv/autovec/pr118154-2.c | 31 +++
 3 files changed, 58 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e78d1090696..05a0aed8add 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -471,7 +471,7 @@ (define_attr "vlmul" ""
 
 ;; It is valid for instruction that require sew/lmul ratio.
 (define_attr "ratio" ""
-  (cond [(eq_attr "type" "vimov,vfmov,vldux,vldox,vstux,vstox,\
+  (cond [(eq_attr "type" "vimov,vfmov,vldux,vldox,vstux,vstox,vsts,\
  vialu,vshift,vicmp,vimul,vidiv,vsalu,\
  vext,viwalu,viwmul,vicalu,vnshift,\
  vimuladd,vimerge,vaalu,vsmul,vsshift,\
@@ -494,6 +494,9 @@ (define_attr "ratio" ""
   vlsegdff,vssegtux,vlsegdox,vlsegdux")
  (match_test "TARGET_XTHEADVECTOR"))
   (const_int INVALID_ATTRIBUTE)
+   (and (eq_attr "type" "vlds")
+  (match_test "VECTOR_MODE_P (GET_MODE (operands[3]))"))
+   (const_int INVALID_ATTRIBUTE)
 (eq_attr "mode" "RVVM8QI,RVVM1BI") (const_int 1)
 (eq_attr "mode" "RVVM4QI,RVVMF2BI") (const_int 2)
 (eq_attr "mode" "RVVM2QI,RVVMF4BI") (const_int 4)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c
new file mode 100644
index 000..55386568a5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99 -Wno-pedantic" } */
+
+long a;
+char b;
+char c[22][484];
+int main() {
+  for (int e = 4; e < 33; e++) {
+for (int f = 0; f < 3; f++)
+  for (int g = 0; g < 18; g++) {
+c[f][g * 22] = 1;
+a = ({ a > 1 ? a : 1; });
+  }
+for (int i = 0; i < 33; i++)
+  for (int h = 0; h < 6; h++)
+for (int j = 0; j < 17; j++)
+  b = ({ b > 17 ? b : 17; });
+  }
+  if (c[1][44] != 1)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c
new file mode 100644
index 000..4172f292994
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99 -Wno-pedantic" } */
+
+long a;
+signed char b;
+long long d;
+signed char c[22][22][484];
+void m(long long *l, int n) { *l ^= n + (*l >> 2); }
+int main() {
+  signed char l = 35;
+  for (signed char f = 4; f; f++) {
+for (signed g = 0; g < 022; g += 4)
+  for (signed char h = 0; h < 022; h++) {
+c[9][g][h * 22 + h] = l;
+a = ({ a > 4095 ? a : 4095; });
+  }
+for (int i = 0; i < 22; i += 3)
+  for (signed char j = 1; j; j++)
+for (signed char k = 0; k < 022; k++)
+  b = ({ b > 19 ? b : 19; });
+  }
+  for (long f = 0; f < 22; ++f)
+for (long g = 0; g < 22; ++g)
+  for (long h = 0; h < 22; ++h)
+for (long i = 0; i < 22; ++i)
+  m(&d, c[f][g][h * 2 + i]);
+  if (d != 38)
+__builtin_abort ();
+}
-- 
2.47.1




[PATCH] match: Keep conditional in simplification to constant [PR118140].

2025-01-10 Thread Robin Dapp
Hi,

in PR118140 we simplify

  _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11);

to "1":

  Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1

when _46 == 1.  This happens by removing the conditional and applying
a | 1 = 1.  Normally we re-introduce the conditional and its else value
if needed but that does not happen here as we're not dealing with a
vector type.  For correctness's sake, we must not remove the conditional
even for non-vector types.

This patch re-introduces a COND_EXPR in such cases.  For PR118140 this
result in a non-vectorized loop.

Bootstrapped and regtested on x86 and aarch64.  Regtested on rv64gcv_zvl512b.

Regards
 Robin

PR middle-end/118140

gcc/ChangeLog:

* gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
COND_EXPR when we simplified to a scalar gimple value but still
have an else value.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr118140.c: New test.
* gcc.target/riscv/rvv/pr118140.c: New test.
---
 gcc/gimple-match-exports.cc   | 26 ++---
 gcc/testsuite/gcc.dg/vect/pr118140.c  | 27 +
 .../gcc.target/riscv/rvv/autovec/pr118140.c   | 29 +++
 3 files changed, 72 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr118140.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index e06a8aaa171..ccba046a1d4 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -337,23 +337,29 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
 }
 
   /* If the "then" value is a gimple value and the "else" value matters,
- create a VEC_COND_EXPR between them, then see if it can be further
+ create a (VEC_)COND_EXPR between them, then see if it can be further
  simplified.  */
   gimple_match_op new_op;
   if (res_op->cond.else_value
-  && VECTOR_TYPE_P (res_op->type)
   && gimple_simplified_result_is_gimple_val (res_op))
 {
-  tree len = res_op->cond.len;
-  if (!len)
-   new_op.set_op (VEC_COND_EXPR, res_op->type,
-  res_op->cond.cond, res_op->ops[0],
-  res_op->cond.else_value);
+  if (VECTOR_TYPE_P (res_op->type))
+   {
+ tree len = res_op->cond.len;
+ if (!len)
+   new_op.set_op (VEC_COND_EXPR, res_op->type,
+  res_op->cond.cond, res_op->ops[0],
+  res_op->cond.else_value);
+ else
+   new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+  res_op->cond.cond, res_op->ops[0],
+  res_op->cond.else_value,
+  res_op->cond.len, res_op->cond.bias);
+   }
   else
-   new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+   new_op.set_op (COND_EXPR, res_op->type,
   res_op->cond.cond, res_op->ops[0],
-  res_op->cond.else_value,
-  res_op->cond.len, res_op->cond.bias);
+  res_op->cond.else_value);
   *res_op = new_op;
   return gimple_resimplify3 (seq, res_op, valueize);
 }
diff --git a/gcc/testsuite/gcc.dg/vect/pr118140.c 
b/gcc/testsuite/gcc.dg/vect/pr118140.c
new file mode 100644
index 000..2dab98bfc91
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr118140.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target { aarch64*-*-* || riscv*-*-* } } } */
+/* { dg-additional-options "-std=gnu99" } */
+
+long long a;
+_Bool d;
+char e;
+_Bool f[17];
+_Bool f_3;
+
+int main() {
+  for (char g = 3; g < 16; g++) {
+  d |= ({
+int h = f[g - 1] ? 2 : 0;
+_Bool t;
+if (f[g - 1])
+  t = f_3;
+else
+  t = 0;
+int i = t;
+h > i;
+  });
+e += f[g + 1];
+  }
+
+  if (d != 0)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c
new file mode 100644
index 000..31134de7b3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99 -Wno-pedantic" } */
+
+long long a;
+_Bool d;
+char e;
+_Bool f[17];
+_Bool f_3;
+
+int main() {
+  for (char g = 3; g < 16; g++) {
+  d |= ({
+int h = f[g - 1] ? 2 : 0;
+_Bool t;
+if (f[g - 1])
+  t = f_3;
+else
+  t = 0;
+int i = t;
+h > i;
+  });
+e += f[g + 1];
+  }
+
+  if (d != 0)
+__builtin_abort ();
+}
-- 
2.47.1



Re: [PATCH] RISC-V: Let strided loads/stores demand proper SEW/LMUL [PR118154].

2025-01-10 Thread 钟居哲
Strided load store should demand RATIO instead of SEW and LMUL.
Is it VSETVL PASS bug ? I don't understand why configure it depand SEW + LMUL 



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2025-01-10 16:42
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH] RISC-V: Let strided loads/stores demand proper SEW/LMUL 
[PR118154].
Hi,
 
in PR118154 we emit strided stores but the first of those does not
always have the proper VTYPE.  That's because we assume it only
demands an SEW/LMUL ratio rather than the proper SEW and LMUL and
subsequently optimize away the accompanying vsetvl.
 
This patch corrects the ratio attribute for strided loads and stores.
 
Regtested on rv64gcv_zvl512b.
 
Regards
Robin
 
PR target/118154
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Do not return a ratio for strided
loads and stores.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr118154-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.
---
gcc/config/riscv/vector.md|  5 ++-
.../gcc.target/riscv/rvv/autovec/pr118154-1.c | 23 ++
.../gcc.target/riscv/rvv/autovec/pr118154-2.c | 31 +++
3 files changed, 58 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e78d1090696..05a0aed8add 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -471,7 +471,7 @@ (define_attr "vlmul" ""
;; It is valid for instruction that require sew/lmul ratio.
(define_attr "ratio" ""
-  (cond [(eq_attr "type" "vimov,vfmov,vldux,vldox,vstux,vstox,\
+  (cond [(eq_attr "type" "vimov,vfmov,vldux,vldox,vstux,vstox,vsts,\
  vialu,vshift,vicmp,vimul,vidiv,vsalu,\
  vext,viwalu,viwmul,vicalu,vnshift,\
  vimuladd,vimerge,vaalu,vsmul,vsshift,\
@@ -494,6 +494,9 @@ (define_attr "ratio" ""
   vlsegdff,vssegtux,vlsegdox,vlsegdux")
  (match_test "TARGET_XTHEADVECTOR"))
   (const_int INVALID_ATTRIBUTE)
+ (and (eq_attr "type" "vlds")
+(match_test "VECTOR_MODE_P (GET_MODE (operands[3]))"))
+ (const_int INVALID_ATTRIBUTE)
(eq_attr "mode" "RVVM8QI,RVVM1BI") (const_int 1)
(eq_attr "mode" "RVVM4QI,RVVMF2BI") (const_int 2)
(eq_attr "mode" "RVVM2QI,RVVMF4BI") (const_int 4)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c
new file mode 100644
index 000..55386568a5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-1.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99 -Wno-pedantic" } */
+
+long a;
+char b;
+char c[22][484];
+int main() {
+  for (int e = 4; e < 33; e++) {
+for (int f = 0; f < 3; f++)
+  for (int g = 0; g < 18; g++) {
+c[f][g * 22] = 1;
+a = ({ a > 1 ? a : 1; });
+  }
+for (int i = 0; i < 33; i++)
+  for (int h = 0; h < 6; h++)
+for (int j = 0; j < 17; j++)
+  b = ({ b > 17 ? b : 17; });
+  }
+  if (c[1][44] != 1)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c
new file mode 100644
index 000..4172f292994
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118154-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99 -Wno-pedantic" } */
+
+long a;
+signed char b;
+long long d;
+signed char c[22][22][484];
+void m(long long *l, int n) { *l ^= n + (*l >> 2); }
+int main() {
+  signed char l = 35;
+  for (signed char f = 4; f; f++) {
+for (signed g = 0; g < 022; g += 4)
+  for (signed char h = 0; h < 022; h++) {
+c[9][g][h * 22 + h] = l;
+a = ({ a > 4095 ? a : 4095; });
+  }
+for (int i = 0; i < 22; i += 3)
+  for (signed char j = 1; j; j++)
+for (signed char k = 0; k < 022; k++)
+  b = ({ b > 19 ? b : 19; });
+  }
+  for (long f = 0; f < 22; ++f)
+for (long g = 0; g < 22; ++g)
+  for (long h = 0; h < 22; ++h)
+for (long i = 0; i < 22; ++i)
+  m(&d, c[f][g][h * 2 + i]);
+  if (d != 38)
+__builtin_abort ();
+}
-- 
2.47.1
 
 
 


[PATCH] c++: Fix ICE with invalid defaulted operator <=> [PR118387]

2025-01-10 Thread Jakub Jelinek
Hi!

In the following testcase there are 2 issues, one is that B doesn't
have operator<=> and the other is that A's operator<=> has int return
type, i.e. not the standard comparison category.
Because of the int return type, retcat is cc_last; when we first
try to synthetize it, it is therefore with tentative false and complain
tf_none, we find that B doesn't have operator<=> and because retcat isn't
tc_last, don't try to search for other operators in genericize_spaceship.
And then mark the operator deleted.
When trying to explain the use of the deleted operator, tentative is still
false, but complain is tf_error_or_warning.
do_one_comp will first do:
  tree comp = build_new_op (loc, code, flags, lhs, rhs,
NULL_TREE, NULL_TREE, &overload,
tentative ? tf_none : complain);
and because complain isn't tf_none, it will actually diagnose the bug
already, but then (tentative || complain) is true and we call
genericize_spaceship, which has
  if (tag == cc_last && is_auto (type))
{
...
}

  gcc_checking_assert (tag < cc_last);
and because tag is cc_last and type isn't auto, we just ICE on that
assertion.

The following patch fixes it by calling genericize_spaceship only if
tentative or complain with auto return type in which case
genericize_spaceship can deal with that.

Other possibility would be
--- gcc/cp/method.cc.jj 2025-01-08 23:11:23.375456869 +0100
+++ gcc/cp/method.cc2025-01-09 19:06:05.529933600 +0100
@@ -1097,8 +1097,8 @@ genericize_spaceship (location_t loc, tr
   if (type == error_mark_node)
return error_mark_node;
 }
-
-  gcc_checking_assert (tag < cc_last);
+  else if (tag == cc_last)
+return error_mark_node;
 
   tree r;
   bool scalar = SCALAR_TYPE_P (TREE_TYPE (op0));

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement.  That is because the standard says that in that case
it should return static_cast(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
?

2025-01-10  Jakub Jelinek  

PR c++/118387
* method.cc (do_one_comp): Don't call genericize_spaceship if
!tentative and rettype is not auto.

* g++.dg/cpp2a/spaceship-synth17.C: New test.

--- gcc/cp/method.cc.jj 2025-01-08 23:11:23.375456869 +0100
+++ gcc/cp/method.cc2025-01-09 18:56:03.246302240 +0100
@@ -1399,7 +1399,8 @@ do_one_comp (location_t loc, const comp_
 
   if (comp == error_mark_node)
 {
-  if (overload == NULL_TREE && (tentative || complain))
+  if (overload == NULL_TREE
+ && (tentative || (complain && is_auto (rettype
{
  /* No viable <=>, try using op< and op==.  */
  tree lteq = genericize_spaceship (loc, rettype, lhs, rhs);
--- gcc/testsuite/g++.dg/cpp2a/spaceship-synth17.C.jj   2025-01-09 
19:00:39.416464901 +0100
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-synth17.C  2025-01-09 
19:03:22.803194662 +0100
@@ -0,0 +1,19 @@
+// PR c++/118387
+// { dg-do compile { target c++20 } }
+
+#include 
+
+struct B {};
+
+struct A
+{
+  B b; // { dg-error "no match for 'operator<=>' in 
'\[^\n\r]*' \\\(operand types are 'B' and 'B'\\\)" }
+  int operator<=> (const A &) const = default;
+};
+
+int
+main ()
+{
+  A a;
+  return a <=> a;  // { dg-error "use of deleted function 'constexpr int 
A::operator<=>\\\(const A&\\\) const'" }
+}

Jakub



Re: [PATCH v2] arm: [MVE intrinsics] Fix tuples field name (PR 118332)

2025-01-10 Thread Richard Sandiford
"Richard Earnshaw (lists)"  writes:
> On 09/01/2025 14:50, Christophe Lyon wrote:
>> The previous fix only worked for C, for C++ we need to add more
>> information to the underlying type so that
>> finish_class_member_access_expr accepts it.
>> 
>> We use the same logic as in aarch64's register_tuple_type for AdvSIMD
>> tuples.
>> 
>> This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
>> mode.
>> 
>> gcc/ChangeLog:
>> 
>>  PR target/118332
>>  * config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete.
>>  (register_type_decl): Delete.
>>  (register_builtin_tuple_types): Use
>>  lang_hooks.types.simulate_record_decl.
>
> Much nicer.
>
> OK, but please give Richard S 24 hours to comment.

Yeah, LGTM too.

Richard


Re: [PATCH]AArch64: correct Cortex-X4 MIDR

2025-01-10 Thread Kyrylo Tkachov


> On 10 Jan 2025, at 00:07, Tamar Christina  wrote:
> 
> Hi All,
> 
> The Parts Num field for the MIDR for Cortex-X4 is wrong.  It's currently the
> parts number for a Cortex-A720 (which does have the right number).
> 
> The correct number can be found in the Cortex-X4 Technical Reference Manual 
> [1]
> on page 382 in Issue Number 5.
> 
> [1] https://developer.arm.com/documentation/102484/latest/
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? and backport to GCC-14?
> 

Ok. I’ve checked that the TRM indeed says 0xd82.
Thanks,
Kyrill


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts
> num.
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index 
> caf61437d1805254b7453e74ea27d2ca8f55d32b..5ac81332b67c9612acf9dde144aee5b0db8d9f7a
>  100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -193,7 +193,7 @@ AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  
> (SVE2_BITPERM, MEMTAG, I8M
> 
> AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
> I8MM, BF16), neoversev2, 0x41, 0xd4e, -1)
> 
> -AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, 
> MEMTAG, PROFILE), neoversev3, 0x41, 0xd81, -1)
> +AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, 
> MEMTAG, PROFILE), neoversev3, 0x41, 0xd82, -1)
> AARCH64_CORE("cortex-x925", cortexx925, cortexa57, V9_2A,  (SVE2_BITPERM, 
> MEMTAG, PROFILE), cortexx925, 0x41, 0xd85, -1)
> 
> AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
> SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
> 
> 
> 
> 
> -- 
> 



[PATCH] [ifcombine] drop other misuses of uniform_integer_cst_p

2025-01-10 Thread Alexandre Oliva


As Jakub pointed out in PR118206, the use of uniform_integer_cst_p in
ifcombine makes no sense, we're not dealing with vectors.  Indeed,
I've been misunderstanding and misusing it since I cut&pasted it from
some preexisting match predicate in earlier version of the ifcombine
field-merge patch.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Drop misuses of
uniform_integer_cst_p.
(fold_truth_andor_for_ifcombine): Likewise.
---
 gcc/gimple-fold.cc |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 20b5024d861db..a3987c4590ae6 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7577,7 +7577,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
   /* Recognize and save a masking operation.  Combine it with an
  incoming mask.  */
   if (pand_mask && gimple_binop_def_p (BIT_AND_EXPR, exp, res_ops)
-  && uniform_integer_cst_p (res_ops[1]))
+  && TREE_CODE (res_ops[1]) == INTEGER_CST)
 {
   loc[1] = gimple_location (SSA_NAME_DEF_STMT (exp));
   exp = res_ops[0];
@@ -7632,7 +7632,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
*pbitsize,
 
   /* Take note of shifts.  */
   if (gimple_binop_def_p (RSHIFT_EXPR, exp, res_ops)
-  && uniform_integer_cst_p (res_ops[1]))
+  && TREE_CODE (res_ops[1]) == INTEGER_CST)
 {
   loc[2] = gimple_location (SSA_NAME_DEF_STMT (exp));
   exp = res_ops[0];
@@ -8092,7 +8092,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   else if ((lcode == LT_EXPR || lcode == GE_EXPR)
   && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
   && TYPE_UNSIGNED (TREE_TYPE (ll_arg))
-  && uniform_integer_cst_p (lr_arg)
+  && TREE_CODE (lr_arg) == INTEGER_CST
   && wi::popcount (wi::to_wide (lr_arg)) == 1)
 {
   ll_and_mask = ~(wi::to_wide (lr_arg) - 1);
@@ -8104,7 +8104,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   else if ((lcode == LE_EXPR || lcode == GT_EXPR)
   && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
   && TYPE_UNSIGNED (TREE_TYPE (ll_arg))
-  && uniform_integer_cst_p (lr_arg)
+  && TREE_CODE (lr_arg) == INTEGER_CST
   && wi::popcount (wi::to_wide (lr_arg) + 1) == 1)
 {
   ll_and_mask = ~wi::to_wide (lr_arg);
@@ -8123,7 +8123,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   else if ((rcode == LT_EXPR || rcode == GE_EXPR)
   && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
   && TYPE_UNSIGNED (TREE_TYPE (rl_arg))
-  && uniform_integer_cst_p (rr_arg)
+  && TREE_CODE (rr_arg) == INTEGER_CST
   && wi::popcount (wi::to_wide (rr_arg)) == 1)
 {
   rl_and_mask = ~(wi::to_wide (rr_arg) - 1);
@@ -8133,7 +8133,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   else if ((rcode == LE_EXPR || rcode == GT_EXPR)
   && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
   && TYPE_UNSIGNED (TREE_TYPE (rl_arg))
-  && uniform_integer_cst_p (rr_arg)
+  && TREE_CODE (rr_arg) == INTEGER_CST
   && wi::popcount (wi::to_wide (rr_arg) + 1) == 1)
 {
   rl_and_mask = ~wi::to_wide (rr_arg);
@@ -8392,7 +8392,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner));
   poly_uint64 ll_end_region = 0;
   if (TYPE_SIZE (TREE_TYPE (ll_inner))
-  && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner
+  && tree_fits_poly_uint64_p (TYPE_SIZE (TREE_TYPE (ll_inner
 ll_end_region = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner)));
   if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region,
 ll_align, BITS_PER_WORD, volatilep, &lnmode))
@@ -8585,7 +8585,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner));
   poly_uint64 lr_end_region = 0;
   if (TYPE_SIZE (TREE_TYPE (lr_inner))
- && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner
+ && tree_fits_poly_uint64_p (TYPE_SIZE (TREE_TYPE (lr_inner
lr_end_region = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (lr_inner)));
   if (!get_best_mode (end_bit - first_bit, first_bit, 0, lr_end_region,
  lr_align, BITS_PER_WORD, volatilep, &rnmode))


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] [ifcombine] drop other misuses of uniform_integer_cst_p

2025-01-10 Thread Richard Biener
On Fri, 10 Jan 2025, Alexandre Oliva wrote:

> 
> As Jakub pointed out in PR118206, the use of uniform_integer_cst_p in
> ifcombine makes no sense, we're not dealing with vectors.  Indeed,
> I've been misunderstanding and misusing it since I cut&pasted it from
> some preexisting match predicate in earlier version of the ifcombine
> field-merge patch.
> 
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Richard.

> 
> for  gcc/ChangeLog
> 
>   * gimple-fold.cc (decode_field_reference): Drop misuses of
>   uniform_integer_cst_p.
>   (fold_truth_andor_for_ifcombine): Likewise.
> ---
>  gcc/gimple-fold.cc |   16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 20b5024d861db..a3987c4590ae6 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7577,7 +7577,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>/* Recognize and save a masking operation.  Combine it with an
>   incoming mask.  */
>if (pand_mask && gimple_binop_def_p (BIT_AND_EXPR, exp, res_ops)
> -  && uniform_integer_cst_p (res_ops[1]))
> +  && TREE_CODE (res_ops[1]) == INTEGER_CST)
>  {
>loc[1] = gimple_location (SSA_NAME_DEF_STMT (exp));
>exp = res_ops[0];
> @@ -7632,7 +7632,7 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>  
>/* Take note of shifts.  */
>if (gimple_binop_def_p (RSHIFT_EXPR, exp, res_ops)
> -  && uniform_integer_cst_p (res_ops[1]))
> +  && TREE_CODE (res_ops[1]) == INTEGER_CST)
>  {
>loc[2] = gimple_location (SSA_NAME_DEF_STMT (exp));
>exp = res_ops[0];
> @@ -8092,7 +8092,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>else if ((lcode == LT_EXPR || lcode == GE_EXPR)
>  && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
>  && TYPE_UNSIGNED (TREE_TYPE (ll_arg))
> -&& uniform_integer_cst_p (lr_arg)
> +&& TREE_CODE (lr_arg) == INTEGER_CST
>  && wi::popcount (wi::to_wide (lr_arg)) == 1)
>  {
>ll_and_mask = ~(wi::to_wide (lr_arg) - 1);
> @@ -8104,7 +8104,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>else if ((lcode == LE_EXPR || lcode == GT_EXPR)
>  && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
>  && TYPE_UNSIGNED (TREE_TYPE (ll_arg))
> -&& uniform_integer_cst_p (lr_arg)
> +&& TREE_CODE (lr_arg) == INTEGER_CST
>  && wi::popcount (wi::to_wide (lr_arg) + 1) == 1)
>  {
>ll_and_mask = ~wi::to_wide (lr_arg);
> @@ -8123,7 +8123,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>else if ((rcode == LT_EXPR || rcode == GE_EXPR)
>  && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
>  && TYPE_UNSIGNED (TREE_TYPE (rl_arg))
> -&& uniform_integer_cst_p (rr_arg)
> +&& TREE_CODE (rr_arg) == INTEGER_CST
>  && wi::popcount (wi::to_wide (rr_arg)) == 1)
>  {
>rl_and_mask = ~(wi::to_wide (rr_arg) - 1);
> @@ -8133,7 +8133,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>else if ((rcode == LE_EXPR || rcode == GT_EXPR)
>  && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
>  && TYPE_UNSIGNED (TREE_TYPE (rl_arg))
> -&& uniform_integer_cst_p (rr_arg)
> +&& TREE_CODE (rr_arg) == INTEGER_CST
>  && wi::popcount (wi::to_wide (rr_arg) + 1) == 1)
>  {
>rl_and_mask = ~wi::to_wide (rr_arg);
> @@ -8392,7 +8392,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner));
>poly_uint64 ll_end_region = 0;
>if (TYPE_SIZE (TREE_TYPE (ll_inner))
> -  && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner
> +  && tree_fits_poly_uint64_p (TYPE_SIZE (TREE_TYPE (ll_inner
>  ll_end_region = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner)));
>if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region,
>ll_align, BITS_PER_WORD, volatilep, &lnmode))
> @@ -8585,7 +8585,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner));
>poly_uint64 lr_end_region = 0;
>if (TYPE_SIZE (TREE_TYPE (lr_inner))
> -   && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner
> +   && tree_fits_poly_uint64_p (TYPE_SIZE (TREE_TYPE (lr_inner
>   lr_end_region = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (lr_inner)));
>if (!get_best_mode (end_bit - first_bit, first_bit, 0, lr_end_region,
> lr_align, BITS_PER_WORD, volatilep, &rnmode))
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] Add warning for non-spec compliant FMV in Aarch64

2025-01-10 Thread Richard Sandiford
 writes:
> This patch adds a warning when FMV is used for Aarch64.
>
> The reasoning for this is the ACLE [1] spec for FMV has diverged
> significantly from the current implementation and we want to prevent
> potential future compatability issues.
>
> There is a patch for an ACLE compliant version of target_version and
> target_clone in progress but it won't make gcc-15.
>
> This has been bootstrap and regression tested for Aarch64.
> Is this okay for master and packport to gcc-14?
>
> [1] 
> https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc
>   (aarch64_mangle_decl_assembler_name): Add experimental warning.
>   * config/aarch64/aarch64.opt: Add command line option to disable
>   warning.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/aarch64/mv-1.C: Add CLI flag
>   * g++.target/aarch64/mv-symbols1.C: Add CLI flag
>   * g++.target/aarch64/mv-symbols2.C: Add CLI flag
>   * g++.target/aarch64/mv-symbols3.C: Add CLI flag
>   * g++.target/aarch64/mv-symbols4.C: Add CLI flag
>   * g++.target/aarch64/mv-symbols5.C: Add CLI flag
>   * g++.target/aarch64/mvc-symbols1.C: Add CLI flag
>   * g++.target/aarch64/mvc-symbols2.C: Add CLI flag
>   * g++.target/aarch64/mvc-symbols3.C: Add CLI flag
>   * g++.target/aarch64/mvc-symbols4.C: Add CLI flag
>   * g++.target/aarch64/mv-warning1.C: New test.
> ---
>  gcc/config/aarch64/aarch64.cc   |  4 
>  gcc/config/aarch64/aarch64.opt  |  4 
>  gcc/doc/invoke.texi | 11 ++-
>  gcc/testsuite/g++.target/aarch64/mv-1.C |  1 +
>  gcc/testsuite/g++.target/aarch64/mv-symbols1.C  |  1 +
>  gcc/testsuite/g++.target/aarch64/mv-symbols2.C  |  1 +
>  gcc/testsuite/g++.target/aarch64/mv-symbols3.C  |  1 +
>  gcc/testsuite/g++.target/aarch64/mv-symbols4.C  |  1 +
>  gcc/testsuite/g++.target/aarch64/mv-symbols5.C  |  1 +
>  gcc/testsuite/g++.target/aarch64/mv-warning1.C  |  9 +
>  gcc/testsuite/g++.target/aarch64/mvc-symbols1.C |  1 +
>  gcc/testsuite/g++.target/aarch64/mvc-symbols2.C |  1 +
>  gcc/testsuite/g++.target/aarch64/mvc-symbols3.C |  1 +
>  gcc/testsuite/g++.target/aarch64/mvc-symbols4.C |  1 +
>  14 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 91de13159cb..7d64e99b76b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -20347,6 +20347,10 @@ aarch64_mangle_decl_assembler_name (tree decl, tree 
> id)
>if (TREE_CODE (decl) == FUNCTION_DECL
>&& DECL_FUNCTION_VERSIONED (decl))
>  {
> +  warning_at (DECL_SOURCE_LOCATION(decl),  OPT_Wexperimental_fmv_target,
> +   "Function Multi Versioning support is experimental, and the "
> +   "behavior is likely to change");
> +
>aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version 
> (decl);

Did you consider doing this in aarch64_option_valid_version_attribute_p
instead?  That hook is called directly by the frontend and is something
that already produces diagnostics for invalid versions.

OK with that change from POV if it works, and if you agree.
Please give others until Monday to comment though.

Thanks,
Richard

>  
>std::string name = IDENTIFIER_POINTER (id);
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 36bc719b822..2a8dd8ea66c 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -431,3 +431,7 @@ handling.  One means we try to form pairs involving one 
> or more existing
>  individual writeback accesses where possible.  A value of two means we
>  also try to opportunistically form writeback opportunities by folding in
>  trailing destructive updates of the base register used by a pair.
> +
> +Wexperimental-fmv-target
> +Target Var(warn_experimental_fmv) Warning Init(1)
> +Warn about usage of experimental Function Multi Versioning.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 51dc871e6bc..bdf9ee1bc0c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -822,7 +822,8 @@ Objective-C and Objective-C++ Dialects}.
>  -moverride=@var{string}  -mverbose-cost-dump
>  -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg}
>  -mstack-protector-guard-offset=@var{offset} -mtrack-speculation
> --moutline-atomics -mearly-ldp-fusion -mlate-ldp-fusion}
> +-moutline-atomics -mearly-ldp-fusion -mlate-ldp-fusion
> +-Wexperimental-fmv-target}
>  
>  @emph{Adapteva Epiphany Options}
>  @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs
> @@ -22087,6 +22088,14 @@ which specify use of that register as a fixed 
> register,
>  and @samp{none}, which means that no register is used for this
>  purpose.  The default is @option{-m1reg

Re: [PATCH v2] testsuite: arm: Use -std=c17 and effective-target arm_arch_v5te_thumb

2025-01-10 Thread Richard Earnshaw (lists)
On 09/01/2025 21:42, Torbjörn SVENSSON wrote:
> Changes since v1:
> 
> - Added dg-add-options arm_arch_v5te_thumb
> - Added -std=c17 to dg-options.
> - Removed -march=armv5te -mfloat-abi=soft -mthumb from dg-options
> - Updated the commit message to reflect the new changes
> 
> Note: This changes from armv5te to armv5te+fp and from soft to softfp.
> Does this matter? If so, I can override it in a new
> dg-additional-options line after the dg-add-options.

No, it's fine.  Firstly it's pure integer code and we're on armv5te which has 
no vector instructions; but secondly, we're generating Thumb (1) instructions, 
so there's no FP extension anyway.

> 
> Ok for trunk?

OK.

R.

> 
> --
> 
> With -std=c23, the following errors are now emitted as the function
> prototype and implementation does not match:
> 
> .../pr59858.c: In function 're_search_internal':
> .../pr59858.c:95:17: error: too many arguments to function 'check_matching'
> .../pr59858.c:75:12: note: declared here
> .../pr59858.c: At top level:
> .../pr59858.c:100:1: error: conflicting types for 'check_matching'; have 
> 'int(re_match_context_t *, int *)'
> .../pr59858.c:75:12: note: previous declaration of 'check_matching' with type 
> 'int(void)'
> .../pr59858.c: In function 'check_matching':
> .../pr59858.c:106:14: error: too many arguments to function 'transit_state'
> .../pr59858.c:77:23: note: declared here
> .../pr59858.c: At top level:
> .../pr59858.c:111:1: error: conflicting types for 'transit_state'; have 
> 're_dfastate_t *(re_match_context_t *, re_dfastate_t *)'
> .../pr59858.c:77:23: note: previous declaration of 'transit_state' with type 
> 're_dfastate_t *(void)'
> .../pr59858.c: In function 'transit_state':
> .../pr59858.c:116:7: error: too many arguments to function 'build_trtable'
> .../pr59858.c:79:12: note: declared here
> .../pr59858.c: At top level:
> .../pr59858.c:121:1: error: conflicting types for 'build_trtable'; have 
> 'int(const re_dfa_t *, re_dfastate_t *)'
> .../pr59858.c:79:12: note: previous declaration of 'build_trtable' with type 
> 'int(void)'
> 
> Adding -std=c17 removes these errors.
> 
> Also, updated test case to use -mcpu=unset/-march=unset feature
> introduced in r15-3606-g7d6c6a0d15c.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/pr59858.c: Use -std=c17 and effective-target
>   arm_arch_v5te_thumb.
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.target/arm/pr59858.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
> b/gcc/testsuite/gcc.target/arm/pr59858.c
> index 9336edfce27..8fc63b57af4 100644
> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
> @@ -1,8 +1,8 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
> -fPIC -w -fpermissive" } */
> +/* { dg-options "-std=c17 -fno-builtin -fno-stack-protector -Os 
> -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w -fpermissive" } */
>  /* { dg-require-effective-target fpic } */
> -/* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
> -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
>  /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
> +/* { dg-add-options arm_arch_v5te_thumb } */
>  
>  typedef enum {
>   REG_ENOSYS = -1,



Re: [PATCH v2] testsuite: arm: Use -std=c17 and effective-target arm_arch_v5te_thumb

2025-01-10 Thread Torbjorn SVENSSON




On 2025-01-10 11:19, Richard Earnshaw (lists) wrote:

On 09/01/2025 21:42, Torbjörn SVENSSON wrote:

Changes since v1:

- Added dg-add-options arm_arch_v5te_thumb
- Added -std=c17 to dg-options.
- Removed -march=armv5te -mfloat-abi=soft -mthumb from dg-options
- Updated the commit message to reflect the new changes

Note: This changes from armv5te to armv5te+fp and from soft to softfp.
Does this matter? If so, I can override it in a new
dg-additional-options line after the dg-add-options.


No, it's fine.  Firstly it's pure integer code and we're on armv5te which has 
no vector instructions; but secondly, we're generating Thumb (1) instructions, 
so there's no FP extension anyway.



Ok for trunk?


OK.


Pushed as r15-6766-gf447c3c0dff.

Kind regards,
Torbjörn



R.



--

With -std=c23, the following errors are now emitted as the function
prototype and implementation does not match:

.../pr59858.c: In function 're_search_internal':
.../pr59858.c:95:17: error: too many arguments to function 'check_matching'
.../pr59858.c:75:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:100:1: error: conflicting types for 'check_matching'; have 
'int(re_match_context_t *, int *)'
.../pr59858.c:75:12: note: previous declaration of 'check_matching' with type 
'int(void)'
.../pr59858.c: In function 'check_matching':
.../pr59858.c:106:14: error: too many arguments to function 'transit_state'
.../pr59858.c:77:23: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:111:1: error: conflicting types for 'transit_state'; have 
're_dfastate_t *(re_match_context_t *, re_dfastate_t *)'
.../pr59858.c:77:23: note: previous declaration of 'transit_state' with type 
're_dfastate_t *(void)'
.../pr59858.c: In function 'transit_state':
.../pr59858.c:116:7: error: too many arguments to function 'build_trtable'
.../pr59858.c:79:12: note: declared here
.../pr59858.c: At top level:
.../pr59858.c:121:1: error: conflicting types for 'build_trtable'; have 
'int(const re_dfa_t *, re_dfastate_t *)'
.../pr59858.c:79:12: note: previous declaration of 'build_trtable' with type 
'int(void)'

Adding -std=c17 removes these errors.

Also, updated test case to use -mcpu=unset/-march=unset feature
introduced in r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr59858.c: Use -std=c17 and effective-target
arm_arch_v5te_thumb.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.target/arm/pr59858.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 9336edfce27..8fc63b57af4 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,8 +1,8 @@
  /* { dg-do compile } */
-/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w 
-fpermissive" } */
+/* { dg-options "-std=c17 -fno-builtin -fno-stack-protector -Os 
-fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w -fpermissive" } */
  /* { dg-require-effective-target fpic } */
-/* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft -mfloat-abi=hard" { *-*-* } { 
"-mfloat-abi=hard" } { "" } } */
  /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
+/* { dg-add-options arm_arch_v5te_thumb } */
  
  typedef enum {

   REG_ENOSYS = -1,






Re: [PATCH] testsuite: arm: Check for short circuit instructions [PR103298]

2025-01-10 Thread Richard Earnshaw (lists)
On 22/12/2024 15:35, Torbjorn SVENSSON wrote:
> 
> 
> On 2024-12-19 12:48, Richard Earnshaw (lists) wrote:
>> On 18/12/2024 16:24, Torbjörn SVENSSON wrote:
>>> Changes since v1:
>>>
>>> - Updated the commit message to reflect the changes (including the subject).
>>> - Replaced the POP/BEQ checks with chesk for {cmp,mov,orr,and}{eq,ne}.
>>> - Removed the size check
>>>
>>>
>>> Ok for trunk and releases/gcc-14?
>>> Should I also push this to releases/gcc-13 and releases/gcc-12 as this is a
>>> regression in r12-5301-g04520645038?
>>>
>>> -- 
>>>
>>> Instead of checking that a certain transformation is not used by
>>> counting the number of return instructions and the number of BEQ
>>> instructions, check that none of CMP, MOV, ORR and AND instructions are
>>> suffixed with EQ or NE.
>>> Also removed size check as it's very unstable (depends on optimization
>>> in use).
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR testsuite/103298
>>> * gcc.target/arm/pr43920-2.c: Change to assembler pattern
>>> "(cmp|mov|orr|and)(eq|ne)" for the check. Remove size check.
>>>
>>> Signed-off-by: Torbjörn SVENSSON 
>>
>> OK
> 
> Pushed as r15-6416-g9e1063ca1c8 and r14.2.0-584-ge79105ad8c0.
> Should I also push it to releases/gcc-12 and releases/gcc-13? Or can the 
> bugzilla be closed regardless (regression in gcc12)?

I'm not convinced it's worth the time to validate the patch on those compilers. 
 It's just a testism.

R.



Re: [PATCH] testsuite: arm: Check for short circuit instructions [PR103298]

2025-01-10 Thread Torbjorn SVENSSON




On 2025-01-10 11:27, Richard Earnshaw (lists) wrote:

On 22/12/2024 15:35, Torbjorn SVENSSON wrote:



On 2024-12-19 12:48, Richard Earnshaw (lists) wrote:

On 18/12/2024 16:24, Torbjörn SVENSSON wrote:

Changes since v1:

- Updated the commit message to reflect the changes (including the subject).
- Replaced the POP/BEQ checks with chesk for {cmp,mov,orr,and}{eq,ne}.
- Removed the size check


Ok for trunk and releases/gcc-14?
Should I also push this to releases/gcc-13 and releases/gcc-12 as this is a
regression in r12-5301-g04520645038?

--

Instead of checking that a certain transformation is not used by
counting the number of return instructions and the number of BEQ
instructions, check that none of CMP, MOV, ORR and AND instructions are
suffixed with EQ or NE.
Also removed size check as it's very unstable (depends on optimization
in use).

gcc/testsuite/ChangeLog:

 PR testsuite/103298
 * gcc.target/arm/pr43920-2.c: Change to assembler pattern
 "(cmp|mov|orr|and)(eq|ne)" for the check. Remove size check.

Signed-off-by: Torbjörn SVENSSON 


OK


Pushed as r15-6416-g9e1063ca1c8 and r14.2.0-584-ge79105ad8c0.
Should I also push it to releases/gcc-12 and releases/gcc-13? Or can the 
bugzilla be closed regardless (regression in gcc12)?


I'm not convinced it's worth the time to validate the patch on those compilers. 
 It's just a testism.


Ok, I'll resolve the bugzilla without any further backports. Thanks!

Kind regards,
Torbjörn



R.





Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-10 Thread Richard Biener
On Thu, Jan 9, 2025 at 9:39 PM Qing Zhao  wrote:
>
>
>
> > On Jan 9, 2025, at 14:10, Jeff Law  wrote:
> >
> >
> >
> > On 1/9/25 10:48 AM, Qing Zhao wrote:
> >
> >>>
> >>> I think Jeff's patch is not reasonable since it boils down to not diagnose
> >>> -Warray-bounds but instead remove those stmts.
> >> If these stmts are dead-code that are generated by compiler optimization 
> >> (NOT from source code),
> >> removing them before diagnosis is correct. (To avoid false positive 
> >> warnings).
> > But I don't think we generally know if the problematic statements came from 
> > user code or were generated by the compiler.
>
> To help the compiler catches real problems in the source code and avoid false 
> positive warnings introduced by the compiler transformation, we might need to 
> add flags in the IR to distinguish this?

Well, the issue is the problematic statements _are_ in user code, just
-Warray-bounds is too stupid to
look at SCEV for indices and instead relies on weaker value-ranges.

It's a problem we're never going to fully solve.  Some of the
testcases show missed optimizations
which we can work on.  Some show we diagnose IL we later are able to
optimize away, some
simply show that users are not always happy with how we decide on
suppressing a diagnostic.

For the case at hand we should be able to optimize it fully.

But optimizing based on UB is always going to be to interact with
diagnosing UB, so we have
to be careful.  Our "late" diagnostics are most problematic here and
I'd argue moving those
earlier is the first thing we should try.

Richard.

>
> Qing
> >
> > Jeff
> >
>


[PATCH] c: Fix up expr location for __builtin_stdc_rotate_* [PR118376]

2025-01-10 Thread Jakub Jelinek
Hi!

Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
case (the other __builtin_stdc_* cases already have it), which means
the locations in expr are uninitialized, sometimes causing ICEs in linemap
code, at other times just valgrind errors about uninitialized var uses.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-10  Jakub Jelinek  

PR c/118376
* c-parser.cc (c_parser_postfix_expression): Call
set_c_expr_source_range before break in the __builtin_stdc_rotate_*
case.

* gcc.dg/pr118376.c: New test.

--- gcc/c/c-parser.cc.jj2025-01-06 10:07:33.585493775 +0100
+++ gcc/c/c-parser.cc   2025-01-09 16:12:07.761005082 +0100
@@ -12906,6 +12906,7 @@ c_parser_postfix_expression (c_parser *p
  expr.value = build2_loc (loc, COMPOUND_EXPR,
   TREE_TYPE (expr.value),
   instrument_expr, expr.value);
+   set_c_expr_source_range (&expr, loc, close_paren_loc);
break;
  }
tree barg1 = arg;
--- gcc/testsuite/gcc.dg/pr118376.c.jj  2025-01-09 16:26:19.621072359 +0100
+++ gcc/testsuite/gcc.dg/pr118376.c 2025-01-09 16:26:04.608283459 +0100
@@ -0,0 +1,11 @@
+/* PR c/118376 */
+/* { dg-do compile } */
+/* { dg-options "-Wsign-conversion" } */
+
+unsigned x;
+
+void
+foo ()
+{
+  __builtin_memset (&x, (long long) __builtin_stdc_rotate_right (x, 0), 1);
+} /* { dg-warning "conversion to 'int' from 'long long int' may change the 
sign of the result" "" { target *-*-* } .-1 } */

Jakub



[COMMITTED 1/5] ada: Reorder syntactic node fields to match the Ada RM grammar

2025-01-10 Thread Marc Poulhiès
From: Piotr Trojanek 

Several AST nodes had their syntactic fields in a different order than
specified by the Ada RM grammar. With the variable-size nodes this no longer
had an impact on the AST memory layout and was making the automatically
generated Nmake routines a bit unintuitive to use.

gcc/ada/ChangeLog:

* exp_ch3.adb (Predef_Spec_Or_Body): Add explicit parameter
associations, because now the Empty_List actual parameter would be
confused as being for the Aspect_Specifications formal parameter.
* gen_il-gen-gen_nodes.adb (Gen_Nodes): Reorder syntactic fields.
* sem_util.adb (Declare_Indirect_Temp): Add explicit parameter
association, because now the parameter will be interpreted as a
subpool handle name.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb  |  5 +-
 gcc/ada/gen_il-gen-gen_nodes.adb | 78 
 gcc/ada/sem_util.adb |  9 ++--
 3 files changed, 48 insertions(+), 44 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 6c69e63b2dd..d95b9178030 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -12399,7 +12399,10 @@ package body Exp_Ch3 is
   --  on the body to add the appropriate stuff.
 
   elsif For_Body then
- return Make_Subprogram_Body (Loc, Spec, Empty_List, Empty);
+ return Make_Subprogram_Body (Loc,
+  Specification  => Spec,
+  Declarations   => Empty_List,
+  Handled_Statement_Sequence => Empty);
 
   --  For the case of an Input attribute predefined for an abstract type,
   --  generate an abstract specification. This will never be called, but we
diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
index c512d85dbb2..ca46bcebdd9 100644
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -377,10 +377,10 @@ begin -- Gen_IL.Gen.Gen_Nodes
 Sm (Is_Qualified_Universal_Literal, Flag)));
 
Cc (N_Quantified_Expression, N_Subexpr,
-   (Sy (Iterator_Specification, Node_Id, Default_Empty),
+   (Sy (All_Present, Flag),
+Sy (Iterator_Specification, Node_Id, Default_Empty),
 Sy (Loop_Parameter_Specification, Node_Id, Default_Empty),
-Sy (Condition, Node_Id, Default_Empty),
-Sy (All_Present, Flag)));
+Sy (Condition, Node_Id, Default_Empty)));
 
Cc (N_Aggregate, N_Subexpr,
(Sy (Expressions, List_Id, Default_No_List),
@@ -395,9 +395,9 @@ begin -- Gen_IL.Gen.Gen_Nodes
 Sm (Has_Self_Reference, Flag)));
 
Cc (N_Allocator, N_Subexpr,
-   (Sy (Expression, Node_Id, Default_Empty),
-Sy (Subpool_Handle_Name, Node_Id, Default_Empty),
+   (Sy (Subpool_Handle_Name, Node_Id, Default_Empty),
 Sy (Null_Exclusion_Present, Flag, Default_False),
+Sy (Expression, Node_Id, Default_Empty),
 Sm (For_Special_Return_Object, Flag),
 Sm (Do_Storage_Check, Flag),
 Sm (Is_Dynamic_Coextension, Flag),
@@ -494,11 +494,11 @@ begin -- Gen_IL.Gen.Gen_Nodes
 Sm (Prev_Ids, Flag)));
 
Cc (N_Entry_Declaration, N_Declaration,
-   (Sy (Defining_Identifier, Node_Id),
+   (Sy (Must_Override, Flag),
+Sy (Must_Not_Override, Flag),
+Sy (Defining_Identifier, Node_Id),
 Sy (Discrete_Subtype_Definition, Node_Id, Default_Empty),
 Sy (Parameter_Specifications, List_Id, Default_No_List),
-Sy (Must_Override, Flag),
-Sy (Must_Not_Override, Flag),
 Sy (Aspect_Specifications, List_Id, Default_No_List),
 Sm (Corresponding_Body, Node_Id)));
 
@@ -513,8 +513,8 @@ begin -- Gen_IL.Gen.Gen_Nodes
 Sy (In_Present, Flag),
 Sy (Out_Present, Flag),
 Sy (Null_Exclusion_Present, Flag, Default_False),
-Sy (Subtype_Mark, Node_Id, Default_Empty),
 Sy (Access_Definition, Node_Id, Default_Empty),
+Sy (Subtype_Mark, Node_Id, Default_Empty),
 Sy (Default_Expression, Node_Id, Default_Empty),
 Sy (Aspect_Specifications, List_Id, Default_No_List),
 Sm (More_Ids, Flag),
@@ -545,17 +545,17 @@ begin -- Gen_IL.Gen.Gen_Nodes
 
Cc (N_Iterator_Specification, N_Declaration,
(Sy (Defining_Identifier, Node_Id),
-Sy (Name, Node_Id, Default_Empty),
-Sy (Reverse_Present, Flag),
+Sy (Subtype_Indication, Node_Id, Default_Empty),
 Sy (Of_Present, Flag),
-Sy (Iterator_Filter, Node_Id, Default_Empty),
-Sy (Subtype_Indication, Node_Id, Default_Empty)));
+Sy (Reverse_Present, Flag),
+Sy (Name, Node_Id, Default_Empty),
+Sy (Iterator_Filter, Node_Id, Default_Empty)));
 
Cc (N_Loop_Parameter_Specification, N_Declaration,
(Sy (Defining_Identifier, Node_Id),
 Sy (Reverse_Present, Flag),
-Sy (Iterator_Filter, Node_Id, Default_Empty),
-Sy (Discrete_Subtype_Definition

[COMMITTED 2/5] ada: Turn Is_Effective_Use_Clause from syntactic to semantic flag

2025-01-10 Thread Marc Poulhiès
From: Piotr Trojanek 

For a USE clause being effective is a semantic property, not a syntactic.
AST cleanup; behavior is unaffected.

gcc/ada/ChangeLog:

* gen_il-gen-gen_nodes.adb (Gen_Nodes): Change Is_Effective_Use_Clause
from syntactic to semantic property.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gen_il-gen-gen_nodes.adb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
index ca46bcebdd9..1f5dc6d3803 100644
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -782,7 +782,7 @@ begin -- Gen_IL.Gen.Gen_Nodes
 
Cc (N_Use_Package_Clause, N_Later_Decl_Item,
(Sy (Name, Node_Id, Default_Empty),
-Sy (Is_Effective_Use_Clause, Flag),
+Sm (Is_Effective_Use_Clause, Flag),
 Sm (Entity_Or_Associated_Node, Node_Id), -- just Associated_Node
 Sm (Hidden_By_Use_Clause, Elist_Id),
 Sm (More_Ids, Flag),
@@ -1497,8 +1497,8 @@ begin -- Gen_IL.Gen.Gen_Nodes
 
Cc (N_Use_Type_Clause, Node_Kind,
(Sy (Subtype_Mark, Node_Id, Default_Empty),
-Sy (Is_Effective_Use_Clause, Flag),
 Sy (All_Present, Flag),
+Sm (Is_Effective_Use_Clause, Flag),
 Sm (Hidden_By_Use_Clause, Elist_Id),
 Sm (More_Ids, Flag),
 Sm (Next_Use_Clause, Node_Id),
-- 
2.43.0



[COMMITTED 4/5] ada: Remove empty line.

2025-01-10 Thread Marc Poulhiès
gcc/ada/ChangeLog:

* env.h: Remove last empty line.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/env.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/ada/env.h b/gcc/ada/env.h
index b80b7e9a0fc..58a92b9d7f2 100644
--- a/gcc/ada/env.h
+++ b/gcc/ada/env.h
@@ -33,5 +33,4 @@ extern void __gnat_getenv (char *name, int *len, char 
**value);
 extern void __gnat_setenv (char *name, char *value);
 extern char **__gnat_environ (void);
 extern void __gnat_unsetenv (char *name);
-extern void __gnat_clearenv (void);
-
+extern void __gnat_clearenv(void);
-- 
2.43.0



Re: [PATCH] match: Keep conditional in simplification to constant [PR118140].

2025-01-10 Thread Richard Sandiford
"Robin Dapp"  writes:
> Hi,
>
> in PR118140 we simplify
>
>   _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11);
>
> to "1":
>
>   Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1
>
> when _46 == 1.  This happens by removing the conditional and applying
> a | 1 = 1.  Normally we re-introduce the conditional and its else value
> if needed but that does not happen here as we're not dealing with a
> vector type.  For correctness's sake, we must not remove the conditional
> even for non-vector types.
>
> This patch re-introduces a COND_EXPR in such cases.  For PR118140 this
> result in a non-vectorized loop.
>
> Bootstrapped and regtested on x86 and aarch64.  Regtested on rv64gcv_zvl512b.
>
> Regards
>  Robin
>
>   PR middle-end/118140
>
> gcc/ChangeLog:
>
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
>   COND_EXPR when we simplified to a scalar gimple value but still
>   have an else value.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/pr118140.c: New test.
>   * gcc.target/riscv/rvv/pr118140.c: New test.

OK, thanks.

Richard

> ---
>  gcc/gimple-match-exports.cc   | 26 ++---
>  gcc/testsuite/gcc.dg/vect/pr118140.c  | 27 +
>  .../gcc.target/riscv/rvv/autovec/pr118140.c   | 29 +++
>  3 files changed, 72 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr118140.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c
>
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index e06a8aaa171..ccba046a1d4 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -337,23 +337,29 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>  }
>  
>/* If the "then" value is a gimple value and the "else" value matters,
> - create a VEC_COND_EXPR between them, then see if it can be further
> + create a (VEC_)COND_EXPR between them, then see if it can be further
>   simplified.  */
>gimple_match_op new_op;
>if (res_op->cond.else_value
> -  && VECTOR_TYPE_P (res_op->type)
>&& gimple_simplified_result_is_gimple_val (res_op))
>  {
> -  tree len = res_op->cond.len;
> -  if (!len)
> - new_op.set_op (VEC_COND_EXPR, res_op->type,
> -res_op->cond.cond, res_op->ops[0],
> -res_op->cond.else_value);
> +  if (VECTOR_TYPE_P (res_op->type))
> + {
> +   tree len = res_op->cond.len;
> +   if (!len)
> + new_op.set_op (VEC_COND_EXPR, res_op->type,
> +res_op->cond.cond, res_op->ops[0],
> +res_op->cond.else_value);
> +   else
> + new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +res_op->cond.cond, res_op->ops[0],
> +res_op->cond.else_value,
> +res_op->cond.len, res_op->cond.bias);
> + }
>else
> - new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> + new_op.set_op (COND_EXPR, res_op->type,
>  res_op->cond.cond, res_op->ops[0],
> -res_op->cond.else_value,
> -res_op->cond.len, res_op->cond.bias);
> +res_op->cond.else_value);
>*res_op = new_op;
>return gimple_resimplify3 (seq, res_op, valueize);
>  }
> diff --git a/gcc/testsuite/gcc.dg/vect/pr118140.c 
> b/gcc/testsuite/gcc.dg/vect/pr118140.c
> new file mode 100644
> index 000..2dab98bfc91
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr118140.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run { target { aarch64*-*-* || riscv*-*-* } } } */
> +/* { dg-additional-options "-std=gnu99" } */
> +
> +long long a;
> +_Bool d;
> +char e;
> +_Bool f[17];
> +_Bool f_3;
> +
> +int main() {
> +  for (char g = 3; g < 16; g++) {
> +  d |= ({
> +int h = f[g - 1] ? 2 : 0;
> +_Bool t;
> +if (f[g - 1])
> +  t = f_3;
> +else
> +  t = 0;
> +int i = t;
> +h > i;
> +  });
> +e += f[g + 1];
> +  }
> +
> +  if (d != 0)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c
> new file mode 100644
> index 000..31134de7b3a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118140.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target riscv_v_ok } */
> +/* { dg-add-options riscv_v } */
> +/* { dg-additional-options "-std=gnu99 -Wno-pedantic" } */
> +
> +long long a;
> +_Bool d;
> +char e;
> +_Bool f[17];
> +_Bool f_3;
> +
> +int main() {
> +  for (char g = 3; g < 16; g++) {
> +  d |= ({
> +int h = f[g - 1] ? 2 : 0;
> +_Bool t;
> +if (f[g - 1])
> +  t = f_3;
> +else
> +  t = 0;
> +int i = t;
> +h > i;

[COMMITTED 3/5] ada: Set syntactic node properties immediately when crating the nodes

2025-01-10 Thread Marc Poulhiès
From: Piotr Trojanek 

When creating a node, we can directly set its syntactic properties.
Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* contracts.adb (Build_Call_Helper_Decl): Tune whitespace.
* exp_attr.adb (Analyze_Attribute): Set Of_Present while
creating the node; reorder setting Subtype_Indication to match the
syntax order.
* exp_ch3.adb (Build_Equivalent_Aggregate): Likewise for Box_Present
and Expression properties.
* sem_ch12.adb (Analyze_Formal_Derived_Type): Set type properties
when creating the nodes.
* sem_ch3.adb (Check_Anonymous_Access_Component): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/contracts.adb |  4 ++--
 gcc/ada/exp_attr.adb  |  8 
 gcc/ada/exp_ch3.adb   |  5 ++---
 gcc/ada/sem_ch12.adb  | 15 +--
 gcc/ada/sem_ch3.adb   | 13 ++---
 5 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb
index 1c9161b8a37..8b94a67639f 100644
--- a/gcc/ada/contracts.adb
+++ b/gcc/ada/contracts.adb
@@ -4066,8 +4066,8 @@ package body Contracts is
 
  begin
 Spec := Build_Call_Helper_Spec (Helper_Id);
-Set_Must_Override  (Spec, False);
-Set_Must_Not_Override  (Spec, False);
+Set_Must_Override (Spec, False);
+Set_Must_Not_Override (Spec, False);
 Set_Is_Inlined (Helper_Id);
 Set_Is_Public  (Helper_Id);
 
diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index cc42d647060..b896228a70e 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -6422,10 +6422,10 @@ package body Exp_Attr is
begin
   Iter :=
 Make_Iterator_Specification (Loc,
-Defining_Identifier => Elem,
-Name => Relocate_Node (Prefix (N)),
-Subtype_Indication => Empty);
-  Set_Of_Present (Iter);
+  Defining_Identifier => Elem,
+  Subtype_Indication  => Empty,
+  Of_Present  => True,
+  Name=> Relocate_Node (Prefix (N)));
 
   New_Loop := Make_Loop_Statement (Loc,
 Iteration_Scheme =>
diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index d95b9178030..0dfd8102df1 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -1349,9 +1349,8 @@ package body Exp_Ch3 is
 
  Append_To (Component_Associations (Aggr),
Make_Component_Association (Loc,
- Choices=> New_List (Make_Others_Choice (Loc)),
- Expression => Empty));
- Set_Box_Present (Last (Component_Associations (Aggr)));
+ Choices => New_List (Make_Others_Choice (Loc)),
+ Box_Present => True));
 
  if Typ /= Full_Typ then
 Analyze_And_Resolve (Aggr, Full_View (Base_Type (Full_Typ)));
diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 088a9ccfb58..dad8c73729e 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -3097,13 +3097,11 @@ package body Sem_Ch12 is
  Defining_Identifier   => T,
  Discriminant_Specifications   => Discriminant_Specifications (N),
  Unknown_Discriminants_Present => Unk_Disc,
+ Abstract_Present  => Abstract_Present (Def),
+ Limited_Present   => Limited_Present (Def),
  Subtype_Indication=> Subtype_Mark (Def),
+ Synchronized_Present  => Synchronized_Present (Def),
  Interface_List=> Interface_List (Def));
-
- Set_Abstract_Present (New_N, Abstract_Present (Def));
- Set_Limited_Present  (New_N, Limited_Present  (Def));
- Set_Synchronized_Present (New_N, Synchronized_Present (Def));
-
   else
  New_N :=
Make_Full_Type_Declaration (Loc,
@@ -3112,12 +3110,9 @@ package body Sem_Ch12 is
Discriminant_Specifications (Parent (T)),
  Type_Definition =>
Make_Derived_Type_Definition (Loc,
+ Abstract_Present   => Abstract_Present (Def),
+ Limited_Present=> Limited_Present (Def),
  Subtype_Indication => Subtype_Mark (Def)));
-
- Set_Abstract_Present
-   (Type_Definition (New_N), Abstract_Present (Def));
- Set_Limited_Present
-   (Type_Definition (New_N), Limited_Present  (Def));
   end if;
 
   Rewrite (N, New_N);
diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index cf6ab68d4e6..64e3f85c605 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -1409,10 +1409,10 @@ package body Sem_Ch3 is
begin
   Decl :=
 Make_Subtype_Declaration (Loc,
-  

[COMMITTED 5/5] ada: Incorrect accessibilty level for library level subprograms

2025-01-10 Thread Marc Poulhiès
From: squirek 

The patch fixes an issue in the compiler whereby accessibility level
calculations for objects declared witihin library-level subprograms
were done incorrectly - potentially allowing runtime accessibility
checks to spuriously pass.

gcc/ada/ChangeLog:

* accessibility.adb:
(Innermost_master_Scope_Depth): Add special case for expressions
within library level subprograms.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/accessibility.adb | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/ada/accessibility.adb b/gcc/ada/accessibility.adb
index b808e88b128..8c85173aa34 100644
--- a/gcc/ada/accessibility.adb
+++ b/gcc/ada/accessibility.adb
@@ -187,6 +187,15 @@ package body Accessibility is
  or else (Nkind (Node_Par) = N_Object_Renaming_Declaration
and then Comes_From_Iterator (Node_Par))
then
+  --  Handle the case of expressions within library level
+  --  subprograms here by adding one to the level modifier.
+
+  if Encl_Scop = Standard_Standard
+and then Nkind (Node_Par) = N_Subprogram_Body
+  then
+ Master_Lvl_Modifier := Master_Lvl_Modifier + 1;
+  end if;
+
   --  Note that in some rare cases the scope depth may not be
   --  set, for example, when we are in the middle of analyzing
   --  a type and the enclosing scope is said type. In that case
-- 
2.43.0



Re: rs6000: Add -msplit-patch-nops (PR112980)

2025-01-10 Thread Martin Jambor
Hello,

On Wed, Dec 11 2024, Martin Jambor wrote:
> Hello,
>
> even though it is not my work, I would like to ping this patch.  Having
> it upstream would really help us a lot.
>

Please, pretty please, consider reviewing this in time for GCC 15,
having it upstream would really help us a lot and from what I can tell,
it should no longer be controversial.

Thank you very much in advance,

Martin

>
> On Wed, Nov 13 2024, Michael Matz wrote:
>> Hello,
>>
>> this is essentially 
>>
>>   https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html
>>
>> from Kewen in functionality.  When discussing this with Segher at the 
>> Cauldron he expressed reservations about changing the default 
>> implementation of -fpatchable-function-entry.  So, to move forward, let's 
>> move it under a new target option -msplit-patch-nops (expressing the 
>> important deviation from the default behaviour, namely that all the 
>> patching nops form a consecutive sequence normally).
>>
>> Regstrapping on power9 ppc64le in progress.  Okay if that passes?
>>
>>
>> Ciao,
>> Michael.
>>
>> ---
>>
>> as the bug report details some uses of -fpatchable-function-entry
>> aren't happy with the "before" NOPs being inserted between global and
>> local entry point on powerpc.  We want the before NOPs be in front
>> of the global entry point.  That means that the patching NOPs aren't
>> consecutive for dual entry point functions, but for these usecases
>> that's not the problem.  But let us support both under the control
>> of a new target option: -msplit-patch-nops.
>>
>>  gcc/
>>
>> PR target/112980
>> * config/rs6000/rs6000.opt (msplit-patch-nops): New option.
>> * doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
>> * config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
>> New member.
>> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>> Emit split nops under control of that one.
>> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>> Add handling of split patch nops.
>> ---
>>  gcc/config/rs6000/rs6000-logue.cc | 15 +--
>>  gcc/config/rs6000/rs6000.cc   | 27 +++
>>  gcc/config/rs6000/rs6000.h|  6 ++
>>  gcc/config/rs6000/rs6000.opt  |  4 
>>  gcc/doc/invoke.texi   | 17 +++--
>>  5 files changed, 57 insertions(+), 12 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
>> b/gcc/config/rs6000/rs6000-logue.cc
>> index c87058b435e..aa1e0442f2b 100644
>> --- a/gcc/config/rs6000/rs6000-logue.cc
>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>> @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file)
>>  
>>unsigned short patch_area_size = crtl->patch_area_size;
>>unsigned short patch_area_entry = crtl->patch_area_entry;
>> -  /* Need to emit the patching area.  */
>> -  if (patch_area_size > 0)
>> +  /* Emit non-split patching area now.  */
>> +  if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0)
>>  {
>>cfun->machine->global_entry_emitted = true;
>>/* As ELFv2 ABI shows, the allowable bytes between the global
>> @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file)
>> patch_area_entry);
>>rs6000_print_patchable_function_entry (file, patch_area_entry,
>>   true);
>> -  patch_area_size -= patch_area_entry;
>>  }
>>  }
>>  
>> @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file)
>>assemble_name (file, name);
>>fputs ("\n", file);
>>/* Emit the nops after local entry.  */
>> -  if (patch_area_size > 0)
>> -rs6000_print_patchable_function_entry (file, patch_area_size,
>> -   patch_area_entry == 0);
>> +  if (patch_area_size > patch_area_entry)
>> +{
>> +  patch_area_size -= patch_area_entry;
>> +  cfun->machine->stop_patch_area_print = false;
>> +  rs6000_print_patchable_function_entry (file, patch_area_size,
>> + patch_area_entry == 0);
>> +}
>>  }
>>  
>>else if (rs6000_pcrel_p ())
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 950fd947fda..6427e6913ba 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file,
>>  {
>>bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p 
>> ();
>>/* For a function which needs global entry point, we will emit the
>> - patchable area before and after local entry point under the control of
>> - cfun->machine->global_entry_emitted, see the handling in function
>> - rs6000_output_function_prologue.  */
>> -  if (!global_entry_needed_p || cfun->machine->global_entry_emitted)
>> + pa

[PATCH] c++: Reject cdtors and conversion operators with a single * as return type [PR118306]

2025-01-10 Thread Simon Martin
We currently accept the following invalid code (EDG and MSVC do as well)

=== cut here ===
struct A {
  *A ();
};
=== cut here ===

The problem is that we end up in grokdeclarator with a cp_declarator of
kind cdk_pointer but no type, and we happily go through (if we have a
reference instead we eventually error out trying to form a reference to
void).

This patch makes sure that grokdeclarator errors out when processing a
constructor or a conversion operator with no return type specified but
also a declarator representing a pointer or a reference type.

Successfully tested on x86_64-pc-linux-gnu. OK for GCC 16?

PR c++/118306

gcc/cp/ChangeLog:

* decl.cc (check_special_function_return_type): Take declarator
and location as input. Reject return type specifiers with only
a * or &.
(grokdeclarator): Update call to
check_special_function_return_type.

gcc/testsuite/ChangeLog:

* g++.dg/parse/constructor4.C: New test.
* g++.dg/parse/conv_op2.C: New test.
* g++.dg/parse/default_to_int.C: New test.

---
 gcc/cp/decl.cc  | 21 +---
 gcc/testsuite/g++.dg/parse/constructor4.C   | 36 +
 gcc/testsuite/g++.dg/parse/conv_op2.C   |  8 +
 gcc/testsuite/g++.dg/parse/default_to_int.C | 33 +++
 4 files changed, 94 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/constructor4.C
 create mode 100644 gcc/testsuite/g++.dg/parse/conv_op2.C
 create mode 100644 gcc/testsuite/g++.dg/parse/default_to_int.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 5c6a4996a89..b57df261e76 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -101,7 +101,8 @@ static void end_cleanup_fn (void);
 static tree cp_make_fname_decl (location_t, tree, int);
 static void initialize_predefined_identifiers (void);
 static tree check_special_function_return_type
-   (special_function_kind, tree, tree, int, const location_t*);
+   (special_function_kind, tree, tree, int, const cp_declarator*,
+   location_t, const location_t*);
 static tree push_cp_library_fn (enum tree_code, tree, int);
 static tree build_cp_library_fn (tree, enum tree_code, tree, int);
 static void store_parm_decls (tree);
@@ -12349,9 +12350,9 @@ smallest_type_location (const cp_decl_specifier_seq 
*declspecs)
   return smallest_type_location (type_quals, declspecs->locations);
 }
 
-/* Check that it's OK to declare a function with the indicated TYPE
-   and TYPE_QUALS.  SFK indicates the kind of special function (if any)
-   that this function is.  OPTYPE is the type given in a conversion
+/* Check that it's OK to declare a function at ID_LOC with the indicated TYPE,
+   TYPE_QUALS and DECLARATOR.  SFK indicates the kind of special function (if
+   any) that this function is.  OPTYPE is the type given in a conversion
operator declaration, or the class type for a constructor/destructor.
Returns the actual return type of the function; that may be different
than TYPE if an error occurs, or for certain special functions.  */
@@ -12361,8 +12362,19 @@ check_special_function_return_type 
(special_function_kind sfk,
tree type,
tree optype,
int type_quals,
+   const cp_declarator *declarator,
+   location_t id_loc,
const location_t* locations)
 {
+  /* If TYPE is unspecified, DECLARATOR, if set, should not represent a pointer
+ or a reference type.  */
+  if (type == NULL_TREE
+  && declarator
+  && (declarator->kind == cdk_pointer
+ || declarator->kind == cdk_reference))
+error_at (id_loc, "expected unqualified-id before %qs token",
+ declarator->kind == cdk_pointer ? "*" : "&");
+
   switch (sfk)
 {
 case sfk_constructor:
@@ -13089,6 +13101,7 @@ grokdeclarator (const cp_declarator *declarator,
   type = check_special_function_return_type (sfk, type,
 ctor_return_type,
 type_quals,
+declarator, id_loc,
 declspecs->locations);
   type_quals = TYPE_UNQUALIFIED;
 }
diff --git a/gcc/testsuite/g++.dg/parse/constructor4.C 
b/gcc/testsuite/g++.dg/parse/constructor4.C
new file mode 100644
index 000..7d5a8ecaa97
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/constructor4.C
@@ -0,0 +1,36 @@
+// PR c++/118306
+// { dg-do "compile" }
+
+// Constructors.
+struct A {
+  *A ();   // { dg-error "expected unqualified-id" }
+};
+struct B {
+  &B ();   // { dg-error "expected unqualified-id|reference to" }
+};
+struct C {
+  *C (const C&);// { dg-error "expected unqualified-id" }
+};
+struct D {
+  &D (const D&);// { dg-error "expected 

Re: [PATCH] c-pretty-print.cc (pp_c_tree_decl_identifier): Strip private name encoding, PR118303

2025-01-10 Thread Jeff Law




On 1/6/25 5:01 PM, Hans-Peter Nilsson wrote:

Regtested native x86_64-linux.  Also tested mmix-knuth-mmixware,
where it fixes ONE testcase, but one which is a regression on
master.  The PR component is currently ipa, changed from the
original middle-end.  IIUC this bug-fix doesn't fit the ipa
category IMHO, but rather more general tree-optimization or
rather middle-end, to which I'll change the component unless I
see a reason for this fitting ipa stated.

Ok to commit?

-- >8 --
This is a part of PR118303.  It fixes
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c (test for excess errors)
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c inbuf.data (test for warnings, 
line 62)
for targets where the parameter on that line is subject to
TARGET_CALLEE_COPIES being true.

c-family:
PR middle-end/118303
* c-pretty-print.cc (c_pretty_printer::primary_expression) :
Call primary_expression for all SSA_NAME_VAR nodes and instead move the
DECL_ARTIFICIAL private name stripping to...
(pp_c_tree_decl_identifier): ...here.
OK assuming it's successfully gone through the usual regression test 
cycle on one of the primary platforms.


jeff



[PATCH] PR tree-optimization/88575 - Use relations when simplifying MIN and MAX.

2025-01-10 Thread Andrew MacLeod

This should have been done a while ago.

The call to simplify MIN and MAX was guarded by a check for INTEGRAL, so 
I removed that as the code was already generalized to work with any type.


And no attempt was being made to pass in a relation... so I query for a 
relation between op0 and op1, and pass it to fold_range. And then all 
the right things happen.


I see Jeff fixed PR 110199 in DOM.   I copied the tests from that and 
created the same tests to test that

  a) EVRP is removing all the MIN_EXPR and MAX_EXPRs
  b) Added a float version of the tests with -ffast-math  to show its 
also working with floats.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK for trunk?

Andrew

PS.    The same patch will not work on gcc-14, but the code could be 
ported if we wanted to.  Presumably DOM is getting the integral 
versions, so it would only be floats we would be new to handling.


gcc-13 is more challenging, but I think the infrastructure is there for 
the integer version. I dont think we have the proper float support yet. 
  But if DOM is already doing it from Jeffs patch, I'm not sure it matters.


From fd4a9e3a3a9ab8cc4c728a0e64463755ccf39ed8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 10 Jan 2025 13:33:01 -0500
Subject: [PATCH] Use relations when simplifying MIN and MAX.

Query for known relations between the operands, and pass that to
fold_range to help simplify MIN and MAX relations.
Make it type agnostic as well.

Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix).

	PR tree-optimization/88575
	gcc/
	* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query
	relation between op0 and op1 and utilize it.
	(simplify_using_ranges::simplify): Do not eliminate float checks.

	gcc/testsuite/
	gcc.dg/tree-ssa/minmax-27.c: Disable VRP.
	gcc.dg/tree-ssa/minmax-27e.c: New.
	gcc.dg/tree-ssa/minmax-27f.c: New.
	gcc.dg/tree-ssa/minmax-28.c: Disable VRP.
	gcc.dg/tree-ssa/minmax-28e.c: New.
	gcc.dg/tree-ssa/minmax-2fe.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-27e.c | 118 +
 gcc/testsuite/gcc.dg/tree-ssa/minmax-27f.c | 118 +
 gcc/testsuite/gcc.dg/tree-ssa/minmax-28.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-28e.c | 117 
 gcc/testsuite/gcc.dg/tree-ssa/minmax-28f.c | 117 
 gcc/vr-values.cc   |  13 ++-
 7 files changed, 481 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-27e.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-27f.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-28e.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-28f.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c
index 4b94203b0d0..a99af6eb521 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2" } */
+/* { dg-options "-O2 -fdump-tree-dom2 -fno-tree-vrp" } */
 
 
 int min1(int a, int b)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-27e.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-27e.c
new file mode 100644
index 000..8498ffd2017
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-27e.c
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+
+int min1(int a, int b)
+{
+if (a <= b)
+return a < b ? a : b;
+return 0;
+}
+
+int min2(int a, int b)
+{
+if (a <= b)
+return a > b ? b : a;
+return 0;
+}
+
+int min3(int a, int b)
+{
+if (a < b)
+return a < b ? a : b;
+return 0;
+}
+
+int min4(int a, int b)
+{
+if (a < b)
+return a > b ? b : a;
+return 0;
+}
+
+int min5(int a, int b)
+{
+if (a <= b)
+return a <= b ? a : b;
+return 0;
+}
+
+int min6(int a, int b)
+{
+if (a <= b)
+return a >= b ? b : a;
+return 0;
+}
+
+int min7(int a, int b)
+{
+if (a < b)
+return a <= b ? a : b;
+return 0;
+}
+
+int min8(int a, int b)
+{
+if (b > a)
+return a >= b ? b : a;
+return 0;
+}
+
+int min9(int a, int b)
+{
+if (b >= a)
+return a < b ? a : b;
+return 0;
+}
+
+int min10(int a, int b)
+{
+if (b >= a)
+return a > b ? b : a;
+return 0;
+}
+
+int min11(int a, int b)
+{
+if (b > a)
+return a < b ? a : b;
+return 0;
+}
+
+int min12(int a, int b)
+{
+if (b > a)
+return a > b ? b : a;
+return 0;
+}
+
+int min13(int a, int b)
+{
+if (b >= a)
+return a <= b ? a : b;
+return 0;
+}
+
+int min14(int a, int b)
+{
+if (b >= a)
+return a >= b ? b : a;
+return 0;
+}
+
+int min15(int a, int b)
+{
+if (b > a)
+return a <= b ? a : b;
+return 0;
+}
+
+int min16(int a, int b)
+{
+if (b > a)
+return a >= b ? b : a;
+return 0;
+}
+

Re: [PATCH] RISC-V: Fix riscv_modes_tieable_p

2025-01-10 Thread Palmer Dabbelt

On Fri, 10 Jan 2025 12:21:15 PST (-0800), jeffreya...@gmail.com wrote:



On 1/10/25 12:11 PM, Robin Dapp wrote:

Integer values and floating-point values need to be converted
by fmv series instructions. So if mode1 is MODE_INT and mode2
is MODE_FLOAT, we should return false in riscv_modes_tieable_p,
and vice versa.


I think that's on purpose because we can read and write float values
from/to integer registers.  Maybe it's a cost problem that we spill
at some point rather than access directly?

But even if you spill, as long as loads/stores don't modify the value
then I think we're OK from a correctness standpoint.




If I compile your test case I do see converting moves in the final
assembly - is there something you're concerned about in particular?


Which appears to be the glibc code (or very similar to it), and I don't 
think we've had users reporting incorrect results there.



Which was my general question as well.  Under precisely what
circumstances is this causing a problem?  The secondary question would
be how does this change interact with the finx and related extensions?


FWIW I'm also a bit lost here: I'd expect riscv_hard_regno_mode_ok() to 
be sufficient to handle these X/F register mixing cases, and thus us not 
to need any more special handling in riscv_modes_tieable_p().


(I think we're safe for finx with the current code, as we can access the 
registers safely there.)


So maybe there's something else also needed to trigger this?



Jeff


Re: [PATCH] config-ml.in: Fix multi-os-dir search

2025-01-10 Thread Jeff Law




On 1/7/25 7:24 PM, YunQiang Su wrote:

Jeff Law  于2025年1月8日周三 07:06写道:




On 1/1/25 6:42 PM, YunQiang Su wrote:

Matthias Klose  于2025年1月1日周三 22:37写道:


in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641619.html

there are two typos in the patch, compared to the local Debian patch,



Oh, sorry it is not duplicated.


- the subst macro has an additional parameter
- the multilib subdirs are not subdirs in lib, but have
  their multilib suffix attached to lib.



It is not subdirectories, so we use string concat here.
the result will be like
 /usr/lib/../lib32
since the output of
```
x86_64-linux-gnu-gcc -m32 --print-multi-os-directory
```
is
```
../lib32
```


ok for the trunk?

Matthias, given you probably have more insight into the patches Debian
is carrying than anyone in the world, what's the state here?

YunQiang, can you clarify your responses?  It's unclear if you're
objecting or not.



I feel this patch is incorrect, since we need a '/' between
`lib` and `$${libsuffix_}`.
Otherwise we will get something like
`lib../lib32`

Yea, I think your're right.

Matthias, can you double check the output of

gcc --print-multi-os-directory

which is used to set libsubbfix_.  I get "../lib"  But maybe there's 
something else buried in the Debian bits that makes this differ.


jeff


[r15-6807 Regression] FAIL: gcc.target/i386/pr106010-8c.c scan-tree-dump-times vect "(?n)add new stmt:.*MEM " 1 on Linux/x86_64

2025-01-10 Thread haochen.jiang
On Linux/x86_64,

68326d5d1a593dc0bf098c03aac25916168bc5a9 is the first bad commit
commit 68326d5d1a593dc0bf098c03aac25916168bc5a9
Author: Alex Coplan 
Date:   Mon Mar 11 13:09:10 2024 +

vect: Force alignment peeling to vectorize more early break loops [PR118211]

caused

FAIL: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom "Invalid sum"
FAIL: gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/vect-tail-nomask-1.c scan-tree-dump-times vect "LOOP 
VECTORIZED" 2
FAIL: gcc.target/i386/pr106010-8c.c scan-tree-dump-times vect "(?n)add new 
stmt:.*MEM " 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-6807/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/predcom-8.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/predcom-8.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-tail-nomask-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-tail-nomask-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-tail-nomask-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-tail-nomask-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr106010-8c.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr106010-8c.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr106010-8c.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr106010-8c.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r15-6810 Regression] FAIL: gcc.dg/vect/vect-121.c scan-tree-dump-not optimized "Invalid sum" on Linux/x86_64

2025-01-10 Thread haochen.jiang
On Linux/x86_64,

f4e259b4a66c81c234608056117836e13606e4c8 is the first bad commit
commit f4e259b4a66c81c234608056117836e13606e4c8
Author: Alex Coplan 
Date:   Thu Jul 25 16:34:05 2024 +

vect: Ensure we add vector skip guard even when versioning for aliasing 
[PR118211]

caused

FAIL: gcc.dg/vect/vect-121.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-121.c scan-tree-dump-not optimized "Invalid sum"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-6810/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-121.c --target_board='unix{-m32}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[pushed 2/2] c++: modules and function attributes

2025-01-10 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

30_threads/stop_token/stop_source/109339.cc was failing because we weren't
representing attribute access on the METHOD_TYPE for _Stop_state_ref.

The modules code expected attributes to appear on tt_variant_type and not
on tt_derived_type, but that's backwards since build_type_attribute_variant
gives a type with attributes its own TYPE_MAIN_VARIANT.

gcc/cp/ChangeLog:

* module.cc (trees_out::type_node): Write attributes for
tt_derived_type, not tt_variant_type.
(trees_in::tree_node): Likewise for reading.

gcc/testsuite/ChangeLog:

* g++.dg/modules/attrib-2_a.C: New test.
* g++.dg/modules/attrib-2_b.C: New test.
---
 gcc/cp/module.cc  | 17 +
 gcc/testsuite/g++.dg/modules/attrib-2_a.C | 12 
 gcc/testsuite/g++.dg/modules/attrib-2_b.C |  9 +
 3 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/attrib-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/attrib-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 321d4164a6a..c932c4d0a90 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -9189,7 +9189,10 @@ trees_out::type_node (tree type)
  tree_node (raises);
}
 
-  tree_node (TYPE_ATTRIBUTES (type));
+  /* build_type_attribute_variant creates a new TYPE_MAIN_VARIANT, so
+variants should all have the same set of attributes.  */
+  gcc_checking_assert (TYPE_ATTRIBUTES (type)
+  == TYPE_ATTRIBUTES (TYPE_MAIN_VARIANT (type)));
 
   if (streaming_p ())
{
@@ -9406,6 +9409,8 @@ trees_out::type_node (tree type)
   break;
 }
 
+  tree_node (TYPE_ATTRIBUTES (type));
+
   /* We may have met the type during emitting the above.  */
   if (ref_node (type) != WK_none)
 {
@@ -10090,6 +10095,13 @@ trees_in::tree_node (bool is_use)
break;
  }
 
+   /* In the exporting TU, a derived type with attributes was built by
+  build_type_attribute_variant as a distinct copy, with itself as
+  TYPE_MAIN_VARIANT.  We repeat that on import to get the version
+  without attributes as TYPE_CANONICAL.  */
+   if (tree attribs = tree_node ())
+ res = cp_build_type_attribute_variant (res, attribs);
+
int tag = i ();
if (!tag)
  {
@@ -10133,9 +10145,6 @@ trees_in::tree_node (bool is_use)
TYPE_USER_ALIGN (res) = true;
  }
 
-   if (tree attribs = tree_node ())
- res = cp_build_type_attribute_variant (res, attribs);
-
int quals = i ();
if (quals >= 0 && !get_overrun ())
  res = cp_build_qualified_type (res, quals);
diff --git a/gcc/testsuite/g++.dg/modules/attrib-2_a.C 
b/gcc/testsuite/g++.dg/modules/attrib-2_a.C
new file mode 100644
index 000..96f667ceec8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/attrib-2_a.C
@@ -0,0 +1,12 @@
+// { dg-additional-options "-fmodules -Wno-global-module" }
+// { dg-module-cmi M }
+
+export module M;
+
+export
+{
+  struct A { int i; };
+
+  __attribute ((access (none, 1)))
+  void f(const A&);
+}
diff --git a/gcc/testsuite/g++.dg/modules/attrib-2_b.C 
b/gcc/testsuite/g++.dg/modules/attrib-2_b.C
new file mode 100644
index 000..c12ad117ce4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/attrib-2_b.C
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fmodules -Wmaybe-uninitialized" }
+
+import M;
+
+int main()
+{
+  A a;
+  f(a);
+}
-- 
2.47.1



[pushed 1/2] c++: modules and class attributes

2025-01-10 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

std/time/traits/is_clock.cc was getting a warning about applying the
deprecated attribute to a variant of auto_ptr, which was wrong because it's
on the primary type.  This turned out to be because we were ignoring the
attributes on the definition of auto_ptr because the forward declaration in
unique_ptr.h has no attributes.  We need to merge attributes as usual in a
redeclaration.

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Merge attributes.

gcc/testsuite/ChangeLog:

* g++.dg/modules/attrib-1_a.C: New test.
* g++.dg/modules/attrib-1_b.C: New test.
---
 gcc/cp/module.cc  |  4 
 gcc/testsuite/g++.dg/modules/attrib-1_a.C | 13 +
 gcc/testsuite/g++.dg/modules/attrib-1_b.C | 10 ++
 3 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/attrib-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/attrib-1_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4fbe522264b..321d4164a6a 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8637,6 +8637,10 @@ trees_in::decl_value ()
  TYPE_STUB_DECL (type) = stub_decl ? stub_decl : inner;
  if (stub_decl)
TREE_TYPE (stub_decl) = type;
+
+ /* Handle separate declarations with different attributes.  */
+ tree &eattr = TYPE_ATTRIBUTES (TREE_TYPE (existing));
+ eattr = merge_attributes (eattr, TYPE_ATTRIBUTES (type));
}
 
   if (inner_tag)
diff --git a/gcc/testsuite/g++.dg/modules/attrib-1_a.C 
b/gcc/testsuite/g++.dg/modules/attrib-1_a.C
new file mode 100644
index 000..d5f89d0c068
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/attrib-1_a.C
@@ -0,0 +1,13 @@
+// { dg-additional-options "-fmodules -Wno-global-module" }
+// { dg-module-cmi M }
+
+module;
+
+template  struct A {
+  void f() const { }
+} __attribute__ ((deprecated ("y tho")));
+
+export module M;
+
+export template 
+A a;// { dg-warning "deprecated" }
diff --git a/gcc/testsuite/g++.dg/modules/attrib-1_b.C 
b/gcc/testsuite/g++.dg/modules/attrib-1_b.C
new file mode 100644
index 000..48ac751b03d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/attrib-1_b.C
@@ -0,0 +1,10 @@
+// { dg-additional-options -fmodules }
+
+template  struct A;
+
+import M;
+
+int main()
+{
+  a.f();
+}

base-commit: f30423ea8c2152dcee91056e75a4f3736cce6a6e
-- 
2.47.1



Re: [PATCH] c: Fix up expr location for __builtin_stdc_rotate_* [PR118376]

2025-01-10 Thread Marek Polacek
On Fri, Jan 10, 2025 at 10:45:18AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
> case (the other __builtin_stdc_* cases already have it), which means
> the locations in expr are uninitialized, sometimes causing ICEs in linemap
> code, at other times just valgrind errors about uninitialized var uses.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.
 
> 2025-01-10  Jakub Jelinek  
> 
>   PR c/118376
>   * c-parser.cc (c_parser_postfix_expression): Call
>   set_c_expr_source_range before break in the __builtin_stdc_rotate_*
>   case.
> 
>   * gcc.dg/pr118376.c: New test.
> 
> --- gcc/c/c-parser.cc.jj  2025-01-06 10:07:33.585493775 +0100
> +++ gcc/c/c-parser.cc 2025-01-09 16:12:07.761005082 +0100
> @@ -12906,6 +12906,7 @@ c_parser_postfix_expression (c_parser *p
> expr.value = build2_loc (loc, COMPOUND_EXPR,
>  TREE_TYPE (expr.value),
>  instrument_expr, expr.value);
> + set_c_expr_source_range (&expr, loc, close_paren_loc);
>   break;
> }
>   tree barg1 = arg;
> --- gcc/testsuite/gcc.dg/pr118376.c.jj2025-01-09 16:26:19.621072359 
> +0100
> +++ gcc/testsuite/gcc.dg/pr118376.c   2025-01-09 16:26:04.608283459 +0100
> @@ -0,0 +1,11 @@
> +/* PR c/118376 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wsign-conversion" } */
> +
> +unsigned x;
> +
> +void
> +foo ()
> +{
> +  __builtin_memset (&x, (long long) __builtin_stdc_rotate_right (x, 0), 1);
> +} /* { dg-warning "conversion to 'int' from 'long long int' may change the 
> sign of the result" "" { target *-*-* } .-1 } */
> 
>   Jakub
> 

Marek



Re: [PATCH v2 2/2] aarch64: Use standard names for SVE saturating arithmetic

2025-01-10 Thread Richard Sandiford
Akram Ahmad  writes:
> Rename the existing SVE unpredicated saturating arithmetic instructions
> to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md: Rename insns
>
> gcc/testsuite/ChangeLog:
>
>   * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc:
>   Template file for auto-vectorizer tests.
>   * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c:
>   Instantiate 8-bit vector tests.
>   * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
>   Instantiate 16-bit vector tests.
>   * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
>   Instantiate 32-bit vector tests.
>   * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c
>   Instantiate 64-bit vector tests.

OK, thanks.  I'll push it along with patch 1.

Sorry again for the long delay in reviewing this series.

Richard


> ---
>  gcc/config/aarch64/aarch64-sve.md |  4 +-
>  .../aarch64/sve/saturating_arithmetic.inc | 68 +++
>  .../aarch64/sve/saturating_arithmetic_1.c | 60 
>  .../aarch64/sve/saturating_arithmetic_2.c | 60 
>  .../aarch64/sve/saturating_arithmetic_3.c | 62 +
>  .../aarch64/sve/saturating_arithmetic_4.c | 62 +
>  6 files changed, 314 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 06bd3e4bb2c..b987b292b20 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -4379,7 +4379,7 @@
>  ;; -
>  
>  ;; Unpredicated saturating signed addition and subtraction.
> -(define_insn "@aarch64_sve_"
> +(define_insn "s3"
>[(set (match_operand:SVE_FULL_I 0 "register_operand")
>   (SBINQOPS:SVE_FULL_I
> (match_operand:SVE_FULL_I 1 "register_operand")
> @@ -4395,7 +4395,7 @@
>  )
>  
>  ;; Unpredicated saturating unsigned addition and subtraction.
> -(define_insn "@aarch64_sve_"
> +(define_insn "s3"
>[(set (match_operand:SVE_FULL_I 0 "register_operand")
>   (UBINQOPS:SVE_FULL_I
> (match_operand:SVE_FULL_I 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc 
> b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
> new file mode 100644
> index 000..0b3ebbcb0d6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
> @@ -0,0 +1,68 @@
> +/* Template file for vector saturating arithmetic validation.
> +
> +   This file defines saturating addition and subtraction functions for a 
> given
> +   scalar type, testing the auto-vectorization of these two operators. This
> +   type, along with the corresponding minimum and maximum values for that 
> type,
> +   must be defined by any test file which includes this template file.  */
> +
> +#ifndef SAT_ARIT_AUTOVEC_INC
> +#define SAT_ARIT_AUTOVEC_INC
> +
> +#include 
> +#include 
> +
> +#ifndef UT
> +#define UT uint32_t
> +#define UMAX UINT_MAX
> +#define UMIN 0
> +#endif
> +
> +void uaddq (UT *out, UT *a, UT *b, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +{
> +  UT sum = a[i] + b[i];
> +  out[i] = sum < a[i] ? UMAX : sum;
> +}
> +}
> +
> +void uaddq2 (UT *out, UT *a, UT *b, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +{
> +  UT sum;
> +  if (!__builtin_add_overflow(a[i], b[i], &sum))
> + out[i] = sum;
> +  else
> + out[i] = UMAX;
> +}
> +}
> +
> +void uaddq_imm (UT *out, UT *a, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +{
> +  UT sum = a[i] + 50;
> +  out[i] = sum < a[i] ? UMAX : sum;
> +}
> +}
> +
> +void usubq (UT *out, UT *a, UT *b, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +{
> +  UT sum = a[i] - b[i];
> +  out[i] = sum > a[i] ? UMIN : sum;
> +}
> +}
> +
> +void usubq_imm (UT *out, UT *a, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +{
> +  UT sum = a[i] - 50;
> +  out[i] = sum > a[i] ? UMIN : sum;
> +}
> +}
> +
> +#endif
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
> new file mode 100644
> index 000..6936e9a2704
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do co

[PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Wilco Dijkstra

ILP32 was originally intended to make porting to AArch64 easier.  Support was
never merged in the Linux kernel or GLIBC, so it has been unsupported for many
years.  There isn't a benefit in keeping unsupported features forever, so
deprecate it now (and it could be removed in a future release).

Passes regress & bootstrap, OK for commit?

gcc:
* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
* doc/invoke.texi: Document -mabi=ilp32 as deprecated.

gcc/testsuite:
* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
* gcc.target/aarch64/pr100518.c: Likewise.
* gcc.target/aarch64/pr113114.c: Likewise.
* gcc.target/aarch64/pr80295.c: Likewise.
* gcc.target/aarch64/pr94201.c: Likewise.
* gcc.target/aarch64/pr94577.c: Likewise.
* gcc.target/aarch64/sve/pr108603.c: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
78d2cc4bbe4933c79153d0741bfd8d7b076952d0..02891b0a8ed75eb596df9d0dbff77ccd6a625f11
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19315,6 +19315,8 @@ aarch64_override_options (void)
   if (TARGET_ILP32)
 error ("assembler does not support %<-mabi=ilp32%>");
 #endif
+  if (TARGET_ILP32)
+warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated.");
 
   /* Convert -msve-vector-bits to a VG count.  */
   aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
17fe2c64c1f85ad8db8b61f040aafe5f8212e488..6722ad5281541e499d5b3916179d9a4d1b39097f
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21472,6 +21472,8 @@ The default depends on the specific target 
configuration.  Note that
 the LP64 and ILP32 ABIs are not link-compatible; you must compile your
 entire program with the same ABI, and link with a compatible set of libraries.
 
+@samp{ilp32} is deprecated.
+
 @opindex mbig-endian
 @item -mbig-endian
 Generate big-endian code.  This is the default when GCC is configured for an
diff --git a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c 
b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
index 
fe8414559864db4a8584fd3f5a7145b5e3d1f322..276c10cd0e86ff2c74a5c09ce70f7d76614978ec
 100644
--- a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
+++ b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-finline-stringops -mabi=ilp32 -ftrivial-auto-var-init=zero" 
} */
+/* { dg-options "-finline-stringops -mabi=ilp32 -Wno-deprecated 
-ftrivial-auto-var-init=zero" } */
 
 short m(unsigned k) {
   const unsigned short *n[65];
diff --git a/gcc/testsuite/gcc.target/aarch64/pr100518.c 
b/gcc/testsuite/gcc.target/aarch64/pr100518.c
index 
5ca599f5d2e0e1603456b2eaf2e98866871faad1..177991cfb2289530e4ee3e3633fddde5972e9e28
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr100518.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr100518.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mabi=ilp32 -mstrict-align -O2" } */
+/* { dg-options "-mabi=ilp32 -Wno-deprecated -mstrict-align -O2" } */
 
 int unsigned_range_min, unsigned_range_max, a11___trans_tmp_1;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113114.c 
b/gcc/testsuite/gcc.target/aarch64/pr113114.c
index 
5b0383c24359ad95c7d333a6f18b98e50383f71b..976e2db71bfafe96e3729e4d4bc333874d98c084
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr113114.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr113114.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mabi=ilp32 -O -mearly-ldp-fusion -mlate-ldp-fusion" } */
+/* { dg-options "-mabi=ilp32 -Wno-deprecated -O -mearly-ldp-fusion 
-mlate-ldp-fusion" } */
 void foo_n(double *a) {
   int i = 1;
   for (; i < (int)foo_n; i++)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
b/gcc/testsuite/gcc.target/aarch64/pr80295.c
index 
b3866d8d6a9e5688f0eedb2fd7504547c412afa2..c79427517d0e61417dd5c0013f8db04ed91da449
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr80295.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mabi=ilp32" } */
+/* { dg-options "-mabi=ilp32 -Wno-deprecated" } */
 
 void f (void *b) 
 { 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr94201.c 
b/gcc/testsuite/gcc.target/aarch64/pr94201.c
index 
3b9b79059e02b21c652726abb86d124274b6547c..cd21f7c06690219410a78eb824fd140627df3354
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr94201.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr94201.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mcmodel=tiny -mabi=ilp32 -fPIC" } */
+/* { dg-options "-mcmodel=tiny -mabi=ilp32 -Wno-deprecated -fPIC" } */
 /* { dg-require-effective-target fpic } */
 
 extern int bar (void *);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr94577.c 
b/gcc/testsuite/gcc.target/aarch64/pr94577.c
index 
d51799fb0bb67999ed1374e2d6

Re: [PATCH 2/3] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2025-01-10 Thread Wilco Dijkstra
ping
 

Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is
already enabled for some cores, but benchmarking it shows it is faster on all
modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1).

Passes regress & bootstrap, OK for commit?

gcc/ChangeLog:

    * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): 
Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.   
    * config/aarch64/tuning_models/ampere1b.h: Remove redundant 
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
    * config/aarch64/tuning_models/neoversev2.h: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 
ffbff20e29c78c00fc211adbba962c20827370aa..1d8abee1e263706e3930e4d39c59faefef8cfe41
 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -51,6 +51,7 @@ AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", 
FULLY_PIPELINED_FMA)
 AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
 
 /* Baseline tuning settings suitable for all modern cores.  */
-#define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND)
+#define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND \
+    | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
 
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/tuning_models/ampere1b.h 
b/gcc/config/aarch64/tuning_models/ampere1b.h
index 
936fe7ad390edbf70f670d50843bc5caa4fa55e5..340f7b0b47943a43ac57342a464c9267d9912f28
 100644
--- a/gcc/config/aarch64/tuning_models/ampere1b.h
+++ b/gcc/config/aarch64/tuning_models/ampere1b.h
@@ -103,8 +103,7 @@ static const struct tune_params ampere1b_tunings =
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_BASE
-   | AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA
-   | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA), /* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags.  */
   &ere1b_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALIGNED    /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h 
b/gcc/config/aarch64/tuning_models/neoversev2.h
index 
40af5f47f4f62757e8e374abbb29cec5d1a8f7f3..43baeafd646bafadb739376160eaaf268d0542a8
 100644
--- a/gcc/config/aarch64/tuning_models/neoversev2.h
+++ b/gcc/config/aarch64/tuning_models/neoversev2.h
@@ -234,8 +234,7 @@ static const struct tune_params neoversev2_tunings =
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
-   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW
-   | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),  /* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &neoversev2_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */


Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Wilco Dijkstra
ping
 

Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and 
AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
to the baseline tuning since all modern cores use it.  Fix the neoverse512tvb 
tuning to be
like Neoverse V1/V2.

gcc/ChangeLog:

    * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): 
Update.   
    * config/aarch64/tuning_models/cortexx925.h: Update.
    * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
    * config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
    * config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
    * config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
    * config/aarch64/tuning_models/neoversen2.h: Likewise.
    * config/aarch64/tuning_models/neoversen3.h: Likewise.
    * config/aarch64/tuning_models/neoversev1.h: Likewise.
    * config/aarch64/tuning_models/neoversev2.h: Likewise.
    * config/aarch64/tuning_models/neoversev3.h: Likewise.
    * config/aarch64/tuning_models/neoversev3ae.h: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 
1d8abee1e263706e3930e4d39c59faefef8cfe41..94ab968dcab999300ce4a01be939b3d9d0a7d910
 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -52,6 +52,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
 
 /* Baseline tuning settings suitable for all modern cores.  */
 #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND  \
-    | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
+    | AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA \
+    | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS \
+    | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT)
 
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
b/gcc/config/aarch64/tuning_models/cortexx925.h
index 
b2ff716157a452f4ff0260c5be8ddc0355e1a9e1..ab7504a367ed0b0f8b0e59f3ad0230b172d94fa0
 100644
--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -219,8 +219,6 @@ static const struct tune_params cortexx925_tunings =
   tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_BASE
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
    | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h 
b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
index 
2d704ecd1100b5ed04a81c297f4d1508089fa78b..feb512811ee7fdb542c8d41578c40267eab0dea5
 100644
--- a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
+++ b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
@@ -54,9 +54,7 @@ static const struct tune_params fujitsu_monaka_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_BASE
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),    /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_BASE),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
index 
bdd309ab03d7737a38c2b12b16db669424d43b3a..7529848fe1569944be862fdc267c8c5e7f8512a0
 100644
--- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
@@ -183,8 +183,7 @@ static const struct tune_params generic_armv8_a_tunings =
   tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_BASE
    | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),    /* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS    /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
index 
a05a9ab92a27e8f24949aa2ffa5b5512c1487518..1ef8bd43e1efb44137f4fa7a85383e85dbd68725
 100644
--- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -248,9 +248,7 @@ static const struct tune_params generic_armv9_a_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,    /* autoprefe

Re: libstdc++: Optimize std::vector

2025-01-10 Thread Jonathan Wakely
On Sun, 8 Dec 2024 at 15:36, Jan Hubicka  wrote:
>
> Hi,
> std::vector has independent implementation for bool which has its won
> size/capacity functions.  I updated them to add __builtin_unreachable to
> announce that size is never more than max_size.  However while testing the 
> code
> I noticed that even construction of unused copy is not optimized out.  Main
> problem is that the vector copying loops copies the tail of vector using loop
> that copies bit by bit.  We eventually pattern match it to bit operations 
> (that
> surprises me) but we need to unroll it and run through loop optimization that
> happens late.
>
> This patch also updates copy_aglined to use bit operations. However for = 
> operation
> we can do better since it is not necessary to preserve original bits (those 
> are
> undefined anyway). So I added copy_aglined_trail (better name would be 
> welcome)
> that does not care about trailing bits of the copied block.
>
> As a result the following functions are optimized to empty functions:
> #include 
> bool
> empty(std::vector src)
> {
> std::vector  data=src;
> return false;
> }
> bool
> empty2(std::vector src)
> {
> std::vector  data;
> data.reserve(1);
> return data.size ();
> }
> bool
> empty3()
> {
> std::vector  data;
> data.push_back (true);
> return data[0];
> }
>
> Finally I mirrored changes to push_back from normal vectors to bit vectors.
> This involve separating append from insert and breaking out the resizing (and
> cold) path to separate function.
>
> Here are two little benchmarks on push_back:
>
> #include 
>
> __attribute__ ((noipa))
> std::vector
> test()
> {
> std::vector  t;
> t.push_back (1);
> t.push_back (2);
> return t;
> }
> int
> main()
> {
> for (int i = 0; i < 1; i++)
> {
> test();
> }
> return 0;
> }
>
>  runtime(s) .text size of 
> a.out
> gcc -O2 1.041606
> gcc -O2 + patch 0.981315
> gcc -O3 0.981249
> gcc -O3 + patch 0.961138
> gcc -O3 + patch --param max-inline-insns-auto=5000  0.961138
> clang -O2   1.561823
> clang -O3   1.561839
> clang -O2 + libc++  2.314272
> clang -O3 + libc++  2.764262
>
> push_back is still too large to inline fully at -O3.  If parameters are 
> bumped up
> for small vectors this makes it possible to propagate bit positions.
> This variant of benchmark
>
> #include 
>
> __attribute__ ((noipa))
> std::vector
> test()
> {
> std::vector  t;
> t.push_back (1);
> t.push_back (2);
> return t;
> }
> int
> main()
> {
> for (int i = 0; i < 1; i++)
> {
> test();
> }
> return 0;
> }
>
>
>  runtime(s) .text size of 
> a.out
> gcc -O2 1.481574
> gcc -O2 + patch 1.301177
> gcc -O3 1.461515
> gcc -O3 + patch 1.271069
> gcc -O3 + patch --param max-inline-insns-auto=5000  1.10388
> clang -O2   1.481823
> clang -O3   1.461711
> clang -O2 + libc++  1.201001
> clang -O3 + libc++  1.171001
>
> Note that clang does not suport noipa attribute, so it has some advantage 
> here.
> Bootstrapped/regtested x86_64-linux, OK?
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_bvector.h (vector::operator=): Use
> _M_copy_aligned_trail.
> (vector::copy_aglined_trail): New function.

Spelling

> (vector::copy_aglined): Implement copying of tail 
> manually.

Spelling

> (vector::size,vector::capacity): Add check
> that size is at most max_size.
> (vector::size,vector::push_back): Use
> _M_append_aux.
> (vector::size,vector::_M_append_aux): 
> Declare.
> (vector::size,vector _Alloc>::_M_realloc_append_aux): Declare.
> (vector::size,vector _Alloc>::_M_realloc_insert_aux): Declare.
> * include/bits/vector.tcc
> (vector::size,vector::_M_append_aux): 
> Implement.
> (vector::size,vector _Alloc>::_M_realloc_append_aux): Implement.
> (vector::size,vector _Alloc>::_M_realloc_insert_aux): Break out from ...
> (vector::size,vector::_M_insert_aux): ... 
> here.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/bvector-1.C: New test.
>
> diff

Re: [PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Kyrylo Tkachov


> On 10 Jan 2025, at 15:30, Richard Sandiford  wrote:
> 
> Wilco Dijkstra  writes:
>> As a minor cleanup remove Cortex-A57 FMA steering pass.  Since Cortex-A57 is
>> pretty old, there isn't any benefit of keeping this.
>> 
>> Passes regress & bootstrap, OK for commit?
>> 
>> gcc:
>> * config.gcc (extra_objs): Remove cortex-a57-fma-steering.o.
>> * config/aarch64/aarch64-passes.def: Remove pass_fma_steering. 
>> * config/aarch64/aarch64-protos.h: Remove make_pass_fma_steering.
>> * config/aarch64/aarch64-tuning-flags.def (RENAME_FMA_REGS): Remove. 
>> * config/aarch64/cortex-a57-fma-steering.cc: Delete file.
>> * config/aarch64/t-aarch64: Remove cortex-a57-fma-steering.o rules. 
>> * config/aarch64/tuning_models/cortexa57.h (cortexa57_tunings):
>>Remove RENAME_FMA_REGS tuning.
> 
> This would probably be a reasonable compromise if the pass ever became a
> maintenance burden.  But I'm not sure the pass is a burden at the moment.
> TBH, I don't remember having had to think about it for years :)
> 
> So IMO we should keep the pass, but reconsider removing it if in future
> it requires non-trivial effort to keep working.
> 
> That said, the fact the patch requires no testsuite changes suggests
> that the pass could easily become ineffective without anyone noticing.
> So I won't object if other maintainers are in favour.

We still have Cortex-A57-based products out there that can potentially use 
newer compilers.
So I’d be in favor for keeping the pass in as long as it’s not too much of a 
maintenance burden.

Thanks,
Kyrill

> 
> Thanks,
> Richard
> 
>> 
>> ---
>> 
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index 
>> 55e37146ee0356b67b8a1a09d263eccdf69cd91a..97ef3ae77fb97b347ba43e55f7a05f5438a96a43
>>  100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -350,7 +350,7 @@ aarch64*-*-*)
>> c_target_objs="aarch64-c.o"
>> cxx_target_objs="aarch64-c.o"
>> d_target_objs="aarch64-d.o"
>> - extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
>> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
>> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
>> cortex-a57-fma-steering.o aarch64-speculation.o aarch-bti-insert.o 
>> aarch64-cc-fusion.o aarch64-early-ra.o aarch64-ldp-fusion.o"
>> + extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
>> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
>> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o aarch64-speculation.o 
>> aarch-bti-insert.o aarch64-cc-fusion.o aarch64-early-ra.o 
>> aarch64-ldp-fusion.o"
>> target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h 
>> \$(srcdir)/config/aarch64/aarch64-builtins.h 
>> \$(srcdir)/config/aarch64/aarch64-builtins.cc 
>> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
>> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
>> target_has_targetm_common=yes
>> ;;
>> diff --git a/gcc/config/aarch64/aarch64-passes.def 
>> b/gcc/config/aarch64/aarch64-passes.def
>> index 
>> 9cf9d3e13b2cb0d0bf9c34439785e8ca704230fe..d80b9c6f39c3b56aa6251a81590172f70b31e01e
>>  100644
>> --- a/gcc/config/aarch64/aarch64-passes.def
>> +++ b/gcc/config/aarch64/aarch64-passes.def
>> @@ -19,7 +19,6 @@
>>.  */
>> 
>> INSERT_PASS_BEFORE (pass_sched, 1, pass_aarch64_early_ra);
>> -INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
>> INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
>> INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
>> pass_switch_pstate_sm);
>> INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
>> pass_late_track_speculation);
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 
>> fa7bc8029be04f6530d2aee2ead4d754ba3b2550..afdf7d01adb3bdafd15d0422c3b2dfd680383bef
>>  100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -1199,7 +1199,6 @@ std::string aarch64_get_extension_string_for_isa_flags 
>> (aarch64_feature_flags,
>> aarch64_feature_flags);
>> 
>> rtl_opt_pass *make_pass_aarch64_early_ra (gcc::context *);
>> -rtl_opt_pass *make_pass_fma_steering (gcc::context *);
>> rtl_opt_pass *make_pass_track_speculation (gcc::context *);
>> rtl_opt_pass *make_pass_late_track_speculation (gcc::context *);
>> rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
>> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
>> b/gcc/config/aarch64/aarch64-tuning-flags.def
>> index 
>> 1feff3beb348f45c254c5a7c346a1a9674dee362..b1ab068f4073f459e829503f59c9de236cffb63a
>>  100644
>> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>> @@ -28,8 +28,6 @@
>>  INTERNAL_NAME gives the internal name suitable for appending to
>>  AARCH64_TUNE_ to give an enum name. */
>> 
>> -AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
>> -
>> /* Some of the optional shift to some arthematic instructions are
>>considere

Re: libstdc++: Optimize std::vector

2025-01-10 Thread Jonathan Wakely
On Fri, 10 Jan 2025 at 14:51, Jonathan Wakely  wrote:
>
> On Sun, 8 Dec 2024 at 15:36, Jan Hubicka  wrote:
> >
> > Hi,
> > std::vector has independent implementation for bool which has its won
> > size/capacity functions.  I updated them to add __builtin_unreachable to
> > announce that size is never more than max_size.  However while testing the 
> > code
> > I noticed that even construction of unused copy is not optimized out.  Main
> > problem is that the vector copying loops copies the tail of vector using 
> > loop
> > that copies bit by bit.  We eventually pattern match it to bit operations 
> > (that
> > surprises me) but we need to unroll it and run through loop optimization 
> > that
> > happens late.
> >
> > This patch also updates copy_aglined to use bit operations. However for = 
> > operation
> > we can do better since it is not necessary to preserve original bits (those 
> > are
> > undefined anyway). So I added copy_aglined_trail (better name would be 
> > welcome)
> > that does not care about trailing bits of the copied block.
> >
> > As a result the following functions are optimized to empty functions:
> > #include 
> > bool
> > empty(std::vector src)
> > {
> > std::vector  data=src;
> > return false;
> > }
> > bool
> > empty2(std::vector src)
> > {
> > std::vector  data;
> > data.reserve(1);
> > return data.size ();
> > }
> > bool
> > empty3()
> > {
> > std::vector  data;
> > data.push_back (true);
> > return data[0];
> > }
> >
> > Finally I mirrored changes to push_back from normal vectors to bit vectors.
> > This involve separating append from insert and breaking out the resizing 
> > (and
> > cold) path to separate function.
> >
> > Here are two little benchmarks on push_back:
> >
> > #include 
> >
> > __attribute__ ((noipa))
> > std::vector
> > test()
> > {
> > std::vector  t;
> > t.push_back (1);
> > t.push_back (2);
> > return t;
> > }
> > int
> > main()
> > {
> > for (int i = 0; i < 1; i++)
> > {
> > test();
> > }
> > return 0;
> > }
> >
> >  runtime(s) .text size of 
> > a.out
> > gcc -O2 1.041606
> > gcc -O2 + patch 0.981315
> > gcc -O3 0.981249
> > gcc -O3 + patch 0.961138
> > gcc -O3 + patch --param max-inline-insns-auto=5000  0.961138
> > clang -O2   1.561823
> > clang -O3   1.561839
> > clang -O2 + libc++  2.314272
> > clang -O3 + libc++  2.764262
> >
> > push_back is still too large to inline fully at -O3.  If parameters are 
> > bumped up
> > for small vectors this makes it possible to propagate bit positions.
> > This variant of benchmark
> >
> > #include 
> >
> > __attribute__ ((noipa))
> > std::vector
> > test()
> > {
> > std::vector  t;
> > t.push_back (1);
> > t.push_back (2);
> > return t;
> > }
> > int
> > main()
> > {
> > for (int i = 0; i < 1; i++)
> > {
> > test();
> > }
> > return 0;
> > }
> >
> >
> >  runtime(s) .text size of 
> > a.out
> > gcc -O2 1.481574
> > gcc -O2 + patch 1.301177
> > gcc -O3 1.461515
> > gcc -O3 + patch 1.271069
> > gcc -O3 + patch --param max-inline-insns-auto=5000  1.10388
> > clang -O2   1.481823
> > clang -O3   1.461711
> > clang -O2 + libc++  1.201001
> > clang -O3 + libc++  1.171001
> >
> > Note that clang does not suport noipa attribute, so it has some advantage 
> > here.
> > Bootstrapped/regtested x86_64-linux, OK?
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/stl_bvector.h (vector::operator=): Use
> > _M_copy_aligned_trail.
> > (vector::copy_aglined_trail): New function.
>
> Spelling
>
> > (vector::copy_aglined): Implement copying of tail 
> > manually.
>
> Spelling
>
> > (vector::size,vector::capacity): Add 
> > check
> > that size is at most max_size.
> > (vector::size,vector::push_back): Use
> > _M_append_aux.
> > (vector::size,vector::_M_append_aux): 
> > Declare.
> > (vector::size,vector > _Alloc>::_M_realloc_append_aux): Declare.
> > (vector::size,vector > _Alloc>::_M_realloc_insert_aux): Declare.
> > * include/bits/vector.t

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Kyrylo Tkachov


> On 10 Jan 2025, at 15:54, Wilco Dijkstra  wrote:
> 
> ping
>  
> 
> Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and 
> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> to the baseline tuning since all modern cores use it.  Fix the neoverse512tvb 
> tuning to be
> like Neoverse V1/V2.

For neoversev512tvb this means adding AARCH64_EXTRA_TUNE_AVOID_PRED_RMW right?
That’s fine by me.
AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS doesn’t exist anymore (i.e. it’s 
implicitly on) so the patch needs to be updated.

Thanks,
Kyrill


> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): 
> Update.   
> * config/aarch64/tuning_models/cortexx925.h: Update.
> * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
> * config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
> * config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
> * config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
> * config/aarch64/tuning_models/neoversen2.h: Likewise.
> * config/aarch64/tuning_models/neoversen3.h: Likewise.
> * config/aarch64/tuning_models/neoversev1.h: Likewise.
> * config/aarch64/tuning_models/neoversev2.h: Likewise.
> * config/aarch64/tuning_models/neoversev3.h: Likewise.
> * config/aarch64/tuning_models/neoversev3ae.h: Likewise.
> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 
> 1d8abee1e263706e3930e4d39c59faefef8cfe41..94ab968dcab999300ce4a01be939b3d9d0a7d910
>  100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -52,6 +52,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", 
> AVOID_PRED_RMW)
>  
>  /* Baseline tuning settings suitable for all modern cores.  */
>  #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND  \
> -| AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
> +| AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA \
> +| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS \
> +| 
> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT)
>  
>  #undef AARCH64_EXTRA_TUNING_OPTION
> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
> b/gcc/config/aarch64/tuning_models/cortexx925.h
> index 
> b2ff716157a452f4ff0260c5be8ddc0355e1a9e1..ab7504a367ed0b0f8b0e59f3ad0230b172d94fa0
>  100644
> --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> @@ -219,8 +219,6 @@ static const struct tune_params cortexx925_tunings =
>tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_BASE
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
>&generic_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
> diff --git a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h 
> b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
> index 
> 2d704ecd1100b5ed04a81c297f4d1508089fa78b..feb512811ee7fdb542c8d41578c40267eab0dea5
>  100644
> --- a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
> +++ b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
> @@ -54,9 +54,7 @@ static const struct tune_params fujitsu_monaka_tunings =
>2,   /* min_div_recip_mul_df.  */
>0,   /* max_case_values.  */
>tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_BASE
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_BASE),   /* tune_flags.  */
>&generic_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> index 
> bdd309ab03d7737a38c2b12b16db669424d43b3a..7529848fe1569944be862fdc267c8c5e7f8512a0
>  100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> @@ -183,8 +183,7 @@ static const struct tune_params generic_armv8_a_tunings =
>tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_BASE
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> -   | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
> +   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
>&generic_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model.  */
> diff --git a/gcc/config/aarch64/tun

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Akram Ahmad

Ah whoops- I didn't see this before sending off V4 just now, my apologies.
I'll try my best to get this implemented before the end of the day so that
it doesn't miss the deadline.

On 09/01/2025 23:04, Richard Sandiford wrote:

Akram Ahmad  writes:

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

This can be fixed by changing:

case CT_REGISTER:
  if (REG_P (op) || SUBREG_P (op))
return true;
  break;

to:

case CT_REGISTER:
  if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH)
return true;
  break;

But I can test & post that as a follow-up if you prefer.

Yes please, if that's not too much trouble- would that have to go into
another patch?

+
  ;; Double vector modes.
  (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF])
  
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c

new file mode 100644
index 000..2b72be7b0d7
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
@@ -0,0 +1,79 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+/*
+** uadd_lane: { xfail *-*-* }
+** dup\tv([0-9]+).8b, w0
+** uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1)
+** umov\tw0, v\2.b\[0\]
+** ret
+*/

Whats the reason behind the xfail?  Is it the early-ra thing, or
something else?  (You might already have covered this, sorry.)

xfailing is fine if it needs further optimisation, was just curious :)
This is because of a missing pattern in match.pd (I've sent another 
patch upstream
to add the missing pattern, although it may have gotten lost). Once that 
pattern is
added though, this should be recognised as .SAT_SUB, and the new 
instructions will

appear.

[...]
diff --git a/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c 
b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
new file mode 100644
index 000..0fc6804683a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
@@ -0,0 +1,270 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps -mearly-ra=none" } */

It'd be worth adding -fno-schedule-insns2 here.  Same for
saturating_arithmetic_1.c and saturating_arithmetic_2.c.  The reason
is that:


+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+#include 
+#include 
+
+/*
+** sadd32:
+** asr w([0-9]+), w1, 31
+** addsw([0-9]+), (?:w0, w1|w1, w0)
+** eor w\1, w\1, -2147483648
+** csinv   w0, w\2, w\1, vc
+** ret
+*/

...the first two instructions can be in either order, and similarly
for the second and third.

Really nice tests though :)


Thanks! That also makes a lot of sense, I was cautious of assuming the 
instructions would
always be in that exact order, so it's good to know I can try and 
specify that.




[PATCH] c++: Handle RAW_DATA_CST in unify [PR118390]

2025-01-10 Thread Jakub Jelinek
Hi!

The following patch on top of
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673177.html
(in review currently)
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672055.html
(acked but waiting for the former)
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672438.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672496.html
(waiting for review)
uses the new function introduced in the second patch to fix up
unify deduction of array sizes.

Ok for trunk?

2025-01-10  Jakub Jelinek  

PR c++/118390
* cp-tree.h (count_ctor_elements): Declare.
* call.cc (count_ctor_elements): No longer static.
* pt.cc (unify): Use count_ctor_elements instead of
CONSTRUCTOR_NELTS.

* g++.dg/cpp/embed-20.C: New test.
* g++.dg/cpp0x/pr118390.C: New test.

--- gcc/cp/cp-tree.h.jj 2025-01-10 11:47:58.478841366 +0100
+++ gcc/cp/cp-tree.h2025-01-10 12:40:51.898875583 +0100
@@ -6815,6 +6815,7 @@ extern tree type_decays_to(tree);
 extern tree extract_call_expr  (tree);
 extern tree build_trivial_dtor_call(tree, bool = false);
 extern tristate ref_conv_binds_to_temporary(tree, tree, bool = false);
+extern unsigned HOST_WIDE_INT count_ctor_elements (tree);
 extern tree build_user_type_conversion (tree, tree, int,
 tsubst_flags_t);
 extern tree build_new_function_call(tree, vec **,
--- gcc/cp/call.cc.jj   2025-01-10 11:49:42.155399433 +0100
+++ gcc/cp/call.cc  2025-01-10 12:40:12.906413343 +0100
@@ -4333,7 +4333,7 @@ has_non_trivial_temporaries (tree expr)
 
 /* Return number of initialized elements in CTOR.  */
 
-static unsigned HOST_WIDE_INT
+unsigned HOST_WIDE_INT
 count_ctor_elements (tree ctor)
 {
   unsigned HOST_WIDE_INT len = 0;
--- gcc/cp/pt.cc.jj 2025-01-10 10:32:28.801729684 +0100
+++ gcc/cp/pt.cc2025-01-10 12:41:19.801491043 +0100
@@ -25064,7 +25064,7 @@ unify (tree tparms, tree targs, tree par
  && deducible_array_bound (TYPE_DOMAIN (parm)))
{
  /* Also deduce from the length of the initializer list.  */
- tree max = size_int (CONSTRUCTOR_NELTS (arg));
+ tree max = size_int (count_ctor_elements (arg));
  tree idx = compute_array_index_type (NULL_TREE, max, tf_none);
  if (idx == error_mark_node)
return unify_invalid (explain_p);
--- gcc/testsuite/g++.dg/cpp/embed-20.C.jj  2025-01-10 12:30:08.578762083 
+0100
+++ gcc/testsuite/g++.dg/cpp/embed-20.C 2025-01-10 12:29:29.882296675 +0100
@@ -0,0 +1,14 @@
+// PR c++/118390
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template
+constexpr int
+foo (const T (&x)[N])
+{
+  return N;
+}
+
+static_assert (foo ({
+  #embed __FILE__ limit (64)
+}) == 64, "");
--- gcc/testsuite/g++.dg/cpp0x/pr118390.C.jj2025-01-10 12:30:59.748055186 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr118390.C   2025-01-10 12:31:34.681572583 
+0100
@@ -0,0 +1,23 @@
+// PR c++/118390
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template
+constexpr int
+foo (const T (&x)[N])
+{
+  return N;
+}
+
+static_assert (foo ({
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+}) == 160, "");

Jakub



[PATCH] libstdc++: Fix std::barrier for constant initialization [PR118395]

2025-01-10 Thread Jonathan Wakely
The std::barrier constructor should be constexpr, which means we need to
defer the dynamic allocation if the constructor is called during
constant-initialization. We can defer it to the first call to
barrier::arrive, using compare-and-swap on an atomic (instead of the
unique_ptr currently used).

Also add precondition checks to the constructor and arrive member
function. Also implement the proposed resolution of LWG 3898.

libstdc++-v3/ChangeLog:

PR libstdc++/118395
PR libstdc++/108974
PR libstdc++/98749
* include/std/barrier (__tree_barrier): Use default
member-initializers. Change _M_state member from
unique_ptr<__state_t[]> to atomic<__state_t*>. Add
no_unique_address attribute to _M_completion.
(__tree_barrier::_M_arrive): Load value from _M_state.
(__tree_barrier::_M_invoke_completion): New member function to
ensure a throwing completion function will terminate, as
proposed in LWG 3898.
(__tree_barrier::max): Reduce by one to avoid overflow.
(__tree_barrier::__tree_barrier): Add constexpr. Qualify call to
std::move. Remove mem-initializers made unnecessary by default
member-initializers. Add precondition check. Only allocate state
array if not constant evaluated.
(__tree_barrier::arrive): Add precondition check. Do deferred
initialization of _M_state if needed.
(barrier): Add static_assert, as proposed in LWG 3898.
(barrier::barrier): Add constexpr.
* testsuite/30_threads/barrier/cons.cc: New test.
* testsuite/30_threads/barrier/lwg3898.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/include/std/barrier  | 57 ++-
 .../testsuite/30_threads/barrier/cons.cc  |  6 ++
 .../testsuite/30_threads/barrier/lwg3898.cc   | 45 +++
 3 files changed, 93 insertions(+), 15 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/cons.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/barrier/lwg3898.cc

diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index 62b03d0223f4..9c1de411f9ce 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -96,11 +96,11 @@ It looks different from literature pseudocode for two main 
reasons:
   };
 
   ptrdiff_t _M_expected;
-  unique_ptr<__state_t[]> _M_state;
-  __atomic_base _M_expected_adjustment;
-  _CompletionF _M_completion;
+  __atomic_base<__state_t*> _M_state{nullptr};
+  __atomic_base _M_expected_adjustment{0};
+  [[no_unique_address]] _CompletionF _M_completion;
 
-  alignas(__phase_alignment) __barrier_phase_t  _M_phase;
+  alignas(__phase_alignment) __barrier_phase_t  _M_phase{};
 
   bool
   _M_arrive(__barrier_phase_t __old_phase, size_t __current)
@@ -114,6 +114,8 @@ It looks different from literature pseudocode for two main 
reasons:
size_t __current_expected = _M_expected;
__current %= ((_M_expected + 1) >> 1);
 
+   __state_t* const __state = _M_state.load(memory_order_relaxed);
+
for (int __round = 0; ; ++__round)
  {
if (__current_expected <= 1)
@@ -125,7 +127,7 @@ It looks different from literature pseudocode for two main 
reasons:
if (__current == __end_node)
  __current = 0;
auto __expect = __old_phase;
-   __atomic_phase_ref_t __phase(_M_state[__current]
+   __atomic_phase_ref_t __phase(__state[__current]
.__tickets[__round]);
if (__current == __last_node && (__current_expected & 1))
  {
@@ -150,36 +152,59 @@ It looks different from literature pseudocode for two 
main reasons:
  }
   }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3898. Possibly unintended preconditions for completion functions
+  void _M_invoke_completion() noexcept { _M_completion(); }
+
 public:
   using arrival_token = __barrier_phase_t;
 
   static constexpr ptrdiff_t
   max() noexcept
-  { return __PTRDIFF_MAX__; }
+  { return __PTRDIFF_MAX__ - 1; }
 
+  constexpr
   __tree_barrier(ptrdiff_t __expected, _CompletionF __completion)
- : _M_expected(__expected), _M_expected_adjustment(0),
-   _M_completion(move(__completion)),
-   _M_phase(static_cast<__barrier_phase_t>(0))
+  : _M_expected(__expected), _M_completion(std::move(__completion))
   {
-   size_t const __count = (_M_expected + 1) >> 1;
+   __glibcxx_assert(__expected >= 0 && __expected <= max());
 
-   _M_state = std::make_unique<__state_t[]>(__count);
+   if (!std::is_constant_evaluated())
+ {
+   size_t const __count = (_M_expected + 1) >> 1;
+   _M_state.store(new __state_t[__count], memory_order_release);
+ }
   }
 
   [[nodiscard]] arr

[PATCH v2] testsuite/118127: Pass fortran tests on ppc64le for IEEE128 long doubles

2025-01-10 Thread Siddhesh Poyarekar
Denormal behaviour is well defined for IEEE128 long doubles, so don't
XFAIL some gfortran tests on ppc64le when configured with the IEEE128
long double ABI.

gcc/testsuite/ChangeLog:

PR testsuite/118127
* lib/target-supports.exp
(check_effective_target_ppc_default_long_double_ibm): New
procedure.
* gfortran.dg/default_format_2.f90: Don't xfail for
ppc_default_long_double_ibm.
* gfortran.dg/default_format_denormal_2.f90: Likewise.
* gfortran.dg/large_real_kind_form_io_2.f90: Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/testsuite/gfortran.dg/default_format_2.f90 |  2 +-
 .../gfortran.dg/default_format_denormal_2.f90  |  2 +-
 .../gfortran.dg/large_real_kind_form_io_2.f90  |  2 +-
 gcc/testsuite/lib/target-supports.exp  | 18 ++
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/default_format_2.f90 
b/gcc/testsuite/gfortran.dg/default_format_2.f90
index 5ad7b3a6429..6ea324b02ad 100644
--- a/gcc/testsuite/gfortran.dg/default_format_2.f90
+++ b/gcc/testsuite/gfortran.dg/default_format_2.f90
@@ -1,4 +1,4 @@
-! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
+! { dg-do run { xfail ppc_default_long_double_ibm } }
 ! { dg-require-effective-target fortran_large_real }
 ! Test XFAILed on these platforms because the system's printf() lacks
 ! proper support for denormalized long doubles. See PR24685
diff --git a/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 
b/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
index e9ccf5e8f61..dca756ff6d8 100644
--- a/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
+++ b/gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
@@ -1,4 +1,4 @@
-! { dg-do run { xfail powerpc*-*-* } }
+! { dg-do run { xfail ppc_default_long_double_ibm } }
 ! { dg-require-effective-target fortran_large_real }
 ! Test XFAILed on this platform because the system's printf() lacks
 ! proper support for denormalized long doubles. See PR24685
diff --git a/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90 
b/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90
index 34b8aec462c..cb8a7edbb9a 100644
--- a/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90
+++ b/gcc/testsuite/gfortran.dg/large_real_kind_form_io_2.f90
@@ -1,4 +1,4 @@
-! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
+! { dg-do run { xfail ppc_default_long_double_ibm } }
 ! Test XFAILed on these platforms because the system's printf() lacks
 ! proper support for denormalized long doubles. See PR24685
 ! { dg-require-effective-target fortran_large_real }
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d550f288a0f..e4b29ad28c2 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1826,6 +1826,24 @@ proc check_effective_target_fortran_integer_16 { } {
 }]
 }
 
+# Check if the PPC target defaults to the IBM long double format.
+
+proc check_effective_target_ppc_default_long_double_ibm { } {
+if { ![istarget powerpc*-*-*] } {
+  return 0
+}
+
+return [check_runtime_nocache ppc_default_long_double_ibm {
+  ! Fortran
+  program default_long_double_ibm
+integer, parameter :: kl = selected_real_kind (precision (0.0_8) + 1)
+if (precision (0.0_kl) != 31) then
+  call exit(1)
+end if
+  end program default_long_double_ibm
+}]
+}
+
 # Return 1 if we can statically link libgfortran, 0 otherwise.
 #
 # When the target name changes, replace the cached result.
-- 
2.47.1



Re: [PATCH v2] Add warning for non-spec compliant FMV in Aarch64

2025-01-10 Thread Alfie Richards

Thank you both for feedback.

On 10/01/2025 10:47, Kyrylo Tkachov wrote:

On 10 Jan 2025, at 11:22, Richard Sandiford  wrote:

 writes:

This patch adds a warning when FMV is used for Aarch64.

The reasoning for this is the ACLE [1] spec for FMV has diverged
significantly from the current implementation and we want to prevent
potential future compatability issues.

There is a patch for an ACLE compliant version of target_version and
target_clone in progress but it won't make gcc-15.

This has been bootstrap and regression tested for Aarch64.
Is this okay for master and packport to gcc-14?

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_mangle_decl_assembler_name): Add experimental warning.
* config/aarch64/aarch64.opt: Add command line option to disable
warning.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Add CLI flag
* g++.target/aarch64/mv-symbols1.C: Add CLI flag
* g++.target/aarch64/mv-symbols2.C: Add CLI flag
* g++.target/aarch64/mv-symbols3.C: Add CLI flag
* g++.target/aarch64/mv-symbols4.C: Add CLI flag
* g++.target/aarch64/mv-symbols5.C: Add CLI flag
* g++.target/aarch64/mvc-symbols1.C: Add CLI flag
* g++.target/aarch64/mvc-symbols2.C: Add CLI flag
* g++.target/aarch64/mvc-symbols3.C: Add CLI flag
* g++.target/aarch64/mvc-symbols4.C: Add CLI flag
* g++.target/aarch64/mv-warning1.C: New test.
---
gcc/config/aarch64/aarch64.cc   |  4 
gcc/config/aarch64/aarch64.opt  |  4 
gcc/doc/invoke.texi | 11 ++-
gcc/testsuite/g++.target/aarch64/mv-1.C |  1 +
gcc/testsuite/g++.target/aarch64/mv-symbols1.C  |  1 +
gcc/testsuite/g++.target/aarch64/mv-symbols2.C  |  1 +
gcc/testsuite/g++.target/aarch64/mv-symbols3.C  |  1 +
gcc/testsuite/g++.target/aarch64/mv-symbols4.C  |  1 +
gcc/testsuite/g++.target/aarch64/mv-symbols5.C  |  1 +
gcc/testsuite/g++.target/aarch64/mv-warning1.C  |  9 +
gcc/testsuite/g++.target/aarch64/mvc-symbols1.C |  1 +
gcc/testsuite/g++.target/aarch64/mvc-symbols2.C |  1 +
gcc/testsuite/g++.target/aarch64/mvc-symbols3.C |  1 +
gcc/testsuite/g++.target/aarch64/mvc-symbols4.C |  1 +
14 files changed, 37 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 91de13159cb..7d64e99b76b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20347,6 +20347,10 @@ aarch64_mangle_decl_assembler_name (tree decl, tree id)
   if (TREE_CODE (decl) == FUNCTION_DECL
   && DECL_FUNCTION_VERSIONED (decl))
 {
+  warning_at (DECL_SOURCE_LOCATION(decl),  OPT_Wexperimental_fmv_target,
+  "Function Multi Versioning support is experimental, and the "
+  "behavior is likely to change");
+
   aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version 
(decl);

Did you consider doing this in aarch64_option_valid_version_attribute_p
instead?  That hook is called directly by the frontend and is something
that already produces diagnostics for invalid versions.
I have tried this, as it does feel like a more natural fit. However, 
when debugging on an
example programmy break point for 
aarch64_option_valid_version_attribute_p is never reached and the 
warning not issued.

OK with that change from POV if it works, and if you agree.
Please give others until Monday to comment though.

Sounds good.

I wonder do we want to make it a once-only warning similar to 
aarch64_report_sve_required ?
If the user has multiple multiversioned functions in their code maybe warning 
only once is okay.
On the other hand, maybe pointing out all of them at once is more helpful.
I don’t feel strongly about it, so happy with either.

I don't have strong feelings about this either.
I’m happy to change it but lean towards leaving it as is for now unless 
anyone else has

a preference.

Thanks,
Kyrill

Thanks,
Richard


   std::string name = IDENTIFIER_POINTER (id);
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 36bc719b822..2a8dd8ea66c 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -431,3 +431,7 @@ handling.  One means we try to form pairs involving one or 
more existing
individual writeback accesses where possible.  A value of two means we
also try to opportunistically form writeback opportunities by folding in
trailing destructive updates of the base register used by a pair.
+
+Wexperimental-fmv-target
+Target Var(warn_experimental_fmv) Warning Init(1)
+Warn about usage of experimental Function Multi Versioning.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 51dc871e6bc..bdf9ee1bc0c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -822,7 +822,8 @@ Objective-C and Objective-C++ Dialects}.
-moverride=@var{string}  -mverbose-cost-dump
-mstack-protector-guard=@var{guard} -mstack-pr

[Patch] Fortran: Fix location_t in gfc_get_extern_function_decl; support 'omp dispatch interop'

2025-01-10 Thread Tobias Burnus

The first change is a simple, generic Fortran change.

Without it, external declarations have odd locations
(namely their input_location):

gcc/testsuite/gfortran.dg/gomp/dispatch-11.f90:67:46:

   67 |   !$omp dispatch interop(obj2, obj1) device(3)
  |  ^
note: ‘declare variant’ candidate ‘repl2’ declared here


While with the change, i.e. gfc_get_location (&sym->declared_at),
we get:

gcc/testsuite/gfortran.dg/gomp/dispatch-11.f90:25:5:

   25 | subroutine base2 (x, y)
  | ^~~~
note: ‘base2’ declared here

I bet there are several other cases where we could/should
improve the location data ...

* * *

Additionally, this patch adds the 'interop' clause to OpenMP's
'dispatch' clause.

That change is a bit boring until also the 'append_args' clause
of 'declare variant' is implemented, but it is a first step and
already gives the proper middle end diagnostic. Otherwise it is
just a list and the existing diagnostic can be reused.

The only special part is that the list is ordered, which means
that C/C++ and Fortran have to agree on the order to make it
easier in the middle end. Thus, we store the clause arguments in
reverse order, matching how a tree list is trivially constructed.

* * *

Comments, remarks, suggestions?

Otherwise, I regard the common Fortran code as obvious - and
the OpenMP part covered by my (co)maintainership.

Hence, I intent to commit it later today.

Nonetheless, I am happy about (nearly) any comment - it is
useful if someone proof reads a patch :-)

Thanks,

Tobias
Fortran: Fix location_t in gfc_get_extern_function_decl; support 'omp dispatch interop'

The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now dows.

Additionally, it adds support for the 'interop' clause of OpenMP's
'dispatch' directive. As the argument order matters,
gfc_match_omp_variable_list gained a 'reverse_order' flag to use the
same order as the C/C++ parser.

gcc/fortran/ChangeLog:

	* gfortran.h: Add OMP_LIST_INTEROP to the unnamed OMP_LIST_ enum.
	* openmp.cc (gfc_match_omp_variable_list): Add reverse_order
	boolean argument, defaulting to false.
	(enum omp_mask2, OMP_DISPATCH_CLAUSES): Add OMP_CLAUSE_INTEROP.
	(gfc_match_omp_clauses, resolve_omp_clauses): Handle dispatch's
	'interop' clause.
	* trans-decl.cc (gfc_get_extern_function_decl): Use sym->declared_at
	instead input_location as DECL_SOURCE_LOCATION.
	* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_INTEROP.

gcc/testsuite/ChangeLog:

	* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Update
	xfail'ed 'dg-bogus' for the better 'declared here' location.
	* gfortran.dg/gomp/dispatch-11.f90: New test.
	* gfortran.dg/gomp/dispatch-12.f90: New test.

 gcc/fortran/gfortran.h |  1 +
 gcc/fortran/openmp.cc  | 53 +++---
 gcc/fortran/trans-decl.cc  |  2 +-
 gcc/fortran/trans-openmp.cc|  3 +
 .../routine-external-level-of-parallelism-2.f  | 28 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-11.f90 | 85 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-12.f90 | 49 +
 7 files changed, 195 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index aa495b5487e..6293d85778c 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1467,6 +1467,7 @@ enum
   OMP_LIST_INIT,
   OMP_LIST_USE,
   OMP_LIST_DESTROY,
+  OMP_LIST_INTEROP,
   OMP_LIST_ADJUST_ARGS,
   OMP_LIST_NUM /* Must be the last.  */
 };
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 79c0f1b2e62..e00044db7d0 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -408,7 +408,8 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 			 bool allow_sections = false,
 			 bool allow_derived = false,
 			 bool *has_all_memory = NULL,
-			 bool reject_common_vars = false)
+			 bool reject_common_vars = false,
+			 bool reverse_order = false)
 {
   gfc_omp_namelist *head, *tail, *p;
   locus old_loc, cur_loc;
@@ -492,15 +493,20 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  p = gfc_get_omp_namelist ();
 	  if (head == NULL)
 	head = tail = p;
+	  else if (reverse_order)
+	{
+	  p->next = head;
+	  head = p;
+	}
 	  else
 	{
 	  tail->next = p;
 	  tail = tail->next;
 	}
-	  tail->sym = sym;
-	  tail->expr = expr;
-	  tail->where = gfc_get_location_range (NULL, 0, &cur_loc, 1,
-		&gfc_current_locus);
+	  p->sym = sym;
+	  p->expr = expr;
+	  p->where = gfc_get_location_range (NULL, 0, &cur_loc, 1,
+	 &gfc_current_locus);
 	  if (reject_common_vars && sym->attr.in_common)
 	{
 	  g

Re: [PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra  writes:
> As a minor cleanup remove Cortex-A57 FMA steering pass.  Since Cortex-A57 is
> pretty old, there isn't any benefit of keeping this.
>
> Passes regress & bootstrap, OK for commit?
>
> gcc:
>   * config.gcc (extra_objs): Remove cortex-a57-fma-steering.o.
>   * config/aarch64/aarch64-passes.def: Remove pass_fma_steering.  
>   * config/aarch64/aarch64-protos.h: Remove make_pass_fma_steering.
>   * config/aarch64/aarch64-tuning-flags.def (RENAME_FMA_REGS): Remove.
>   * config/aarch64/cortex-a57-fma-steering.cc: Delete file.
>   * config/aarch64/t-aarch64: Remove cortex-a57-fma-steering.o rules. 
>   * config/aarch64/tuning_models/cortexa57.h (cortexa57_tunings):
> Remove RENAME_FMA_REGS tuning.

This would probably be a reasonable compromise if the pass ever became a
maintenance burden.  But I'm not sure the pass is a burden at the moment.
TBH, I don't remember having had to think about it for years :)

So IMO we should keep the pass, but reconsider removing it if in future
it requires non-trivial effort to keep working.

That said, the fact the patch requires no testsuite changes suggests
that the pass could easily become ineffective without anyone noticing.
So I won't object if other maintainers are in favour.

Thanks,
Richard

>
> ---
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 
> 55e37146ee0356b67b8a1a09d263eccdf69cd91a..97ef3ae77fb97b347ba43e55f7a05f5438a96a43
>  100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -350,7 +350,7 @@ aarch64*-*-*)
>   c_target_objs="aarch64-c.o"
>   cxx_target_objs="aarch64-c.o"
>   d_target_objs="aarch64-d.o"
> - extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
> cortex-a57-fma-steering.o aarch64-speculation.o aarch-bti-insert.o 
> aarch64-cc-fusion.o aarch64-early-ra.o aarch64-ldp-fusion.o"
> + extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o aarch64-speculation.o 
> aarch-bti-insert.o aarch64-cc-fusion.o aarch64-early-ra.o 
> aarch64-ldp-fusion.o"
>   target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h 
> \$(srcdir)/config/aarch64/aarch64-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-builtins.cc 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
>   target_has_targetm_common=yes
>   ;;
> diff --git a/gcc/config/aarch64/aarch64-passes.def 
> b/gcc/config/aarch64/aarch64-passes.def
> index 
> 9cf9d3e13b2cb0d0bf9c34439785e8ca704230fe..d80b9c6f39c3b56aa6251a81590172f70b31e01e
>  100644
> --- a/gcc/config/aarch64/aarch64-passes.def
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -19,7 +19,6 @@
> .  */
>  
>  INSERT_PASS_BEFORE (pass_sched, 1, pass_aarch64_early_ra);
> -INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
>  INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
>  INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
> pass_switch_pstate_sm);
>  INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
> pass_late_track_speculation);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> fa7bc8029be04f6530d2aee2ead4d754ba3b2550..afdf7d01adb3bdafd15d0422c3b2dfd680383bef
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1199,7 +1199,6 @@ std::string aarch64_get_extension_string_for_isa_flags 
> (aarch64_feature_flags,
>   aarch64_feature_flags);
>  
>  rtl_opt_pass *make_pass_aarch64_early_ra (gcc::context *);
> -rtl_opt_pass *make_pass_fma_steering (gcc::context *);
>  rtl_opt_pass *make_pass_track_speculation (gcc::context *);
>  rtl_opt_pass *make_pass_late_track_speculation (gcc::context *);
>  rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 
> 1feff3beb348f45c254c5a7c346a1a9674dee362..b1ab068f4073f459e829503f59c9de236cffb63a
>  100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -28,8 +28,6 @@
>   INTERNAL_NAME gives the internal name suitable for appending to
>   AARCH64_TUNE_ to give an enum name. */
>  
> -AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
> -
>  /* Some of the optional shift to some arthematic instructions are
> considered cheap.  Logical shift left <=4 with or without a
> zero extend are considered cheap.  Sign extend; non logical shift left
> diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.cc 
> b/gcc/config/aarch64/cortex-a57-fma-ste

[PATCH][libstdc++]: backport inline keyword on std::find

2025-01-10 Thread Tamar Christina
Hi All,

This is a backport version of the same patch as
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671618.html

for the release branches.  I'd like to backport this to GCC 14,13 and 12 where
the first regression showed up.  I am however aware that GCC 12 is going to
get it's last release soon and as such a backport to 12 may not be desirable
for a non correctness fix.

If that is the case I would be happy with just 13 and 14.

I've benchmarked the patch on the branches and see the regressions go away to
what they were in GCC 11.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for GCC-14 and GCC-13?

Thanks,
Tamar

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (find): Add inline keyword.

---
diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 
834288c747c28e8625d9d8db387e6abe719b6c87..f5f421d2fd3218d827d673cf7dd1ec9cd9495982
 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1723,7 +1723,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typename _ExtractKey, typename _Equal,
   typename _Hash, typename _RangeHash, typename _Unused,
   typename _RehashPolicy, typename _Traits>
-auto
+auto inline
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 find(const key_type& __k)
@@ -1746,7 +1746,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typename _ExtractKey, typename _Equal,
   typename _Hash, typename _RangeHash, typename _Unused,
   typename _RehashPolicy, typename _Traits>
-auto
+auto inline
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 find(const key_type& __k) const




-- 
diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 834288c747c28e8625d9d8db387e6abe719b6c87..f5f421d2fd3218d827d673cf7dd1ec9cd9495982 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1723,7 +1723,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   typename _ExtractKey, typename _Equal,
 	   typename _Hash, typename _RangeHash, typename _Unused,
 	   typename _RehashPolicy, typename _Traits>
-auto
+auto inline
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 find(const key_type& __k)
@@ -1746,7 +1746,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   typename _ExtractKey, typename _Equal,
 	   typename _Hash, typename _RangeHash, typename _Unused,
 	   typename _RehashPolicy, typename _Traits>
-auto
+auto inline
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 find(const key_type& __k) const





Re: [PATCH][libstdc++]: backport inline keyword on std::find

2025-01-10 Thread Jonathan Wakely
On Fri, 10 Jan 2025 at 14:32, Tamar Christina  wrote:
>
> Hi All,
>
> This is a backport version of the same patch as
> https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671618.html
>
> for the release branches.  I'd like to backport this to GCC 14,13 and 12 where
> the first regression showed up.  I am however aware that GCC 12 is going to
> get it's last release soon and as such a backport to 12 may not be desirable
> for a non correctness fix.
>
> If that is the case I would be happy with just 13 and 14.
>
> I've benchmarked the patch on the branches and see the regressions go away to
> what they were in GCC 11.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for GCC-14 and GCC-13?

Yes for both.

I'm tempted to say it's OK for 12 as well. Maybe leave it a few weeks
in case we get reports of negative consequences for the 13 and 14
backport?

>
> Thanks,
> Tamar
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/hashtable.h (find): Add inline keyword.
>
> ---
> diff --git a/libstdc++-v3/include/bits/hashtable.h 
> b/libstdc++-v3/include/bits/hashtable.h
> index 
> 834288c747c28e8625d9d8db387e6abe719b6c87..f5f421d2fd3218d827d673cf7dd1ec9cd9495982
>  100644
> --- a/libstdc++-v3/include/bits/hashtable.h
> +++ b/libstdc++-v3/include/bits/hashtable.h
> @@ -1723,7 +1723,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>typename _ExtractKey, typename _Equal,
>typename _Hash, typename _RangeHash, typename _Unused,
>typename _RehashPolicy, typename _Traits>
> -auto
> +auto inline
>  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>  find(const key_type& __k)
> @@ -1746,7 +1746,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>typename _ExtractKey, typename _Equal,
>typename _Hash, typename _RangeHash, typename _Unused,
>typename _RehashPolicy, typename _Traits>
> -auto
> +auto inline
>  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>  find(const key_type& __k) const
>
>
>
>
> --



RE: [PATCH][libstdc++]: backport inline keyword on std::find

2025-01-10 Thread Tamar Christina


> -Original Message-
> From: Jonathan Wakely 
> Sent: Friday, January 10, 2025 2:36 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; libstd...@gcc.gnu.org
> Subject: Re: [PATCH][libstdc++]: backport inline keyword on std::find
> 
> On Fri, 10 Jan 2025 at 14:32, Tamar Christina  wrote:
> >
> > Hi All,
> >
> > This is a backport version of the same patch as
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671618.html
> >
> > for the release branches.  I'd like to backport this to GCC 14,13 and 12 
> > where
> > the first regression showed up.  I am however aware that GCC 12 is going to
> > get it's last release soon and as such a backport to 12 may not be desirable
> > for a non correctness fix.
> >
> > If that is the case I would be happy with just 13 and 14.
> >
> > I've benchmarked the patch on the branches and see the regressions go away 
> > to
> > what they were in GCC 11.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for GCC-14 and GCC-13?
> 
> Yes for both.
> 
> I'm tempted to say it's OK for 12 as well. Maybe leave it a few weeks
> in case we get reports of negative consequences for the 13 and 14
> backport?
> 

Sure, that works for me!

Thanks!,
Tamar

> >
> > Thanks,
> > Tamar
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/hashtable.h (find): Add inline keyword.
> >
> > ---
> > diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-
> v3/include/bits/hashtable.h
> > index
> 834288c747c28e8625d9d8db387e6abe719b6c87..f5f421d2fd3218d827d673cf
> 7dd1ec9cd9495982 100644
> > --- a/libstdc++-v3/include/bits/hashtable.h
> > +++ b/libstdc++-v3/include/bits/hashtable.h
> > @@ -1723,7 +1723,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >typename _ExtractKey, typename _Equal,
> >typename _Hash, typename _RangeHash, typename _Unused,
> >typename _RehashPolicy, typename _Traits>
> > -auto
> > +auto inline
> >  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >  find(const key_type& __k)
> > @@ -1746,7 +1746,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >typename _ExtractKey, typename _Equal,
> >typename _Hash, typename _RangeHash, typename _Unused,
> >typename _RehashPolicy, typename _Traits>
> > -auto
> > +auto inline
> >  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >  find(const key_type& __k) const
> >
> >
> >
> >
> > --



Re: [PATCH 10/10] aarch64: Try to avoid passing new flags to assembler

2025-01-10 Thread Andrew Carlotti
On Thu, Jan 09, 2025 at 06:00:34PM +, Richard Sandiford wrote:
> Richard Sandiford  writes:
> > Andrew Carlotti  writes:
> >> On Mon, Nov 25, 2024 at 11:26:39PM +, Richard Sandiford wrote:
> >>> Sorry for the slow review.
> >>> 
> >>> Andrew Carlotti  writes:
> >>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs)
> >>> > were only recently added to the assembler.  To improve compatibility
> >>> > with older assemblers, we try to avoid passing these new flags to the
> >>> > assembler if we can express the targetted architecture without them. We
> >>> > do so by using an almost-equivalent architecture string with a higher
> >>> > architecture version.
> >>> >
> >>> > This should never reduce the set of instructions accepted by the
> >>> > assembler.  It will make it more lenient in two cases:
> >>> >
> >>> > 1. Many system registers are currently gated behind architecture
> >>> > versions instead of specific feature flags.  Increasing the base
> >>> > architecture version may cause more system register accesses to be
> >>> > accepted.
> >>> >
> >>> > 2. FEAT_XS doesn't have an HWCAP bit or cpuinfo entry.  We still want to
> >>> > avoid passing +wfxt or +noxs to the assembler if possible, so we'll
> >>> > instruct the assembler to accept FEAT_XS instructions as well whenever
> >>> > the rest of the new features are enabled.
> >>> >
> >>> > gcc/ChangeLog:
> >>> >
> >>> > * common/config/aarch64/aarch64-common.cc
> >>> > (aarch64_get_arch_string_for_assembler): New.
> >>> > (aarch64_rewrite_march): New.
> >>> > (aarch64_rewrite_selected_cpu): Call new function.
> >>> > * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity 
> >>> > mapping.
> >>> > * config/aarch64/aarch64-protos.h
> >>> > (aarch64_get_arch_string_for_assembler): New.
> >>> > * config/aarch64/aarch64.cc
> >>> > (aarch64_declare_function_name): Call new function.
> >>> > (aarch64_start_file): Ditto.
> >>> > * config/aarch64/aarch64.h
> >>> > * config/aarch64/aarch64.h
> >>> > (EXTRA_SPEC_FUNCTIONS): Use new macro name.
> >>> > (MCPU_TO_MARCH_SPEC): Rename to...
> >>> > (MARCH_REWRITE_SPEC): ...this, and add new spec rule.
> >>> > (aarch64_rewrite_march): New declaration.
> >>> > (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
> >>> > (MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function.
> >>> > (ASM_CPU_SPEC): Use new macro name.
> >>> >
> >>> > gcc/testsuite/ChangeLog:
> >>> >
> >>> > * gcc.target/aarch64/cpunative/native_cpu_21.c: Update check.
> >>> > * gcc.target/aarch64/cpunative/native_cpu_22.c: Update check.
> >>> > * gcc.target/aarch64/cpunative/info_27: New test.
> >>> > * gcc.target/aarch64/cpunative/info_28: New test.
> >>> > * gcc.target/aarch64/cpunative/info_29: New test.
> >>> > * gcc.target/aarch64/cpunative/native_cpu_27.c: New test.
> >>> > * gcc.target/aarch64/cpunative/native_cpu_28.c: New test.
> >>> > * gcc.target/aarch64/cpunative/native_cpu_29.c: New test.
> >>> >
> >>> >
> >>> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> >>> > b/gcc/common/config/aarch64/aarch64-common.cc
> >>> > index 
> >>> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..717b3238be16f39a6fd1b4143662eb540ccf292d
> >>> >  100644
> >>> > --- a/gcc/common/config/aarch64/aarch64-common.cc
> >>> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
> >>> > @@ -371,6 +371,119 @@ aarch64_get_extension_string_for_isa_flags
> >>> >return outstr;
> >>> >  }
> >>> >  
> >>> > +/* Generate an arch string to be passed to the assembler.
> >>> > +
> >>> > +   Several flags were added retrospectively for features that were 
> >>> > previously
> >>> > +   enabled only by specifying an architecture version.  We want to 
> >>> > avoid
> >>> > +   passing these flags to the assembler if possible, to improve 
> >>> > compatibility
> >>> > +   with older assemblers.  */
> >>> > +
> >>> > +std::string
> >>> > +aarch64_get_arch_string_for_assembler (aarch64_arch arch,
> >>> > +  aarch64_feature_flags flags)
> >>> > +{
> >>> > +  if (!(flags & AARCH64_FL_FCMA) || !(flags & AARCH64_FL_JSCVT))
> >>> > +goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8A
> >>> > +  || arch == AARCH64_ARCH_V8_1A
> >>> > +  || arch == AARCH64_ARCH_V8_2A)
> >>> > +arch = AARCH64_ARCH_V8_3A;
> >>> > +
> >>> > +  if (!(flags & AARCH64_FL_RCPC2))
> >>> > +goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8_3A)
> >>> > +arch = AARCH64_ARCH_V8_4A;
> >>> > +
> >>> > +  if (!(flags & AARCH64_FL_FRINTTS) || !(flags & AARCH64_FL_FLAGM2))
> >>> > +goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8_4A)
> >>> > +arch = AARCH64_ARCH_V8_5A;
> >>> > +
> >>> > +  if (!(flags & AARCH64_FL_WFXT))
> >>> > +goto done;
> >>> > +
> >>> > +  if (arch == AAR

Re: [libstdc++] Optimize std::vector::operator[]

2025-01-10 Thread Jonathan Wakely
On Fri, 27 Dec 2024 at 20:13, Jan Hubicka  wrote:
>
> Hi,
> the following testcase:
>
>   bool f(const std::vector& v, std::size_t x) {
> return v[x];
>   }
>
> is compiled as:
>
> f(std::vector > const&, unsigned long):
> testq   %rsi, %rsi
> leaq63(%rsi), %rax
> movq(%rdi), %rdx
> cmovns  %rsi, %rax
> sarq$6, %rax
> leaq(%rdx,%rax,8), %rdx
> movq%rsi, %rax
> sarq$63, %rax
> shrq$58, %rax
> addq%rax, %rsi
> andl$63, %esi
> subq%rax, %rsi
> jns .L2
> addq$64, %rsi
> subq$8, %rdx
> .L2:
> movl$1, %eax
> shlx%rsi, %rax, %rax
> andq(%rdx), %rax
> setne   %al
> ret
>
> which is quite expensive for simple bit access in a bitmap.  The reason is 
> that
> the bit access is implemented using iterators
> return begin()[__n];
> Which in turn cares about situation where __n is negative yielding the extra
> conditional.
>
> _GLIBCXX20_CONSTEXPR
> void
> _M_incr(ptrdiff_t __i)
> {
>   _M_assume_normalized();
>   difference_type __n = __i + _M_offset;
>   _M_p += __n / int(_S_word_bit);
>   __n = __n % int(_S_word_bit);
>   if (__n < 0)
> {
>   __n += int(_S_word_bit);
>   --_M_p;
> }
>   _M_offset = static_cast(__n);
> }
>
> While we can use __builtin_unreachable to declare that __n is in range
> 0...max_size () but I think it is better to implement it directly, since
> resulting code is shorter and much easier to optimize.

Yeah I think that change makes sense, OK for trunk, thanks!

>
> We now porduce:
> .LFB1248:
> .cfi_startproc
> movq(%rdi), %rax
> movq%rsi, %rdx
> shrq$6, %rdx
> andq(%rax,%rdx,8), %rsi
> andl$63, %esi
> setne   %al
> ret
>
> Testcase suggests
> movq(%rdi), %rax
> movl%esi, %ecx
> shrq$5, %rsi# does still need to be 64-bit
> movl(%rax,%rsi,4), %eax
> btl %ecx, %eax
> setb%al
> retq
> Which is still one instruction shorter.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> libstdc++-v3/ChangeLog:
>
> PR target/80813
> * include/bits/stl_bvector.h (vector::operator []): Do
> not use iterators.
>
> gcc/testsuite/ChangeLog:
>
> PR target/80813
> * g++.dg/tree-ssa/bvector-3.C: New test.
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/bvector-3.C 
> b/gcc/testsuite/g++.dg/tree-ssa/bvector-3.C
> new file mode 100644
> index 000..feae791b20d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/bvector-3.C
> @@ -0,0 +1,10 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -fdump-tree-optimized"  }
> +// { dg-skip-if "requires hosted libstdc++ for vector" { ! hostedlib } }
> +
> +#include 
> +bool f(const std::vector& v, std::size_t x) {
> +   return v[x];
> +}
> +// All references to src should be optimized out, so there should be no name 
> for it
> +// { dg-final { scan-tree-dump-not "if \\("  optimized } }
> diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
> b/libstdc++-v3/include/bits/stl_bvector.h
> index 341eee33b21..975857bfdbd 100644
> --- a/libstdc++-v3/include/bits/stl_bvector.h
> +++ b/libstdc++-v3/include/bits/stl_bvector.h
> @@ -1132,7 +1141,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>operator[](size_type __n)
>{
> __glibcxx_requires_subscript(__n);
> -   return begin()[__n];
> +   return _Bit_reference (this->_M_impl._M_start._M_p
> +  + __n / int(_S_word_bit),
> +  1UL << __n % int(_S_word_bit));
>}
>
>_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
> @@ -1140,7 +1151,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>operator[](size_type __n) const
>{
> __glibcxx_requires_subscript(__n);
> -   return begin()[__n];
> +   return _Bit_reference (this->_M_impl._M_start._M_p
> +  + __n / int(_S_word_bit),
> +  1UL << __n % int(_S_word_bit));
>}
>
>  protected:
>



[PATCH] Fix bootstrap on !HARDREG_PRE_REGNOS targets

2025-01-10 Thread Richard Biener
Pushed as obvious.

* gcse.cc (pass_hardreg_pre::gate): Wrap possibly unused
fun argument.
---
 gcc/gcse.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcse.cc b/gcc/gcse.cc
index 3f3f7fe15b0..4ae19f28430 100644
--- a/gcc/gcse.cc
+++ b/gcc/gcse.cc
@@ -4351,7 +4351,7 @@ public:
 }; // class pass_rtl_pre
 
 bool
-pass_hardreg_pre::gate (function *fun)
+pass_hardreg_pre::gate (function * ARG_UNUSED (fun))
 {
 #ifdef HARDREG_PRE_REGNOS
   return optimize > 0
-- 
2.43.0


[PATCH v4 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Akram Ahmad
Hi Kyrill,

Thanks for the very quick response! V4 of the patch can be found
below the line.

Best wishes,

Akram

---

This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.

The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.

Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.

Furthermore, this patch introduces a new optimisation for signed 32
and 64-bit scalar saturating arithmetic which uses adds/subs in place
of the NEON instruction.

Addition, before:
fmovd0, x0
fmovd1, x1
sqadd   d0, d0, d1
fmovx0, d0

Addition, after:
asr x2, x1, 63
addsx0, x0, x1
eor x2, x2, 0x8000
csinv   x0, x0, x2, vc

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere. The signed scalar case is also tested with
an execution test to check the results.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
* gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests.
---
 gcc/config/aarch64/aarch64-builtins.cc|  13 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
 gcc/config/aarch64/aarch64-simd.md| 207 +-
 gcc/config/aarch64/arm_neon.h |  96 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../saturating_arithmetic_autovect.inc|  58 
 .../saturating_arithmetic_autovect_1.c|  79 +
 .../saturating_arithmetic_autovect_2.c|  79 +
 .../saturating_arithmetic_autovect_3.c|  75 +
 .../saturating_arithmetic_autovect_4.c|  77 +
 .../aarch64/saturating-arithmetic-signed.c| 270 ++
 .../aarch64/saturating_arithmetic.inc |  39 +++
 .../aarch64/saturating_arithmetic_1.c |  36 +++
 .../aarch64/saturating_arithmetic_2.c |  36 +++
 .../aarch64/saturating_arithmetic_3.c |  30 ++
 .../aarch64/saturating_arithmetic_4.c |  30 ++
 16 files changed, 1081 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra
ping
 

Simplify and cleanup ifunc selection logic.  Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.

Passes regress and bootstrap, OK for commit?

libatomic:
    * config/linux/aarch64/host-config.h (has_lse2): Cleanup.
    (has_lse128): Likewise.
    (has_rcpc3): Add early check for LSE2.

---

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 
93f367d587803ce26b3c9a45881ac2d9b2e37168..d9d9239897c82d2eebff2bf38f6bac3a7c7b23ea
 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -91,69 +91,62 @@ has_lse2 (unsigned long hwcap, const __ifunc_arg_t 
*features)
   /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
 return true;
-  /* No point checking further for atomic 128-bit load/store if LSE
- prerequisite not met.  */
-  if (!(hwcap & HWCAP_ATOMICS))
-    return false;
-  if (!(hwcap & HWCAP_CPUID))
-    return false;
 
-  unsigned long midr;
-  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+  /* If LSE and CPUID are supported, check MIDR.  */
+  if (hwcap & HWCAP_CPUID && hwcap & HWCAP_ATOMICS)
+    {
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
 
-  /* Neoverse N1 supports atomic 128-bit load/store.  */
-  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM (midr) == 0xd0c)
-    return true;
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  return MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM (midr) == 0xd0c;
+    }
 
   return false;
 }
 
-/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic,
-   bits[23:20].  The expected value is 0b0011.  Check that.  */
+/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, bits[23:20].
+   The minimum value for LSE128 is 0b0011.  */
 
 #define AT_FEAT_FIELD(isar0)    (((isar0) >> 20) & 15)
 
 static inline bool
 has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
-  if (hwcap & _IFUNC_ARG_HWCAP
-  && features->_hwcap2 & HWCAP2_LSE128)
-    return true;
-  /* A 0 HWCAP2_LSE128 bit may be just as much a sign of missing HWCAP2 bit
- support in older kernels as it is of CPU feature absence.  Try fallback
- method to guarantee LSE128 is not implemented.
-
- In the absence of HWCAP_CPUID, we are unable to check for LSE128.
- If feature check available, check LSE2 prerequisite before proceeding.  */
-  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
- return false;
-
-  unsigned long isar0;
-  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
-  if (AT_FEAT_FIELD (isar0) >= 3)
+  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128)
 return true;
+
+  /* If LSE2 and CPUID are supported, check for LSE128.  */
+  if (hwcap & HWCAP_CPUID && hwcap & HWCAP_USCAT)
+    {
+  unsigned long isar0;
+  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
+  return AT_FEAT_FIELD (isar0) >= 3;
+    }
+
   return false;
 }
 
-/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].  The
-   expected value is 0b0011.  Check that.  */
+/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].
+   The minimum value for LRCPC3 is 0b0011.  */
 
 static inline bool
 has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
-  if (hwcap & _IFUNC_ARG_HWCAP
-  && features->_hwcap2 & HWCAP2_LRCPC3)
-    return true;
-  /* Try fallback feature check method to guarantee LRCPC3 is not implemented.
-
- In the absence of HWCAP_CPUID, we are unable to check for RCPC3, return.
- If feature check available, check LSE2 prerequisite before proceeding.  */
-  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+  /* LSE2 is a prerequisite for atomic LDIAPP/STILP.  */
+  if (!(hwcap & HWCAP_USCAT))
 return false;
-  unsigned long isar1;
-  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1));
-  if (AT_FEAT_FIELD (isar1) >= 3)
+
+  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LRCPC3)
 return true;
+
+  if (hwcap & HWCAP_CPUID)
+    {
+  unsigned long isar1;
+  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1));
+  return AT_FEAT_FIELD (isar1) >= 3;
+    }
+
   return false;
 }
 

[PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Wilco Dijkstra

As a minor cleanup remove Cortex-A57 FMA steering pass.  Since Cortex-A57 is
pretty old, there isn't any benefit of keeping this.

Passes regress & bootstrap, OK for commit?

gcc:
* config.gcc (extra_objs): Remove cortex-a57-fma-steering.o.
* config/aarch64/aarch64-passes.def: Remove pass_fma_steering.  
* config/aarch64/aarch64-protos.h: Remove make_pass_fma_steering.
* config/aarch64/aarch64-tuning-flags.def (RENAME_FMA_REGS): Remove.
* config/aarch64/cortex-a57-fma-steering.cc: Delete file.
* config/aarch64/t-aarch64: Remove cortex-a57-fma-steering.o rules. 
* config/aarch64/tuning_models/cortexa57.h (cortexa57_tunings):
Remove RENAME_FMA_REGS tuning.

---

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
55e37146ee0356b67b8a1a09d263eccdf69cd91a..97ef3ae77fb97b347ba43e55f7a05f5438a96a43
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -350,7 +350,7 @@ aarch64*-*-*)
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
-   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
cortex-a57-fma-steering.o aarch64-speculation.o aarch-bti-insert.o 
aarch64-cc-fusion.o aarch64-early-ra.o aarch64-ldp-fusion.o"
+   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o aarch64-speculation.o 
aarch-bti-insert.o aarch64-cc-fusion.o aarch64-early-ra.o aarch64-ldp-fusion.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h 
\$(srcdir)/config/aarch64/aarch64-builtins.h 
\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 
9cf9d3e13b2cb0d0bf9c34439785e8ca704230fe..d80b9c6f39c3b56aa6251a81590172f70b31e01e
 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,7 +19,6 @@
.  */
 
 INSERT_PASS_BEFORE (pass_sched, 1, pass_aarch64_early_ra);
-INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
 INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
 INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
pass_switch_pstate_sm);
 INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
pass_late_track_speculation);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
fa7bc8029be04f6530d2aee2ead4d754ba3b2550..afdf7d01adb3bdafd15d0422c3b2dfd680383bef
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1199,7 +1199,6 @@ std::string aarch64_get_extension_string_for_isa_flags 
(aarch64_feature_flags,
aarch64_feature_flags);
 
 rtl_opt_pass *make_pass_aarch64_early_ra (gcc::context *);
-rtl_opt_pass *make_pass_fma_steering (gcc::context *);
 rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 rtl_opt_pass *make_pass_late_track_speculation (gcc::context *);
 rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 
1feff3beb348f45c254c5a7c346a1a9674dee362..b1ab068f4073f459e829503f59c9de236cffb63a
 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -28,8 +28,6 @@
  INTERNAL_NAME gives the internal name suitable for appending to
  AARCH64_TUNE_ to give an enum name. */
 
-AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-
 /* Some of the optional shift to some arthematic instructions are
considered cheap.  Logical shift left <=4 with or without a
zero extend are considered cheap.  Sign extend; non logical shift left
diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.cc 
b/gcc/config/aarch64/cortex-a57-fma-steering.cc
deleted file mode 100644
index 
fd6da66d855e36cd023b5343c392cd2f7f062d1b..
--- a/gcc/config/aarch64/cortex-a57-fma-steering.cc
+++ /dev/null
@@ -1,1096 +0,0 @@
-/* FMA steering optimization pass for Cortex-A57.
-   Copyright (C) 2015-2025 Free Software Foundation, Inc.
-   Contributed by ARM Ltd.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   GCC is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the imp

Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-10 Thread Richard Biener
On Fri, Jan 10, 2025 at 3:27 PM Qing Zhao  wrote:
>
>
>
> > On Jan 10, 2025, at 03:00, Richard Biener  
> > wrote:
> >
> > On Thu, Jan 9, 2025 at 9:39 PM Qing Zhao  wrote:
> >>
> >>
> >>
> >>> On Jan 9, 2025, at 14:10, Jeff Law  wrote:
> >>>
> >>>
> >>>
> >>> On 1/9/25 10:48 AM, Qing Zhao wrote:
> >>>
> >
> > I think Jeff's patch is not reasonable since it boils down to not 
> > diagnose
> > -Warray-bounds but instead remove those stmts.
>  If these stmts are dead-code that are generated by compiler optimization 
>  (NOT from source code),
>  removing them before diagnosis is correct. (To avoid false positive 
>  warnings).
> >>> But I don't think we generally know if the problematic statements came 
> >>> from user code or were generated by the compiler.
> >>
> >> To help the compiler catches real problems in the source code and avoid 
> >> false positive warnings introduced by the compiler transformation, we 
> >> might need to add flags in the IR to distinguish this?
> >
> > Well, the issue is the problematic statements _are_ in user code, just
> > -Warray-bounds is too stupid to
> > look at SCEV for indices and instead relies on weaker value-ranges.
>
> A little confused here: are you saying that the testing case of PR92539 has 
> __conditional__ UB in the source code level?
> If so, could you please clarify this a little bit more? (From my 
> understanding of the source code, I didn’t see
> UB in the source code, do I miss anything obvious?)

static bool eat(char const*& first, char const* last)
{
if (first != last && ischar(*first)) {
++first;
return true;
}
return false;
}

static bool eat_two(char const*& first, char const* last)
{
auto save = first;
if (eat(first, last) && eat(first, last))
return true;

The ++first is the conditional UB stmt for the 2nd eat().  It's
conditional on the first eat() returning true and first != last.

The compiler now needs to prove that these conditions
are enough that UB never happens at runtime.  uninit
analysis for example would diagnose if it cannot prove that
while -Warray-bounds simply always diagnoses regardless
of how the conditions are but it requires "obvious" UB.

Richard.

>
> thanks.
>
> Qing
> >
> > It's a problem we're never going to fully solve.  Some of the
> > testcases show missed optimizations
> > which we can work on.  Some show we diagnose IL we later are able to
> > optimize away, some
> > simply show that users are not always happy with how we decide on
> > suppressing a diagnostic.
> >
> > For the case at hand we should be able to optimize it fully.
> >
> > But optimizing based on UB is always going to be to interact with
> > diagnosing UB, so we have
> > to be careful.  Our "late" diagnostics are most problematic here and
> > I'd argue moving those
> > earlier is the first thing we should try.
> >
> > Richard.
> >
> >>
> >> Qing
> >>>
> >>> Jeff
>
>


Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Akram Ahmad

On 09/01/2025 23:04, Richard Sandiford wrote:

+   gcc_assert (imm != 0);

The constraints do allow 0, so I'm not sure this assert is safe.
Certainly we shouldn't usually get unfolded instructions, but strange
things can happen with fuzzed options.

Does the code mishandle that case?  It looked like it should be ok.
I accidentally deleted my response when trimming down the quote text- I 
haven't tested this, but it came about from an offline discussion about 
the patch with a teammate. It should be fine without the assert, but 
I'll test it to make sure.

[PATCH] Fix some memory leaks

2025-01-10 Thread Richard Biener
The following fixes memory leaks found compiling SPEC CPU 2017 with
valgrind.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* df-core.cc (rest_of_handle_df_finish): Release dflow for
problems without free function (like LR).
* gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc):
Release loop_bbs on all exits.
* tree-vectorizer.h (supportable_indirect_convert_operation): Change.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for
converts.
(supportable_indirect_convert_operation): Get a reference to
the output vector of converts.
---
 gcc/df-core.cc |  2 ++
 gcc/gimple-crc-optimization.cc |  6 +-
 gcc/tree-vect-generic.cc   |  2 +-
 gcc/tree-vect-stmts.cc | 12 ++--
 gcc/tree-vectorizer.h  |  2 +-
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/gcc/df-core.cc b/gcc/df-core.cc
index a7011decf0b..abfc0e63d35 100644
--- a/gcc/df-core.cc
+++ b/gcc/df-core.cc
@@ -808,6 +808,8 @@ rest_of_handle_df_finish (void)
   struct dataflow *dflow = df->problems_in_order[i];
   if (dflow->problem->free_fun)
dflow->problem->free_fun ();
+  else
+   free (dflow);
 }
 
   free (df->postorder);
diff --git a/gcc/gimple-crc-optimization.cc b/gcc/gimple-crc-optimization.cc
index 0e1f2a99d72..a98cbe6752b 100644
--- a/gcc/gimple-crc-optimization.cc
+++ b/gcc/gimple-crc-optimization.cc
@@ -947,6 +947,7 @@ crc_optimization::loop_may_calculate_crc (class loop *loop)
fprintf (dump_file,
 "The number of conditional "
 "branches in the loop isn't 2.\n");
+  free (loop_bbs);
   return false;
 }
 
@@ -977,8 +978,11 @@ crc_optimization::loop_may_calculate_crc (class loop *loop)
  return true;
}
 
-   if (++checked_xor_count == 2)
+ if (++checked_xor_count == 2)
+   {
+ free (loop_bbs);
  return false;
+   }
}
}
 }
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index fa5e9a54dbf..c2f7a29d539 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1757,7 +1757,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
   auto_vec > converts;
   if (supportable_indirect_convert_operation (code,
  ret_type, arg_type,
- &converts,
+ converts,
  arg))
 {
   new_rhs = arg;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c0e38d00246..f5b3608f6b1 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5583,7 +5583,7 @@ vectorizable_conversion (vec_info *vinfo,
   scalar_mode lhs_mode = SCALAR_TYPE_MODE (lhs_type);
   scalar_mode rhs_mode = SCALAR_TYPE_MODE (rhs_type);
   opt_scalar_mode rhs_mode_iter;
-  vec > converts = vNULL;
+  auto_vec > converts;
 
   /* Supportable by target?  */
   switch (modifier)
@@ -5597,7 +5597,7 @@ vectorizable_conversion (vec_info *vinfo,
   if (supportable_indirect_convert_operation (code,
  vectype_out,
  vectype_in,
- &converts,
+ converts,
  op0))
{
  gcc_assert (converts.length () <= 2);
@@ -15170,7 +15170,7 @@ bool
 supportable_indirect_convert_operation (code_helper code,
tree vectype_out,
tree vectype_in,
-   vec > 
*converts,
+   vec > 
&converts,
tree op0)
 {
   bool found_mode = false;
@@ -15187,7 +15187,7 @@ supportable_indirect_convert_operation (code_helper 
code,
 vectype_in,
 &tc1))
 {
-  converts->safe_push (std::make_pair (vectype_out, tc1));
+  converts.safe_push (std::make_pair (vectype_out, tc1));
   return true;
 }
 
@@ -15278,9 +15278,9 @@ supportable_indirect_convert_operation (code_helper 
code,
 
   if (found_mode)
{
- converts->safe_push (std::make_pair (cvt_type, tc2));
+ converts.safe_push (std::make_pair (cvt_type, tc2));
  if (TYPE_MODE (cvt_type) != TYPE_MODE (vectype_out))
-   converts->safe_push (std::make_pair (vectype_out, tc1));
+   converts.safe_push (std::make_pair (vectype_out, tc1));
  return true;
}
 }
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index d3e

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Richard Sandiford
Wilco Dijkstra  writes:
> ILP32 was originally intended to make porting to AArch64 easier.  Support was
> never merged in the Linux kernel or GLIBC, so it has been unsupported for many
> years.  There isn't a benefit in keeping unsupported features forever, so
> deprecate it now (and it could be removed in a future release).
>
> Passes regress & bootstrap, OK for commit?
>
> gcc:
> * config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
> * doc/invoke.texi: Document -mabi=ilp32 as deprecated.
>
> gcc/testsuite:
> * gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.  
> * gcc.target/aarch64/pr100518.c: Likewise.
> * gcc.target/aarch64/pr113114.c: Likewise.
> * gcc.target/aarch64/pr80295.c: Likewise.
> * gcc.target/aarch64/pr94201.c: Likewise.
> * gcc.target/aarch64/pr94577.c: Likewise.
> * gcc.target/aarch64/sve/pr108603.c: Likewise.

I suggested this on irc a while back, but unfortunately I forgot to take
a complete log, so I don't have a record of the important bits of the
conversation.  The outcome was that the Apple ecosystem does have an
ILP32 ABI, so that might become relevant when Iain's Darwin work is
merged.  I therefore think we should keep ILP32 for now.

IMO ILP32 is supported for aarch64-elf (and aarch64_be-elf).  But I
agree it can be considered unsupported for GNU/Linux.  Unfortunately,
as far as code maintenance goes, deprecating it for one subtarget is
probably worse than not deprecating it at all.  That means that we'll
occasionally have to deal with bug reports about ILP32 support in
GNU-only code.

For those reasons, I think we should keep ILP32 for now.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 78d2cc4bbe4933c79153d0741bfd8d7b076952d0..02891b0a8ed75eb596df9d0dbff77ccd6a625f11
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19315,6 +19315,8 @@ aarch64_override_options (void)
>if (TARGET_ILP32)
>  error ("assembler does not support %<-mabi=ilp32%>");
>  #endif
> +  if (TARGET_ILP32)
> +warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated.");
>  
>/* Convert -msve-vector-bits to a VG count.  */
>aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 17fe2c64c1f85ad8db8b61f040aafe5f8212e488..6722ad5281541e499d5b3916179d9a4d1b39097f
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21472,6 +21472,8 @@ The default depends on the specific target 
> configuration.  Note that
>  the LP64 and ILP32 ABIs are not link-compatible; you must compile your
>  entire program with the same ABI, and link with a compatible set of 
> libraries.
>  
> +@samp{ilp32} is deprecated.
> +
>  @opindex mbig-endian
>  @item -mbig-endian
>  Generate big-endian code.  This is the default when GCC is configured for an
> diff --git a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c 
> b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> index 
> fe8414559864db4a8584fd3f5a7145b5e3d1f322..276c10cd0e86ff2c74a5c09ce70f7d76614978ec
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> +++ b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-finline-stringops -mabi=ilp32 
> -ftrivial-auto-var-init=zero" } */
> +/* { dg-options "-finline-stringops -mabi=ilp32 -Wno-deprecated 
> -ftrivial-auto-var-init=zero" } */
>  
>  short m(unsigned k) {
>const unsigned short *n[65];
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr100518.c 
> b/gcc/testsuite/gcc.target/aarch64/pr100518.c
> index 
> 5ca599f5d2e0e1603456b2eaf2e98866871faad1..177991cfb2289530e4ee3e3633fddde5972e9e28
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr100518.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr100518.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32 -mstrict-align -O2" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated -mstrict-align -O2" } */
>  
>  int unsigned_range_min, unsigned_range_max, a11___trans_tmp_1;
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr113114.c 
> b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> index 
> 5b0383c24359ad95c7d333a6f18b98e50383f71b..976e2db71bfafe96e3729e4d4bc333874d98c084
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr113114.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32 -O -mearly-ldp-fusion -mlate-ldp-fusion" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated -O -mearly-ldp-fusion 
> -mlate-ldp-fusion" } */
>  void foo_n(double *a) {
>int i = 1;
>for (; i < (int)foo_n; i++)
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
> b/gcc/testsuite/gcc.target/aarch64/pr80295.c
> index 
> b3866d8d6a9e5688f0eedb2fd7504

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Kyrylo Tkachov
Hi Wilco,

> On 10 Jan 2025, at 15:05, Wilco Dijkstra  wrote:
> 
> 
> ILP32 was originally intended to make porting to AArch64 easier.  Support was
> never merged in the Linux kernel or GLIBC, so it has been unsupported for many
> years.  There isn't a benefit in keeping unsupported features forever, so
> deprecate it now (and it could be removed in a future release).
> 
> Passes regress & bootstrap, OK for commit?

I agree on that front for Linux, but I thought using it for bare-metal/embedded 
cases is still supported?
I haven’t tested in a while but the aarch64-none-elf newlib target used to work 
fine with -mabi=ilp32.
Would it make sense to deprecate it for Linux/glibc targets i.e. deprecate the 
aarch64*-linux-gnu_ilp32 platform instead?

Thanks,
Kyrill


> 
> gcc:
>* config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
>* doc/invoke.texi: Document -mabi=ilp32 as deprecated.
> 
> gcc/testsuite:
>* gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated. 
>* gcc.target/aarch64/pr100518.c: Likewise.
>* gcc.target/aarch64/pr113114.c: Likewise.
>* gcc.target/aarch64/pr80295.c: Likewise.
>* gcc.target/aarch64/pr94201.c: Likewise.
>* gcc.target/aarch64/pr94577.c: Likewise.
>* gcc.target/aarch64/sve/pr108603.c: Likewise.
> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 78d2cc4bbe4933c79153d0741bfd8d7b076952d0..02891b0a8ed75eb596df9d0dbff77ccd6a625f11
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19315,6 +19315,8 @@ aarch64_override_options (void)
>   if (TARGET_ILP32)
> error ("assembler does not support %<-mabi=ilp32%>");
> #endif
> +  if (TARGET_ILP32)
> +warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated.");
> 
>   /* Convert -msve-vector-bits to a VG count.  */
>   aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 17fe2c64c1f85ad8db8b61f040aafe5f8212e488..6722ad5281541e499d5b3916179d9a4d1b39097f
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21472,6 +21472,8 @@ The default depends on the specific target 
> configuration.  Note that
> the LP64 and ILP32 ABIs are not link-compatible; you must compile your
> entire program with the same ABI, and link with a compatible set of libraries.
> 
> +@samp{ilp32} is deprecated.
> +
> @opindex mbig-endian
> @item -mbig-endian
> Generate big-endian code.  This is the default when GCC is configured for an
> diff --git a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c 
> b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> index 
> fe8414559864db4a8584fd3f5a7145b5e3d1f322..276c10cd0e86ff2c74a5c09ce70f7d76614978ec
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> +++ b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-finline-stringops -mabi=ilp32 
> -ftrivial-auto-var-init=zero" } */
> +/* { dg-options "-finline-stringops -mabi=ilp32 -Wno-deprecated 
> -ftrivial-auto-var-init=zero" } */
> 
> short m(unsigned k) {
>   const unsigned short *n[65];
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr100518.c 
> b/gcc/testsuite/gcc.target/aarch64/pr100518.c
> index 
> 5ca599f5d2e0e1603456b2eaf2e98866871faad1..177991cfb2289530e4ee3e3633fddde5972e9e28
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr100518.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr100518.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32 -mstrict-align -O2" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated -mstrict-align -O2" } */
> 
> int unsigned_range_min, unsigned_range_max, a11___trans_tmp_1;
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr113114.c 
> b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> index 
> 5b0383c24359ad95c7d333a6f18b98e50383f71b..976e2db71bfafe96e3729e4d4bc333874d98c084
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr113114.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32 -O -mearly-ldp-fusion -mlate-ldp-fusion" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated -O -mearly-ldp-fusion 
> -mlate-ldp-fusion" } */
> void foo_n(double *a) {
>   int i = 1;
>   for (; i < (int)foo_n; i++)
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
> b/gcc/testsuite/gcc.target/aarch64/pr80295.c
> index 
> b3866d8d6a9e5688f0eedb2fd7504547c412afa2..c79427517d0e61417dd5c0013f8db04ed91da449
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr80295.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated" } */
> 
> void f (void *b) 
> { 
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr94201.c 
> b/gcc/tes

Re: [PATCH] docs: Document new hardreg PRE pass

2025-01-10 Thread Andrew Carlotti
On Tue, Jan 07, 2025 at 06:17:02PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > I forgot to include this in the earlier patch; is this ok for master (once 
> > the
> > pass is merged, of course)?
> >
> > gcc/ChangeLog:
> >
> > * doc/passes.texi: Document hardreg PRE pass.
> >
> >
> > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > index 
> > 639f6b325c8be47bffd64269340c4dd8ea0f321c..5c2a174a7495404de48002f54902cce846e62b53
> >  100644
> > --- a/gcc/doc/passes.texi
> > +++ b/gcc/doc/passes.texi
> > @@ -959,6 +959,11 @@ global constant and  copy propagation.
> >  The source file for this pass is @file{gcse.cc}, and the LCM routines
> >  are in @file{lcm.cc}.
> >  
> > +A third version of this pass is run on some targets to optimise 
> > assignments to
> > +specific hard registers.  This can be used in cases where a register has a
> > +single purpose, such as specifying a mode as an extra input for specific
> > +instructions (when these modes cannot be handled in the mode switching 
> > pass).
> 
> LGTM, but how about adding ", @pxref{Mode switching optimization}" after
> "mode switching pass"?  (Untested.)
> 
> OK with that change, or without if it doesn't work.

I've committed the below patch, after inspecting the info, dvi and html output.

> 
> Thanks,
> Richard
> 
> > +
> >  @item Loop optimization
> >  
> >  This pass performs several loop related optimizations.



docs: Document new hardreg PRE pass.

gcc/ChangeLog:

* doc/passes.texi: Document hardreg PRE pass.


diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 
59a143292c78813db21ac3a7c05bca8bf5640e2d..282fc1a6a12b4f514dc7a629e8104f5374b18551
 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -959,6 +959,12 @@ global constant and  copy propagation.
 The source file for this pass is @file{gcse.cc}, and the LCM routines
 are in @file{lcm.cc}.
 
+A third version of this pass is run on some targets to optimise assignments to
+specific hard registers.  This can be used in cases where a register has a
+single purpose, such as specifying a mode as an extra input for specific
+instructions (@pxref{mode switching optimization} for another way of handling
+instruction modes).
+
 @item Loop optimization
 
 This pass performs several loop related optimizations.
@@ -1018,6 +1024,7 @@ combination approaches as well.
 The pass runs twice, once before register allocation and once after
 register allocation.  The code is located in @file{late-combine.cc}.
 
+@anchor{mode switching optimization}
 @item Mode switching optimization
 
 This pass looks for instructions that require the processor to be in a


Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Andreas Schwab
On Jan 10 2025, Wilco Dijkstra wrote:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 17fe2c64c1f85ad8db8b61f040aafe5f8212e488..6722ad5281541e499d5b3916179d9a4d1b39097f
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21472,6 +21472,8 @@ The default depends on the specific target 
> configuration.  Note that
>  the LP64 and ILP32 ABIs are not link-compatible; you must compile your
>  entire program with the same ABI, and link with a compatible set of 
> libraries.
>  
> +@samp{ilp32} is deprecated.

Please avoid starting a sentence with a lower case letter.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [RFC/RFA] [PR tree-optimization/92539] Improve code and avoid Warray-bounds false positive

2025-01-10 Thread Qing Zhao


> On Jan 10, 2025, at 03:00, Richard Biener  wrote:
> 
> On Thu, Jan 9, 2025 at 9:39 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Jan 9, 2025, at 14:10, Jeff Law  wrote:
>>> 
>>> 
>>> 
>>> On 1/9/25 10:48 AM, Qing Zhao wrote:
>>> 
> 
> I think Jeff's patch is not reasonable since it boils down to not diagnose
> -Warray-bounds but instead remove those stmts.
 If these stmts are dead-code that are generated by compiler optimization 
 (NOT from source code),
 removing them before diagnosis is correct. (To avoid false positive 
 warnings).
>>> But I don't think we generally know if the problematic statements came from 
>>> user code or were generated by the compiler.
>> 
>> To help the compiler catches real problems in the source code and avoid 
>> false positive warnings introduced by the compiler transformation, we might 
>> need to add flags in the IR to distinguish this?
> 
> Well, the issue is the problematic statements _are_ in user code, just
> -Warray-bounds is too stupid to
> look at SCEV for indices and instead relies on weaker value-ranges.

A little confused here: are you saying that the testing case of PR92539 has 
__conditional__ UB in the source code level? 
If so, could you please clarify this a little bit more? (From my understanding 
of the source code, I didn’t see
UB in the source code, do I miss anything obvious?)

thanks.

Qing
> 
> It's a problem we're never going to fully solve.  Some of the
> testcases show missed optimizations
> which we can work on.  Some show we diagnose IL we later are able to
> optimize away, some
> simply show that users are not always happy with how we decide on
> suppressing a diagnostic.
> 
> For the case at hand we should be able to optimize it fully.
> 
> But optimizing based on UB is always going to be to interact with
> diagnosing UB, so we have
> to be careful.  Our "late" diagnostics are most problematic here and
> I'd argue moving those
> earlier is the first thing we should try.
> 
> Richard.
> 
>> 
>> Qing
>>> 
>>> Jeff




[pushed][PR118017][LRA]: Fix test for i686

2025-01-10 Thread Vladimir Makarov

The commit message contains an explanation.
commit 94d8de53388793f4d5fc0d0aa00fef32ca4aa870
Author: Vladimir N. Makarov 
Date:   Fri Jan 10 10:36:24 2025 -0500

[PR118017][LRA]: Fix test for i686

My previous patch for PR118017 contains a test which fails on i686.  The patch fixes this.

gcc/testsuite/ChangeLog:

PR target/118017
* gcc.target/i386/pr118017.c: Check target int128.

diff --git a/gcc/testsuite/gcc.target/i386/pr118017.c b/gcc/testsuite/gcc.target/i386/pr118017.c
index c82d71e8d29..28797a0ad73 100644
--- a/gcc/testsuite/gcc.target/i386/pr118017.c
+++ b/gcc/testsuite/gcc.target/i386/pr118017.c
@@ -1,5 +1,5 @@
 /* PR target/118017 */
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-Og -frounding-math -mno-80387 -mno-mmx -Wno-psabi" } */
 
 typedef __attribute__((__vector_size__ (64))) _Float128 F;


Re: [PATCH] testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test

2025-01-10 Thread Richard Earnshaw (lists)
On 07/01/2025 20:16, Torbjörn SVENSSON wrote:
> Ok for trunk and releases/gcc-14?
> 
> --
> 
> Since armv8-m.base uses thumb1 that does not suport sigcall/tailcall,
> a pattern is needed that uses PUSH/BL/POP sequence instead of a single
> B instruction to reuse an already existing function in the compile unit.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.
> 
> Signed-off-by: Torbjörn SVENSSON 

OK.

R.



[PATCH] rtl-optimization/117467 - limit ext-dce memory use

2025-01-10 Thread Richard Biener
The following puts in a hard limit on ext-dce because it might end
up requiring memory on the order of the number of basic blocks
times the number of pseudo registers.  The limiting follows what
GCSE based passes do and thus I re-use --param max-gcse-memory here.

This doesn't in any way address the implementation issues of the pass,
but it reduces the memory-use when compiling the
module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I plan to push this later today unless I hear objection.

PR rtl-optimization/117467
PR rtl-optimization/117934
* ext-dce.cc (ext_dce_execute): Do nothing if a memory
allocation estimate exceeds what is allowed by
--param max-gcse-memory.
---
 gcc/ext-dce.cc | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 6cf64187349..e257e3bc873 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "df.h"
 #include "print-rtl.h"
 #include "dbgcnt.h"
+#include "diagnostic-core.h"
 
 /* These should probably move into a C++ class.  */
 static vec livein;
@@ -1110,6 +,21 @@ static bool ext_dce_rd_confluence_n (edge) { return 
true; }
 void
 ext_dce_execute (void)
 {
+  /* Limit the amount of memory we use for livein, with 4 bits per
+ reg per basic-block including overhead that maps to one byte
+ per reg per basic-block.  */
+  uint64_t memory_request
+= (uint64_t)n_basic_blocks_for_fn (cfun) * max_reg_num ();
+  if (memory_request / 1024 > (uint64_t)param_max_gcse_memory)
+{
+  warning (OPT_Wdisabled_optimization,
+  "ext-dce disabled: %d basic blocks and %d registers; "
+  "increase %<--param max-gcse-memory%> above %wu",
+  n_basic_blocks_for_fn (cfun), max_reg_num (),
+  memory_request / 1024);
+  return;
+}
+
   /* Some settings of SUBREG_PROMOTED_VAR_P are actively harmful
  to this pass.  Clear it for those cases.  */
   maybe_clear_subreg_promoted_p ();
-- 
2.43.0


Re: [PATCH] rtl: Remove invalid compare simplification [PR117186]

2025-01-10 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford
>  wrote:
>>
>> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
>> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
>> added code to treat:
>>
>>   (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))
>>
>> as a nop.  This PR shows that that isn't always correct.
>> The compare in the set above is between two 0/1 booleans (at least
>> on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
>> produced the incoming (reg:CC cc) is unconstrained; it could be between
>> arbitrary integers, or even floats.  The fold is therefore replacing a
>> cc that is valid for both signed and unsigned comparisons with one that
>> is only known to be valid for signed comparisons.
>>
>>   (gt (compare (gt cc 0) (lt cc 0) 0)
>>
>> does simplify to:
>>
>>   (gt cc 0)
>>
>> but:
>>
>>   (gtu (compare (gt cc 0) (lt cc 0) 0)
>>
>> does not simplify to:
>>
>>   (gtu cc 0)
>>
>> The optimisation didn't come with a testcase, but it was added for
>> i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
>> as it once did, since it's now conditional on -minline-all-stringops.
>> But the patch is almost 25 years old, so whatever the original
>> motivation was, it seems likely that other things now rely on it.
>>
>> It therefore seems better to try to preserve the optimisation on rtl
>> rather than get rid of it.  To do that, we need to look at how the
>> result of the outer compare is used.  We'd therefore be looking at four
>> instructions (the gt, the lt, the compare, and the use of the compare),
>> but combine already allows that for 3-instruction combinations thanks
>> to:
>>
>>   /* If the source is a COMPARE, look for the use of the comparison result
>>  and try to simplify it unless we already have used undobuf.other_insn.  
>> */
>>
>> When applied to boolean inputs, a comparison operator is
>> effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
>> simplify_logical_relational_operation already had code to simplify
>> logical operators between two comparison results, but:
>>
>> * It only handled IOR, which doesn't cover all the cases needed here.
>>   The others are easily added.
>>
>> * It treated comparisons of integers as having an ORDERED/UNORDERED result.
>>   Therefore:
>>
>>   * it would not treat "true for LT + EQ + GT" as "always true" for
>> comparisons between integers, because the mask excluded the UNORDERED
>> condition.
>>
>>   * it would try to convert "true for LT + GT" into LTGT even for comparisons
>> between integers.  To prevent an ICE later, the code used:
>>
>>/* Many comparison codes are only valid for certain mode classes.  */
>>if (!comparison_code_valid_for_mode (code, mode))
>>  return 0;
>>
>> However, this used the wrong mode, since "mode" is here the integer
>> result of the comparisons (and the mode of the IOR), not the mode of
>> the things being compared.  Thus the effect was to reject all
>> floating-point-only codes, even when comparing floats.
>>
>>   I think instead the code should detect whether the comparison is between
>>   integer values and remove UNORDERED from consideration if so.  It then
>>   always produces a valid comparison (or an always true/false result),
>>   and so comparison_code_valid_for_mode is not needed.  In particular,
>>   "true for LT + GT" becomes NE for comparisons between integers but
>>   remains LTGT for comparisons between floats.
>>
>> * There was a missing check for whether the comparison inputs had
>>   side effects.
>>
>> While there, it also seemed worth extending
>> simplify_logical_relational_operation to unsigned comparisons, since
>> that makes the testing easier.
>>
>> As far as that testing goes: the patch exhaustively tests all
>> combinations of integer comparisons in:
>>
>>   (cmp1 (cmp2 X Y) (cmp3 X Y))
>>
>> for the 10 integer comparisons, giving 1000 fold attempts in total.
>> It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
>> on the result of the fold, giving 9 checks per fold, or 9000 in total.
>> That's probably more than is typical for self-tests, but it seems to
>> complete in neglible time, even for -O0 builds.
>>
>> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>
> OK.

Thanks!

>> The patch isn't exactly a spot fix, and the bug is ancient, so I suppose
>> the patch probably isn't suitable for backports.
>
> Maybe for GCC 14, but not without some soaking time of course.

OK, I'll leave the PR open for that.

JTFR, I noticed while reading the patch back that I'd fluffed the
attempt to future-proof the aarch64 test.  The ls and lo conditions in:

>> +/*
>> +** f8:
>> +** (
>> +** cmp w0, w1
>> +** csetw0, hi
>> +** |
>> +** cmp w1, w0
>> +** csetw0, ls
>> +** )
>> +** ret
>> +*/
>> +int
>> +f8 (unsigned int x, unsigned int y)
>> +{
>> +  return (x < y) < (y < 

Re: [ping][PATCH] testsuite/118127: Pass fortran tests on ppc64le for IEEE128 long doubles

2025-01-10 Thread Siddhesh Poyarekar

On 2025-01-06 11:34, Jakub Jelinek wrote:

That looks incorrect to me.
ppc_ieee128_ok just means that one can use the __ieee128 type (and only if
-mfloat128 option is passed).
What the tests care is whether real(16) is IEEE128 or IBM128.
That is dependent on what glibc gcc has been configured against, with what
configure options and whether -mabi=ieeelongdouble or -mabi=ibmlongdouble
options were used.

The long_double_ibm128 and long_double_ieee128 effective targets would
be slightly closer to what you want, but they actually test whether
that is the case when one uses dg-add-options long_double_ibm128 or
dg-add-options long_double_ieee128.

For the xfail, guess you want a test which will check what is the default...

So, e.g. (for powerpc* only) try to compile
integer function foo ()
   integer, parameter :: kl = selected_real_kind (precision (0.0_8) + 1)
   foo = precision (0.0_kl)
end
and see if it returns 33 (that is IEEE quad) or 31 (IBM double double).


Thanks, I've sent a v2.

Sid


[PATCH] More memory leak fixes

2025-01-10 Thread Richard Biener
The following were found compiling SPEC CPU 2017 with valgrind.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

* tree-vect-slp.cc (vect_analyze_slp): Release saved_stmts
vector.
(vect_build_slp_tree_2): Release new_oprnds_info when not
used.
---
 gcc/tree-vect-slp.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 337506419d9..8188f6b07c5 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2678,6 +2678,8 @@ out:
  nops = 1;
  has_two_operators_perm = true;
}
+  else
+   vect_free_oprnd_info (new_oprnds_info);
 }
 
   auto_vec children;
@@ -4951,8 +4953,8 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size,
   max_tree_size, &limit,
   bst_map, NULL, force_single_lane);
}
- saved_stmts.release ();
}
+ saved_stmts.release ();
}
 
   /* Make sure to vectorize only-live stmts, usually inductions.  */
-- 
2.43.0


Re: [PATCH] Do not call cp_parser_omp_dispatch directly in cp_parser_pragma

2025-01-10 Thread Paul-Antoine Arras

Hi Tobias,

On 07/01/2025 12:13, Tobias Burnus wrote:

Paul-Antoine Arras wrote:

This is a followup to
ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args.

The call to cp_parser_omp_dispatch only belongs in 
cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling 
cp_parser_omp_construct.


I think this change is good - but not sufficient. For instance,
the following gives an ICE:

void k();
struct t {
  #pragma omp dispatch
   k();
};

I think that's context == pragma_member.



This amended patch checks the context and adds the suggested testcase.

Thanks,
--
PAcommit 7f91528dc54d260a489c749a8c5ccc004a96bfac
Author: Paul-Antoine Arras 
Date:   Mon Jan 6 16:06:43 2025 +0100

Do not call cp_parser_omp_dispatch directly in cp_parser_pragma

This is a followup to
ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args.

The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch
with cp_parser_omp_construct.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/dispatch-8.C: New test.

diff --git gcc/cp/parser.cc gcc/cp/parser.cc
index f548dc31c2b..e28c23141c0 100644
--- gcc/cp/parser.cc
+++ gcc/cp/parser.cc
@@ -53060,7 +53060,9 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
   break;
 
 case PRAGMA_OMP_DISPATCH:
-  cp_parser_omp_dispatch (parser, pragma_tok);
+  if (context != pragma_stmt && context != pragma_compound)
+	goto bad_stmt;
+  cp_parser_omp_construct (parser, pragma_tok, if_p);
   return true;
 
 case PRAGMA_IVDEP:
diff --git gcc/testsuite/g++.dg/gomp/dispatch-8.C gcc/testsuite/g++.dg/gomp/dispatch-8.C
new file mode 100644
index 000..b8e8e73db1f
--- /dev/null
+++ gcc/testsuite/g++.dg/gomp/dispatch-8.C
@@ -0,0 +1,10 @@
+// { dg-do compile }
+
+// Check that an appropriate diagnostic is emitted when a dispatch directive
+// appears in a pragma_member context.
+
+void k();
+struct t {
+ #pragma omp dispatch  // { dg-error "expected declaration specifiers before end of line" }
+  k();  // { dg-error ".*" }
+};


Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Wilco Dijkstra
Hi Kyrill,

>> Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and 
>> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
>> to the baseline tuning since all modern cores use it.  Fix the 
>> neoverse512tvb tuning to be
>> like Neoverse V1/V2.
>
> For neoversev512tvb this means adding AARCH64_EXTRA_TUNE_AVOID_PRED_RMW right?
> That’s fine by me.

Yes that was the intention.

> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS doesn’t exist anymore (i.e. it’s 
> implicitly on) so the patch needs to be updated.

I've rebased it to latest trunk - see v2 below.

Cheers,
Wilco


v2: Rebase to trunk, update neoverse512tvb.

Add AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT to the baseline tuning since 
all modern
cores use it.  Fix the neoverse512tvb extra tune to be like Neoverse V1/V2 by 
adding
AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.

gcc:
* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): 
Update.
* config/aarch64/tuning_models/cortexx925.h: Update.
* config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
* config/aarch64/tuning_models/neoversev3.h: Likewise.
* config/aarch64/tuning_models/neoversev3ae.h: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 
60967aac9037abe204ae1d0aabad31c1a3b4311b..1feff3beb348f45c254c5a7c346a1a9674dee362
 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -50,6 +50,7 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
 
 /* Baseline tuning settings suitable for all modern cores.  */
 #define AARCH64_EXTRA_TUNE_BASE (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND \
-| AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
+| AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA \
+| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT)
 
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
b/gcc/config/aarch64/tuning_models/cortexx925.h
index 
7d0162eae54c1823eff7b954d5e1d7564eb31dab..59e8c5f002fbb2d8e372b71575c796ba005e5413
 100644
--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -221,7 +221,6 @@ static const struct tune_params cortexx925_tunings =
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_BASE
| AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
| AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
   &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h 
b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
index 
5dc40243fe3846feffb8c54dd98d1797b45b672c..6790cb42be8e99ce37a2f20e440d66b5cbbb316b
 100644
--- a/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
+++ b/gcc/config/aarch64/tuning_models/fujitsu_monaka.h
@@ -54,8 +54,7 @@ static const struct tune_params fujitsu_monaka_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_BASE
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_BASE),   /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
index 
35de3f032963980f48ad05b3bea69c26fc8ac654..d3f5b5d26443ef428c3a5eec189782fbe0a56150
 100644
--- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
@@ -182,8 +182,7 @@ static const struct tune_params generic_armv8_a_tunings =
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_BASE
-   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags.  */
   &generic_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
b/gcc/config/aarch64/tuning_models/generic_armv9_

[pushed] c++: modules and DECL_REPLACEABLE_P

2025-01-10 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We need to remember that the ::operator new is replaceable to avoid a bogus
error about __builtin_operator_new finding a non-replaceable function.

This affected __get_temporary_buffer in stl_tempbuf.h.

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Write replaceable_operator.
(trees_in::core_bools): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/operator-2_a.C: New test.
* g++.dg/modules/operator-2_b.C: New test.
---
 gcc/cp/module.cc|  2 ++
 gcc/testsuite/g++.dg/modules/operator-2_a.C | 14 ++
 gcc/testsuite/g++.dg/modules/operator-2_b.C |  8 
 3 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/operator-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/operator-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7288c46a7ba..4fbe522264b 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5640,6 +5640,7 @@ trees_out::core_bools (tree t, bits_out& bits)
 
   WB (t->function_decl.has_debug_args_flag);
   WB (t->function_decl.versioned_function);
+  WB (t->function_decl.replaceable_operator);
 
   /* decl_type is a (misnamed) 2 bit discriminator. */
   unsigned kind = t->function_decl.decl_type;
@@ -5796,6 +5797,7 @@ trees_in::core_bools (tree t, bits_in& bits)
 
   RB (t->function_decl.has_debug_args_flag);
   RB (t->function_decl.versioned_function);
+  RB (t->function_decl.replaceable_operator);
 
   /* decl_type is a (misnamed) 2 bit discriminator. */
   unsigned kind = 0;
diff --git a/gcc/testsuite/g++.dg/modules/operator-2_a.C 
b/gcc/testsuite/g++.dg/modules/operator-2_a.C
new file mode 100644
index 000..0b1f6e80422
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/operator-2_a.C
@@ -0,0 +1,14 @@
+// { dg-additional-options -fmodules }
+// { dg-module-cmi M }
+
+module;
+
+#include 
+
+export module M;
+
+export template 
+inline T* alloc (__SIZE_TYPE__ n)
+{
+  return (T*) __builtin_operator_new (n * sizeof (T), std::nothrow_t{});
+};
diff --git a/gcc/testsuite/g++.dg/modules/operator-2_b.C 
b/gcc/testsuite/g++.dg/modules/operator-2_b.C
new file mode 100644
index 000..fb21ccb6d30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/operator-2_b.C
@@ -0,0 +1,8 @@
+// { dg-additional-options -fmodules }
+
+import M;
+
+int main()
+{
+  int *p = alloc(42);
+}

base-commit: 9193641d1695293006ed0b818bb4161a1b6fbed2
-- 
2.47.1



[PATCH] RISC-V: Fix riscv_modes_tieable_p

2025-01-10 Thread Zhijin Zeng
Integer values and floating-point values need to be converted
by fmv series instructions. So if mode1 is MODE_INT and mode2
is MODE_FLOAT, we should return false in riscv_modes_tieable_p,
and vice versa.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_modes_tieable_p):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/fwprop1-modes-tieable.c: New test.
---
 gcc/config/riscv/riscv.cc                     |  5 ++
 .../gcc.target/riscv/fwprop1-modes-tieable.c  | 80 +++
 2 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/fwprop1-modes-tieable.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 65e09842fde..58b3b8c726c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9753,6 +9753,11 @@ riscv_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
      E.g. V2SI and DI are not tieable.  */
   if (riscv_v_ext_mode_p (mode1) != riscv_v_ext_mode_p (mode2))
     return false;
+  if ((GET_MODE_CLASS (mode1) == MODE_FLOAT
+       && GET_MODE_CLASS (mode2) == MODE_INT)
+       || (GET_MODE_CLASS (mode2) == MODE_FLOAT
+       && GET_MODE_CLASS (mode1) == MODE_INT))
+    return false;
   return (mode1 == mode2
          || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
               && GET_MODE_CLASS (mode2) == MODE_FLOAT));
diff --git a/gcc/testsuite/gcc.target/riscv/fwprop1-modes-tieable.c 
b/gcc/testsuite/gcc.target/riscv/fwprop1-modes-tieable.c
new file mode 100644
index 000..05d775c31d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fwprop1-modes-tieable.c
@@ -0,0 +1,80 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -fdump-rtl-fwprop1" } */
+/* { dg-skip-if "" { *-*-* } {"-Os" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-rtl-dump-not 
"\\(and:DI^(\s|\n)?$\\(subreg:DI^(\s|\n)?$\\(reg/v:DF" "fwprop1" } } */
+
+#include 
+
+#define EXP_TABLE_BITS 7
+#define EXP_POLY_ORDER 5
+#define EXP2_POLY_ORDER 5
+struct exp_data
+{
+  double invln2N;
+  double shift;
+  double negln2hiN;
+  double negln2loN;
+  double poly[4]; /* Last four coefficients.  */
+  double exp2_shift;
+  double exp2_poly[EXP2_POLY_ORDER];
+  uint64_t tab[2*(1 << EXP_TABLE_BITS)];
+};
+
+extern struct exp_data __exp_data;
+
+#define N (1 << EXP_TABLE_BITS)
+#define InvLn2N __exp_data.invln2N
+#define NegLn2hiN __exp_data.negln2hiN
+#define NegLn2loN __exp_data.negln2loN
+#define Shift __exp_data.shift
+#define T __exp_data.tab
+#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
+#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
+#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
+#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
+
+static inline uint64_t
+asuint64 (double f)
+{
+  union
+  {
+    double f;
+    uint64_t i;
+  } u = {f};
+  return u.i;
+}
+
+static inline double
+asdouble (uint64_t i)
+{
+  union
+  {
+    uint64_t i;
+    double f;
+  } u = {i};
+  return u.f;
+}
+
+double
+__testexp (double x)
+{
+  uint64_t ki, idx, sbits;
+  double kd, z, r, scale, tmp;
+
+  z = InvLn2N * x;
+
+  kd = z + Shift;
+  ki = asuint64 (kd);
+  kd -= Shift;
+
+  r = kd * NegLn2hiN + kd * NegLn2loN;
+
+  idx = (ki % N);
+
+  sbits = T[idx];
+
+  tmp = (r * C3);
+
+  scale = asdouble (sbits);
+  return scale * tmp;
+}
--
2.25.1


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this message, please delete it and any attachment from your system and notify 
the sender immediately by reply e-mail. Unintended recipients should not use, 
copy, disclose or take any action based on this message or any information 
contained in this message. Emails cannot be guaranteed to be secure or error 
free as they can be intercepted, amended, lost or destroyed, and you should 
take full responsibility for security checking. 
 
本邮件及其任何附件具有保密性质,并可能受其他保护或不允许被披露给第三方。如阁下误收到本邮件,敬请立即以回复电子邮件的方式通知发件人,并将本邮件及其任何附件从阁下系统中予以删除。如阁下并非本邮件写明之收件人,敬请切勿使用、复制、披露本邮件或其任何内容,亦请切勿依本邮件或其任何内容而采取任何行动。电子邮件无法保证是一种安全和不会出现任何差错的通信方式,可能会被拦截、修改、丢失或损坏,收件人需自行负责做好安全检查。


Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Richard Sandiford
Akram Ahmad  writes:
> Ah whoops- I didn't see this before sending off V4 just now, my apologies.
> I'll try my best to get this implemented before the end of the day so that
> it doesn't miss the deadline.

No rush!  The delay here is entirely my fault, so no problem if the
patch lands early stage 4.

> On 09/01/2025 23:04, Richard Sandiford wrote:
>> Akram Ahmad  writes:
>>> In the above example, subtraction replaces the adds with subs and the
>>> csinv with csel. The 32-bit case follows the same approach. Arithmetic
>>> with a constant operand is simplified further by directly storing the
>>> saturating limit in the temporary register, resulting in only three
>>> instructions being used. It is important to note that this only works
>>> when early-ra is disabled due to an early-ra bug which erroneously
>>> assigns FP registers to the operands; if early-ra is enabled, then the
>>> original behaviour (NEON instruction) occurs.
>> This can be fixed by changing:
>>
>>  case CT_REGISTER:
>>if (REG_P (op) || SUBREG_P (op))
>>  return true;
>>break;
>>
>> to:
>>
>>  case CT_REGISTER:
>>if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH)
>>  return true;
>>break;
>>
>> But I can test & post that as a follow-up if you prefer.
> Yes please, if that's not too much trouble- would that have to go into
> another patch?

Yeah.  But early-ra pessimisations are regressions, since early-ra was
new to GCC 14.  So that can go in during stage 4 as well.

>>> +
>>>   ;; Double vector modes.
>>>   (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF])
>>>   
>>> diff --git 
>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>>  
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> new file mode 100644
>>> index 000..2b72be7b0d7
>>> --- /dev/null
>>> +++ 
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> @@ -0,0 +1,79 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
>>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>> +
>>> +/*
>>> +** uadd_lane: { xfail *-*-* }
>>> +** dup\tv([0-9]+).8b, w0
>>> +** uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1)
>>> +** umov\tw0, v\2.b\[0\]
>>> +** ret
>>> +*/
>> Whats the reason behind the xfail?  Is it the early-ra thing, or
>> something else?  (You might already have covered this, sorry.)
>>
>> xfailing is fine if it needs further optimisation, was just curious :)
> This is because of a missing pattern in match.pd (I've sent another 
> patch upstream
> to add the missing pattern, although it may have gotten lost). Once that 
> pattern is
> added though, this should be recognised as .SAT_SUB, and the new 
> instructions will
> appear.

Ah, great!

Thanks,
Richard


Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Andrew Pinski
On Fri, Jan 10, 2025 at 6:06 AM Wilco Dijkstra  wrote:
>
>
> ILP32 was originally intended to make porting to AArch64 easier.  Support was
> never merged in the Linux kernel or GLIBC, so it has been unsupported for many
> years.  There isn't a benefit in keeping unsupported features forever, so
> deprecate it now (and it could be removed in a future release).
>
> Passes regress & bootstrap, OK for commit?

Personally I would like this deprecated even for bare-metal. Yes the
iwatch ABI is an ILP32 ABI but I don't see GCC implementing that any
time soon and I suspect it would not be hard to resurrect the code at
that point.
I have only seen use of this outside of just testing and inside
Samsung and Huawei but for the most part both of them have not done
any maintenance to support ILP32. Well Huawei did file a bug a few
months ago about how the build is broken for libatomic since the
addition of LSE128 support:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118142; Took them 9
months to notice too.

Thanks,
Andrew

>
> gcc:
> * config/aarch64/aarch64.cc (aarch64_override_options): Add warning.
> * doc/invoke.texi: Document -mabi=ilp32 as deprecated.
>
> gcc/testsuite:
> * gcc.target/aarch64/inline-mem-set-pr112804.c: Add -Wno-deprecated.
> * gcc.target/aarch64/pr100518.c: Likewise.
> * gcc.target/aarch64/pr113114.c: Likewise.
> * gcc.target/aarch64/pr80295.c: Likewise.
> * gcc.target/aarch64/pr94201.c: Likewise.
> * gcc.target/aarch64/pr94577.c: Likewise.
> * gcc.target/aarch64/sve/pr108603.c: Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 78d2cc4bbe4933c79153d0741bfd8d7b076952d0..02891b0a8ed75eb596df9d0dbff77ccd6a625f11
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19315,6 +19315,8 @@ aarch64_override_options (void)
>if (TARGET_ILP32)
>  error ("assembler does not support %<-mabi=ilp32%>");
>  #endif
> +  if (TARGET_ILP32)
> +warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated.");
>
>/* Convert -msve-vector-bits to a VG count.  */
>aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 17fe2c64c1f85ad8db8b61f040aafe5f8212e488..6722ad5281541e499d5b3916179d9a4d1b39097f
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21472,6 +21472,8 @@ The default depends on the specific target 
> configuration.  Note that
>  the LP64 and ILP32 ABIs are not link-compatible; you must compile your
>  entire program with the same ABI, and link with a compatible set of 
> libraries.
>
> +@samp{ilp32} is deprecated.
> +
>  @opindex mbig-endian
>  @item -mbig-endian
>  Generate big-endian code.  This is the default when GCC is configured for an
> diff --git a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c 
> b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> index 
> fe8414559864db4a8584fd3f5a7145b5e3d1f322..276c10cd0e86ff2c74a5c09ce70f7d76614978ec
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> +++ b/gcc/testsuite/gcc.target/aarch64/inline-mem-set-pr112804.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-finline-stringops -mabi=ilp32 
> -ftrivial-auto-var-init=zero" } */
> +/* { dg-options "-finline-stringops -mabi=ilp32 -Wno-deprecated 
> -ftrivial-auto-var-init=zero" } */
>
>  short m(unsigned k) {
>const unsigned short *n[65];
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr100518.c 
> b/gcc/testsuite/gcc.target/aarch64/pr100518.c
> index 
> 5ca599f5d2e0e1603456b2eaf2e98866871faad1..177991cfb2289530e4ee3e3633fddde5972e9e28
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr100518.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr100518.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32 -mstrict-align -O2" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated -mstrict-align -O2" } */
>
>  int unsigned_range_min, unsigned_range_max, a11___trans_tmp_1;
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr113114.c 
> b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> index 
> 5b0383c24359ad95c7d333a6f18b98e50383f71b..976e2db71bfafe96e3729e4d4bc333874d98c084
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr113114.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr113114.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mabi=ilp32 -O -mearly-ldp-fusion -mlate-ldp-fusion" } */
> +/* { dg-options "-mabi=ilp32 -Wno-deprecated -O -mearly-ldp-fusion 
> -mlate-ldp-fusion" } */
>  void foo_n(double *a) {
>int i = 1;
>for (; i < (int)foo_n; i++)
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
> b/gcc/testsuite/gcc.target/aarch64/pr80295.c
> index 
> b3866d8d6a9e5688f0eedb2fd7504547c412afa2..c79427517d0e61417dd5c0013f8db04ed91da449
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr80295.c
> +++ b/gc

[PATCH 07/11] aarch64: Move arch/cpu parsing to aarch64-common.cc

2025-01-10 Thread Andrew Carlotti
Aside from moving the functions, the only changes are to make them
non-static, and to use the existing info arrays within aarch64-common.cc
instead of the info arrays remaining in aarch64.cc.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_all_extension_candidates): Move within file.
(aarch64_print_hint_for_extensions): Move from aarch64.cc.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_core): Ditto.
(enum aarch_parse_opt_result): Ditto.
(aarch64_parse_arch): Ditto.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_mtune): Ditto.
* config/aarch64/aarch64-protos.h
(aarch64_rewrite_selected_cpu): Move within file.
(aarch64_print_hint_for_extensions): Share function prototype.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_core): Ditto.
(enum aarch_parse_opt_result): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_mtune): Ditto.
(aarch64_get_all_extension_candidates): Unshare prototype.
* config/aarch64/aarch64.cc
(aarch64_parse_arch): Move to aarch64-common.cc.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_print_hint_for_core): Ditto.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_extensions): Ditto.
(aarch64_validate_mcpu): Ditto.
(aarch64_validate_march): Ditto.
(aarch64_validate_mtune): Ditto.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
4f4e363539b7b9311bfcb7a8b30b706000e50352..5cc00cd3b72807ec439c9c72af297d1ff5b2b679
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -205,6 +205,85 @@ static constexpr processor_info all_cores[] =
 };
 
 
+/* Append all architecture extension candidates to the CANDIDATES vector.  */
+
+void
+aarch64_get_all_extension_candidates (auto_vec *candidates)
+{
+  const struct extension_info *opt;
+  for (opt = all_extensions; opt->name != NULL; opt++)
+candidates->safe_push (opt->name);
+}
+
+/* Print a hint with a suggestion for an extension name
+   that most closely resembles what the user passed in STR.  */
+
+void
+aarch64_print_hint_for_extensions (const char *str)
+{
+  auto_vec candidates;
+  aarch64_get_all_extension_candidates (&candidates);
+  char *s;
+  const char *hint = candidates_list_and_hint (str, s, candidates);
+  if (hint)
+inform (input_location, "valid arguments are: %s;"
+" did you mean %qs?", s, hint);
+  else
+inform (input_location, "valid arguments are: %s", s);
+
+  XDELETEVEC (s);
+}
+
+/* Print a hint with a suggestion for an architecture name that most closely
+   resembles what the user passed in STR.  */
+
+void
+aarch64_print_hint_for_arch (const char *str)
+{
+  auto_vec candidates;
+  const struct arch_info *entry = all_architectures;
+  for (; entry->name != NULL; entry++)
+candidates.safe_push (entry->name);
+
+#ifdef HAVE_LOCAL_CPU_DETECT
+  /* Add also "native" as possible value.  */
+  candidates.safe_push ("native");
+#endif
+
+  char *s;
+  const char *hint = candidates_list_and_hint (str, s, candidates);
+  if (hint)
+inform (input_location, "valid arguments are: %s;"
+" did you mean %qs?", s, hint);
+  else
+inform (input_location, "valid arguments are: %s", s);
+
+  XDELETEVEC (s);
+}
+
+/* Print a hint with a suggestion for a core name that most closely resembles
+   what the user passed in STR.  */
+
+void
+aarch64_print_hint_for_core (const char *str)
+{
+  auto_vec candidates;
+  const struct processor_info *entry = all_cores;
+  for (; entry->name != NULL; entry++)
+candidates.safe_push (entry->name);
+
+  char *s;
+  const char *hint = candidates_list_and_hint (str, s, candidates);
+  if (hint)
+inform (input_location, "valid arguments are: %s;"
+" did you mean %qs?", s, hint);
+  else
+inform (input_location, "valid arguments are: %s", s);
+
+  XDELETEVEC (s);
+}
+
+
 /* Parse the architecture extension string STR and update ISA_FLAGS
with the architecture features turned on or off.  Return a
aarch_parse_opt_result describing the result.
@@ -275,16 +354,266 @@ aarch64_parse_extension (const char *str, 
aarch64_feature_flags *isa_flags,
   return AARCH_PARSE_OK;
 }
 
-/* Append all architecture extension candidates to the CANDIDATES vector.  */
+/* Parse the TO_PARSE string and put the architecture that it
+   selects into RES_ARCH and the architectural features into RES_FLAGS.
+   Return an aarch_parse_opt_result describing the parse result.
+   If there is an error parsing, RES_ARCH and RES_FLAGS are lef

[PATCH 08/11] aarch64: Inline aarch64_get_all_extension_candidates

2025-01-10 Thread Andrew Carlotti
gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_all_extension_candidates): Inline into...
(aarch64_print_hint_for_extensions): ...this.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
5cc00cd3b72807ec439c9c72af297d1ff5b2b679..0d0502a72687cb50e1dd66d9e4312386ee6096fe
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -205,16 +205,6 @@ static constexpr processor_info all_cores[] =
 };
 
 
-/* Append all architecture extension candidates to the CANDIDATES vector.  */
-
-void
-aarch64_get_all_extension_candidates (auto_vec *candidates)
-{
-  const struct extension_info *opt;
-  for (opt = all_extensions; opt->name != NULL; opt++)
-candidates->safe_push (opt->name);
-}
-
 /* Print a hint with a suggestion for an extension name
that most closely resembles what the user passed in STR.  */
 
@@ -222,7 +212,10 @@ void
 aarch64_print_hint_for_extensions (const char *str)
 {
   auto_vec candidates;
-  aarch64_get_all_extension_candidates (&candidates);
+  const struct extension_info *opt;
+  for (opt = all_extensions; opt->name != NULL; opt++)
+candidates.safe_push (opt->name);
+
   char *s;
   const char *hint = candidates_list_and_hint (str, s, candidates);
   if (hint)



[PATCH 09/11] aarch64: Rewrite architecture strings for assembler

2025-01-10 Thread Andrew Carlotti
Add infrastructure to allow rewriting the architecture strings passed to
the assembler (either as -march options or .arch directives).  There was
already canonicalisation everywhere except for an -march driver option
passed directly to the compiler; this patch applies the same
canonicalisation there as well.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_arch_string_for_assembler): New.
(aarch64_rewrite_march): New.
(aarch64_rewrite_selected_cpu): Call new function.
* config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
* config/aarch64/aarch64-protos.h
(aarch64_get_arch_string_for_assembler): New.
* config/aarch64/aarch64.cc
(aarch64_declare_function_name): Call new function.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h
* config/aarch64/aarch64.h
(EXTRA_SPEC_FUNCTIONS): Use new macro name.
(MCPU_TO_MARCH_SPEC): Rename to...
(MARCH_REWRITE_SPEC): ...this, and add new spec rule.
(aarch64_rewrite_march): New declaration.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
(MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function.
(ASM_CPU_SPEC): Use new macro name.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
0d0502a72687cb50e1dd66d9e4312386ee6096fe..297210e3809255d51b1aff4c827501534fae9546
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -697,6 +697,50 @@ aarch64_get_extension_string_for_isa_flags
   return outstr;
 }
 
+/* Generate an arch string to be passed to the assembler.  */
+
+std::string
+aarch64_get_arch_string_for_assembler (aarch64_arch arch,
+  aarch64_feature_flags flags)
+{
+  const struct arch_info *entry;
+  for (entry = all_architectures; entry->arch != aarch64_no_arch; entry++)
+{
+  if (entry->arch == arch)
+   break;
+}
+
+  std::string outstr = entry->name
+   + aarch64_get_extension_string_for_isa_flags (flags, entry->flags);
+
+  return outstr;
+}
+
+/* Called by the driver to rewrite a name passed to the -march
+   argument in preparation to be passed to the assembler.  The
+   names passed from the commend line will be in ARGV, we want
+   to use the right-most argument, which should be in
+   ARGV[ARGC - 1].  ARGC should always be greater than 0.  */
+
+const char *
+aarch64_rewrite_march (int argc, const char **argv)
+{
+  gcc_assert (argc);
+  const char *name = argv[argc - 1];
+  aarch64_arch arch;
+  aarch64_feature_flags flags;
+
+  aarch64_validate_march (name, &arch, &flags);
+
+  std::string outstr = aarch64_get_arch_string_for_assembler (arch, flags);
+
+  /* We are going to memory leak here, nobody elsewhere
+ in the callchain is going to clean up after us.  The alternative is
+ to allocate a static buffer, and assert that it is big enough for our
+ modified string, which seems much worse!  */
+  return xstrdup (outstr.c_str ());
+}
+
 /* Attempt to rewrite NAME, which has been passed on the command line
as a -mcpu option to an equivalent -march value.  If we can do so,
return the new string, otherwise return an error.  */
@@ -740,7 +784,7 @@ aarch64_rewrite_selected_cpu (const char *name)
break;
 }
 
-  /* We couldn't find that proceesor name, or the processor name we
+  /* We couldn't find that processor name, or the processor name we
  found does not map to an architecture we understand.  */
   if (p_to_a->arch == aarch64_no_arch
   || a_to_an->arch == aarch64_no_arch)
@@ -749,9 +793,8 @@ aarch64_rewrite_selected_cpu (const char *name)
   aarch64_feature_flags extensions = p_to_a->flags;
   aarch64_parse_extension (extension_str.c_str (), &extensions, NULL);
 
-  std::string outstr = a_to_an->name
-   + aarch64_get_extension_string_for_isa_flags (extensions,
- a_to_an->flags);
+  std::string outstr = aarch64_get_arch_string_for_assembler (a_to_an->arch,
+ extensions);
 
   /* We are going to memory leak here, nobody elsewhere
  in the callchain is going to clean up after us.  The alternative is
diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index 
b2f13be2dab5931f19d62fc29febceb98baf6fee..f6ebb723715ad0f092f14f06e733eff2b4fe3a1e
 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -136,7 +136,6 @@
 #define ASM_SPEC "\
 %{mbig-endian:-EB} \
 %{mlittle-endian:-EL} \
-%{march=*:-march=%*} \
 %(asm_cpu_spec)" \
 ASM_MABI_SPEC
 #endif
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
4114fc9b3b7645b8781257f6f775ddfe7e8c339e..b27da1e25720da06712da0eff1d527e23408a59f
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/co

[PATCH 05/11] aarch64: Adjust option parsing parameter types.

2025-01-10 Thread Andrew Carlotti
Replace `const struct processor *` in output parameters with
`aarch64_arch` or `aarch64_cpu`.

Replace `std:string` parameter in aarch64_print_hint_for_extensions with
`char *`.

Also name the return parameters more clearly and consistently.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_print_hint_for_extensions): Receive string as a char *.
(aarch64_parse_arch): Don't return a const struct processor *.
(aarch64_parse_cpu): Ditto.
(aarch64_parse_tune): Ditto.
(aarch64_validate_mtune): Ditto.
(aarch64_validate_mcpu): Ditto, and use temporary variables for
march/mcpu cross-check.
(aarch64_validate_march): Ditto.
(aarch64_override_options): Adjust for changed parameter types.
(aarch64_handle_attr_arch): Ditto.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_tune): Ditto.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d8a2916d8230cc25122f21818b88fd347e72693a..9b44d08f3e5fe6b4a7aa8f040e7001e3070b362d
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18217,16 +18217,16 @@ better_main_loop_than_p (const vector_costs 
*uncast_other) const
 
 static void initialize_aarch64_code_model (struct gcc_options *);
 
-/* Parse the TO_PARSE string and put the architecture struct that it
-   selects into RES and the architectural features into ISA_FLAGS.
+/* Parse the TO_PARSE string and put the architecture that it
+   selects into RES_ARCH and the architectural features into RES_FLAGS.
Return an aarch_parse_opt_result describing the parse result.
-   If there is an error parsing, RES and ISA_FLAGS are left unchanged.
+   If there is an error parsing, RES_ARCH and RES_FLAGS are left unchanged.
When the TO_PARSE string contains an invalid extension,
a copy of the string is created and stored to INVALID_EXTENSION.  */
 
 static enum aarch_parse_opt_result
-aarch64_parse_arch (const char *to_parse, const struct processor **res,
-   aarch64_feature_flags *isa_flags,
+aarch64_parse_arch (const char *to_parse, aarch64_arch *res_arch,
+   aarch64_feature_flags *res_flags,
std::string *invalid_extension)
 {
   const char *ext;
@@ -18250,21 +18250,21 @@ aarch64_parse_arch (const char *to_parse, const 
struct processor **res,
   if (strlen (arch->name) == len
  && strncmp (arch->name, to_parse, len) == 0)
{
- auto isa_temp = arch->flags;
+ auto isa_flags = arch->flags;
 
  if (ext != NULL)
{
  /* TO_PARSE string contains at least one extension.  */
  enum aarch_parse_opt_result ext_res
-   = aarch64_parse_extension (ext, &isa_temp, invalid_extension);
+   = aarch64_parse_extension (ext, &isa_flags, invalid_extension);
 
  if (ext_res != AARCH_PARSE_OK)
return ext_res;
}
  /* Extension parsing was successful.  Confirm the result
 arch and ISA flags.  */
- *res = arch;
- *isa_flags = isa_temp;
+ *res_arch = arch->arch;
+ *res_flags = isa_flags;
  return AARCH_PARSE_OK;
}
 }
@@ -18273,16 +18273,16 @@ aarch64_parse_arch (const char *to_parse, const 
struct processor **res,
   return AARCH_PARSE_INVALID_ARG;
 }
 
-/* Parse the TO_PARSE string and put the result tuning in RES and the
-   architecture flags in ISA_FLAGS.  Return an aarch_parse_opt_result
-   describing the parse result.  If there is an error parsing, RES and
-   ISA_FLAGS are left unchanged.
+/* Parse the TO_PARSE string and put the result tuning in RES_CPU and the
+   architecture flags in RES_FLAGS.  Return an aarch_parse_opt_result
+   describing the parse result.  If there is an error parsing, RES_CPU and
+   RES_FLAGS are left unchanged.
When the TO_PARSE string contains an invalid extension,
a copy of the string is created and stored to INVALID_EXTENSION.  */
 
 static enum aarch_parse_opt_result
-aarch64_parse_cpu (const char *to_parse, const struct processor **res,
-  aarch64_feature_flags *isa_flags,
+aarch64_parse_cpu (const char *to_parse, aarch64_cpu *res_cpu,
+  aarch64_feature_flags *res_flags,
   std::string *invalid_extension)
 {
   const char *ext;
@@ -18305,21 +18305,21 @@ aarch64_parse_cpu (const char *to_parse, const struct 
processor **res,
 {
   if (strlen (cpu->name) == len && strncmp (cpu->name, to_parse, len) == 0)
{
- auto isa_temp = cpu->flags;
+ auto isa_flags = cpu->flags;
 
  if (ext != NULL)
{
  /* TO_PARSE string contains at least one extension.  */
  enum aarch_parse_opt_result ext_res
-   = aarch64_parse_extension (ext, &isa_temp, invalid_extension);
+   = aarch64_parse_extension (ext, &isa_flags, invalid_extension);
 
   

[PATCH 01/11] aarch64: Improve mcpu/march conflict check

2025-01-10 Thread Andrew Carlotti
Features from a cpu or base architecture that were explicitly disabled
by a +nofeat option were being incorrectly added back in before checking
for conflicts between -mcpu and -march options.  This patch instead
compares the returned feature masks directly.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Compare
returned feature masks directly.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/target_attr_crypto_ice_1.c: Prune warning.
* gcc.target/aarch64/target_attr_crypto_ice_2.c: Ditto.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
ad31e9d255c05dda00c7c2b4755ccec33ae2c83d..330a04c147a97bcd99d6819290d7f82ff5066a44
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19282,13 +19282,10 @@ aarch64_override_options (void)
 cpu features would end up disabling an achitecture feature.  In
 otherwords the cpu features need to be a strict superset of the arch
 features and if so prefer the -march ISA flags.  */
-  auto full_arch_flags = arch->flags | arch_isa;
-  auto full_cpu_flags = cpu->flags | cpu_isa;
-  if (~full_cpu_flags & full_arch_flags)
+  if (~cpu_isa & arch_isa)
{
  std::string ext_diff
-   = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
- full_cpu_flags);
+   = aarch64_get_extension_string_for_isa_flags (arch_isa, cpu_isa);
  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
  "and resulted in options %qs being added",
   aarch64_cpu_string,
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_1.c 
b/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_1.c
index 
3b354c0611092b1fb66e4a9c2098a9806c749825..f13e5e2560cd43aab570ab5d240e4cf1975d2f12
 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mcpu=thunderx+nofp -march=armv8-a" } */
+/* { dg-prune-output "warning: switch .* conflicts" } */
 
 #include "arm_neon.h"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c 
b/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c
index 
d0a62b83351b671d157ec0de083d681394056d79..ab2549228a7fa06aa26592e02d0d2055f6b990ed
 100644
--- a/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_crypto_ice_2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mcpu=thunderx+nofp -march=armv8-a" } */
+/* { dg-prune-output "warning: switch .* conflicts" } */
 
 /* Make sure that we don't ICE when dealing with vector parameters
in a simd-tagged function within a non-simd translation unit.  */


  1   2   >