RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-08 Thread Li, Pan2
Thanks Richard for comments.

> That said - I'd avoid canonicalizing this via match.pd given that
> inevitably will if-convert.

I see, if no more concern I will revert the simplify merged into match.pd.

> Instead I'd see it as a way to provide a generic .SAT_* expansion
> though one could say we should then simply implement fallback expansion
> for the internal function.  It's also not necessarily your
> responsibility to implement
> this since risc-v does have .SAT_* expanders, so does x86.

Got it, will have a try after some clean/refactor for the matching pattern as 
we discussed in previous.
It may look like below as I understanding.

if (SAT_ADD_SUPPORTED (...))
  return target_implemented_expanders (...);

return fallback_expansion (..);

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, November 8, 2024 4:03 PM
To: Li, Pan2 
Cc: Jeff Law ; Tamar Christina 
; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD 
into branchless

On Fri, Nov 8, 2024 at 12:34 AM Li, Pan2  wrote:
>
> Thanks Tamar and Jeff for comments.
>
> > I'm not sure it's that simple.  It'll depend on the micro-architecture.
> > So things like strength of the branch predictors, how fetch blocks are
> > handled (can you have embedded not-taken branches, short-forward-branch
> > optimizations, etc).
>
> > After:
> >
> > .L.sat_add_u_1(unsigned int, unsigned int):
> >  add 4,3,4
> >  rldicl 9,4,0,32
> >  subf 3,3,9
> >  sradi 3,3,63
> >  or 3,3,4
> >  rldicl 3,3,0,32
> >  blr
> >
> > and before
> >
> > .L.sat_add_u_1(unsigned int, unsigned int):
> >  add 4,3,4
> >  cmplw 0,4,3
> >  bge 0,.L2
> >  li 4,-1
> > .L2:
> >  rldicl 3,4,0,32
> >  blr
>
> I am not familiar with branch prediction, but the branch should be 50% token 
> and 50% not-token
> according to the range of sat add input. It is the worst case for branch 
> prediction? I mean if we call
> 100 times with token, not-token, token, not-token... sequence, the branch 
> version will be still faster?
> Feel free to correct me if I'm wrong.
>
> Back to these 16 forms of sat add as below, is there any suggestion which one 
> or two form(s) may be
> cheaper than others from the perspective of gimple IR? Independent with the 
> backend implemented SAT_ADD or not.
>
> #define DEF_SAT_U_ADD_1(T)   \
> T sat_u_add_##T##_1 (T x, T y)   \
> {\
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
>
> #define DEF_SAT_U_ADD_2(T)  \
> T sat_u_add_##T##_2 (T x, T y)  \
> {   \
>   return (T)(x + y) < x ? -1 : (x + y); \
> }
>
> #define DEF_SAT_U_ADD_3(T)   \
> T sat_u_add_##T##_3 (T x, T y)   \
> {\
>   return x <= (T)(x + y) ? (x + y) : -1; \
> }
>
> #define DEF_SAT_U_ADD_4(T)  \
> T sat_u_add_##T##_4 (T x, T y)  \
> {   \
>   return x > (T)(x + y) ? -1 : (x + y); \
> }
>
> #define DEF_SAT_U_ADD_5(T)  \
> T sat_u_add_##T##_1 (T x, T y)  \
> {   \
>   if ((T)(x + y) >= x)  \
> return x + y;   \
>   else  \
> return -1;  \
> }
>
> #define DEF_SAT_U_ADD_6(T)  \
> T sat_u_add_##T##_6 (T x, T y)  \
> {   \
>   if ((T)(x + y) < x)   \
> return -1;  \
>   else  \
> return x + y;   \
> }
>
> #define DEF_SAT_U_ADD_7(T)  \
> T sat_u_add_##T##_7 (T x, T y)  \
> {   \
>   if (x <= (T)(x + y))  \
> return x + y;   \
>   else  \
> return -1;  \
> }
>
> #define DEF_SAT_U_ADD_8(T)  \
> T sat_u_add_##T##_8 (T x, T y)  \
> {   \
>   if (x > (T)(x + y))   \
> return -1;  \
>   else  \
> return x + y;   \
> }
>
> #define DEF_SAT_U_ADD_9(T) \
> T sat_u_add_##T##_9 (T x, T y) \
> {  \
>   T ret;   \
>   return __builtin_add_overflow (x, y, &ret) == 0 ? ret : - 1; \
> }
>
> #define DEF_SAT_U_ADD_10(T)\
> T sat_u_add_##T##_10 (T x, T y)\
> {  \
>   T ret;   \
>   return !__builtin_add_overflow (x, y, &ret) ? ret : - 1; \
> }
>
> #define DEF_SAT_U_ADD_11(T) \
> T sat_u_add_##T##_11 (T x, T y)

[PATCH 08/12] libstdc++: Remove _Insert base class from _Hashtable

2024-11-08 Thread Jonathan Wakely
There's no reason to have a separate base class defining the insert
member functions now. They can all be moved into the _Hashtable class,
which simplifies them slightly.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Remove inheritance from
__detail::_Insert and move its members into _Hashtable.
* include/bits/hashtable_policy.h (__detail::_Insert): Remove.

Reviewed-by: François Dumont 
---
 libstdc++-v3/include/bits/hashtable.h| 164 ++--
 libstdc++-v3/include/bits/hashtable_policy.h | 195 ---
 2 files changed, 144 insertions(+), 215 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index a46a94e2ecd..9db568a1f63 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -169,7 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
*  Functionality is implemented by decomposition into base classes,
*  where the derived _Hashtable class is used in _Map_base,
-   *  _Insert, _Rehash_base, and _Equality base classes to access the
+   *  _Rehash_base, and _Equality base classes to access the
*  "this" pointer. _Hashtable_base is used in the base classes as a
*  non-recursive, fully-completed-type so that detailed nested type
*  information, such as iterator type and node type, can be
@@ -180,7 +180,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  Base class templates are:
*- __detail::_Hashtable_base
*- __detail::_Map_base
-   *- __detail::_Insert
*- __detail::_Rehash_base
*- __detail::_Equality
*/
@@ -194,9 +193,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   public __detail::_Map_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 _Hash, _RangeHash, _Unused,
 _RehashPolicy, _Traits>,
-  public __detail::_Insert<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-  _Hash, _RangeHash, _Unused,
-  _RehashPolicy, _Traits>,
   public __detail::_Rehash_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
_Hash, _RangeHash, _Unused,
_RehashPolicy, _Traits>,
@@ -237,10 +233,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __node_base_ptr = typename __hashtable_alloc::__node_base_ptr;
   using __buckets_ptr = typename __hashtable_alloc::__buckets_ptr;
 
-  using __insert_base = __detail::_Insert<_Key, _Value, _Alloc, 
_ExtractKey,
- _Equal, _Hash,
- _RangeHash, _Unused,
- _RehashPolicy, _Traits>;
   using __enable_default_ctor
= _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
   using __rehash_guard_t
@@ -259,9 +251,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef value_type&  reference;
   typedef const value_type&
const_reference;
 
-  using iterator = typename __insert_base::iterator;
+  using iterator
+   = __detail::_Node_iterator<_Value, __constant_iterators::value,
+  __hash_cached::value>;
 
-  using const_iterator = typename __insert_base::const_iterator;
+  using const_iterator
+   = __detail::_Node_const_iterator<_Value, __constant_iterators::value,
+__hash_cached::value>;
 
   using local_iterator = __detail::_Local_iterator,
+iterator>;
 
   using __map_base = __detail::_Map_base<_Key, _Value, _Alloc, _ExtractKey,
 _Equal, _Hash, _RangeHash, _Unused,
@@ -355,12 +353,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool _Unique_keysa>
friend struct __detail::_Map_base;
 
-  template
-   friend struct __detail::_Insert;
-
   template
+   void
+   _M_insert_range_multi(_InputIterator __first, _InputIterator __last);
+
 public:
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
@@ -980,9 +976,107 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  else
return _M_emplace_multi(__hint, std::forward<_Args>(__args)...);
}
-#pragma GCC diagnostic pop
 
-  // Insert member functions via inheritance.
+  // Insert
+  __ireturn_type
+  insert(const value_type& __v)
+  {
+   if constexpr (__unique_keys::value)
+ return _M_emplace_uniq(__v);
+   else
+ return _M_emplace_multi(cend(), __v);
+  }
+
+  iterator
+  insert(const_iterator __hint, const value_type& __v)
+  {
+   if constexpr (__unique_keys::value)
+ return _M_emplace_uniq(__v).first;
+   else
+ return _M_emplace_multi(__hint, __v);
+  }
+
+

[PATCH 04/12] libstdc++: Refactor Hashtable erasure

2024-11-08 Thread Jonathan Wakely
This reworks the internal member functions for erasure from
unordered containers, similarly to the earlier commit doing it for
insertion.

Instead of multiple overloads of _M_erase which are selected via tag
dispatching, the erase(const key_type&) member can use 'if constexpr' to
choose an appropriate implementation (returning after erasing a single
element for unique keys, or continuing to erase all equivalent elements
for non-unique keys).

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::_M_erase): Remove
overloads for erasing by key, moving logic to ...
(_Hashtable::erase): ... here.

Reviewed-by: François Dumont 
---
 libstdc++-v3/include/bits/hashtable.h | 109 +-
 1 file changed, 37 insertions(+), 72 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index eeffa15e525..23484f711cc 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -955,12 +955,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
iterator
_M_emplace_multi(const_iterator, _Args&&... __args);
 
-  size_type
-  _M_erase(true_type __uks, const key_type&);
-
-  size_type
-  _M_erase(false_type __uks, const key_type&);
-
   iterator
   _M_erase(size_type __bkt, __node_base_ptr __prev_n, __node_ptr __n);
 
@@ -1002,8 +996,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return erase(const_iterator(__it)); }
 
   size_type
-  erase(const key_type& __k)
-  { return _M_erase(__unique_keys{}, __k); }
+  erase(const key_type& __k);
 
   iterator
   erase(const_iterator, const_iterator);
@@ -2372,6 +2365,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __result;
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   template::
-_M_erase(true_type /* __uks */, const key_type& __k)
+erase(const key_type& __k)
 -> size_type
 {
   __node_base_ptr __prev_n;
@@ -2409,77 +2404,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __n = static_cast<__node_ptr>(__prev_n->_M_nxt);
}
 
-  _M_erase(__bkt, __prev_n, __n);
-  return 1;
-}
-
-  template
-auto
-_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-  _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-_M_erase(false_type /* __uks */, const key_type& __k)
--> size_type
-{
-  std::size_t __bkt;
-  __node_base_ptr __prev_n;
-  __node_ptr __n;
-  if (size() <= __small_size_threshold())
+  if constexpr (__unique_keys::value)
{
- __prev_n = _M_find_before_node(__k);
- if (!__prev_n)
-   return 0;
-
- // We found a matching node, erase it.
- __n = static_cast<__node_ptr>(__prev_n->_M_nxt);
- __bkt = _M_bucket_index(*__n);
+ _M_erase(__bkt, __prev_n, __n);
+ return 1;
}
   else
{
- __hash_code __code = this->_M_hash_code(__k);
- __bkt = _M_bucket_index(__code);
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 526. Is it undefined if a function in the standard changes
+ // in parameters?
+ // We use one loop to find all matching nodes and another to
+ // deallocate them so that the key stays valid during the first loop.
+ // It might be invalidated indirectly when destroying nodes.
+ __node_ptr __n_last = __n->_M_next();
+ while (__n_last && this->_M_node_equals(*__n, *__n_last))
+   __n_last = __n_last->_M_next();
 
- // Look for the node before the first matching node.
- __prev_n = _M_find_before_node(__bkt, __k, __code);
- if (!__prev_n)
-   return 0;
+ std::size_t __n_last_bkt
+   = __n_last ? _M_bucket_index(*__n_last) : __bkt;
 
- __n = static_cast<__node_ptr>(__prev_n->_M_nxt);
+ // Deallocate nodes.
+ size_type __result = 0;
+ do
+   {
+ __node_ptr __p = __n->_M_next();
+ this->_M_deallocate_node(__n);
+ __n = __p;
+ ++__result;
+   }
+ while (__n != __n_last);
+
+ _M_element_count -= __result;
+ if (__prev_n == _M_buckets[__bkt])
+   _M_remove_bucket_begin(__bkt, __n_last, __n_last_bkt);
+ else if (__n_last_bkt != __bkt)
+   _M_buckets[__n_last_bkt] = __prev_n;
+ __prev_n->_M_nxt = __n_last;
+ return __result;
}
-
-  // _GLIBCXX_RESOLVE_LIB_DEFECTS
-  // 526. Is it undefined if a function in the standard changes
-  // in parameters?
-  // We use one loop to find all matching nodes and another to deallocate
-  // them so that the key stays valid during the first loop. It might be
-  // invalidated indirectly when destroying nodes.
-  __node_ptr __n_last = __n->_M_next();
-  while (__n_last && this->_M_node_equals(*__n, *__n_last))
- 

[PATCH 09/12] libstdc++: Remove _Equality base class from _Hashtable

2024-11-08 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Remove _Equality base
class.
(_Hashtable::_M_equal): Define equality comparison here instead
of in _Equality::_M_equal.
* include/bits/hashtable_policy.h (_Equality): Remove.
---
 libstdc++-v3/include/bits/hashtable.h| 111 +++---
 libstdc++-v3/include/bits/hashtable_policy.h | 147 ---
 2 files changed, 94 insertions(+), 164 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 9db568a1f63..7b0a684a2d2 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -168,8 +168,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  not throw and this is enforced by a static assertion.
*
*  Functionality is implemented by decomposition into base classes,
-   *  where the derived _Hashtable class is used in _Map_base,
-   *  _Rehash_base, and _Equality base classes to access the
+   *  where the derived _Hashtable class is used in _Map_base and
+   *  _Rehash_base base classes to access the
*  "this" pointer. _Hashtable_base is used in the base classes as a
*  non-recursive, fully-completed-type so that detailed nested type
*  information, such as iterator type and node type, can be
@@ -181,7 +181,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*- __detail::_Hashtable_base
*- __detail::_Map_base
*- __detail::_Rehash_base
-   *- __detail::_Equality
*/
   template,
-  public __detail::_Equality<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-_Hash, _RangeHash, _Unused,
-_RehashPolicy, _Traits>,
   private __detail::_Hashtable_alloc<
__alloc_rebind<_Alloc,
   __detail::_Hash_node<_Value,
@@ -293,10 +289,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Hash, _RangeHash, _Unused,
   _RehashPolicy, _Traits>;
 
-  using __eq_base = __detail::_Equality<_Key, _Value, _Alloc, _ExtractKey,
-   _Equal, _Hash, _RangeHash, _Unused,
-   _RehashPolicy, _Traits>;
-
   using __node_builder_t = __detail::_NodeBuilder<_ExtractKey>;
 
   // Simple RAII type for managing a node containing an element
@@ -353,13 +345,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool _Unique_keysa>
friend struct __detail::_Map_base;
 
-  template
-   friend struct __detail::_Equality;
-
 public:
   using size_type = typename __hashtable_base::size_type;
   using difference_type = typename __hashtable_base::difference_type;
@@ -1300,6 +1285,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 #endif // C++17 __glibcxx_node_extract
 
+  bool
+  _M_equal(const _Hashtable& __other) const;
+
 private:
   // Helper rehash method used when keys are unique.
   void _M_rehash(size_type __bkt_count, true_type __uks);
@@ -2798,6 +2786,95 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_buckets = __new_buckets;
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
+
+  // This is for implementing equality comparison for unordered containers,
+  // per N3068, by John Lakos and Pablo Halpern.
+  // Algorithmically, we follow closely the reference implementations therein.
+  template
+bool
+_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+  _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
+_M_equal(const _Hashtable& __other) const
+{
+  if (size() != __other.size())
+   return false;
+
+  if constexpr (__unique_keys::value)
+   for (auto __x_n = _M_begin(); __x_n; __x_n = __x_n->_M_next())
+ {
+   std::size_t __ybkt = __other._M_bucket_index(*__x_n);
+   auto __prev_n = __other._M_buckets[__ybkt];
+   if (!__prev_n)
+ return false;
+
+   for (__node_ptr __n = static_cast<__node_ptr>(__prev_n->_M_nxt);;
+__n = __n->_M_next())
+ {
+   if (__n->_M_v() == __x_n->_M_v())
+ break;
+
+   if (!__n->_M_nxt
+   || __other._M_bucket_index(*__n->_M_next()) != __ybkt)
+ return false;
+ }
+ }
+  else // non-unique keys
+   for (auto __x_n = _M_begin(); __x_n;)
+ {
+   std::size_t __x_count = 1;
+   auto __x_n_end = __x_n->_M_next();
+   for (; __x_n_end
+  && key_eq()(_ExtractKey{}(__x_n->_M_v()),
+  _ExtractKey{}(__x_n_end->_M_v()));
+__x_n_end = __x_n_end->_M_next())
+ ++__x_count;
+
+   std::size_t __ybkt = __other._M_bucket_index(*__x_n);
+   auto __y_prev_n = __other._M_buckets[__ybkt];
+  

Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Jakub Jelinek
On Fri, Nov 08, 2024 at 05:44:48PM +, Richard Sandiford wrote:
> It's for https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667499.html ,
> which needs to switch to the simd clone's chosen target (SVE) in order
> to construct the correct types.  Currently the patch uses:
> 
> +  cl_target_option_save (&cur_target, &global_options, 
> &global_options_set);
> +  tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
> +  cl_target_option_restore (&global_options, &global_options_set,
> + TREE_TARGET_OPTION (new_target));
> +  aarch64_override_options_internal (&global_options);
> +  memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
> +   sizeof (have_regs_of_mode));
> +  for (int i = 0; i < NUM_MACHINE_MODES; ++i)
> + if (aarch64_sve_mode_p ((machine_mode) i))
> +   have_regs_of_mode[i] = true;
> 
> to switch in and:
> 
> +  /* Restore current options.  */
> +  cl_target_option_restore (&global_options, &global_options_set, 
> &cur_target);
> +  aarch64_override_options_internal (&global_options);
> +  memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
> +   sizeof (have_regs_of_mode));
> 
> to switch back, but the idea is to replace that with:
> 
>   push_function_decl (node->decl);
> 
>   ...
> 
>   pop_function_decl ();

Why do you need that?  Can't you just use opt_for_fn?

Jakub



[committed] hppa: Don't allow mode size 32 in hard registers

2024-11-08 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

hppa: Don't allow mode size 32 in hard registers

2024-11-08  John David Anglin  

gcc/ChangeLog:

PR target/117238
* config/pa/pa64-regs.h (PA_HARD_REGNO_MODE_OK): Don't allow
mode size 32.

diff --git a/gcc/config/pa/pa64-regs.h b/gcc/config/pa/pa64-regs.h
index 3b9273c2867..90762e119dc 100644
--- a/gcc/config/pa/pa64-regs.h
+++ b/gcc/config/pa/pa64-regs.h
@@ -157,13 +157,10 @@ along with GCC; see the file COPYING3.  If not see
: FP_REGNO_P (REGNO)
\
  ? (VALID_FP_MODE_P (MODE) \
&& (GET_MODE_SIZE (MODE) <= 8   \
-   || (GET_MODE_SIZE (MODE) == 16 && ((REGNO) & 1) == 0)   \
-   || (GET_MODE_SIZE (MODE) == 32 && ((REGNO) & 3) == 0))) \
+   || (GET_MODE_SIZE (MODE) == 16 && ((REGNO) & 1) == 0))) \
: (GET_MODE_SIZE (MODE) <= UNITS_PER_WORD   \
   || (GET_MODE_SIZE (MODE) == 2 * UNITS_PER_WORD   \
- && REGNO) & 1) == 1 && (REGNO) <= 25) || (REGNO) == 28))  \
-  || (GET_MODE_SIZE (MODE) == 4 * UNITS_PER_WORD   \
- && ((REGNO) & 3) == 3 && (REGNO) <= 23)))
+ && REGNO) & 1) == 1 && (REGNO) <= 25) || (REGNO) == 28
 
 /* How to renumber registers for gdb.
 


signature.asc
Description: PGP signature


[committed] hppa: Don't use '%' operator in base14_operand

2024-11-08 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
to trunk and gcc-14.

Dave
---

hppa: Don't use '%' operator in base14_operand

Division is slow on hppa and mode sizes are powers of 2.  So, we
can use '&' operator to check displacement alignment.

2024-11-08  John David Anglin  

gcc/ChangeLog:

* config/pa/predicates.md (base14_operand): Use '&' operator
instead of '%' to check displacement alignment.

diff --git a/gcc/config/pa/predicates.md b/gcc/config/pa/predicates.md
index 0defd2282fb..a27b2b1c78d 100644
--- a/gcc/config/pa/predicates.md
+++ b/gcc/config/pa/predicates.md
@@ -285,7 +285,7 @@
   return false;
 
 default:
-  return (INTVAL (op) % GET_MODE_SIZE (mode)) == 0;
+  return (INTVAL (op) & (GET_MODE_SIZE (mode) - 1)) == 0;
 }
 
   return false;


signature.asc
Description: PGP signature


Re: [PATCH v3] C: Support Function multiversionsing in the C front end

2024-11-08 Thread Joseph Myers
I should also add: the ACLE specification for the details of how function 
multiversioning is supposed to work in terms of interactions of 
declarations for different versions in the same or different scopes and 
what happens regarding forming composite types seems rather vague.  So 
maybe it would be a good idea to clarify the specification for what the 
intended semantics actually are in such cases.

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH] testsuite: arm: Check that a far jump is used in thumb1-far-jump-2.c

2024-11-08 Thread Torbjörn SVENSSON
Ok for trunk?

--

With the changes in r15-1579-g792f97b44ff, the code used as "padding" in
the test case is optimized way. Prevent this optimization by forcing a
read of the volatile memory.
Also, validate that there is a far jump in the generated assembler.

Without this patch, the generated assembler is reduced to:
f3:
cmp r0, #0
beq .L1
ldr r4, .L6
.L1:
bx  lr
.L7:
.align  2
.L6:
.word   g_0_1

With the patch, the generated assembler is:
f3:
push{lr}
cmp r0, #0
bne .LCB7
bl  .L1 @far jump
.LCB7:
ldr r3, .L6
ldr r3, [r3]
...
ldr r3, .L9+976
ldr r4, [r3]
b   .L10
.L11:
.align  2
.L9:
.word   g_0_3_7_5
...
.word   g_0_1
.L10:
.L1:
pop {pc}

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb1-far-jump-2.c: Force a read of volatile
memory in macro to avoid optimization.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c 
b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c
index 78fcafaaf7d..1cf7a0a86e8 100644
--- a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c
+++ b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c
@@ -10,7 +10,7 @@ void f3(int i)
 {
 #define GO(n) \
   extern volatile int g_##n; \
-  r4=(int)&g_##n;
+  r4=(int)g_##n;
 
 #define GO8(n) \
   GO(n##_0) \
@@ -54,4 +54,5 @@ void f3(int i)
   }
 }
 
-/* { dg-final { scan-assembler "push.*lr" } } */
+/* { dg-final { scan-assembler "\tpush.*lr" } } */
+/* { dg-final { scan-assembler "\tbl\t\\.L\[0-9\]+\t@far jump" } } */
-- 
2.25.1



Re: [PATCH] testsuite: arm: Check that a far jump is used in thumb1-far-jump-2.c

2024-11-08 Thread Christophe Lyon
On Fri, 8 Nov 2024 at 19:20, Torbjörn SVENSSON
 wrote:
>
> Ok for trunk?
>
> --
>
> With the changes in r15-1579-g792f97b44ff, the code used as "padding" in
> the test case is optimized way. Prevent this optimization by forcing a
> read of the volatile memory.
> Also, validate that there is a far jump in the generated assembler.
>
> Without this patch, the generated assembler is reduced to:
> f3:
> cmp r0, #0
> beq .L1
> ldr r4, .L6
> .L1:
> bx  lr
> .L7:
> .align  2
> .L6:
> .word   g_0_1
>
> With the patch, the generated assembler is:
> f3:
> push{lr}
> cmp r0, #0
> bne .LCB7
> bl  .L1 @far jump
> .LCB7:
> ldr r3, .L6
> ldr r3, [r3]
> ...
> ldr r3, .L9+976
> ldr r4, [r3]
> b   .L10
> .L11:
> .align  2
> .L9:
> .word   g_0_3_7_5
> ...
> .word   g_0_1
> .L10:
> .L1:
> pop {pc}
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/thumb1-far-jump-2.c: Force a read of volatile
> memory in macro to avoid optimization.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c 
> b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c
> index 78fcafaaf7d..1cf7a0a86e8 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-2.c
> @@ -10,7 +10,7 @@ void f3(int i)
>  {
>  #define GO(n) \
>extern volatile int g_##n; \
> -  r4=(int)&g_##n;
> +  r4=(int)g_##n;
>

It really seems to me that this was a typo in the original submission:
this volatile was probably here to prevent optimization and make sure
we have a large function. But taking the address of a volatile
variable does not have the intended effect, you need to actually
access the variable.

LGTM, but wait for Richard's approval.

Thanks,

Christophe

>  #define GO8(n) \
>GO(n##_0) \
> @@ -54,4 +54,5 @@ void f3(int i)
>}
>  }
>
> -/* { dg-final { scan-assembler "push.*lr" } } */
> +/* { dg-final { scan-assembler "\tpush.*lr" } } */
> +/* { dg-final { scan-assembler "\tbl\t\\.L\[0-9\]+\t@far jump" } } */
> --
> 2.25.1
>


Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Fri, Nov 08, 2024 at 05:44:48PM +, Richard Sandiford wrote:
>> It's for https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667499.html 
>> ,
>> which needs to switch to the simd clone's chosen target (SVE) in order
>> to construct the correct types.  Currently the patch uses:
>> 
>> +  cl_target_option_save (&cur_target, &global_options, 
>> &global_options_set);
>> +  tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
>> +  cl_target_option_restore (&global_options, &global_options_set,
>> +TREE_TARGET_OPTION (new_target));
>> +  aarch64_override_options_internal (&global_options);
>> +  memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
>> +  sizeof (have_regs_of_mode));
>> +  for (int i = 0; i < NUM_MACHINE_MODES; ++i)
>> +if (aarch64_sve_mode_p ((machine_mode) i))
>> +  have_regs_of_mode[i] = true;
>> 
>> to switch in and:
>> 
>> +  /* Restore current options.  */
>> +  cl_target_option_restore (&global_options, &global_options_set, 
>> &cur_target);
>> +  aarch64_override_options_internal (&global_options);
>> +  memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
>> +  sizeof (have_regs_of_mode));
>> 
>> to switch back, but the idea is to replace that with:
>> 
>>   push_function_decl (node->decl);
>> 
>>   ...
>> 
>>   pop_function_decl ();
>
> Why do you need that?  Can't you just use opt_for_fn?

The function doesn't need to query the optimisations or target directly,
since it directly controls the things that it cares about.

The problem instead is that we need the SVE vector modes to be enabled
when doing the call to simd_clone_adjust_sve_vector_type.  (There might
be other reasons as well.  That's just the one I'm aware of.)

Richard


[PATCH V2 3/11] Do not allow -mvsx to boost processor to power7.

2024-11-08 Thread Michael Meissner
This patch restructures the code so that -mvsx for example will not silently
convert the processor to power7.  The user must now use -mcpu=power7 or higher.
This means if the user does -mvsx and the default processor does not have VSX
support, it will be an error.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (report_architecture_mismatch): New function.
Report an error if the user used an option such as -mvsx when the
default processor would not allow the option.
(rs6000_option_override_internal): Move some ISA checking code into
report_architecture_mismatch.
---
 gcc/config/rs6000/rs6000.cc | 129 ++--
 1 file changed, 79 insertions(+), 50 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 8388542b721..a944ffde28a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1173,6 +1173,7 @@ const int INSN_NOT_AVAILABLE = -1;
 static void rs6000_print_isa_options (FILE *, int, const char *,
  HOST_WIDE_INT, HOST_WIDE_INT);
 static HOST_WIDE_INT rs6000_disable_incompatible_switches (void);
+static void report_architecture_mismatch (void);
 
 static enum rs6000_reg_type register_to_reg_type (rtx, bool *);
 static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
@@ -3695,7 +3696,6 @@ rs6000_option_override_internal (bool global_init_p)
   bool ret = true;
 
   HOST_WIDE_INT set_masks;
-  HOST_WIDE_INT ignore_masks;
   int cpu_index = -1;
   int tune_index;
   struct cl_target_option *main_target_opt
@@ -3964,59 +3964,13 @@ rs6000_option_override_internal (bool global_init_p)
 dwarf_offset_size = POINTER_SIZE_UNITS;
 #endif
 
-  /* Handle explicit -mno-{altivec,vsx} and turn off all of
- the options that depend on those flags.  */
-  ignore_masks = rs6000_disable_incompatible_switches ();
-
-  /* For the newer switches (vsx, dfp, etc.) set some of the older options,
- unless the user explicitly used the -mno- to disable the code.  */
-  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_MISC)
-rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_P9_MINMAX)
-{
-  if (cpu_index >= 0)
-   {
- if (cpu_index == PROCESSOR_POWER9)
-   {
- /* legacy behavior: allow -mcpu=power9 with certain
-capabilities explicitly disabled.  */
- rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
-   }
- else
-   error ("power9 target option is incompatible with %<%s=%> "
-  "for  less than power9", "-mcpu");
-   }
-  else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
-  != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags
-  & rs6000_isa_flags_explicit))
-   /* Enforce that none of the ISA_3_0_MASKS_SERVER flags
-  were explicitly cleared.  */
-   error ("%qs incompatible with explicitly disabled options",
-  "-mpower9-minmax");
-  else
-   rs6000_isa_flags |= ISA_3_0_MASKS_SERVER;
-}
-  else if (TARGET_P8_VECTOR || TARGET_POWER8 || TARGET_CRYPTO)
-rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_VSX)
-rs6000_isa_flags |= (ISA_2_6_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_POPCNTD)
-rs6000_isa_flags |= (ISA_2_6_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_DFP)
-rs6000_isa_flags |= (ISA_2_5_MASKS_SERVER & ~ignore_masks);
-  else if (TARGET_CMPB)
-rs6000_isa_flags |= (ISA_2_5_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_FPRND)
-rs6000_isa_flags |= (ISA_2_4_MASKS & ~ignore_masks);
-  else if (TARGET_POPCNTB)
-rs6000_isa_flags |= (ISA_2_2_MASKS & ~ignore_masks);
-  else if (TARGET_ALTIVEC)
-rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~ignore_masks);
+  /* Report trying to use things like -mmodulo to imply -mcpu=power9.  */
+  report_architecture_mismatch ();
 
   /* Disable VSX and Altivec silently if the user switched cpus to power7 in a
  target attribute or pragma which automatically enables both options,
  unless the altivec ABI was set.  This is set by default for 64-bit, but
- not for 32-bit.  Don't move this before the above code using ignore_masks,
+ not for 32-bit.  Don't move this before report_architecture_mismatch
  since it can reset the cleared VSX/ALTIVEC flag again.  */

[PATCH V2 4/11] Change TARGET_POPCNTB to TARGET_POWER5

2024-11-08 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of
TARGET_POPCNTB to TARGET_POWER5.  The POPCNTB instruction was added in ISA 2.02
(power5).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER5 instead of TARGET_POPCNTB.
* config/rs6000/rs6000.h (TARGET_EXTRA_BUILTINS): Use TARGET_POWER5
instead of TARGET_POPCNTB.  Eliminate TARGET_CMPB and TARGET_POPCNTD
tests since TARGET_POWER5 will always be true for those tests.
(TARGET_FRE): Use TARGET_POWER5 instead of TARGET_POPCNTB.
(TARGET_FRSQRTES): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(popcount): Use TARGET_POWER5 instead of TARGET_POPCNTB.  Drop
test for TARGET_POPCNTD (i.e power7), since TARGET_POPCNTB will always
be set if TARGET_POPCNTD is set.
(popcntb2): Use TARGET_POWER5 instead of TARGET_POPCNTB.
(parity2): Likewise.
(parity2_cmpb): Remove TARGET_POPCNTB test, since it will always
be true when TARGET_CMPB (i.e. power6) is set.
---
 gcc/config/rs6000/rs6000-builtin.cc |  2 +-
 gcc/config/rs6000/rs6000.h  |  8 +++-
 gcc/config/rs6000/rs6000.md | 10 +-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 9bdbae1ecf9..98a0545030c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -155,7 +155,7 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_ALWAYS:
   return true;
 case ENB_P5:
-  return TARGET_POPCNTB;
+  return TARGET_POWER5;
 case ENB_P6:
   return TARGET_CMPB;
 case ENB_P6_64:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 7ad8baca177..4500724d895 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -547,9 +547,7 @@ extern int rs6000_vector_align[];
 
 #define TARGET_EXTRA_BUILTINS  (TARGET_POWERPC64\
 || TARGET_PPC_GPOPT /* 970/power4 */\
-|| TARGET_POPCNTB   /* ISA 2.02 */  \
-|| TARGET_CMPB  /* ISA 2.05 */  \
-|| TARGET_POPCNTD   /* ISA 2.06 */  \
+|| TARGET_POWER5/* ISA 2.02 & above */ \
 || TARGET_ALTIVEC   \
 || TARGET_VSX   \
 || TARGET_HARD_FLOAT)
@@ -563,9 +561,9 @@ extern int rs6000_vector_align[];
 #define TARGET_FRES(TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT)
 
 #define TARGET_FRE (TARGET_HARD_FLOAT \
-&& (TARGET_POPCNTB || VECTOR_UNIT_VSX_P (DFmode)))
+&& (TARGET_POWER5 || VECTOR_UNIT_VSX_P (DFmode)))
 
-#define TARGET_FRSQRTES(TARGET_HARD_FLOAT && TARGET_POPCNTB \
+#define TARGET_FRSQRTES(TARGET_HARD_FLOAT && TARGET_POWER5 \
 && TARGET_PPC_GFXOPT)
 
 #define TARGET_FRSQRTE (TARGET_HARD_FLOAT \
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8eda2f7bb0d..10d13bf812d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -379,7 +379,7 @@ (define_attr "enabled" ""
  (const_int 1)
 
  (and (eq_attr "isa" "p5")
- (match_test "TARGET_POPCNTB"))
+ (match_test "TARGET_POWER5"))
  (const_int 1)
 
  (and (eq_attr "isa" "p6")
@@ -2510,7 +2510,7 @@ (define_expand "ffs2"
 (define_expand "popcount2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
(popcount:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
-  "TARGET_POPCNTB || TARGET_POPCNTD"
+  "TARGET_POWER5"
 {
   rs6000_emit_popcount (operands[0], operands[1]);
   DONE;
@@ -2520,7 +2520,7 @@ (define_insn "popcntb2"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")]
UNSPEC_POPCNTB))]
-  "TARGET_POPCNTB"
+  "TARGET_POWER5"
   "popcntb %0,%1"
   [(set_attr "type" "popcnt")])
 
@@ -2535,7 +2535,7 @@ (define_insn "popcntd2"
 (define_expand "parity2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
(parity:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
-  "TARGE

[PATCH V2 5/11] Change TARGET_FPRND to TARGET_POWER5X

2024-11-08 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of
TARGET_FPRND to TARGET_POWER5X.  The FPRND instruction was added in power5+.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000.cc (report_architecture_mismatch): Use
TARGET_POWER5X instead of TARGET_FPRND.
* config/rs6000/rs6000.md (fmod3): Use TARGET_POWER5X instead of
TARGET_FPRND.
(remainder3): Likewise.
(fctiwuz_): Likewise.
(btrunc2): Likewise.
(ceil2): Likewise.
(floor2): Likewise.
(round): Likewise.
---
 gcc/config/rs6000/rs6000.cc |  2 +-
 gcc/config/rs6000/rs6000.md | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a944ffde28a..dd51d75c495 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25428,7 +25428,7 @@ report_architecture_mismatch (void)
 rs6000_isa_flags |= (ISA_2_5_MASKS_SERVER & ~ignore_masks);
   else if (TARGET_CMPB)
 rs6000_isa_flags |= (ISA_2_5_MASKS_EMBEDDED & ~ignore_masks);
-  else if (TARGET_FPRND)
+  else if (TARGET_POWER5X)
 rs6000_isa_flags |= (ISA_2_4_MASKS & ~ignore_masks);
   else if (TARGET_POPCNTB)
 rs6000_isa_flags |= (ISA_2_2_MASKS & ~ignore_masks);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 10d13bf812d..7f9fe609a03 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5171,7 +5171,7 @@ (define_expand "fmod3"
(use (match_operand:SFDF 1 "gpc_reg_operand"))
(use (match_operand:SFDF 2 "gpc_reg_operand"))]
   "TARGET_HARD_FLOAT
-   && TARGET_FPRND
+   && TARGET_POWER5X
&& flag_unsafe_math_optimizations"
 {
   rtx div = gen_reg_rtx (mode);
@@ -5189,7 +5189,7 @@ (define_expand "remainder3"
(use (match_operand:SFDF 1 "gpc_reg_operand"))
(use (match_operand:SFDF 2 "gpc_reg_operand"))]
   "TARGET_HARD_FLOAT
-   && TARGET_FPRND
+   && TARGET_POWER5X
&& flag_unsafe_math_optimizations"
 {
   rtx div = gen_reg_rtx (mode);
@@ -6687,7 +6687,7 @@ (define_insn "fctiwuz_"
 (define_insn "*friz"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d,wa")
(float:DF (fix:DI (match_operand:DF 1 "gpc_reg_operand" "d,wa"]
-  "TARGET_HARD_FLOAT && TARGET_FPRND
+  "TARGET_HARD_FLOAT && TARGET_POWER5X
&& flag_unsafe_math_optimizations && !flag_trapping_math && TARGET_FRIZ"
   "@
friz %0,%1
@@ -6815,7 +6815,7 @@ (define_insn "btrunc2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIZ))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
friz %0,%1
xsrdpiz %x0,%x1"
@@ -6825,7 +6825,7 @@ (define_insn "ceil2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIP))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
frip %0,%1
xsrdpip %x0,%x1"
@@ -6835,7 +6835,7 @@ (define_insn "floor2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
 UNSPEC_FRIM))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "@
frim %0,%1
xsrdpim %x0,%x1"
@@ -6846,7 +6846,7 @@ (define_insn "round2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "")]
 UNSPEC_FRIN))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT && TARGET_POWER5X"
   "frin %0,%1"
   [(set_attr "type" "fp")])
 
-- 
2.47.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH V2 8/11] Change TARGET_MODULO to TARGET_POWER9

2024-11-08 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of
TARGET_MODULO to TARGET_POWER9.  The modulo instructions were added in power9 
(ISA
3.0).  Note, I did not change the uses of TARGET_MODULO where it was explicitly
generating different code if the machine had a modulo instruction.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use
TARGET_POWER9 instead of TARGET_MODULO.
* config/rs6000/rs6000.h (TARGET_CTZ): Likewise.
(TARGET_EXTSWSLI): Likewise.
(TARGET_MADDLD): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
---
 gcc/config/rs6000/rs6000-builtin.cc | 4 ++--
 gcc/config/rs6000/rs6000.h  | 6 +++---
 gcc/config/rs6000/rs6000.md | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index dae43b672ea..b6093b3cb64 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -169,9 +169,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P8V:
   return TARGET_P8_VECTOR;
 case ENB_P9:
-  return TARGET_MODULO;
+  return TARGET_POWER9;
 case ENB_P9_64:
-  return TARGET_MODULO && TARGET_POWERPC64;
+  return TARGET_POWER9 && TARGET_POWERPC64;
 case ENB_P9V:
   return TARGET_P9_VECTOR;
 case ENB_P10:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3a03c32f222..89ca1bad80f 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -461,9 +461,9 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIWUZ TARGET_POWER7
 /* Only powerpc64 and powerpc476 support fctid.  */
 #define TARGET_FCTID   (TARGET_POWERPC64 || rs6000_cpu == PROCESSOR_PPC476)
-#define TARGET_CTZ TARGET_MODULO
-#define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
-#define TARGET_MADDLD  TARGET_MODULO
+#define TARGET_CTZ TARGET_POWER9
+#define TARGET_EXTSWSLI(TARGET_POWER9 && TARGET_POWERPC64)
+#define TARGET_MADDLD  TARGET_POWER9
 
 /* TARGET_DIRECT_MOVE is redundant to TARGET_P8_VECTOR, so alias it to that.  
*/
 #define TARGET_DIRECT_MOVE TARGET_P8_VECTOR
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bff898a4eff..fc0d454e9a4 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -403,7 +403,7 @@ (define_attr "enabled" ""
  (const_int 1)
 
  (and (eq_attr "isa" "p9")
- (match_test "TARGET_MODULO"))
+ (match_test "TARGET_POWER9"))
  (const_int 1)
 
  (and (eq_attr "isa" "p9v")
-- 
2.47.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH V2 9/11] Update tests to work with architecture flags changes.

2024-11-08 Thread Michael Meissner
Two tests used -mvsx to raise the processor level to at least power7.  These
tests were rewritten to add cpu=power7 support.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/ppc-target-4.c: Rewrite the test to add cpu=power7
when we need to add VSX support.  Add test for adding cpu=power7 no-vsx
to generate only Altivec instructions.
* gcc.target/powerpc/pr115688.c: Add cpu=power7 when requesting VSX
instructions.
---
 .../gcc.target/powerpc/ppc-target-4.c | 38 ++-
 gcc/testsuite/gcc.target/powerpc/pr115688.c   |  3 +-
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
index feef76db461..5e2ecf34f24 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
@@ -2,7 +2,7 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_fprs } */
 /* { dg-options "-O2 -ffast-math -mdejagnu-cpu=power5 -mno-altivec 
-mabi=altivec -fno-unroll-loops" } */
-/* { dg-final { scan-assembler-times "vaddfp" 1 } } */
+/* { dg-final { scan-assembler-times "vaddfp" 2 } } */
 /* { dg-final { scan-assembler-times "xvaddsp" 1 } } */
 /* { dg-final { scan-assembler-times "fadds" 1 } } */
 
@@ -18,10 +18,6 @@
 #error "__VSX__ should not be defined."
 #endif
 
-#pragma GCC target("altivec,vsx")
-#include 
-#pragma GCC reset_options
-
 #pragma GCC push_options
 #pragma GCC target("altivec,no-vsx")
 
@@ -33,6 +29,7 @@
 #error "__VSX__ should not be defined."
 #endif
 
+/* Altivec build, generate vaddfp.  */
 void
 av_add (vector float *a, vector float *b, vector float *c)
 {
@@ -40,10 +37,11 @@ av_add (vector float *a, vector float *b, vector float *c)
   unsigned long n = SIZE / 4;
 
   for (i = 0; i < n; i++)
-a[i] = vec_add (b[i], c[i]);
+a[i] = b[i] + c[i];
 }
 
-#pragma GCC target("vsx")
+/* cpu=power7 must be used to enable VSX.  */
+#pragma GCC target("cpu=power7,vsx")
 
 #ifndef __ALTIVEC__
 #error "__ALTIVEC__ should be defined."
@@ -53,6 +51,7 @@ av_add (vector float *a, vector float *b, vector float *c)
 #error "__VSX__ should be defined."
 #endif
 
+/* VSX build on power7, generate xsaddsp.  */
 void
 vsx_add (vector float *a, vector float *b, vector float *c)
 {
@@ -60,11 +59,31 @@ vsx_add (vector float *a, vector float *b, vector float *c)
   unsigned long n = SIZE / 4;
 
   for (i = 0; i < n; i++)
-a[i] = vec_add (b[i], c[i]);
+a[i] = b[i] + c[i];
+}
+
+#pragma GCC target("cpu=power7,no-vsx")
+
+#ifndef __ALTIVEC__
+#error "__ALTIVEC__ should be defined."
+#endif
+
+#ifdef __VSX__
+#error "__VSX__ should not be defined."
+#endif
+
+/* Altivec build on power7 with no VSX, generate vaddfp.  */
+void
+av2_add (vector float *a, vector float *b, vector float *c)
+{
+  unsigned long i;
+  unsigned long n = SIZE / 4;
+
+  for (i = 0; i < n; i++)
+a[i] = b[i] + c[i];
 }
 
 #pragma GCC pop_options
-#pragma GCC target("no-vsx,no-altivec")
 
 #ifdef __ALTIVEC__
 #error "__ALTIVEC__ should not be defined."
@@ -74,6 +93,7 @@ vsx_add (vector float *a, vector float *b, vector float *c)
 #error "__VSX__ should not be defined."
 #endif
 
+/* Default power5 build, generate scalar fadds.  */
 void
 norm_add (float *a, float *b, float *c)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115688.c 
b/gcc/testsuite/gcc.target/powerpc/pr115688.c
index 5222e66ef17..00c7c301436 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr115688.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr115688.c
@@ -7,7 +7,8 @@
 
 /* Verify there is no ICE under 32 bit env.  */
 
-__attribute__((target("vsx")))
+/* cpu=power7 must be used to enable VSX.  */
+__attribute__((target("cpu=power7,vsx")))
 int test (void)
 {
   return 0;
-- 
2.47.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH V2 10/11] Add support for -mcpu=future

2024-11-08 Thread Michael Meissner
This patch adds the support that can be used in developing GCC support for
future PowerPC processors.

2024-11-06  Michael Meissner  

* config.gcc (powerpc*-*-*): Add support for --with-cpu=future.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/rs6000-arch.def: Add future cpu.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If
-mcpu=future, define _ARCH_FUTURE.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
(future cpu): Define.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (power10_cost): Update comment.
(get_arch_flags): Add support for future processor.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
(TARGET_POWER11): New macro.
* config/rs6000/rs6000.md (cpu attribute): Likewise.
---
 gcc/config.gcc  |  4 ++--
 gcc/config/rs6000/aix71.h   |  1 +
 gcc/config/rs6000/aix72.h   |  1 +
 gcc/config/rs6000/aix73.h   |  1 +
 gcc/config/rs6000/driver-rs6000.cc  |  2 ++
 gcc/config/rs6000/rs6000-arch.def   |  1 +
 gcc/config/rs6000/rs6000-c.cc   |  2 ++
 gcc/config/rs6000/rs6000-cpus.def   |  3 +++
 gcc/config/rs6000/rs6000-opts.h |  1 +
 gcc/config/rs6000/rs6000-tables.opt | 11 ++
 gcc/config/rs6000/rs6000.cc | 34 ++---
 gcc/config/rs6000/rs6000.h  |  2 ++
 gcc/config/rs6000/rs6000.md |  2 +-
 13 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fd848228722..d552d01b439 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -539,7 +539,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
-   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
+   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500|xfuture)
cpu_is_64bit=yes
;;
esac
@@ -5647,7 +5647,7 @@ case "${target}" in
tm_defines="${tm_defines} CONFIG_PPC405CR"
eval "with_$which=405"
;;
-   "" | common | native \
+   "" | common | native | future \
| power[3456789] | power1[01] | power5+ | power6x \
| powerpc | powerpc64 | powerpc64le \
| rs64 \
diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 41037b3852d..570ddcc451d 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h
index fe59f8319b4..242ca94bd06 100644
--- a/gcc/config/rs6000/aix72.h
+++ b/gcc/config/rs6000/aix72.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/aix73.h b/gcc/config/rs6000/aix73.h
index 1318b0b3662..2bd6b4bb3c4 100644
--- a/gcc/config/rs6000/aix73.h
+++ b/gcc/config/rs6000/aix73.h
@@ -79,6 +79,7 @@ do {  
\
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
 "%{mcpu=native: %(asm_cpu_native); \
+  mcpu=future: -mfuture; \
   mcpu=power11: -mpwr11; \
   mcpu=power10: -mpwr10; \
   mcpu=power9: -mpwr9; \
diff --git a/gcc/config/rs6000/driver-rs6000.cc 
b/gcc/config/rs6000/driver-rs6000.cc
index f4900724b98..c47e8b70c93 100644
--- a/gcc/config/rs6000/driver-rs6000.cc
+++ b/gcc/config/rs6000/driver-rs6000.cc
@@ -452,6 +452,7 @@ static const struct asm_name asm_names[] = {
   { "power9",  "-mpwr9" },

[PATCH V2 11/11] Add -mcpu=future tuning support.

2024-11-08 Thread Michael Meissner
This patch makes -mtune=future use the same tuning decision as -mtune=power11.

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/power10.md (all reservations): Add future as an
alterntive to power10 and power11.
---
 gcc/config/rs6000/power10.md | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 2310c460345..e42b057dc45 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -1,4 +1,4 @@
-;; Scheduling description for the IBM Power10 and Power11 processors.
+;; Scheduling description for the IBM Power10, Power11, and Future processors.
 ;; Copyright (C) 2020-2024 Free Software Foundation, Inc.
 ;;
 ;; Contributed by Pat Haugen (pthau...@us.ibm.com).
@@ -97,12 +97,12 @@ (define_insn_reservation "power10-load" 4
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-fused-load" 4
   (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-load" 4
@@ -110,13 +110,13 @@ (define_insn_reservation "power10-prefixed-load" 4
(eq_attr "update" "no")
(eq_attr "size" "!128")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-load-update" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-fpload-double" 4
@@ -124,7 +124,7 @@ (define_insn_reservation "power10-fpload-double" 4
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "no")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 (define_insn_reservation "power10-prefixed-fpload-double" 4
@@ -132,14 +132,14 @@ (define_insn_reservation "power10-prefixed-fpload-double" 
4
(eq_attr "update" "no")
(eq_attr "size" "64")
(eq_attr "prefixed" "yes")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-double" 4
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "64")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; SFmode loads are cracked and have additional 3 cycles over DFmode
@@ -148,27 +148,27 @@ (define_insn_reservation "power10-fpload-single" 7
   (and (eq_attr "type" "fpload")
(eq_attr "update" "no")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10")
 
 (define_insn_reservation "power10-fpload-update-single" 7
   (and (eq_attr "type" "fpload")
(eq_attr "update" "yes")
(eq_attr "size" "32")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 (define_insn_reservation "power10-vecload" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,LU_power10")
 
 ; lxvp
 (define_insn_reservation "power10-vecload-pair" 4
   (and (eq_attr "type" "vecload")
(eq_attr "size" "256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,LU_power10+SXU_power10")
 
 ; Store Unit
@@ -178,12 +178,12 @@ (define_insn_reservation "power10-store" 0
(eq_attr "prefixed" "no")
(eq_attr "size" "!128")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_any_power10,STU_power10")
 
 (define_insn_reservation "power10-fused-store" 0
   (and (eq_attr "type" "fused_store_store")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
   "DU_even_power10,STU_power10")
 
 (define_insn_reservation "power10-prefixed-store" 0
@@ -191,52 +191,52 @@ (define_insn_reservation "power10-prefixed-store" 0
(eq_attr "prefixed" "yes")
(eq_attr "size" "!128")
(eq_attr "size" "!256")
-   (eq_attr "cpu" "power10,power11"))
+   (eq_attr "cpu" "power10,power11,future"))
 

[PATCH 0/12] libstdc++: Refactor _Hashtable class

2024-11-08 Thread Jonathan Wakely
This patch series attempts to remove some unnecessary complexity in the
internals of std::unordered_xxx containers. There is a lot of overloading, tag
dispatching, and inheritance that can be removed by using modern C++ features
(with appropriate pragmas to disable warnings for older -std modes).

Most of these commits were already pushed to
https://forge.sourceware.org/gcc/gcc-TEST/pulls/15 which was linked to from
https://inbox.sourceware.org/libstdc++/zyvhtyt8kquki...@zen.kayari.org/ but
this is the first time I've posted most of the patches to gcc-patches.

Since first linking to the forge pull request a week ago, I've pushed a couple
of the smaller, independent changes that didn't depend on the rest of the
series, made changes to the 03/12 "Refactor _Hashtable insertion" commit, and
added the 12/12 "Add _Hashtable::_M_locate(const key_type&)" commit.

N.B. Patch 11/12 was previously posted as
https://inbox.sourceware.org/gcc-patches/cacb0b4nmy3fxujhjbkhiu4innxzn_vzmrmm1gzzqdz4gte5...@mail.gmail.com/T/#t
but I've included it again in this series, because otherwise the last patch in
the series won't apply cleanly and CI checks will fail.

Tested powerpc64le-linux and x86_64-linux.

Diffstat for the headers (where reducing the amount of code is good):

 libstdc++-v3/include/bits/hashtable.h| 1119 
++
 libstdc++-v3/include/bits/hashtable_policy.h |  504 
+-
 libstdc++-v3/include/bits/unordered_map.h|   19 +-
 libstdc++-v3/include/bits/unordered_set.h|   19 +-
 4 files changed, 722 insertions(+), 939 deletions(-)

Diffstat for the tests (where adding code is good):

 libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
  |  21 +++
 libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/merge.cc  
  | 130 +++
 libstdc++-v3/testsuite/23_containers/unordered_multimap/modifiers/merge.cc 
  | 119 +
 
libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/move_assign.cc
 |   5 +--
 libstdc++-v3/testsuite/23_containers/unordered_multiset/modifiers/merge.cc 
  | 121 ++
 libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
  |  14 
 libstdc++-v3/testsuite/23_containers/unordered_set/allocator/move_assign.cc
  |  10 +++---
 libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/merge.cc  
  | 128 +
 8 files changed, 528 insertions(+), 20 deletions(-)






[PATCH 11/12] libstdc++: Simplify _Hashtable merge functions

2024-11-08 Thread Jonathan Wakely
I realised that _M_merge_unique and _M_merge_multi call extract(iter)
which then has to call _M_get_previous_node to iterate through the
bucket to find the node before the one iter points to. Since the merge
function is already iterating over the entire container, we had the
previous node a moment ago. Walking the whole bucket to find it again is
wasteful. We could just rewrite the loop in terms of node pointers
instead of iterators, and then call _M_extract_node directly. However,
this is only possible when the source container is the same type as the
destination, because otherwise we can't access the source's private
members (_M_before_begin, _M_begin, _M_extract_node etc.)

Add overloads of _M_merge_unique and _M_merge_multi that work with
source containers of the same type, to enable this optimization.

For both overloads of _M_merge_unique we can also remove the conditional
modifications to __n_elt and just consistently decrement it for every
element processed. Use a multiplier of one or zero that dictates whether
__n_elt is passed to _M_insert_unique_node or not. We can also remove
the repeated calls to size() and just keep track of the size in a local
variable.

Although _M_merge_unique and _M_merge_multi should be safe for
"self-merge", i.e. when doing c.merge(c), it's wasteful to search/insert
every element when we don't need to do anything. Add 'this == &source'
checks to the overloads taking an lvalue of the container's own type.
Because those checks aren't needed for the rvalue overloads, change
those to call the underlying _M_merge_xxx function directly instead of
going through the lvalue overload that checks the address.

I've also added more extensive tests for better coverage of the new
overloads added in this commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_M_merge_unique): Add overload for
merging from same type.
(_M_merge_unique): Simplify size tracking. Add
comment.
(_M_merge_multi): Add overload for merging from same type.
(_M_merge_multi): Add comment.
* include/bits/unordered_map.h (unordered_map::merge): Check for
self-merge in the lvalue overload. Call _M_merge_unique directly
for the rvalue overload.
(unordered_multimap::merge): Likewise.
* include/bits/unordered_set.h (unordered_set::merge): Likewise.
(unordered_multiset::merge): Likewise.
* testsuite/23_containers/unordered_map/modifiers/merge.cc:
Add more tests.
* testsuite/23_containers/unordered_multimap/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_set/modifiers/merge.cc:
Likewise.
---
 libstdc++-v3/include/bits/hashtable.h | 118 
 libstdc++-v3/include/bits/unordered_map.h |  19 ++-
 libstdc++-v3/include/bits/unordered_set.h |  19 ++-
 .../unordered_map/modifiers/merge.cc  | 130 ++
 .../unordered_multimap/modifiers/merge.cc | 119 
 .../unordered_multiset/modifiers/merge.cc | 121 
 .../unordered_set/modifiers/merge.cc  | 128 +
 7 files changed, 626 insertions(+), 28 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 7b0a684a2d2..aca431ae216 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1212,6 +1212,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __nh;
   }
 
+  /// Merge from another container of the same type.
+  void
+  _M_merge_unique(_Hashtable& __src)
+  {
+   __glibcxx_assert(get_allocator() == __src.get_allocator());
+
+   auto __size = size();
+   auto __n_elt = __src.size();
+   size_type __first = 1;
+   // For a container of identical type we can use its private members.
+   auto __p = static_cast<__node_ptr>(&__src._M_before_begin);
+   while (__n_elt--)
+ {
+   const auto __prev = __p;
+   __p = __p->_M_next();
+   const auto& __node = *__p;
+   const key_type& __k = _ExtractKey{}(__node._M_v());
+   if (__size <= __small_size_threshold())
+ {
+   auto __n = _M_begin();
+   for (; __n; __n = __n->_M_next())
+ if (this->_M_key_equals(__k, *__n))
+   break;
+   if (__n)
+ continue;
+ }
+
+   __hash_code __code
+ = _M_src_hash_code(__src.hash_function(), __k, __node);
+   size_type __bkt = _M_bucket_index(__code);
+   if (__size > __small_size_threshold())
+ if (_M_find_node(__bkt, __k, __code) != nullptr)
+   continue;
+
+   __hash_code __src_code = __src.hash_function()(__k);
+   size_type __src_bkt = __src._M_bucket_index(__src_code);
+   

[PATCH] AArch64: Remove duplicated addr_cost tables

2024-11-08 Thread Wilco Dijkstra

Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for
Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores.
No changes in generated code.

OK for commit?

gcc/ChangeLog:

* config/aarch64/tuning_models/cortexx925.h 
(cortexx925_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversen1.h: Use 
generic_armv8_a_addrcost_table.
* config/aarch64/tuning_models/neoversen2.h 
(neoversen2_addrcost_table): Remove. 
* config/aarch64/tuning_models/neoversen3.h 
(neoversen3_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversev2.h 
(neoversev2_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversev3.h 
(neoversev3_addrcost_table): Remove.
* config/aarch64/tuning_models/neoversev3ae.h 
(neoversev3ae_addrcost_table): Remove.

---

diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
b/gcc/config/aarch64/tuning_models/cortexx925.h
index 
89aa353669937f1a5e8cffae7c3d49044562cfd7..eb9b89984b0472858bc08dba924c962ec4ba53bd
 100644
--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -22,24 +22,6 @@
 
 #include "generic.h"
 
-static const struct cpu_addrcost_table cortexx925_addrcost_table =
-{
-{
-  1, /* hi  */
-  0, /* si  */
-  0, /* di  */
-  1, /* ti  */
-},
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
 static const struct cpu_regmove_cost cortexx925_regmove_cost =
 {
   3, /* GP2GP  */
@@ -209,7 +191,7 @@ static const struct cpu_vector_cost cortexx925_vector_cost =
 static const struct tune_params cortexx925_tunings =
 {
   &cortexa76_extra_costs,
-  &cortexx925_addrcost_table,
+  &generic_armv9_a_addrcost_table,
   &cortexx925_regmove_cost,
   &cortexx925_vector_cost,
   &generic_branch_cost,
diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h 
b/gcc/config/aarch64/tuning_models/neoversen1.h
index 
a09b684fcdb0e558c87e3f6c17c6c4f359cca51c..82def6b2736df8162d9b606440d260c951f3ef99
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen1.h
+++ b/gcc/config/aarch64/tuning_models/neoversen1.h
@@ -25,7 +25,7 @@
 static const struct tune_params neoversen1_tunings =
 {
   &cortexa76_extra_costs,
-  &generic_addrcost_table,
+  &generic_armv8_a_addrcost_table,
   &generic_regmove_cost,
   &cortexa57_vector_cost,
   &generic_branch_cost,
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
b/gcc/config/aarch64/tuning_models/neoversen2.h
index 
dd175b75557b28c485b3e27d7a50c50600f367a5..18199ac206c6cbfcef8695b497401b78a8f77f38
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen2.h
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -22,24 +22,6 @@
 
 #include "generic.h"
 
-static const struct cpu_addrcost_table neoversen2_addrcost_table =
-{
-{
-  1, /* hi  */
-  0, /* si  */
-  0, /* di  */
-  1, /* ti  */
-},
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
 static const struct cpu_regmove_cost neoversen2_regmove_cost =
 {
   1, /* GP2GP  */
@@ -209,7 +191,7 @@ static const struct cpu_vector_cost neoversen2_vector_cost =
 static const struct tune_params neoversen2_tunings =
 {
   &cortexa76_extra_costs,
-  &neoversen2_addrcost_table,
+  &generic_armv9_a_addrcost_table,
   &neoversen2_regmove_cost,
   &neoversen2_vector_cost,
   &generic_branch_cost,
diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h 
b/gcc/config/aarch64/tuning_models/neoversen3.h
index 
e510c8f09f781b9fafb59088e90cfd5dea43cc75..4da85cfac0d185a5d59439f6d19d90ace0354e8f
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen3.h
+++ b/gcc/config/aarch64/tuning_models/neoversen3.h
@@ -22,24 +22,6 @@
 
 #include "generic.h"
 
-static const struct cpu_addrcost_table neoversen3_addrcost_table =
-{
-{
-  1, /* hi  */
-  0, /* si  */
-  0, /* di  */
-  1, /* ti  */
-},
-  0, /* pre_modify  */
-  0, /* post_modify  */
-  2, /* post_modify_ld3_st3  */
-  2, /* post_modify_ld4_st4  */
-  0, /* register_offset  */
-  0, /* register_sextend  */
-  0, /* register_zextend  */
-  0 /* imm_offset  */
-};
-
 static const struct cpu_regmove_cost neoversen3_regmove_cost =
 {
   3, /* GP2GP  */
@@ -209,7 +191,7 @@ static const struct cpu_vector_cost neoversen3_vector_cost =
 static const struct tune_params neoversen3_tunings =
 {
   &cortexa76_extra_costs,
-  &neoversen3_addrcost_table,
+  &generic_armv9_a_addrcost_table,
   &neoversen3_regmove_cost,
   &neoversen3_vector_cost,
   &generic_branch_cost,
diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h 
b/gcc/config/aarch64/tuning_models/neoversev2.h
index 
b2aca79b9cef

Re: [PATCH v2] arm: Don't ICE on arm_mve.h pragma without MVE types [PR117408]

2024-11-08 Thread Christophe Lyon
On Thu, 7 Nov 2024 at 18:05, Torbjörn SVENSSON
 wrote:
>
> Changes since v1:
>
> - Updated the error message to mention that arm_mve_types.h needs to be
>   included.
> - Corrected some spelling errors in commit message.
>
> As the warning for pure functions returning void is not related to this
> patch, I'll leave it for you Christophe to look into. :)
>
> Ok for trunk and releases/gcc-14?
>
OK,

Thanks

Christophe

> --
>
> Starting with r14-435-g00d97bf3b5a, doing `#pragma arm "arm_mve.h"
> false` or `#pragma arm "arm_mve.h" true` without first doing
> `#pragma arm "arm_mve_types.h"` causes GCC to ICE.
>
> gcc/ChangeLog:
>
>   PR target/117408
>   * config/arm/arm-mve-builtins.cc(handle_arm_mve_h): Detect if MVE
>   types are missing and if so, return error.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/117408
>   * gcc.target/arm/mve/pr117408-1.c: New test.
>   * gcc.target/arm/mve/pr117408-2.c: Likewise.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/config/arm/arm-mve-builtins.cc| 7 +++
>  gcc/testsuite/gcc.target/arm/mve/pr117408-1.c | 7 +++
>  gcc/testsuite/gcc.target/arm/mve/pr117408-2.c | 7 +++
>  3 files changed, 21 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
>
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index af1908691b6..ed3d6000641 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -535,6 +535,13 @@ handle_arm_mve_h (bool preserve_user_namespace)
>return;
>  }
>
> +  if (!handle_arm_mve_types_p)
> +{
> +  error ("this definition requires MVE types, please include %qs",
> +"arm_mve_types.h");
> +  return;
> +}
> +
>/* Define MVE functions.  */
>function_table = new hash_table (1023);
>function_builder builder;
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
> new file mode 100644
> index 000..25eaf67e297
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr117408-1.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +
> +/* It doesn't really matter if this produces errors missing types,
> +  but it mustn't trigger an ICE.  */
> +#pragma GCC arm "arm_mve.h" false /* { dg-error "this definition requires 
> MVE types, please include 'arm_mve_types.h'" } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c 
> b/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
> new file mode 100644
> index 000..c3a0af25f77
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/pr117408-2.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +
> +/* It doesn't really matter if this produces errors missing types,
> +  but it mustn't trigger an ICE.  */
> +#pragma GCC arm "arm_mve.h" true /* { dg-error "this definition requires MVE 
> types, please include 'arm_mve_types.h'" } */
> --
> 2.25.1
>


Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 08/11/2024 12:25, Richard Sandiford wrote:
>> For the aarch64 simd clones patches, it would be useful to be able to
>> push a function declaration onto the cfun stack, even though it has no
>> function body associated with it.  That is, we want cfun to be null,
>> current_function_decl to be the decl itself, and the target and
>> optimisation flags to reflect the declaration.
>
> What do the simd_clone patches do? Just curious.

It's for https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667499.html ,
which needs to switch to the simd clone's chosen target (SVE) in order
to construct the correct types.  Currently the patch uses:

+  cl_target_option_save (&cur_target, &global_options, 
&global_options_set);
+  tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl);
+  cl_target_option_restore (&global_options, &global_options_set,
+   TREE_TARGET_OPTION (new_target));
+  aarch64_override_options_internal (&global_options);
+  memcpy (m_old_have_regs_of_mode, have_regs_of_mode,
+ sizeof (have_regs_of_mode));
+  for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+   if (aarch64_sve_mode_p ((machine_mode) i))
+ have_regs_of_mode[i] = true;

to switch in and:

+  /* Restore current options.  */
+  cl_target_option_restore (&global_options, &global_options_set, 
&cur_target);
+  aarch64_override_options_internal (&global_options);
+  memcpy (have_regs_of_mode, m_old_have_regs_of_mode,
+ sizeof (have_regs_of_mode));

to switch back, but the idea is to replace that with:

  push_function_decl (node->decl);

  ...

  pop_function_decl ();

Richard


[PATCH] testsuite: arm: Use effective-target for pr68674.c test

2024-11-08 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

--

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68674.c: Use effective-target arm_arch_v7a
and arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/pr68674.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr68674.c 
b/gcc/testsuite/gcc.target/arm/pr68674.c
index 0b3237458fe..3fd562d0518 100644
--- a/gcc/testsuite/gcc.target/arm/pr68674.c
+++ b/gcc/testsuite/gcc.target/arm/pr68674.c
@@ -1,9 +1,10 @@
 /* PR target/68674 */
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_neon_ok } */
-/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-require-effective-target arm_libc_fp_abi_ok } */
 /* { dg-options "-O2" } */
-/* { dg-add-options arm_fp } */
+/* { dg-add-options arm_arch_v7a } */
+/* { dg-add-options arm_libc_fp_abi } */
 
 #pragma GCC target ("fpu=vfp")
 
-- 
2.25.1



Re: [PATCH] testsuite: arm: Update expected asm in no-literal-pool-m0.c

2024-11-08 Thread Christophe Lyon
On Fri, 8 Nov 2024 at 15:30, Richard Earnshaw (lists)
 wrote:
>
> On 14/10/2024 16:28, Christophe Lyon wrote:
> >
> >
> > On 10/14/24 16:40, Torbjorn SVENSSON wrote:
> >> Hi Christophe,
> >>
> >> On 2024-10-14 14:16, Christophe Lyon wrote:
> >>> Hi Torbjörn,
> >>>
> >>>
> >>> On 10/13/24 19:37, Torbjörn SVENSSON wrote:
>  Ok for trunk?
> 
>  --
> 
>  With the changes in r15-1579-g792f97b44ff, the constants have been
>  updated. This patch aligns the constants in the test cases with the
>  updated implementation.
> 
> >>>
> >>> Could you share a bit more details? In particular, IIUC, since the
> >>> constants used now are different, why do you introduce new
> >>> alternatives, rather than replace the old expected constants?
> >>>
> >>> Do we now generate both old & new versions, depending on some flags?
> >>
> >> The constants depends on if some optimization step is included or not.
> >> I don't understand how this works, but the changeset that Richard
> >> merged in the above commit id does cause this need.
> >> I think the constants switch with -O2, but I don't know exactly what
> >> part that enables it.
> >>
> >>  From my point of view, the mov instruction could always us the higher
> >> value and then do a lower amount of shifts, but don't know if that's
> >> easy to achieve or not.
> >>
> > Indeed, these tests are exercised with several optimization levels from
> > -O0 to -O3 and -Os, and late-combine is only enabled at -O2 and above,
> > including -Os. So I guess the previous value is still used at -O0 and -O1.
> >
> > Thanks for the clarification.
> >
> > LGTM.
> >
> > Christophe
> >
> >> Kind regards,
> >> Torbjörn
> >>
> >>>
> >>> FTR, this is also tracked by
> >>> https://linaro.atlassian.net/browse/GNU-1269
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Christophe
> >>>
>  gcc/testsuite/ChangeLog:
> 
>  * gcc.target/arm/pure-code/no-literal-pool-m0.c: Update expected
>  asm.
> 
>  Signed-off-by: Torbjörn SVENSSON 
>  ---
>    .../arm/pure-code/no-literal-pool-m0.c| 29
>  ++-
>    1 file changed, 21 insertions(+), 8 deletions(-)
> 
>  diff --git a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-
>  m0.c b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
>  index bd6f4af183b..effaf8b60b6 100644
>  --- a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
>  +++ b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
>  @@ -95,8 +95,13 @@ test_65536 ()
>    /*
>    ** test_0x123456:
>    **...
>  +** (
>    **movsr[0-3], #18
>    **lslsr[0-3], r[0-3], #8
>  +** |
>  +**movsr[0-3], #144
>  +**lslsr[0-3], r[0-3], #5
>  +** )
>    **addsr[0-3], r[0-3], #52
>    **lslsr[0-3], r[0-3], #8
>    **addsr[0-3], r[0-3], #86
>  @@ -125,18 +130,16 @@ test_0x1123456 ()
>  return 0x1123456;
>    }
>  -/* With -Os, we generate:
>  -   movs r0, #16
>  -   lsls r0, r0, r0
>  -   With the other optimization levels, we generate:
>  -   movs r0, #16
>  -   lsls r0, r0, #16
>  -   hence the two alternatives.  */
>    /*
>    ** test_0x110:
>    **...
>  +** (
>    **movsr[0-3], #16
>  -**lslsr[0-3], r[0-3], (#16|r[0-3])
>  +**lslsr[0-3], r[0-3], #16
>  +** |
>  +**movsr[0-3], #128
>  +**lslsr[0-3], r[0-3], #13
>  +** )
>    **addsr[0-3], r[0-3], #1
>    **lslsr[0-3], r[0-3], #4
>    **...
>  @@ -150,8 +153,13 @@ test_0x110 ()
>    /*
>    ** test_0x111:
>    **...
>  +** (
>    **movsr[0-3], #1
>    **lslsr[0-3], r[0-3], #24
>  +** |
>  +**movsr[0-3], #128
>  +**lslsr[0-3], r[0-3], #17
>  +** )
>    **addsr[0-3], r[0-3], #17
>    **...
>    */
>  @@ -164,8 +172,13 @@ test_0x111 ()
>    /*
>    ** test_m8192:
>    **...
>  +** (
>    **movsr[0-3], #1
>    **lslsr[0-3], r[0-3], #13
>  +** |
>  +**movsr[0-3], #128
>  +**lslsr[0-3], r[0-3], #6
>  +** )
>    **rsbsr[0-3], r[0-3], #0
>    **...
>    */
> >>
>
> Do we really need to care about the precise assembler output at all the
> different optimization levels?  Isn't it enough to simply check that we
> don't get a literal pool entry?  Eg to scan for
>   ldr   r[0-7], .L
>
> *not* being in the generated code.
>

Good point.
That would be less fragile indeed.

I guess at the time I added that test, I wanted to check we generate
the expected code sequence, where other tests checked other
conditions, so that we had full coverage of -mpure-code code paths wrt
to such constants

Christophe

> R.


[PATCH V2 7/11] Change TARGET_POPCNTD to TARGET_POWER7

2024-11-08 Thread Michael Meissner
As part of the architecture flags patches, this patch changes the use of
TARGET_POPCNTD to TARGET_POWER7.  The POPCNTD instruction was added in power7
(ISA 2.06).

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

* config/rs6000/dfp.md (floatdidd2): Change TARGET_POPCNTD to
TARGET_POWER7.
* config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported):
Likewise.
* config/rs6000/rs6000-string.cc (expand_block_compare_gpr): Likewise.
* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
Likewise.
(rs6000_rtx_costs): Likewise.
(rs6000_emit_popcount): Likewise.
* config/rs6000/rs6000.h (TARGET_LDBRX): Likewise.
(TARGET_LFIWZX): Likewise.
(TARGET_FCFIDS): Likewise.
(TARGET_FCFIDU): Likewise.
(TARGET_FCFIDUS): Likewise.
(TARGET_FCTIDUZ): Likewise.
(TARGET_FCTIWUZ): Likewise.
(CTZ_DEFINED_VALUE_AT_ZERO): Likewise.
* config/rs6000/rs6000.md (enabled attribute): Likewise.
(ctz2): Likewise.
(popcntd2): Likewise.
(lrintsi2): Likewise.
(lrintsi): Likewise.
(lrintsi_di): Likewise.
(cmpmemsi): Likewise.
(bpermd_"): Likewise.
(addg6s): Likewise.
(cdtbcd): Likewise.
(cbcdtd): Likewise.
(div_): Likewise.
---
 gcc/config/rs6000/dfp.md|  2 +-
 gcc/config/rs6000/rs6000-builtin.cc |  4 ++--
 gcc/config/rs6000/rs6000-string.cc  |  4 ++--
 gcc/config/rs6000/rs6000.cc |  6 +++---
 gcc/config/rs6000/rs6000.h  | 16 
 gcc/config/rs6000/rs6000.md | 24 
 6 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index fa9d7dd45dd..b8189390d41 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -214,7 +214,7 @@ (define_insn "*cmp_internal1"
 (define_insn "floatdidd2"
   [(set (match_operand:DD 0 "gpc_reg_operand" "=d")
(float:DD (match_operand:DI 1 "gpc_reg_operand" "d")))]
-  "TARGET_DFP && TARGET_POPCNTD"
+  "TARGET_DFP && TARGET_POWER7"
   "dcffix %0,%1"
   [(set_attr "type" "dfp")])
 
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 76421bd1de0..dae43b672ea 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -161,9 +161,9 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P6_64:
   return TARGET_POWER6 && TARGET_POWERPC64;
 case ENB_P7:
-  return TARGET_POPCNTD;
+  return TARGET_POWER7;
 case ENB_P7_64:
-  return TARGET_POPCNTD && TARGET_POWERPC64;
+  return TARGET_POWER7 && TARGET_POWERPC64;
 case ENB_P8:
   return TARGET_POWER8;
 case ENB_P8V:
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 55b4133b1a3..3674c4bd984 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1948,8 +1948,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
-  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
-  gcc_assert (TARGET_POPCNTD);
+  /* TARGET_POWER7 is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POWER7);
 
   /* For P8, this case is complicated to handle because the subtract
  with carry instructions do not generate the 64-bit carry and so
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index dd51d75c495..7d20e757c7c 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1999,7 +1999,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
return 1;
 
- if (TARGET_POPCNTD && mode == SImode)
+ if (TARGET_POWER7 && mode == SImode)
return 1;
 
  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
@@ -22473,7 +22473,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   return false;
 
 case POPCOUNT:
-  *total = COSTS_N_INSNS (TARGET_POPCNTD ? 1 : 6);
+  *total = COSTS_N_INSNS (TARGET_POWER7 ? 1 : 6);
   return false;
 
 case PARITY:
@@ -23260,7 +23260,7 @@ rs6000_emit_popcount (rtx dst, rtx src)
   rtx tmp1, tmp2;
 
   /* Use the PPC ISA 2.06 popcnt{w,d} instruction if we can.  */
-  

[committed] hppa: Fix handling of secondary reloads involving a SUBREG

2024-11-08 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
to trunk and gcc-14.

Dave
---

hppa: Fix handling of secondary reloads involving a SUBREG

This is fairly subtle.

When handling spills for SUBREG arguments in pa_emit_move_sequence,
alter_subreg may be called.  It in turn calls adjust_address_1 and
change_address_1.  change_address_1 calls pa_legitimate_address_p
to validate the new spill address.  change_address_1 generates an
internal compiler error if the address is not valid.  We need to
allow 14-bit displacements for all modes when reload_in_progress
is true and strict is false to prevent the internal compiler error.

SUBREGs are only used with the general registers, so the spill
should result in an integer access.  14-bit displacements are okay
for integer loads and stores but not for floating-point loads and
stores.

Potentially, the change could break the handling of spills for the
floating point-registers but I believe these are handled separately
in pa_emit_move_sequence.

This change fixes the build of symmetrica-3.0.1+ds.

2024-11-08  John David Anglin  

gcc/ChangeLog:

PR target/117443
* config/pa/pa.cc (pa_legitimate_address_p): Allow any
14-bit displacement when reload is in progress and strict
is false.

diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
index 94ee7dbfa8e..941ef3a7128 100644
--- a/gcc/config/pa/pa.cc
+++ b/gcc/config/pa/pa.cc
@@ -11009,6 +11009,7 @@ pa_legitimate_address_p (machine_mode mode, rtx x, bool 
strict, code_helper)
  /* Long 14-bit displacements always okay for these cases.  */
  if (INT14_OK_STRICT
  || reload_completed
+ || (reload_in_progress && !strict)
  || mode == QImode
  || mode == HImode)
return true;


signature.asc
Description: PGP signature


Re: [PATCH v3] C: Support Function multiversionsing in the C front end

2024-11-08 Thread Joseph Myers
On Mon, 4 Nov 2024, alfie.richa...@arm.com wrote:

>  /* Subroutine of duplicate_decls.  Compare NEWDECL to OLDDECL.
> Returns true if the caller should proceed to merge the two, false
> if OLDDECL should simply be discarded.  As a side effect, issues
> @@ -3365,11 +3382,53 @@ pushdecl (tree x)
>   b->inner_comp = false;
>b_use = b;
>b_ext = b;
> +
> +  /* Check if the function is part of a group of multiversioned 
> functions.
> + If so, we should check for conflicts or duplicates between any with
> + the same version. If there are no conflicts, should just add the new
> + binding.  */
> +  if (b && TREE_CODE (x) == FUNCTION_DECL
> +   && TREE_CODE (b->decl) == FUNCTION_DECL
> +   && (DECL_FUNCTION_VERSIONED (b->decl)
> +   || targetm.target_option.function_versions (x, b->decl)))
> + {
> +   maybe_mark_function_versioned (x);
> +   for (; b_use; b_use = b_use->shadowed)
> + {
> +   if (!comptypes (type, TREE_TYPE (b_use->decl))
> +   || TREE_CODE (b_use->decl) != FUNCTION_DECL)
> + {
> +   /* Found a conflicting declaration.
> +  duplicate_decls will create teh diagnostics forthis. */

Note typos in this comment: "teh", "forthis".

> +   b = b_use;
> +   break;

I don't follow the logic you're using here.  This loop is going through 
all bindings for the name in containing scopes, including completely 
unrelated ones such as object declarations with block scope and automatic 
storage duration, which looks like it would do the wrong thing.

// File scope
__attribute__ ((target_version ("default"))) int foo ();
{
  // Block scope
  int foo;
  {
 __attribute__ ((target_version ("default"))) int foo ();
 __attribute__ ((target_version ("default"))) int foo ();

As I read this code, if foo is a multiversioned function then it would 
find the variable declaration in the intermediate scope and treat it as 
conflicting.  Compare the existing logic looping through scopes for 
external declarations, which is careful to ignore any declarations not 
satisfying DECL_FILE_SCOPE_P.

There is a lot of complicated logic in pushdecl that it seems a bad idea 
to bypass for multiversioned functions.  For example, everything related 
to putting declarations with external linkage in the "external scope" in 
addition to their main scope.

I think a patch like this needs to come with a clearly stated semantic 
model for how the multiversioned functions are meant to be represented 
inside the front end, and how that translates into consequences for 
declaration processing, against which the patch can be tested.  I suggest 
some principles like this:

* For the purposes of *checking compatible types*, all declarations of the 
same name - different versions and the dispatcher - should be compared and 
required to be consistent.  Based on the tests you have, it appears at 
least some of this is taking place - for example, when two declarations 
have incompatible types.

* For the purposes of *combining information from multiple declarations*, 
at least some of that should happen for different versions as well.  For 
example, if one version has a parameter int (*x)[] and another has a 
parameter int (*x)[2], the composite type should be formed and used for 
both versions.

* For the purposes of *merging declarations and discarding the new one*, 
however, only declarations of the same version should be merged.

So all effects of diagnose_mismatched_decls should apply between different 
versions.  But only a subset of merge_decls is relevant (such as forming 
composite types) - and that subset would be relevant in both directions, 
because both declarations remain in use afterwards.  Or if you don't want 
the complications there, you could define the semantics not to e.g. form 
composite types between different versions.

Given these principles, pretty much all existing logic in pushdecl is 
relevant for multiversioned functions - twice over.  It's fully relevant 
for matching and merging with other declarations for the same version of 
the function (but with all the logic that looks for bindings for a given 
symbol name needing to be adjusted to consider only those for the same 
version as potentially matching / needing to be merged).  And it's mostly 
relevant for declarations for other versions of the function, to the 
extent that type compatibility needs to be checked and maybe composite 
types formed.

But maybe we need to start the conceptual design at an earlier stage.  
In the presence of multiple declarations of different versions of a 
function, (a) what DECL or DECLs should be recorded as bindings for the 
function name in the scope in which the declaration appears, and (b) what 
DECL or DECLs should be present in the IR at all?  Note that e.g. 
lookup_name definitely expects only one binding for a given name (as an 
ordinary identifier) i

Re: [PATCH v3] c: Implement C2y N3356, if declarations [PR117019]

2024-11-08 Thread Joseph Myers
On Fri, 8 Nov 2024, Marek Polacek wrote:

> OK, I've reworded the comment to
> 
>   /* The call above already performed convert_lvalue_to_rvalue, but
>  if it parsed an expression, read_p was false.  Make sure we mark
>  the expression as read.  */
>  
> though it's questionable whether it's useful at all.
> 
> > I think there's another case of invalid code it would be good to add tests 
> > for.  You have tests in c2y-if-decls-6.c of valid code with a VLA in in 
> > if/switch declaration.  It would be good to test also the invalid case 
> > where there is a jump into the scope of such an identifier with VM type 
> > (whether with goto, or with an outer switch around an if statement with 
> > such a declaration), to verify that the checks for this do work in both 
> > the declaration and the simple-declaration cases of if and switch.
> 
> OK, I've extended c2y-if-decls-7.c to cover those cases as well.  Otherwise
> no changes.

This version is OK.

-- 
Joseph S. Myers
josmy...@redhat.com



[committed] hppa: Don't allow large modes in hard registers

2024-11-08 Thread John David Anglin
Tested on hppa-unknown-linux-gnu.  Committed to trunk.

Dave
---

hppa: Don't allow large modes in hard registers

LRA has problems handling spills for OI and TI modes.  There are
issues with SUBREG support as well.

This change fixes gcc.c-torture/compile/pr92618.c with LRA.

2024-11-08  John David Anglin  

gcc/ChangeLog:

PR target/117238
* config/pa/pa32-regs.h (PA_HARD_REGNO_MODE_OK): Don't allow
mode size 32.  Limit mode size 16 in general registers to
complex modes.

diff --git a/gcc/config/pa/pa32-regs.h b/gcc/config/pa/pa32-regs.h
index 3467e03afed..c9a27ef1658 100644
--- a/gcc/config/pa/pa32-regs.h
+++ b/gcc/config/pa/pa32-regs.h
@@ -187,10 +187,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
that includes the incoming arguments and the return value.  We specify a
set with no overlaps so that we don't have to specify that the destination
register is an early clobber in patterns using this mode.  Except for the
-   return value, the starting registers are odd.  For 128 and 256 bit modes,
-   we similarly specify non-overlapping sets of cpu registers.  However,
-   there aren't any patterns defined for modes larger than 64 bits at the
-   moment.
+   return value, the starting registers are odd.  Except for complex modes,
+   we don't allow modes larger than 64 bits in the general registers as there
+   are issues with copies, spills and SUBREG support.
 
We limit the modes allowed in the floating point registers to the
set of modes used in the machine definition.  In addition, we allow
@@ -217,15 +216,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  ? (VALID_FP_MODE_P (MODE) \
&& (GET_MODE_SIZE (MODE) <= 4   \
|| (GET_MODE_SIZE (MODE) == 8 && ((REGNO) & 1) == 0)\
-   || (GET_MODE_SIZE (MODE) == 16 && ((REGNO) & 3) == 0)   \
-   || (GET_MODE_SIZE (MODE) == 32 && ((REGNO) & 7) == 0))) \
+   || (GET_MODE_SIZE (MODE) == 16 && ((REGNO) & 3) == 0))) \
: (GET_MODE_SIZE (MODE) <= UNITS_PER_WORD   \
   || (GET_MODE_SIZE (MODE) == 2 * UNITS_PER_WORD   \
  && REGNO) & 1) == 1 && (REGNO) <= 25) || (REGNO) == 28))  \
   || (GET_MODE_SIZE (MODE) == 4 * UNITS_PER_WORD   \
- && ((REGNO) & 3) == 3 && (REGNO) <= 23)   \
-  || (GET_MODE_SIZE (MODE) == 8 * UNITS_PER_WORD   \
- && ((REGNO) & 7) == 3 && (REGNO) <= 19)))
+ && COMPLEX_MODE_P (MODE)  \
+ && ((REGNO) & 3) == 3 && (REGNO) <= 23)))
 
 /* How to renumber registers for gdb.
 


signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-11-08 Thread Joseph Myers
On Fri, 8 Nov 2024, Alejandro Colomar wrote:

> Hi Joseph,
> 
> This is a gentle ping about this patch set, 10 days before the start of
> stage 3.

It's obviously not ready to include in its current form (using a name 
different from that actually accepted into C2Y).  Since it requires 
significant work to get it into a form corresponding to the actual C2Y 
feature and you've said you won't do that work, it's a very low priority 
for any kind of review at all compared to any submissions whose authors 
are willing to adapt them following review to be ready for inclusion.  
Name changes could be considered in a later development stage if there 
were a decision to change the name in Graz, but that requires a suitable 
submission while new features are under consideration for GCC 15.

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH v2 4/4] aarch64: add svcvt* FP8 intrinsics

2024-11-08 Thread Claudio Bantaloukas

This patch adds the following intrinsics:
- svcvt1_bf16[_mf8]_fpm
- svcvt1_f16[_mf8]_fpm
- svcvt2_bf16[_mf8]_fpm
- svcvt2_f16[_mf8]_fpm
- svcvtlt1_bf16[_mf8]_fpm
- svcvtlt1_f16[_mf8]_fpm
- svcvtlt2_bf16[_mf8]_fpm
- svcvtlt2_f16[_mf8]_fpm
- svcvtn_mf8[_f16_x2]_fpm (unpredicated)
- svcvtnb_mf8[_f32_x2]_fpm
- svcvtnt_mf8[_f32_x2]_fpm

The underlying instructions are only available when SVE2 is enabled and the PE
is not in streaming SVE mode. They are also available when SME2 is enabled and
the PE is in streaming SVE mode.

gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_signature): Add an fpm_t (uint64_t) argument to functions that
set the fpm register.
(unary_convert_narrowxn_fpm_def): New class.
(unary_convert_narrowxn_fpm): New shape.
(unary_convertxn_fpm_def): New class.
(unary_convertxn_fpm): New shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(unary_convert_narrowxn_fpm): Declare.
(unary_convertxn_fpm): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svcvt_fp8_impl): New class.
(svcvtn_impl): Handle fp8 cases.
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new FUNCTION.
(svcvtnb): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new DEF_SVE_FUNCTION_GS.
(svcvtn): Likewise.
(svcvtnb, svcvtnt): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svcvt1, svcvt2, svcvtlt1, svcvtlt2, svcvtnb, svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_cvt_mf8, TYPES_cvtn_mf8, TYPES_cvtnx_mf8): Add new types arrays.
(function_builder::get_name): Append _fpm to functions that set fpmr.
(function_resolver::check_gp_argument): Deal with the fpm_t argument.
(function_expander::use_exact_insn): Set the fpm register before
calling the insn if the function warrants it.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_fp8_cvt): Add new.
(@aarch64_sve2_fp8_cvtn): Likewise.
(@aarch64_sve2_fp8_cvtnb): Likewise.
(@aarch64_sve_cvtnt): Likewise.
* config/aarch64/aarch64.h (TARGET_SSVE_FP8): Add new.
* config/aarch64/iterators.md
(VNx8SF_ONLY, SVE_FULL_HFx2): New mode iterators.
(UNSPEC_F1CVT, UNSPEC_F1CVTLT, UNSPEC_F2CVT, UNSPEC_F2CVTLT): Add new.
(UNSPEC_FCVTNB, UNSPEC_FCVTNT): Likewise.
(UNSPEC_FP8FCVTN): Likewise.
(FP8CVT_UNS, fp8_cvt_uns_op): Likewise.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z): Add fpm0 argument
* gcc.target/aarch64/sve/acle/general-c/unary_convert_narrowxn_fpm_1.c:
Add new tests.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_fpm_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c: Likewise.
* lib/target-supports.exp: Add aarch64_asm_fp8_ok check.
---
 .../aarch64/aarch64-sve-builtins-shapes.cc| 74 +++
 .../aarch64/aarch64-sve-builtins-shapes.h |  2 +
 .../aarch64/aarch64-sve-builtins-sve2.cc  | 28 ++-
 .../aarch64/aarch64-sve-builtins-sve2.def | 12 +++
 .../aarch64/aarch64-sve-builtins-sve2.h   |  6 ++
 gcc/config/aarch64/aarch64-sve-builtins.cc| 30 +++-
 gcc/config/aarch64/aarch64-sve2.md| 52 +
 gcc/config/aarch64/aarch64.h  |  5 ++
 gcc/config/aarch64/iterators.md   | 25 +++
 .../aarch64/sve/acle/asm/test_sve_acle.h  |  2 +-
 .../general-c/unary_convert_narrowxn_fpm_1.c  | 38 ++
 .../acle/general-c/unary_convertxn_fpm_1.c| 60 +++
 .../aarch64/sve2/acle/asm/cvt_mf8.c   | 48 
 .../aarch64/sve2/acle/asm/cvtlt_mf8.c | 47 
 .../aarch64/sve2/acle/asm/cvtn_mf8.c  | 59 +++
 gcc/testsuite/lib/target-supports.exp |  2 +-
 16 files changed, 485 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_convert_narrowxn_fpm_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_convertxn_fpm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 51f7cfdf96f..f08c377f5e4 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -325,6 +325,8 @@ parse_signature (const function_instance &instance, const char *format,
 	argument_types.quick_push (argumen

[PATCH v2 0/4] aarch64: Add fp8 sve foundation

2024-11-08 Thread Claudio Bantaloukas


The ACLE defines a new set of fp8 vector types and intrinsics that operate on
these, some of them operating on the vectors as if they were bags of bits and
some requiring an additional argument of type fpm_t.

The following patches introduce:
- the types
- intrinsics that operate without the fpm_t type
- foundational changes that will be used to implement intrinsics requiring an
  fpm_t argument at the end
- conversion intrinsics

Compared to v1 of this patch adds:
- A change has been added to fix return of scalar fp8 values
- Added tests for sve<->simd conversions
- Support for svcvt* intrinsics along with supporting shapes

Is this ok for master? I do not have commit rights yet, if ok, can someone 
commit it on my behalf?

Regression tested on aarch64-unknown-linux-gnu.

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (4):
  aarch64: return scalar fp8 values in fp registers
  aarch64: Add basic svmfloat8_t support to arm_sve.h
  aarch64: specify fpm mode in function instances and groups
  aarch64: add svcvt* FP8 intrinsics

 .../aarch64/aarch64-sve-builtins-base.cc  |  15 +-
 .../aarch64/aarch64-sve-builtins-base.def |   3 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc|  77 -
 .../aarch64/aarch64-sve-builtins-shapes.h |   2 +
 .../aarch64/aarch64-sve-builtins-sme.def  | 130 
 .../aarch64/aarch64-sve-builtins-sve2.cc  |  48 ++-
 .../aarch64/aarch64-sve-builtins-sve2.def | 108 ---
 .../aarch64/aarch64-sve-builtins-sve2.h   |   6 +
 gcc/config/aarch64/aarch64-sve-builtins.cc|  61 +++-
 gcc/config/aarch64/aarch64-sve-builtins.def   |   7 +-
 gcc/config/aarch64/aarch64-sve-builtins.h |  26 +-
 gcc/config/aarch64/aarch64-sve2.md|  52 +++
 gcc/config/aarch64/aarch64.cc |   3 +-
 gcc/config/aarch64/aarch64.h  |   5 +
 gcc/config/aarch64/iterators.md   |  25 ++
 .../aarch64/sve/acle/general-c++/mangle_1.C   |   2 +
 .../aarch64/sve/acle/general-c++/mangle_2.C   |   2 +
 .../gcc.target/aarch64/fp8_scalar_1.c |   4 +-
 .../aarch64/sve/acle/asm/clasta_mf8.c |  52 +++
 .../aarch64/sve/acle/asm/clastb_mf8.c |  52 +++
 .../aarch64/sve/acle/asm/create2_1.c  |  15 +
 .../aarch64/sve/acle/asm/create3_1.c  |  11 +
 .../aarch64/sve/acle/asm/create4_1.c  |  12 +
 .../aarch64/sve/acle/asm/dup_lane_mf8.c   | 124 
 .../gcc.target/aarch64/sve/acle/asm/dup_mf8.c |  31 ++
 .../aarch64/sve/acle/asm/dup_neonq_mf8.c  |  30 ++
 .../aarch64/sve/acle/asm/dupq_lane_mf8.c  |  48 +++
 .../gcc.target/aarch64/sve/acle/asm/ext_mf8.c |  73 +
 .../aarch64/sve/acle/asm/get2_mf8.c   |  55 
 .../aarch64/sve/acle/asm/get3_mf8.c   | 108 +++
 .../aarch64/sve/acle/asm/get4_mf8.c   | 179 +++
 .../aarch64/sve/acle/asm/get_neonq_mf8.c  |  33 ++
 .../aarch64/sve/acle/asm/insr_mf8.c   |  22 ++
 .../aarch64/sve/acle/asm/lasta_mf8.c  |  12 +
 .../aarch64/sve/acle/asm/lastb_mf8.c  |  12 +
 .../gcc.target/aarch64/sve/acle/asm/ld1_mf8.c | 162 ++
 .../aarch64/sve/acle/asm/ld1ro_mf8.c  | 121 +++
 .../aarch64/sve/acle/asm/ld1rq_mf8.c  | 137 
 .../gcc.target/aarch64/sve/acle/asm/ld2_mf8.c | 204 
 .../gcc.target/aarch64/sve/acle/asm/ld3_mf8.c | 246 +++
 .../gcc.target/aarch64/sve/acle/asm/ld4_mf8.c | 290 +
 .../aarch64/sve/acle/asm/ldff1_mf8.c  |  91 ++
 .../aarch64/sve/acle/asm/ldnf1_mf8.c  | 155 +
 .../aarch64/sve/acle/asm/ldnt1_mf8.c  | 162 ++
 .../gcc.target/aarch64/sve/acle/asm/len_mf8.c |  12 +
 .../aarch64/sve/acle/asm/reinterpret_bf16.c   |  17 +
 .../aarch64/sve/acle/asm/reinterpret_f16.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_f32.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_f64.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_mf8.c| 297 ++
 .../aarch64/sve/acle/asm/reinterpret_s16.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_s32.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_s64.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_s8.c |  17 +
 .../aarch64/sve/acle/asm/reinterpret_u16.c|  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u32.c|  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u64.c|  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u8.c |  28 ++
 .../gcc.target/aarch64/sve/acle/asm/rev_mf8.c |  21 ++
 .../gcc.target/aarch64/sve/acle/asm/sel_mf8.c |  30 ++
 .../aarch64/sve/acle/asm/set2_mf8.c   |  41 +++
 .../aarch64/sve/acle/asm/set3_mf8.c   |  63 
 .../aarch64/sve/acle/asm/set4_mf8.c   |  87 +
 .../aarch64/sve/acle/asm/set_neonq_mf8.c  |  23 ++
 .../aarch64/sve/acle/asm/splice_mf8.c |  33 ++
 .../gcc.target/aarch64/sve/acle/asm/st1_mf8.c | 162 ++
 .../gcc.target/aarch64/sve/acle/asm/st2_mf8.c | 204 
 .../gcc.target/aarch64/sve/acle/asm/st3_mf8.c | 246 +

[PATCH] AArch64: Cleanup fusion defines

2024-11-08 Thread Wilco Dijkstra

Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base
level of fusion supported by almost all cores.  Add AARCH64_FUSE_MOVK as a
shortcut for all MOVK fusion.  In most cases there is no change.  It enables
AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measurable
effect if a core doesn't support it.  Also it may have been accidentally
left out on some cores that support all other types of branch fusion.

In the future we could add fusion types to AARCH64_FUSE_BASE if beneficial.

Passes regress & bootstrap, OK for commit?

gcc/ChangeLog:

* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_BASE): New 
define.
(AARCH64_FUSE_MOVK): Likewise.
* config/aarch64/tuning_models/a64fx.h: Update.
* config/aarch64/tuning_models/ampere1.h: Likewise.
* config/aarch64/tuning_models/ampere1a.h: Likewise.
* config/aarch64/tuning_models/ampere1b.h: Likewise.
* config/aarch64/tuning_models/cortexa35.h: Likewise.
* config/aarch64/tuning_models/cortexa53.h: Likewise.
* config/aarch64/tuning_models/cortexa57.h: Likewise.
* config/aarch64/tuning_models/cortexa72.h: Likewise.
* config/aarch64/tuning_models/cortexa73.h: Likewise.
* config/aarch64/tuning_models/cortexx925.h: Likewise.
* config/aarch64/tuning_models/exynosm1.h: Likewise.
* config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
* config/aarch64/tuning_models/generic.h: Likewise.
* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen1.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
* config/aarch64/tuning_models/neoversev3.h: Likewise.
* config/aarch64/tuning_models/neoversev3ae.h: Likewise.
* config/aarch64/tuning_models/qdf24xx.h: Likewise.
* config/aarch64/tuning_models/saphira.h: Likewise.
* config/aarch64/tuning_models/thunderx2t99.h: Likewise.
* config/aarch64/tuning_models/thunderx3t110.h: Likewise.
* config/aarch64/tuning_models/tsv110.h: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 
bf5e85ba8fe128721521505bd6b73b38c25d9f65..f8413ab0c802c28290ebcc171bfd131622cb33be
 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -41,3 +41,8 @@ AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL)
 AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
 
 #undef AARCH64_FUSION_PAIR
+
+/* Baseline fusion settings suitable for all cores.  */
+#define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC)
+
+#define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK)
diff --git a/gcc/config/aarch64/tuning_models/a64fx.h 
b/gcc/config/aarch64/tuning_models/a64fx.h
index 
378a1b3889ee265859786c1ff6525fce2305b615..2de96190b2d668f7f8e09b48fba418788d726ccf
 100644
--- a/gcc/config/aarch64/tuning_models/a64fx.h
+++ b/gcc/config/aarch64/tuning_models/a64fx.h
@@ -150,7 +150,7 @@ static const struct tune_params a64fx_tunings =
 4 /* store_pred.  */
   }, /* memmov_cost.  */
   7, /* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
+  AARCH64_FUSE_BASE, /* fusible_ops  */
   "32",/* function_align.  */
   "16",/* jump_align.  */
   "32",/* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1.h 
b/gcc/config/aarch64/tuning_models/ampere1.h
index 
ace9bf49f7593d3713ed0bc61494c3915749a9a8..b2b376699ae64c3089896491baa6d8dcd948ef87
 100644
--- a/gcc/config/aarch64/tuning_models/ampere1.h
+++ b/gcc/config/aarch64/tuning_models/ampere1.h
@@ -88,11 +88,8 @@ static const struct tune_params ampere1_tunings =
 4 /* store_pred.  */
   }, /* memmov_cost.  */
   4, /* issue_rate  */
-  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
-   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
-   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
-   AARCH64_FUSE_CMP_BRANCH),
-  /* fusible_ops  */
+  (AARCH64_FUSE_BASE | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   "32",/* function_align.  */
   "4", /* jump_align.  */
   "32:16", /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h 
b/gcc/config/aarch64/tuning_models/ampere1a.h
index 
7fd7c9fca27b3ab873b47390e83b3db6b3404050..d2f114c13d07248512df5787c06ff47c53b10686
 100644
--- a/gcc/config/aarch64/tuning_models/ampere1a.h
+++ b/gcc/config/aarch64/tuning_models/ampere1a.h
@@ -39,12 +39,9 @@ 

[PATCH] c++: Add __builtin_operator_{new,delete} support

2024-11-08 Thread Jakub Jelinek
Hi!

clang++ adds __builtin_operator_{new,delete} builtins which as documented
work similarly to ::operator {new,delete}, except that it is an error
if the called ::operator {new,delete} is not a replaceable global operator
and allow optimizations which C++ normally allows just when those are used
from new/delete expressions https://eel.is/c++draft/expr.new#14
When using these builtins, the same optimizations can be done even when
using those builtins.

For GCC we note that in the CALL_FROM_NEW_OR_DELETE_P flag on CALL_EXPRs.
The following patch implements it as a C++ FE keyword (because passing
references through ... changes the argument and so BUILT_IN_FRONTEND
builtin can't be used), just attempts to call the ::operator {new,delete}
and if it isn't replaceable, diagnoses it.

So far lightly tested, ok for trunk if it passes bootstrap/regtest
(note, libstdc++ already uses the builtin)?

2024-11-08  Jakub Jelinek  

gcc/c-family/
* c-common.h (enum rid): Add RID_BUILTIN_OPERATOR_NEW
and RID_BUILTIN_OPERATOR_DELETE.
(names_builtin_p): Change return type from bool to int.
* c-common.cc (c_common_reswords): Add __builtin_operator_new
and __builtin_operator_delete.
gcc/c/
* c-decl.cc (names_builtin_p): Change return type from
bool to int, adjust return statments.
gcc/cp/
* parser.cc (cp_parser_postfix_expression): Handle
RID_BUILTIN_OPERATOR_NEW and RID_BUILTIN_OPERATOR_DELETE.
* cp-objcp-common.cc (names_builtin_p): Change return type from
bool to int, adjust return statments.  Handle
RID_BUILTIN_OPERATOR_NEW and RID_BUILTIN_OPERATOR_DELETE.
* pt.cc (tsubst_expr) : Handle
CALL_FROM_NEW_OR_DELETE_P.
gcc/
* doc/extend.texi (New/Delete Builtins): Document
__builtin_operator_new and __builtin_operator_delete.
gcc/testsuite/
* g++.dg/ext/builtin-operator-new-1.C: New test.
* g++.dg/ext/builtin-operator-new-2.C: New test.
* g++.dg/ext/builtin-operator-new-3.C: New test.

--- gcc/c-family/c-common.h.jj  2024-10-25 10:00:29.314770060 +0200
+++ gcc/c-family/c-common.h 2024-11-08 16:53:01.198538630 +0100
@@ -168,6 +168,7 @@ enum rid
   RID_ADDRESSOF,
   RID_BUILTIN_LAUNDER,
   RID_BUILTIN_BIT_CAST,
+  RID_BUILTIN_OPERATOR_NEW, RID_BUILTIN_OPERATOR_DELETE,
 
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
@@ -840,7 +841,7 @@ extern bool in_late_binary_op;
 extern const char *c_addr_space_name (addr_space_t as);
 extern tree identifier_global_value (tree);
 extern tree identifier_global_tag (tree);
-extern bool names_builtin_p (const char *);
+extern int names_builtin_p (const char *);
 extern tree c_linkage_bindings (tree);
 extern void record_builtin_type (enum rid, const char *, tree);
 extern void start_fname_decls (void);
--- gcc/c-family/c-common.cc.jj 2024-10-30 07:59:36.347587874 +0100
+++ gcc/c-family/c-common.cc2024-11-08 15:33:51.733170328 +0100
@@ -434,6 +434,8 @@ const struct c_common_resword c_common_r
   { "__builtin_counted_by_ref", RID_BUILTIN_COUNTED_BY_REF, D_CONLY },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
+  { "__builtin_operator_new", RID_BUILTIN_OPERATOR_NEW, D_CXXONLY },
+  { "__builtin_operator_delete", RID_BUILTIN_OPERATOR_DELETE, D_CXXONLY },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_stdc_bit_ceil", RID_BUILTIN_STDC, D_CONLY },
--- gcc/c/c-decl.cc.jj  2024-10-31 21:17:06.857021211 +0100
+++ gcc/c/c-decl.cc 2024-11-08 16:56:56.153195293 +0100
@@ -11751,10 +11751,10 @@ identifier_global_tag (tree t)
   return NULL_TREE;
 }
 
-/* Returns true if NAME refers to a built-in function or function-like
-   operator.  */
+/* Returns non-zero (result of __has_builtin) if NAME refers to a built-in
+   function or function-like operator.  */
 
-bool
+int
 names_builtin_p (const char *name)
 {
   tree id = get_identifier (name);
@@ -11775,12 +11775,12 @@ names_builtin_p (const char *name)
 case RID_CHOOSE_EXPR:
 case RID_OFFSETOF:
 case RID_TYPES_COMPATIBLE_P:
-  return true;
+  return 1;
 default:
   break;
 }
 
-  return false;
+  return 0;
 }
 
 /* In C, the only C-linkage public declaration is at file scope.  */
--- gcc/cp/parser.cc.jj 2024-11-06 18:53:10.815844090 +0100
+++ gcc/cp/parser.cc2024-11-08 17:52:04.332208389 +0100
@@ -7733,6 +7733,8 @@ cp_parser_postfix_expression (cp_parser
 case RID_BUILTIN_SHUFFLEVECTOR:
 case RID_BUILTIN_LAUNDER:
 case RID_BUILTIN_ASSOC_BARRIER:
+case RID_BUILTIN_OPERATOR_NEW:
+case RID_BUILTIN_OPERATOR_DELETE:
   {
vec *vec;
 
@@ -7819,6 +7821,39 @@ cp_parser_postfix_expression (cp_parser
  }
break;
 
+ case RID_BUILTIN_OPERATOR_NEW:
+ case RID_BUILTIN_OPERATOR_DELETE:
+   tree fn

[PATCH v3] c: Implement C2y N3356, if declarations [PR117019]

2024-11-08 Thread Marek Polacek
On Fri, Nov 08, 2024 at 08:43:39PM +, Joseph Myers wrote:
> On Thu, 7 Nov 2024, Marek Polacek wrote:
> 
> > @@ -8355,7 +8492,9 @@ c_parser_switch_statement (c_parser *parser, bool 
> > *if_p, tree before_labels)
> >if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
> >   && c_token_starts_typename (c_parser_peek_2nd_token (parser)))
> > explicit_cast_p = true;
> > -  ce = c_parser_expression (parser);
> > +  ce = c_parser_selection_header (parser, /*switch_p=*/true);
> > +  /* The call above already performed convert_lvalue_to_rvalue, but 
> > with
> > +read_p=false.  */
> >ce = convert_lvalue_to_rvalue (switch_cond_loc, ce, true, true);
> 
> That comment only seems accurate in the case of an expression; in the case 
> of a simple-declaration, read_p=true for the previous call.

OK, I've reworded the comment to

  /* The call above already performed convert_lvalue_to_rvalue, but
 if it parsed an expression, read_p was false.  Make sure we mark
 the expression as read.  */
 
though it's questionable whether it's useful at all.

> I think there's another case of invalid code it would be good to add tests 
> for.  You have tests in c2y-if-decls-6.c of valid code with a VLA in in 
> if/switch declaration.  It would be good to test also the invalid case 
> where there is a jump into the scope of such an identifier with VM type 
> (whether with goto, or with an outer switch around an if statement with 
> such a declaration), to verify that the checks for this do work in both 
> the declaration and the simple-declaration cases of if and switch.

OK, I've extended c2y-if-decls-7.c to cover those cases as well.  Otherwise
no changes.

Thanks,

-- >8 --
This patch implements C2y N3356, if declarations as described at
.

This feature is cognate with C++17 Selection statements with initializer
,
but they are not the same yet.  For example, C++17 allows

  if (lock (); int i = getval ())

whereas C2y does not.

The proposal adds new grammar productions.  selection-header is handled
in c_parser_selection_header which is the gist of the patch.
simple-declaration is handled by c_parser_declaration_or_fndef, which
gets a new parameter.

PR c/117019

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Adjust declaration.
(c_parser_external_declaration): Adjust a call to
c_parser_declaration_or_fndef.
(c_parser_declaration_or_fndef): New bool parameter.  Return a tree
instead of void.  Adjust for N3356.  Adjust a call to
c_parser_declaration_or_fndef.
(c_parser_compound_statement_nostart): Adjust calls to
c_parser_declaration_or_fndef.
(c_parser_selection_header): New.
(c_parser_paren_selection_header): New.
(c_parser_if_statement): Call c_parser_paren_selection_header
instead of c_parser_paren_condition.
(c_parser_switch_statement): Call c_parser_selection_header instead of
c_parser_expression.
(c_parser_for_statement): Adjust calls to c_parser_declaration_or_fndef.
(c_parser_objc_methodprotolist): Likewise.
(c_parser_oacc_routine): Likewise.
(c_parser_omp_loop_nest): Likewise.
(c_parser_omp_declare_simd): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/c23-if-decls-1.c: New test.
* gcc.dg/c23-if-decls-2.c: New test.
* gcc.dg/c2y-if-decls-1.c: New test.
* gcc.dg/c2y-if-decls-2.c: New test.
* gcc.dg/c2y-if-decls-3.c: New test.
* gcc.dg/c2y-if-decls-4.c: New test.
* gcc.dg/c2y-if-decls-5.c: New test.
* gcc.dg/c2y-if-decls-6.c: New test.
* gcc.dg/c2y-if-decls-7.c: New test.
* gcc.dg/c2y-if-decls-8.c: New test.
* gcc.dg/c2y-if-decls-9.c: New test.
* gcc.dg/c2y-if-decls-10.c: New test.
* gcc.dg/c2y-if-decls-11.c: New test.
* gcc.dg/gnu2y-if-decls-1.c: New test.
* gcc.dg/gnu99-if-decls-1.c: New test.
* gcc.dg/gnu99-if-decls-2.c: New test.
---
 gcc/c/c-parser.cc   | 253 ++--
 gcc/testsuite/gcc.dg/c23-if-decls-1.c   |  15 ++
 gcc/testsuite/gcc.dg/c23-if-decls-2.c   |   6 +
 gcc/testsuite/gcc.dg/c2y-if-decls-1.c   | 168 
 gcc/testsuite/gcc.dg/c2y-if-decls-10.c  |  38 
 gcc/testsuite/gcc.dg/c2y-if-decls-11.c  | 199 +++
 gcc/testsuite/gcc.dg/c2y-if-decls-2.c   |  35 
 gcc/testsuite/gcc.dg/c2y-if-decls-3.c   |  39 
 gcc/testsuite/gcc.dg/c2y-if-decls-4.c   | 199 +++
 gcc/testsuite/gcc.dg/c2y-if-decls-5.c   |  35 
 gcc/testsuite/gcc.dg/c2y-if-decls-6.c   |  27 +++
 gcc/testsuite/gcc.dg/c2y-if-decls-7.c   |  96 +
 gcc/testsuite/gcc.dg/c2y-if-decls-8.c   | 168 
 gcc/testsuite/gcc.dg/c2y-if-decls-9.c   |  35 
 gcc/testsuite/g

[PATCH v2] c: minor fixes related to arrays of unspecified size [PR116284,PR117391]

2024-11-08 Thread Martin Uecker


This version of the already approved patch only adds the missing
word "size" to the commit message and a missing "-std=gnu23" to 
the first test.  If there are no new comments, I will commit this
once the pre-commit CI tests are complete.


Bootstrapped and regression tested on x86_64.

Martin



c: minor fixes related to arrays of unspecified size

The patch for PR117145 and PR117245 also fixed PR100420 and PR116284 which
are bugs related to arrays of unspecified size.  Those are now represented
as variable size arrays with size (0, 0).  There are still some loose ends,
which are resolved here by

1. adding a testcase for PR116284,
2. moving code related to creation and detection of arrays of unspecified
sizes in their own functions,
3. preferring a specified size over an unspecified size when forming
a composite type as required by C99 (PR118391)
4. removing useless code in comptypes_internal and composite_type_internal.

PR c/116284
PR c/117391

gcc/ChangeLog:
* c/c-tree.h (c_type_unspecified_p): New inline function.
* c/c-typeck.cc (c_build_array_type_unspecified): New function.
(comptypes_interal): Remove useless code.
(composite_type_internal): Update.
* c/c-decl.cc (grokdeclarator): Revise.

gcc/testsuite/ChangeLog:
* gcc.dg/pr116284.c: New test.
* gcc.dg/pr117391.c: New test.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 42d329e4fd5..ac47ef24a3d 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -7501,10 +7501,6 @@ grokdeclarator (const struct c_declarator *declarator,
/* C99 6.7.5.2p4 */
if (decl_context == TYPENAME)
  warning (0, "%<[*]%> not in a declaration");
-   /* Array of unspecified size.  */
-   tree upper = build2 (COMPOUND_EXPR, TREE_TYPE (size_zero_node),
-integer_zero_node, size_zero_node);
-   itype = build_index_type (upper);
size_varies = true;
  }
 
@@ -7540,7 +7536,10 @@ grokdeclarator (const struct c_declarator *declarator,
if (!ADDR_SPACE_GENERIC_P (as) && as != TYPE_ADDR_SPACE (type))
  type = c_build_qualified_type (type,
 ENCODE_QUAL_ADDR_SPACE (as));
-   type = c_build_array_type (type, itype);
+   if (array_parm_vla_unspec_p)
+ type = c_build_array_type_unspecified (type);
+   else
+ type = c_build_array_type (type, itype);
  }
 
if (type != error_mark_node)
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index c8e9731bfc4..f6bcbabb9d5 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -776,12 +776,22 @@ extern struct c_switch *c_switch_stack;
 extern bool null_pointer_constant_p (const_tree);
 
 
-inline
-bool c_type_variably_modified_p (tree t)
+inline bool
+c_type_variably_modified_p (tree t)
 {
   return error_mark_node != t && C_TYPE_VARIABLY_MODIFIED (t);
 }
 
+inline bool
+c_type_unspecified_p (tree t)
+{
+  return error_mark_node != t
+&& C_TYPE_VARIABLE_SIZE (t) && TREE_CODE (t) == ARRAY_TYPE
+&& TYPE_DOMAIN (t) && TYPE_MAX_VALUE (TYPE_DOMAIN (t))
+&& TREE_CODE (TYPE_MAX_VALUE (TYPE_DOMAIN (t))) == COMPOUND_EXPR
+&& integer_zerop (TREE_OPERAND (TYPE_MAX_VALUE (TYPE_DOMAIN (t)), 0))
+&& integer_zerop (TREE_OPERAND (TYPE_MAX_VALUE (TYPE_DOMAIN (t)), 1));
+}
 
 extern bool char_type_p (tree);
 extern tree c_objc_common_truthvalue_conversion (location_t, tree,
@@ -883,10 +893,10 @@ extern tree c_reconstruct_complex_type (tree, tree);
 extern tree c_build_type_attribute_variant (tree ntype, tree attrs);
 extern tree c_build_pointer_type (tree type);
 extern tree c_build_array_type (tree type, tree domain);
+extern tree c_build_array_type_unspecified (tree type);
 extern tree c_build_function_type (tree type, tree args, bool no = false);
 extern tree c_build_pointer_type_for_mode (tree type, machine_mode mode, bool 
m);
 
-
 /* Set to 0 at beginning of a function definition, set to 1 if
a return statement that specifies a return value is seen.  */
 
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6673cbf7294..201d75d2e9c 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -358,7 +358,7 @@ qualify_type (tree type, tree like)
 }
 
 
-/* Check consistency of type TYP.E  For derived types, we test that
+/* Check consistency of type TYPE.  For derived types, we test that
C_TYPE_VARIABLE_SIZE and C_TYPE_VARIABLY_MODIFIED are consistent with
the requirements of the base type.  We also check that arrays with a
non-constant length are marked with C_TYPE_VARIABLE_SIZE. If any
@@ -490,6 +490,17 @@ c_build_array_type (tree type, tree domain)
   return c_set_type_bits (ret, type);
 }
 
+
+/* Build an array type

Re: [PATCH] testsuite: arm: Update expected asm in no-literal-pool-m0.c

2024-11-08 Thread Torbjorn SVENSSON




On 2024-11-08 15:30, Richard Earnshaw (lists) wrote:

On 14/10/2024 16:28, Christophe Lyon wrote:



On 10/14/24 16:40, Torbjorn SVENSSON wrote:

Hi Christophe,

On 2024-10-14 14:16, Christophe Lyon wrote:

Hi Torbjörn,


On 10/13/24 19:37, Torbjörn SVENSSON wrote:

Ok for trunk?

--

With the changes in r15-1579-g792f97b44ff, the constants have been
updated. This patch aligns the constants in the test cases with the
updated implementation.



Could you share a bit more details? In particular, IIUC, since the 
constants used now are different, why do you introduce new 
alternatives, rather than replace the old expected constants?


Do we now generate both old & new versions, depending on some flags?


The constants depends on if some optimization step is included or 
not. I don't understand how this works, but the changeset that 
Richard merged in the above commit id does cause this need.
I think the constants switch with -O2, but I don't know exactly what 
part that enables it.


 From my point of view, the mov instruction could always us the 
higher value and then do a lower amount of shifts, but don't know if 
that's easy to achieve or not.


Indeed, these tests are exercised with several optimization levels 
from -O0 to -O3 and -Os, and late-combine is only enabled at -O2 and 
above, including -Os. So I guess the previous value is still used at - 
O0 and -O1.


Thanks for the clarification.

LGTM.

Christophe


Kind regards,
Torbjörn



FTR, this is also tracked by https://linaro.atlassian.net/browse/ 
GNU-1269



Thanks,

Christophe


gcc/testsuite/ChangeLog:

* gcc.target/arm/pure-code/no-literal-pool-m0.c: Update expected
asm.

Signed-off-by: Torbjörn SVENSSON 
---
  .../arm/pure-code/no-literal-pool-m0.c    | 29 + 
+-

  1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pure-code/no-literal- 
pool- m0.c b/gcc/testsuite/gcc.target/arm/pure-code/no-literal- 
pool-m0.c

index bd6f4af183b..effaf8b60b6 100644
--- a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
+++ b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
@@ -95,8 +95,13 @@ test_65536 ()
  /*
  ** test_0x123456:
  **    ...
+** (
  **    movs    r[0-3], #18
  **    lsls    r[0-3], r[0-3], #8
+** |
+**    movs    r[0-3], #144
+**    lsls    r[0-3], r[0-3], #5
+** )
  **    adds    r[0-3], r[0-3], #52
  **    lsls    r[0-3], r[0-3], #8
  **    adds    r[0-3], r[0-3], #86
@@ -125,18 +130,16 @@ test_0x1123456 ()
    return 0x1123456;
  }
-/* With -Os, we generate:
-   movs r0, #16
-   lsls r0, r0, r0
-   With the other optimization levels, we generate:
-   movs r0, #16
-   lsls r0, r0, #16
-   hence the two alternatives.  */
  /*
  ** test_0x110:
  **    ...
+** (
  **    movs    r[0-3], #16
-**    lsls    r[0-3], r[0-3], (#16|r[0-3])
+**    lsls    r[0-3], r[0-3], #16
+** |
+**    movs    r[0-3], #128
+**    lsls    r[0-3], r[0-3], #13
+** )
  **    adds    r[0-3], r[0-3], #1
  **    lsls    r[0-3], r[0-3], #4
  **    ...
@@ -150,8 +153,13 @@ test_0x110 ()
  /*
  ** test_0x111:
  **    ...
+** (
  **    movs    r[0-3], #1
  **    lsls    r[0-3], r[0-3], #24
+** |
+**    movs    r[0-3], #128
+**    lsls    r[0-3], r[0-3], #17
+** )
  **    adds    r[0-3], r[0-3], #17
  **    ...
  */
@@ -164,8 +172,13 @@ test_0x111 ()
  /*
  ** test_m8192:
  **    ...
+** (
  **    movs    r[0-3], #1
  **    lsls    r[0-3], r[0-3], #13
+** |
+**    movs    r[0-3], #128
+**    lsls    r[0-3], r[0-3], #6
+** )
  **    rsbs    r[0-3], r[0-3], #0
  **    ...
  */




Do we really need to care about the precise assembler output at all the 
different optimization levels?  Isn't it enough to simply check that we 
don't get a literal pool entry?  Eg to scan for

  ldr    r[0-7], .L

*not* being in the generated code.


Do you want me to remove all the function-bodies and just add

/* { dg-final { scan-assembler-not "\tldr\tr[0-9]+, \\.L[0-9]+" } } */

or, do you want me to only change the functions that I am touching in my 
patch? (I've not tested the statement above, I will do that prior to 
sending it...).


Kind regards,
Torbjörn



R.




Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Andrew Stubbs

On 08/11/2024 12:25, Richard Sandiford wrote:

For the aarch64 simd clones patches, it would be useful to be able to
push a function declaration onto the cfun stack, even though it has no
function body associated with it.  That is, we want cfun to be null,
current_function_decl to be the decl itself, and the target and
optimisation flags to reflect the declaration.


What do the simd_clone patches do? Just curious.

Andrew


[PATCH V2 0/11] Separate PowerPC archiecture bits from ISA flags that use command line options

2024-11-08 Thread Michael Meissner
These patches are a clean up in the PowerPC port to move architecture bits that
are not user ISA options from rs6000_isa_flags to a new targt variable
rs6000_arch_flags.  The intention is to remove switches that are currently isa
options, but the user should not be using this particular option. For example,
we want users to use -mcpu=power10 and not just -mpower10.

This is version 2 of my patches.  The difference between this patch and the
previous patch is if you configure a GCC compiler on a little endian server
without using --with-cpu, the previous patch would set the .machine option to
powerpc.  This patch now sets the .machine option to power8, which is the
minimum ISA level for little endian 32-bit.

The previous patches were at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666529.html

There are 11 patches in this series.  I have tested these patches on both
little endian and big endian systems and there are no regressions.  Can I apply
these patches to the trunk?  I don't see the need to backport these changes to
the earlier branches, but if desired I can do that.

The patches are:

Patch #1: This patch sets up the infrastructure to have a separate architecture
flags.  It moves the target_clones attribute to use this new architecture
flags.  The generation of ".machine" now also uses this table.

Patch #2: For newer PowerPC architectures, the architecture flags are used for
defining '_ARCH_PWR' instead of the isa flags.  The -mpower10 and -mpower11
options are removed.

Patch #3: The code is restructured so that -mvsx does not convert the processor
to power7.  Thus using -mvsx is not allowed unless the user uses -mcpu=power7
or later.

Patch #4: Change uses of TARGET_POPCNTB to TARGET_POWER5.

Patch #5: Change uses of TARGET_FPRND to TARGET_POWER5X.

Patch #6: Change uses of TARGET_CMPB to TARGET_POWER6.

Patch #7: Change uses of TARGET_POPCNTD to TARGET_POWER7.

Patch #8: Change uses of TARGET_MODULO to TARGET_POWER9.

Patch #9: Rework tests that use -mvsx to raise the cpu to power7 to explicitly
add an appropriate #pragma to force the code generation to a power7.

Patch #10: Add support for a -mcpu=future option.

Patch #11: Make -mtune=future (and -mcpu=future without an explicit -mtune=
option) automatically schedudle insns like -mtune=power10 or -mtune=power11.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] testsuite: arm: Require 16-bit float support

2024-11-08 Thread Torbjorn SVENSSON




On 2024-11-08 12:24, Richard Earnshaw (lists) wrote:

On 05/11/2024 20:06, Torbjörn SVENSSON wrote:

Based on how these functions are used in test cases, I think it's correct
to require 16-bit float support in both functions.

Without this change, the checks passes for armv8-m and armv8.1-m, but the
test cases that uses them fails due to the incorrect -mfpu option.

Ok for trunk and releases/gcc-14?


Can you expand on the issue you're trying to address with this change?


If dejagnu is started with a specified FPU, the function 
arm_v8_2a_fp16_scalar_ok will check if 
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined, but it will not ensure 
that the FPU supports 16-bit floats.
The result is that with the given FPU, GCC might report that 
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC is supported, but 16-bit floats are 
not.


With -march and -mfpu:
.../bin/arm-none-eabi-gcc -E -dM - -mthumb -march=armv8-m.main+fp 
-mfloat-abi=hard -mfpu=fpv5-sp-d16 -fdiagnostics-plain-output -O2 
-mcpu=unset -march=armv8.2-a+fp16  __ARM_FEATURE_FP16_SCALAR_ARITHMETIC

#define __ARM_FP 4
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1


Same as above, but with -mfpu=auto appended:
.../bin/arm-none-eabi-gcc -E -dM - -mthumb -march=armv8-m.main+fp 
-mfloat-abi=hard -mfpu=fpv5-sp-d16 -fdiagnostics-plain-output -O2 
-mcpu=unset -march=armv8.2-a+fp16 -mfpu=auto  '__ARM_FP ' -e __ARM_FEATURE_FP16_SCALAR_ARITHMETIC

#define __ARM_FP 14
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1


So, adding the __ARM_FP validation ensures that the empty set of flags 
is never accepted for this scenario.



For check_effective_target_arm_v8_2a_fp16_neon_ok_nocache, it's the same 
thing but here we also assume that neon is available without checking it.



Looking though other failing tests, I also notices that
check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache is 
essential a copy of 
check_effective_target_arm_v8_2a_fp16_neon_ok_nocache, but with a 
different architecture and define, so I'll add a fix for that too.



With all this said, I see that there is an error in this patch, so a v2 
will be sent as soon as my current test run completes and there is no 
regression.



Kind regards,
Torbjörn




R.



--

In both functions, it's assumed that 16-bit float support is available,
but it's not checked.
In addition, check_effective_target_arm_v8_2a_fp16_neon_ok also assumes
that neon is used, but it's not checked.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): Check
that 16-bit float is supported.
(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): Check
that neon is used and that 16-bit float is supported.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/lib/target-supports.exp | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/ 
lib/target-supports.exp

index 75703ddca60..19a9981d9cd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6360,6 +6360,12 @@ proc 
check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {

 "-mfpu=fp-armv8 -mfloat-abi=softfp"} {
  if { [check_no_compiler_messages_nocache \
    arm_v8_2a_fp16_scalar_ok object {
+    #if !defined (__ARM_FP)
+    #error "__ARM_FP not defined"
+    #endif
+    #if ((__ARM_FP & 1) == 0)
+    #error "__ARM_FP indicates that 16-bit is not supported"
+    #endif
  #if !defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
  #error "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined"
  #endif
@@ -6395,6 +6401,15 @@ proc 
check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {

 "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
  if { [check_no_compiler_messages_nocache \
    arm_v8_2a_fp16_neon_ok object {
+    #if !defined (__ARM_FP)
+    #error "__ARM_FP not defined"
+    #endif
+    #if ((__ARM_FP & 1) == 0)
+    #error "__ARM_FP indicates that 16-bit is not supported"
+    #endif
+    #if !defined (__ARM_NEON__)
+    #error "__ARM_NEON__ not defined"
+    #endif
  #if !defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
  #error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined"
  #endif






[PATCH V2 1/11] Add rs6000 architecture masks.

2024-11-08 Thread Michael Meissner
This patch begins the journey to move architecture bits that are not user ISA
options from rs6000_isa_flags to a new targt variable rs6000_arch_flags.  The
intention is to remove switches that are currently isa options, but the user
should not be using this particular option. For example, we want users to use
-mcpu=power10 and not just -mpower10.

This patch also changes the target_clones support to use an architecture mask
instead of isa bits.

This patch also switches the handling of .machine to use architecture masks if
they exist (power4 through power11).  All of the other PowerPCs will continue to
use the existing code for setting the .machine option.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/default64.h (TARGET_CPU_DEFAULT): Set default cpu name.
* config/rs6000/rs6000-arch.def: New file.
* config/rs6000/rs6000.cc (struct clone_map): Switch to using
architecture masks instead of ISA masks.
(rs6000_clone_map): Likewise.
(rs6000_print_isa_options): Add an architecture flags argument, change
all callers.
(get_arch_flag): New function.
(rs6000_debug_reg_global): Update rs6000_print_isa_options calls.
(rs6000_option_override_internal): Likewise.
(rs6000_machine_from_flags): Switch to using architecture masks instead
of ISA masks.
(struct rs6000_arch_mask): New structure.
(rs6000_arch_masks): New table of architecutre masks and names.
(rs6000_function_specific_save): Save architecture flags.
(rs6000_function_specific_restore): Restore architecture flags.
(rs6000_function_specific_print): Update rs6000_print_isa_options calls.
(rs6000_print_options_internal): Add architecture flags options.
(rs6000_clone_priority): Switch to using architecture masks instead of
ISA masks.
(rs6000_can_inline_p): Don't allow inling if the callee requires a newer
architecture than the caller.
* config/rs6000/rs6000.h: Use rs6000-arch.def to create the architecture
masks.
* config/rs6000/rs6000.opt (rs6000_arch_flags): New target variable.
(x_rs6000_arch_flags): New save/restore field for rs6000_arch_flags.
---
 gcc/config/rs6000/default64.h |  11 ++
 gcc/config/rs6000/rs6000-arch.def |  48 +++
 gcc/config/rs6000/rs6000.cc   | 215 +-
 gcc/config/rs6000/rs6000.h|  24 
 gcc/config/rs6000/rs6000.opt  |   8 ++
 5 files changed, 270 insertions(+), 36 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-arch.def

diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 10e3dec78ac..afa6542e040 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define RS6000_CPU(NAME, CPU, FLAGS)
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
+#undef TARGET_CPU_DEFAULT
 
 #if (TARGET_DEFAULT & MASK_LITTLE_ENDIAN)
 #undef TARGET_DEFAULT
@@ -28,10 +29,20 @@ along with GCC; see the file COPYING3.  If not see
| MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
+#define TARGET_CPU_DEFAULT "power8"
+
 #else
 #undef TARGET_DEFAULT
 #define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
| OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
+
+#if (TARGET_DEFAULT & MASK_POWERPC64)
+#define TARGET_CPU_DEFAULT "powerpc64"
+
+#else
+#define TARGET_CPU_DEFAULT "powerpc"
+#endif
+
 #endif
diff --git a/gcc/config/rs6000/rs6000-arch.def 
b/gcc/config/rs6000/rs6000-arch.def
new file mode 100644
index 000..e5b6e958133
--- /dev/null
+++ b/gcc/config/rs6000/rs6000-arch.def
@@ -0,0 +1,48 @@
+/* IBM RS/6000 CPU architecture features by processor type.
+   Copyright (C) 1991-2024 Free Software Foundation, Inc.
+   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS

[PATCH V2 2/11] Use architecture flags for defining _ARCH_PWR macros.

2024-11-08 Thread Michael Meissner
For the newer architectures, this patch changes GCC to define the _ARCH_PWR
macros using the new architecture flags instead of relying on isa options like
-mpower10.

The -mpower8-internal, -mpower10, and -mpower11 options were removed.  The
-mpower11 option was removed completely, since it was just added in GCC 15.  The
other two options were marked as WarnRemoved, and the various ISA bits were
removed.

TARGET_POWER8 and TARGET_POWER10 were re-defined to use the architeture bits
instead of the ISA bits.

There are other internal isa bits that aren't removed with this patch because
the built-in function support uses those bits.

I have built both big endian and little endian bootstrap compilers and there
were no regressions.

In addition, I constructed a test case that used every archiecture define (like
_ARCH_PWR4, etc.) and I also looked at the .machine directive generated.  I ran
this test for all supported combinations of -mcpu, big/little endian, and 32/64
bit support.  Every single instance generated exactly the same code with the
patches installed compared to the compiler before installing the patches.

Can I install this patch on the GCC 15 trunk?

2024-11-06  Michael Meissner  

gcc/

* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros) Add support to
use architecture flags instead of ISA flags for setting most of the
_ARCH_PWR* macros.
(rs6000_cpu_cpp_builtins): Update rs6000_target_modify_macros call.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Remove
OPTION_MASK_POWER8.
(ISA_3_1_MASKS_SERVER): Remove OPTION_MASK_POWER10.
(POWER11_MASKS_SERVER): Remove OPTION_MASK_POWER11.
(POWERPC_MASKS): Remove OPTION_MASK_POWER8, OPTION_MASK_POWER10, and
OPTION_MASK_POWER11.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros): Update
declaration.
(rs6000_target_modify_macros_ptr): Likewise.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Likewise.
(rs6000_option_override_internal): Use architecture flags instead of ISA
flags.
(rs6000_opt_masks): Remove -mpower10 and -mpower11, which are no longer
in the ISA flags.
(rs6000_pragma_target_parse): Use architecture flags as well as ISA
flags.
* config/rs6000/rs6000.h (TARGET_POWER4): New macro.
(TARGET_POWER5): Likewise.
(TARGET_POWER5X): Likewise.
(TARGET_POWER6): Likewise.
(TARGET_POWER7): Likewise.
(TARGET_POWER8): Likewise.
(TARGET_POWER9): Likewise.
(TARGET_POWER10): Likewise.
(TARGET_POWER11): Likewise.
* config/rs6000/rs6000.opt (-mpower8-internal): Remove ISA flag bits.
(-mpower10): Likewise.
(-mpower11): Likewise.
---
 gcc/config/rs6000/rs6000-c.cc | 27 +++
 gcc/config/rs6000/rs6000-cpus.def |  8 +---
 gcc/config/rs6000/rs6000-protos.h |  5 +++--
 gcc/config/rs6000/rs6000.cc   | 19 +++
 gcc/config/rs6000/rs6000.h| 20 
 gcc/config/rs6000/rs6000.opt  | 11 ++-
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 04882c396bf..c8f33289fa3 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -338,7 +338,8 @@ rs6000_define_or_undefine_macro (bool define_p, const char 
*name)
#pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
+HOST_WIDE_INT arch_flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
@@ -411,7 +412,7 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
summary of the flags associated with particular cpu
definitions.  */
 
-  /* rs6000_isa_flags based options.  */
+  /* rs6000_isa_flags and rs6000_arch_flags based options.  */
   rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC");
   if ((flags & OPTION_MASK_PPC_GPOPT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCSQ");
@@ -419,23 +420,25 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPCGR");
   if ((flags & OPTION_MASK_POWERPC64) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC64");
-  if ((flags & OPTION_MASK_MFCRF) != 0)
+  if ((flags & OPTION_MASK_POWERPC64) != 0)
+rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC64");
+  if ((arch_flags & ARCH_MASK_POWER4) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR4");
-  if ((flags & OPTION_MASK_POPCNTB) != 0)
+  if ((arch_flags & ARCH_MASK_POWER5) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR5");
-  if ((flags & OPTION_MASK_FPRND) != 0)
+  if ((arch_flags & ARCH_

[PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2024-11-08 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

--

A long time ago, this test forced -march=armv6.

With -marm, the generated assembler is:
foo:
sub r0, r0, #48
cmp r0, #9
movhi   r0, #0
movls   r0, #1
bx  lr

With -mthumb, the generated assembler is:
foo:
subsr0, r0, #48
movsr2, #9
uxtbr3, r0
movsr0, #0
cmp r2, r3
adcsr0, r0, r0
uxtbr0, r0
bx  lr

Require effective-target arm_arm_ok to skip the test for thumb-only
targets (Cortex-M).

gcc/testsuite/ChangeLog:

* gcc.target/arm/unsigned-extend-1.c: Use effective-target
arm_arm_ok.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c 
b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
index 3b4ab048fb0..73f2e1a556d 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
 /* { dg-options "-O2" } */
 
 unsigned char foo (unsigned char c)
-- 
2.25.1



Re: [PATCH 0/11] Separate PowerPC architecture bits from ISA flags that use command line options

2024-11-08 Thread Michael Meissner
I have posted a new version of the patches at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668177.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH V2 1/11] Add rs6000 architecture masks.

2024-11-08 Thread Peter Bergner
On 11/8/24 1:44 PM, Michael Meissner wrote:
> diff --git a/gcc/config/rs6000/rs6000-arch.def 
> b/gcc/config/rs6000/rs6000-arch.def
> new file mode 100644
> index 000..e5b6e958133
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-arch.def
> @@ -0,0 +1,48 @@
> +/* IBM RS/6000 CPU architecture features by processor type.
> +   Copyright (C) 1991-2024 Free Software Foundation, Inc.
> +   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)

This doesn't make any sense to me.  This is a new file with new
code not copied form anywhere else, so why the Contributed by
Richard Kenner line?  Cut/paste error?

Peter




[PATCH] VN: Don't recurse on for the same value of `a | b` [PR117496]

2024-11-08 Thread Andrew Pinski
After adding vn_valueize to the handle the `a | b ==/!= 0` case
of insert_predicates_for_cond, it would go into an infinite loop
as the Value number for either a or b could be the same as what it
is for the whole expression. This avoids that recursion so there is
no infinite loop here.

Bootstrapped and tested on x86_64-linux.

PR tree-optimization/117496

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
valueization for the new lhs is the same as the old one,
don't recurse.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117496-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr117496-1.c | 25 +++
 gcc/tree-ssa-sccvn.cc | 11 --
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117496-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr117496-1.c 
b/gcc/testsuite/gcc.dg/torture/pr117496-1.c
new file mode 100644
index 000..f35d13dfa85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117496-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+
+
+/* PR tree-optimization/117496 */
+/* This would go into an infinite loop into VN while recording
+   the predicates for the `tracks == 0 && wm == 0` GIMPLE_COND.
+   As wm_N and tracks_N would valueize back to `tracks | wm`.  */
+
+int main_argc, gargs_preemp, gargs_nopreemp;
+static void gargs();
+void main_argv() {
+  int tracks = 0;
+  gargs(main_argc, main_argv, &tracks);
+}
+void gargs(int, char *, int *tracksp) {
+  int tracks = *tracksp, wm;
+  for (;;) {
+if (tracks == 0)
+  wm |= 4;
+if (gargs_nopreemp)
+  gargs_preemp = 0;
+if (tracks == 0 && wm == 0)
+  tracks++;
+  }
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 16299662b95..e93acb44200 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -7900,6 +7900,7 @@ insert_related_predicates_on_edge (enum tree_code code, 
tree *ops, edge pred_e)
 
 /* Insert on the TRUE_E true and FALSE_E false predicates
derived from LHS CODE RHS.  */
+
 static void
 insert_predicates_for_cond (tree_code code, tree lhs, tree rhs,
edge true_e, edge false_e)
@@ -7977,10 +7978,16 @@ insert_predicates_for_cond (tree_code code, tree lhs, 
tree rhs,
  tree nlhs;
 
  nlhs = vn_valueize (gimple_assign_rhs1 (def_stmt));
- insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
+ /* A valueization of the `a` might return the old lhs
+which is already handled above. */
+ if (nlhs != lhs)
+   insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
 
+ /* A valueization of the `b` might return the old lhs
+which is already handled above. */
  nlhs = vn_valueize (gimple_assign_rhs2 (def_stmt));
- insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
+ if (nlhs != lhs)
+   insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
}
 }
 }
-- 
2.43.0



RE: [EXTERNAL] [PATCH] Enable autofdo bootstrap for lto/fortran

2024-11-08 Thread Eugene Rozenfeld
This line in gcc/fortran/Make-lang.in looks wrong (copy/paste?):

+f95.fda: create_fdas_for_lto1

There are no invocations of $(CREATE_GCOV in gcc/fortran/Make-lang.in so this 
is incomplete.

-Original Message-
From: Andi Kleen  
Sent: Thursday, October 31, 2024 4:19 PM
To: gcc-patches@gcc.gnu.org
Cc: Eugene Rozenfeld ; Andi Kleen 

Subject: [EXTERNAL] [PATCH] Enable autofdo bootstrap for lto/fortran

From: Andi Kleen 

When autofdo bootstrap support was originally implemented there were issues 
with the LTO bootstrap, that is why it wasn't enabled for them. I retested this 
now and it works on x86_64-linux.

Fortran was also missing, not sure why. Also enabled now.

gcc/fortran/ChangeLog:

* Make-lang.in: Enable autofdo.

gcc/lto/ChangeLog:

* Make-lang.in: Enable autofdo.
---
 gcc/fortran/Make-lang.in | 13 +++--
 gcc/lto/Make-lang.in | 14 +-
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in index 
0be3c6b654b1..7295118185fd 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -69,6 +69,15 @@ F95_OBJS = $(F95_PARSER_OBJS) $(FORTRAN_TARGET_OBJS) \
 
 fortran_OBJS = $(F95_OBJS) fortran/gfortranspec.o
 
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(fortran_OBJS): CFLAGS += -fauto-profile=f95.fda
+$(fortran_OBJS): f95.fda
+endif
+
+f95.fda: create_fdas_for_lto1
+   $(PROFILE_MERGER) $(shell ls -ha f95_*.fda) --output_file f95.fda 
+-gcov_version 2
+
 #

 # Define the names for selecting gfortran in LANGUAGES.
 fortran: f951$(exeext)
@@ -272,7 +281,7 @@ fortran.install-info: $(DESTDIR)$(infodir)/gfortran.info
 fortran.install-man: $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext)
 
 $(DESTDIR)$(man1dir)/$(GFORTRAN_INSTALL_NAME)$(man1ext): doc/gfortran.1 \
-   installdirs
+   installdirs f95*.fda
-rm -f $@
-$(INSTALL_DATA) $< $@
-chmod a-x $@
@@ -293,7 +302,7 @@ fortran.uninstall:
 # We just have to delete files specific to us.
 
 fortran.mostlyclean:
-   -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext)
+   -rm -f gfortran$(exeext) gfortran-cross$(exeext) f951$(exeext) 
+f95*.fda
-rm -f fortran/*.o
 
 fortran.clean:
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in index 
b62ddcbe0dc9..4f9f21cdfc9e 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -29,15 +29,11 @@ lto_OBJS = $(LTO_OBJS)  LTO_DUMP_OBJS = lto/lto-lang.o 
lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o 
lto/lto-common.o  lto_dump_OBJS = $(LTO_DUMP_OBJS)
 
-# this is only useful in a LTO bootstrap, but this does not work right -# now. 
Should reenable after this is fixed, but only when LTO bootstrap -# is enabled.
-
-#ifeq ($(if $(wildcard ../stage_current),$(shell cat \ -#  
../stage_current)),stageautofeedback)
-#$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
-#$(LTO_OBJS): lto1.fda
-#endif
+ifeq ($(if $(wildcard ../stage_current),$(shell cat \
+  ../stage_current)),stageautofeedback)
+$(LTO_OBJS): CFLAGS += -fauto-profile=lto1.fda
+$(LTO_OBJS): lto1.fda
+endif
 
 # Rules
 
--
2.46.2



RE: [EXTERNAL] Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-11-08 Thread Eugene Rozenfeld
The patch looks good to me.

-Original Message-
From: Richard Biener  
Sent: Wednesday, November 6, 2024 12:01 AM
To: Andi Kleen 
Cc: Jason Merrill ; Andi Kleen ; 
gcc-patches@gcc.gnu.org; Eugene Rozenfeld ; 
pins...@gmail.com; Andi Kleen 
Subject: [EXTERNAL] Re: [PATCH] PR117350: Keep assembler name for abstract 
decls for autofdo

On Tue, Nov 5, 2024 at 6:08 PM Andi Kleen  wrote:
>
> On Tue, Nov 05, 2024 at 09:47:17AM +0100, Richard Biener wrote:
> > On Tue, Nov 5, 2024 at 2:02 AM Jason Merrill  wrote:
> > >
> > > On 10/31/24 4:40 PM, Andi Kleen wrote:
> > > > From: Andi Kleen 
> > > >
> > > > autofdo looks up inline stacks and tries to match them with the 
> > > > profile data using their symbol name. Make sure all decls that 
> > > > can be in a inline stack have a valid assembler name.
> > > >
> > > > This fixes a bootstrap problem with autoprofiledbootstrap and LTO.
> > >
> > > OK in a week if no other comments.
> >
> > Hmm, but DECL_ABSTRACT_P should be only set on entities that generate no 
> > code.
> >
> > How does autofdo look them up?  Are you sure it's the abstract decl 
> > autofdo wants to lookup?  Or is autofdo processing not serializing 
> > the compilation and thus it affects code generation on parts that 
> > have not been processed yet?
>
>
> autofdo tries to match inlines to an inline stack derived from dwarf.
> So if something appears in dwarf it has to be in its stack. For the 
> test case the abstract entity is in the dwarf stack.
>
> For the inside gcc lookup it walks the BLOCK_SUPERCONTEXT links and 
> looks at BLOCK_ABSTRACT_ORIGIN and ignores everything that has unknown 
> location
>
> Maybe there could be some filtering there, but it would need to be on 
> both sides, which would be a version break for the file format.

Ah, OK - I wasn't aware it uses/requires debug info.  It's still a bit odd that 
only the abstract decl has the correct symbol name.  If it's abstract shouldn't 
it have the same symbol name as an actual entry?  Maybe the symbol name noted 
in the debug info is wrong?  I think the only DECL_ABSTRACT_P decls we have are 
from C++ CTOR/DTOR stuff.

In any case, I withdraw my objection.

Thanks for explaining,
Richard.

> -Andi
>


[PATCH] fold: Remove (rrotate (rrotate A CST) CST) folding [PR117492]

2024-11-08 Thread Andrew Pinski
This removes an (broken) simplification from fold which is already handled in 
match.
The reason why it was broken is because of the use of wi::to_wide on the RHS of 
the
rotate which could be 2 different types even though the LHS was the same type.
Since it is already handled in match (by the patterns for
`Turn (a OP c1) OP c2 into a OP (c1+c2).`). It can be removed without losing 
any optimizations.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/117492

gcc/ChangeLog:

* fold-const.cc (fold_binary_loc): Remove `Two consecutive rotates 
adding up
to the some integer` simplifcation.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117492-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/fold-const.cc | 10 --
 gcc/testsuite/gcc.dg/torture/pr117492-1.c | 16 
 2 files changed, 16 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117492-1.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 0e374294c01..1e8ae1ab493 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12530,16 +12530,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
   arg01, arg1));
}
 
-  /* Two consecutive rotates adding up to the some integer
-multiple of the precision of the type can be ignored.  */
-  if (code == RROTATE_EXPR && TREE_CODE (arg1) == INTEGER_CST
- && TREE_CODE (arg0) == RROTATE_EXPR
- && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
- && wi::umod_trunc (wi::to_wide (arg1)
-+ wi::to_wide (TREE_OPERAND (arg0, 1)),
-prec) == 0)
-   return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
-
   return NULL_TREE;
 
 case MIN_EXPR:
diff --git a/gcc/testsuite/gcc.dg/torture/pr117492-1.c 
b/gcc/testsuite/gcc.dg/torture/pr117492-1.c
new file mode 100644
index 000..543d8b7e709
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117492-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* PR middle-end/117492 */
+
+/* This code would ICE in fold due to code which was using wi::to_wide with 
different types
+   and adding them. */
+
+typedef unsigned u;
+
+u
+foo(u x)
+{
+  return
+__builtin_stdc_rotate_left((unsigned)
+  __builtin_stdc_rotate_right(x, 0x10001ull),
+  1);
+}
-- 
2.43.0



RE: [EXTERNAL] [PATCH] Update gcc-auto-profile / gen_autofdo_event.py

2024-11-08 Thread Eugene Rozenfeld
The patch looks good to me. Thank you for fixing this, Andi.

-Original Message-
From: Andi Kleen  
Sent: Thursday, October 31, 2024 4:37 PM
To: gcc-patches@gcc.gnu.org
Cc: Eugene Rozenfeld ; Andi Kleen 

Subject: [EXTERNAL] [PATCH] Update gcc-auto-profile / gen_autofdo_event.py

From: Andi Kleen 

- Fix warnings with newer python versions about bad escapes by making all the 
python string raw.
- Add a fallback for using the builtin perf event list if the CPU model number 
is unknown.
- Regenerate the shipped gcc-auto-profile with the changes.

contrib/ChangeLog:

* gen_autofdo_event.py: Convert strings to raw.
Add fallback to using builtin perf event list.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: Regenerate.
---
 contrib/gen_autofdo_event.py | 36 ++--
 gcc/config/i386/gcc-auto-profile | 21 ---
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py index 
4c201943b5c7..4e58a5320fff 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -112,7 +112,7 @@ for j in u:
 u.close()
 
 if args.script:
-print('''#!/bin/sh
+print(r'''#!/bin/sh
 # Profile workload for gcc profile feedback (autofdo) using Linux perf.
 # Auto generated. To regenerate for new CPUs run  # 
contrib/gen_autofdo_event.py --script --all in gcc source @@ -152,22 +152,26 @@ 
case `grep -E -q "^cpu family\s*: 6" /proc/cpuinfo &&
 for event, mod in eventmap.items():
 for m in mod[:-1]:
 print("model*:\ %s|\\" % m)
-print('model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
-print('''*)
+print(r'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event))
+print(r'''*)
+if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
+E=br_inst_retired.near_taken:p
+else
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
-   exit 1 ;;''')
-print("esac")
-print("set -x")
-print('if ! perf record -e $E -b "$@" ; then')
-print('  # PEBS may not actually be working even if the processor supports 
it')
-print('  # (e.g., in a virtual machine). Trying to run without /p.')
-print('  set +x')
-print('  echo >&2 "Retrying without /p."')
-print('  E="$(echo "${E}" | sed -e \'s/\/p/\//\')"')
-print('  set -x')
-print('  exec perf record -e $E -b "$@"')
-print(' set +x')
-print('fi')
+ exit 1
+fi ;;''')
+print(r"esac")
+print(r"set -x")
+print(r'if ! perf record -e $E -b "$@" ; then')
+print(r'  # PEBS may not actually be working even if the processor 
supports it')
+print(r'  # (e.g., in a virtual machine). Trying to run without /p.')
+print(r'  set +x')
+print(r'  echo >&2 "Retrying without /p."')
+print(r'  E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"')
+print(r'  set -x')
+print(r'  exec perf record -e $E -b "$@"')
+print(r' set +x')
+print(r'fi')
 
 if cpufound == 0 and not args.all:
 sys.exit('CPU %s not found' % cpu)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 04f7d35dcc51..528b34e42400 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -82,17 +82,24 @@ model*:\ 126|\
 model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
-model*:\ 143|\
-model*:\ 207|\
 model*:\ 106|\
-model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
+model*:\ 108|\
+model*:\ 173|\
+model*:\ 174) E="cpu/event=0xc4,umask=0x20/$FLAGS" ;;
 model*:\ 134|\
 model*:\ 150|\
-model*:\ 156|\
-model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 156) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;; model*:\ 143|\ 
+model*:\ 207) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;; model*:\ 190) 
+E="cpu/event=0xc4,umask=0xc0/$FLAGS" ;; model*:\ 190) 
+E="cpu/event=0xc4,umask=0xfe/$FLAGS" ;;
 *)
+if perf list br_inst_retired | grep -q br_inst_retired.near_taken ; 
then
+E=br_inst_retired.near_taken:p
+else
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
-   exit 1 ;;
+ exit 1
+fi ;;
 esac
 set -x
 if ! perf record -e $E -b "$@" ; then
@@ -100,7 +107,7 @@ if ! perf record -e $E -b "$@" ; then
   # (e.g., in a virtual machine). Trying to run without /p.
   set +x
   echo >&2 "Retrying without /p."
-  E="$(echo "${E}" | sed -e 's/\/p/\//')"
+  E="$(echo "${E}" | sed -e \'s/\/p/\//\ -e s/:p//)"
   set -x
   exec perf record -e $E -b "$@"
  set +x
--
2.46.2



Re: [PATCH] VN: Don't recurse on for the same value of `a | b` [PR117496]

2024-11-08 Thread Richard Biener



> Am 09.11.2024 um 02:36 schrieb Andrew Pinski :
> 
> After adding vn_valueize to the handle the `a | b ==/!= 0` case
> of insert_predicates_for_cond, it would go into an infinite loop
> as the Value number for either a or b could be the same as what it
> is for the whole expression. This avoids that recursion so there is
> no infinite loop here.
> 
> Bootstrapped and tested on x86_64-linux.

Ok

Richard 

>PR tree-optimization/117496
> 
> gcc/ChangeLog:
> 
>* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
>valueization for the new lhs is the same as the old one,
>don't recurse.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/torture/pr117496-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/testsuite/gcc.dg/torture/pr117496-1.c | 25 +++
> gcc/tree-ssa-sccvn.cc | 11 --
> 2 files changed, 34 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr117496-1.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117496-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr117496-1.c
> new file mode 100644
> index 000..f35d13dfa85
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117496-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +
> +
> +/* PR tree-optimization/117496 */
> +/* This would go into an infinite loop into VN while recording
> +   the predicates for the `tracks == 0 && wm == 0` GIMPLE_COND.
> +   As wm_N and tracks_N would valueize back to `tracks | wm`.  */
> +
> +int main_argc, gargs_preemp, gargs_nopreemp;
> +static void gargs();
> +void main_argv() {
> +  int tracks = 0;
> +  gargs(main_argc, main_argv, &tracks);
> +}
> +void gargs(int, char *, int *tracksp) {
> +  int tracks = *tracksp, wm;
> +  for (;;) {
> +if (tracks == 0)
> +  wm |= 4;
> +if (gargs_nopreemp)
> +  gargs_preemp = 0;
> +if (tracks == 0 && wm == 0)
> +  tracks++;
> +  }
> +}
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 16299662b95..e93acb44200 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -7900,6 +7900,7 @@ insert_related_predicates_on_edge (enum tree_code code, 
> tree *ops, edge pred_e)
> 
> /* Insert on the TRUE_E true and FALSE_E false predicates
>derived from LHS CODE RHS.  */
> +
> static void
> insert_predicates_for_cond (tree_code code, tree lhs, tree rhs,
>edge true_e, edge false_e)
> @@ -7977,10 +7978,16 @@ insert_predicates_for_cond (tree_code code, tree lhs, 
> tree rhs,
>  tree nlhs;
> 
>  nlhs = vn_valueize (gimple_assign_rhs1 (def_stmt));
> -  insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
> +  /* A valueization of the `a` might return the old lhs
> + which is already handled above. */
> +  if (nlhs != lhs)
> +insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
> 
> +  /* A valueization of the `b` might return the old lhs
> + which is already handled above. */
>  nlhs = vn_valueize (gimple_assign_rhs2 (def_stmt));
> -  insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
> +  if (nlhs != lhs)
> +insert_predicates_for_cond (EQ_EXPR, nlhs, rhs, e, nullptr);
>}
> }
> }
> --
> 2.43.0
> 


Re: [PATCH] fold: Remove (rrotate (rrotate A CST) CST) folding [PR117492]

2024-11-08 Thread Richard Biener



> Am 09.11.2024 um 05:00 schrieb Andrew Pinski :
> 
> This removes an (broken) simplification from fold which is already handled 
> in match.
> The reason why it was broken is because of the use of wi::to_wide on the RHS 
> of the
> rotate which could be 2 different types even though the LHS was the same type.
> Since it is already handled in match (by the patterns for
> `Turn (a OP c1) OP c2 into a OP (c1+c2).`). It can be removed without losing 
> any optimizations.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

>PR middle-end/117492
> 
> gcc/ChangeLog:
> 
>* fold-const.cc (fold_binary_loc): Remove `Two consecutive rotates adding 
> up
>to the some integer` simplifcation.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/torture/pr117492-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/fold-const.cc | 10 --
> gcc/testsuite/gcc.dg/torture/pr117492-1.c | 16 
> 2 files changed, 16 insertions(+), 10 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr117492-1.c
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 0e374294c01..1e8ae1ab493 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -12530,16 +12530,6 @@ fold_binary_loc (location_t loc, enum tree_code 
> code, tree type,
>   arg01, arg1));
>}
> 
> -  /* Two consecutive rotates adding up to the some integer
> - multiple of the precision of the type can be ignored.  */
> -  if (code == RROTATE_EXPR && TREE_CODE (arg1) == INTEGER_CST
> -  && TREE_CODE (arg0) == RROTATE_EXPR
> -  && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
> -  && wi::umod_trunc (wi::to_wide (arg1)
> - + wi::to_wide (TREE_OPERAND (arg0, 1)),
> - prec) == 0)
> -return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
> -
>   return NULL_TREE;
> 
> case MIN_EXPR:
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117492-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr117492-1.c
> new file mode 100644
> index 000..543d8b7e709
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117492-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* PR middle-end/117492 */
> +
> +/* This code would ICE in fold due to code which was using wi::to_wide with 
> different types
> +   and adding them. */
> +
> +typedef unsigned u;
> +
> +u
> +foo(u x)
> +{
> +  return
> +__builtin_stdc_rotate_left((unsigned)
> +  __builtin_stdc_rotate_right(x, 0x10001ull),
> +  1);
> +}
> --
> 2.43.0
> 


Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-08 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard,
>
>> That's because, once an instruction matches, the instruction should
>> continue to match.  It should always be possible to set the INSN_CODE of
>> an existing instruction to -1, rerun recog, and get the same instruction
>> code back.
>>
>> Because of that, insn conditions shouldn't depend on can_create_pseudo_p.
>
> We should never get into that state since it would be incorrect. If say we
> created a movdf after regalloc that needs a split or a literal load, it cannot
> match any alternative. So recog would fail.

But, as this series shows, it's possible to match new instructions after
split1 and before can_create_pseudo_p returns false.  And in general, we
shouldn't rely on split1 for correctness.

>> Yeah, I realise it's done by the split pass at the moment.  My question was:
>> why do we need to wait till then?  Why can't we do it in expand instead?
>
> We could split at a different time. But why would that make a difference?
> As long as we allow all FP immediates at all times in movsf/df, we end up
> with the same issue.

The idea was that, if we did the split during expand, the movsf/df
define_insns would then only accept the immediates that their
constraints can handle.

>> Are there cases where we expect to discover new FP constants during RTL
>> optimisation that weren't present in gimple?  And if so, which cases are
>> they?  Where do the constants come from?
>
> These constants are created by undoing the previous split (using REG_EQUIV
> to just create a new movsf/movdf instruction). When the split happened is not
> relevant. Even using UNSPEC would not work as long as there is a REG_EQUIV
> somewhere.

Yeah, I realise that, which is why I think the expand approach would
be more robust.  It would in some ways be similar to what we do for
symbolic constants.

An alternative would be to add a constraint for the kind of constants that
the split handles, and then use it in a new alternative of the move patterns.
The split could then happen at any time, before or after reload.

Thanks,
Richard


[PATCH 07/12] libstdc++: Use RAII in _Hashtable

2024-11-08 Thread Jonathan Wakely
Use scoped guard types to clean up if an exception is thrown. This
allows some try-catch blocks to be removed.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (operator=(const _Hashtable&)): Use
RAII instead of try-catch.
(_M_assign(_Ht&&, _NodeGenerator&)): Likewise.

Reviewed-by: François Dumont 
---

I think this one could be pushed indepedently too.

 libstdc++-v3/include/bits/hashtable.h | 101 ++
 1 file changed, 56 insertions(+), 45 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index f1c30896bcb..a46a94e2ecd 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1307,17 +1307,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_bucket_count = __ht._M_bucket_count;
  _M_element_count = __ht._M_element_count;
  _M_rehash_policy = __ht._M_rehash_policy;
- __try
-   {
- _M_assign(__ht);
-   }
- __catch(...)
-   {
- // _M_assign took care of deallocating all memory. Now we
- // must make sure this instance remains in a usable state.
- _M_reset();
- __throw_exception_again;
-   }
+
+ struct _Guard
+ {
+   ~_Guard() { if (_M_ht) _M_ht->_M_reset(); }
+   _Hashtable* _M_ht;
+ };
+ // If _M_assign exits via an exception it will have deallocated
+ // all memory. This guard will ensure *this is in a usable state.
+ _Guard __guard{this};
+ _M_assign(__ht);
+ __guard._M_ht = nullptr;
  return *this;
}
  std::__alloc_on_copy(__this_alloc, __that_alloc);
@@ -1390,46 +1390,57 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
   _M_assign(_Ht&& __ht, _NodeGenerator& __node_gen)
   {
-   __buckets_ptr __buckets = nullptr;
-   if (!_M_buckets)
- _M_buckets = __buckets = _M_allocate_buckets(_M_bucket_count);
-
-   __try
+   struct _Guard
+   {
+ ~_Guard()
  {
-   if (!__ht._M_before_begin._M_nxt)
- return;
-
-   using _FromVal = __conditional_t::value,
-const value_type&, value_type&&>;
-
-   // First deal with the special first node pointed to by
-   // _M_before_begin.
-   __node_ptr __ht_n = __ht._M_begin();
-   __node_ptr __this_n
- = __node_gen(static_cast<_FromVal>(__ht_n->_M_v()));
-   this->_M_copy_code(*__this_n, *__ht_n);
-   _M_update_bbegin(__this_n);
-
-   // Then deal with other nodes.
-   __node_ptr __prev_n = __this_n;
-   for (__ht_n = __ht_n->_M_next(); __ht_n; __ht_n = __ht_n->_M_next())
+   if (_M_ht)
  {
-   __this_n = __node_gen(static_cast<_FromVal>(__ht_n->_M_v()));
-   __prev_n->_M_nxt = __this_n;
-   this->_M_copy_code(*__this_n, *__ht_n);
-   size_type __bkt = _M_bucket_index(*__this_n);
-   if (!_M_buckets[__bkt])
- _M_buckets[__bkt] = __prev_n;
-   __prev_n = __this_n;
+   _M_ht->clear();
+   if (_M_dealloc_buckets)
+ _M_ht->_M_deallocate_buckets();
  }
  }
-   __catch(...)
+ _Hashtable* _M_ht = nullptr;
+ bool _M_dealloc_buckets = false;
+   };
+   _Guard __guard;
+
+   if (!_M_buckets)
  {
-   clear();
-   if (__buckets)
- _M_deallocate_buckets();
-   __throw_exception_again;
+   _M_buckets = _M_allocate_buckets(_M_bucket_count);
+   __guard._M_dealloc_buckets = true;
  }
+
+   if (!__ht._M_before_begin._M_nxt)
+ return;
+
+   __guard._M_ht = this;
+
+   using _FromVal = __conditional_t::value,
+const value_type&, value_type&&>;
+
+   // First deal with the special first node pointed to by
+   // _M_before_begin.
+   __node_ptr __ht_n = __ht._M_begin();
+   __node_ptr __this_n
+ = __node_gen(static_cast<_FromVal>(__ht_n->_M_v()));
+   this->_M_copy_code(*__this_n, *__ht_n);
+   _M_update_bbegin(__this_n);
+
+   // Then deal with other nodes.
+   __node_ptr __prev_n = __this_n;
+   for (__ht_n = __ht_n->_M_next(); __ht_n; __ht_n = __ht_n->_M_next())
+ {
+   __this_n = __node_gen(static_cast<_FromVal>(__ht_n->_M_v()));
+   __prev_n->_M_nxt = __this_n;
+   this->_M_copy_code(*__this_n, *__ht_n);
+   size_type __bkt = _M_bucket_index(*__this_n);
+   if (!_M_buckets[__bkt])
+ _M_buckets[__bkt] = __prev_n;
+   __prev_n = __thi

[PATCH 02/12] libstdc++: Allow unordered_set assignment to assign to existing nodes

2024-11-08 Thread Jonathan Wakely
Currently the _ReuseOrAllocNode::operator(Args&&...) function always
destroys the value stored in recycled nodes and constructs a new value.

The _ReuseOrAllocNode type is only ever used for implementing
assignment, either from another unordered container of the same type, or
from std::initializer_list. Consequently, the parameter pack
Args only ever consists of a single parameter or type const value_type&
or value_type.  We can replace the variadic parameter pack with a single
forwarding reference parameter, and when the value_type is assignable
from that type we can use assignment instead of destroying the existing
value and then constructing a new one.

Using assignment is typically only possible for sets, because for maps
the value_type is std::pair and in most
cases std::is_assignable_v is false.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_ReuseOrAllocNode::operator()):
Replace parameter pack with a single parameter. Assign to
existing value when possible.
* testsuite/23_containers/unordered_multiset/allocator/move_assign.cc:
Adjust expected count of operations.
* testsuite/23_containers/unordered_set/allocator/move_assign.cc:
Likewise.

Reviewed-by: François Dumont 
---
 libstdc++-v3/include/bits/hashtable_policy.h  | 37 +--
 .../allocator/move_assign.cc  |  5 ++-
 .../unordered_set/allocator/move_assign.cc| 10 +++--
 3 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index b5f837e6061..7a3c66c37fd 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -172,24 +172,39 @@ namespace __detail
   ~_ReuseOrAllocNode()
   { _M_h._M_deallocate_nodes(_M_nodes); }
 
-  template
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
+  template
__node_ptr
-   operator()(_Args&&... __args)
+   operator()(_Arg&& __arg)
{
  if (!_M_nodes)
-   return _M_h._M_allocate_node(std::forward<_Args>(__args)...);
+   return _M_h._M_allocate_node(std::forward<_Arg>(__arg));
+
+ using value_type = typename _NodeAlloc::value_type::value_type;
 
  __node_ptr __node = _M_nodes;
- _M_nodes = _M_nodes->_M_next();
- __node->_M_nxt = nullptr;
- auto& __a = _M_h._M_node_allocator();
- __node_alloc_traits::destroy(__a, __node->_M_valptr());
- _NodePtrGuard<__hashtable_alloc, __node_ptr> __guard { _M_h, __node };
- __node_alloc_traits::construct(__a, __node->_M_valptr(),
-std::forward<_Args>(__args)...);
- __guard._M_ptr = nullptr;
+ if constexpr (is_assignable::value)
+   {
+ __node->_M_v() = std::forward<_Arg>(__arg);
+ _M_nodes = _M_nodes->_M_next();
+ __node->_M_nxt = nullptr;
+   }
+ else
+   {
+ _M_nodes = _M_nodes->_M_next();
+ __node->_M_nxt = nullptr;
+ auto& __a = _M_h._M_node_allocator();
+ __node_alloc_traits::destroy(__a, __node->_M_valptr());
+ _NodePtrGuard<__hashtable_alloc, __node_ptr>
+   __guard{ _M_h, __node };
+ __node_alloc_traits::construct(__a, __node->_M_valptr(),
+std::forward<_Arg>(__arg));
+ __guard._M_ptr = nullptr;
+   }
  return __node;
}
+#pragma GCC diagnostic pop
 
 private:
   __node_ptr _M_nodes;
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/move_assign.cc
 
b/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/move_assign.cc
index 50608ec443f..6d00354902e 100644
--- 
a/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/move_assign.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/move_assign.cc
@@ -46,8 +46,9 @@ void test01()
   VERIFY( 1 == v1.get_allocator().get_personality() );
   VERIFY( 2 == v2.get_allocator().get_personality() );
 
-  VERIFY( counter_type::move_count == 1  );
-  VERIFY( counter_type::destructor_count == 2 );
+  VERIFY( counter_type::move_count == 0  );
+  // 1 element in v1 destroyed.
+  VERIFY( counter_type::destructor_count == 1 );
 }
 
 void test02()
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/move_assign.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/move_assign.cc
index 677ea67d0ea..6be70022705 100644
--- 
a/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/move_assign.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/move_assign.cc
@@ -52,8 +52,9 @@ void test01()
 VERIFY( 1 == v1.get_allocator().get_personality() );
 VERIFY( 2 == v2.get_allocator().get_personality() 

[PATCH 06/12] libstdc++: Replace _Hashtable::__fwd_value_for with cast

2024-11-08 Thread Jonathan Wakely
We can just use a cast to the appropriate type instead of calling a
function to do it. This gives the compiler less work to compile and
optimize, and at -O0 avoids a function call per element.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::__fwd_value_for):
Remove.
(_Hashtable::_M_assign): Use static_cast instead of
__fwd_value_for.

Reviewed-by: François Dumont 
---

This one can be isolated from the rest of the series, and so I suppose
it could be pushed independently (and even backported, but I don't
think that's necessary).

 libstdc++-v3/include/bits/hashtable.h | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index bf6eed7c1c6..f1c30896bcb 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -325,13 +325,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__node_ptr _M_node;
   };
 
-  template
-   static constexpr
-   __conditional_t::value,
-   const value_type&, value_type&&>
-   __fwd_value_for(value_type& __val) noexcept
-   { return std::move(__val); }
-
   // Compile-time diagnostics.
 
   // _Hash_code_base has everything protected, so use this derived type to
@@ -1406,11 +1399,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (!__ht._M_before_begin._M_nxt)
  return;
 
+   using _FromVal = __conditional_t::value,
+const value_type&, value_type&&>;
+
// First deal with the special first node pointed to by
// _M_before_begin.
__node_ptr __ht_n = __ht._M_begin();
__node_ptr __this_n
- = __node_gen(__fwd_value_for<_Ht>(__ht_n->_M_v()));
+ = __node_gen(static_cast<_FromVal>(__ht_n->_M_v()));
this->_M_copy_code(*__this_n, *__ht_n);
_M_update_bbegin(__this_n);
 
@@ -1418,7 +1414,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__node_ptr __prev_n = __this_n;
for (__ht_n = __ht_n->_M_next(); __ht_n; __ht_n = __ht_n->_M_next())
  {
-   __this_n = __node_gen(__fwd_value_for<_Ht>(__ht_n->_M_v()));
+   __this_n = __node_gen(static_cast<_FromVal>(__ht_n->_M_v()));
__prev_n->_M_nxt = __this_n;
this->_M_copy_code(*__this_n, *__ht_n);
size_type __bkt = _M_bucket_index(*__this_n);
-- 
2.47.0



[committed] libstdc++: Make some _Hashtable members inline

2024-11-08 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Add 'inline' to some
one-line constructors.

Reviewed-by: François Dumont 
---
Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/hashtable.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 6bcba2de368..b36142b358a 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1264,6 +1264,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typename _Hash, typename _RangeHash, typename _Unused,
   typename _RehashPolicy, typename _Traits>
 template
+  inline
   _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
   _Hashtable(_InputIterator __f, _InputIterator __l,
@@ -1527,6 +1528,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typename _ExtractKey, typename _Equal,
   typename _Hash, typename _RangeHash, typename _Unused,
   typename _RehashPolicy, typename _Traits>
+inline
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 _Hashtable(const _Hashtable& __ht)
@@ -1582,6 +1584,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typename _ExtractKey, typename _Equal,
   typename _Hash, typename _RangeHash, typename _Unused,
   typename _RehashPolicy, typename _Traits>
+inline
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
 _Hashtable(const _Hashtable& __ht, const allocator_type& __a)
-- 
2.47.0



[PATCH 05/12] libstdc++: Add _Hashtable::_M_assign for the common case

2024-11-08 Thread Jonathan Wakely
This adds a convenient _M_assign overload for the common case where the
node generator is the _AllocNode type. Only two places need to call
_M_assign with a _ReuseOrAllocNode node generator, so all the other
calls to _M_assign can use the new overload instead of manually
constructing a node generator.

The _AllocNode::operator(Args&&...) function doesn't need to be a
variadic template. It is only ever called with a single argument of type
const value_type& or value_type&&, so could be simplified. That isn't
done in this commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Remove typedefs for
node generators.
(_Hashtable::_M_assign(_Ht&&)): Add new overload.
(_Hashtable::operator=(initializer_list)): Add local
typedef for node generator.
(_Hashtable::_M_assign_elements): Likewise.
(_Hashtable::operator=(const _Hashtable&)): Use new _M_assign
overload.
(_Hashtable(const _Hashtable&)): Likewise.
(_Hashtable(const _Hashtable&, const allocator_type&)):
Likewise.
(_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)):
Likewise.
* include/bits/hashtable_policy.h (_Insert): Remove typedef for
node generator.

Reviewed-by: François Dumont 
---

Most of this patch series *removes* overloaded names, making overload
resolution simpler. This one adds an overload of _M_assign where there
was previously only one function template with that name. I think the
increased usability for the common case is worth it.

 libstdc++-v3/include/bits/hashtable.h| 34 +++-
 libstdc++-v3/include/bits/hashtable_policy.h |  1 -
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 23484f711cc..bf6eed7c1c6 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -299,12 +299,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Equal, _Hash, _RangeHash, _Unused,
_RehashPolicy, _Traits>;
 
-  using __reuse_or_alloc_node_gen_t =
-   __detail::_ReuseOrAllocNode<__node_alloc_type>;
-  using __alloc_node_gen_t =
-   __detail::_AllocNode<__node_alloc_type>;
-  using __node_builder_t =
-   __detail::_NodeBuilder<_ExtractKey>;
+  using __node_builder_t = __detail::_NodeBuilder<_ExtractKey>;
 
   // Simple RAII type for managing a node containing an element
   struct _Scoped_node
@@ -480,6 +475,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
void
_M_assign_elements(_Ht&&);
 
+  template
+   void
+   _M_assign(_Ht&& __ht)
+   {
+ __detail::_AllocNode<__node_alloc_type> __alloc_node_gen(*this);
+ _M_assign(std::forward<_Ht>(__ht), __alloc_node_gen);
+   }
+
   template
void
_M_assign(_Ht&&, _NodeGenerator&);
@@ -608,6 +611,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Hashtable&
   operator=(initializer_list __l)
   {
+   using __reuse_or_alloc_node_gen_t =
+ __detail::_ReuseOrAllocNode<__node_alloc_type>;
+
__reuse_or_alloc_node_gen_t __roan(_M_begin(), *this);
_M_before_begin._M_nxt = nullptr;
clear();
@@ -1308,10 +1314,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_bucket_count = __ht._M_bucket_count;
  _M_element_count = __ht._M_element_count;
  _M_rehash_policy = __ht._M_rehash_policy;
- __alloc_node_gen_t __alloc_node_gen(*this);
  __try
{
- _M_assign(__ht, __alloc_node_gen);
+ _M_assign(__ht);
}
  __catch(...)
{
@@ -1340,6 +1345,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
   _M_assign_elements(_Ht&& __ht)
   {
+   using __reuse_or_alloc_node_gen_t =
+ __detail::_ReuseOrAllocNode<__node_alloc_type>;
+
__buckets_ptr __former_buckets = nullptr;
std::size_t __former_bucket_count = _M_bucket_count;
__rehash_guard_t __rehash_guard(_M_rehash_policy);
@@ -1517,8 +1525,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_element_count(__ht._M_element_count),
   _M_rehash_policy(__ht._M_rehash_policy)
 {
-  __alloc_node_gen_t __alloc_node_gen(*this);
-  _M_assign(__ht, __alloc_node_gen);
+  _M_assign(__ht);
 }
 
   template::value,
const _Hashtable&, _Hashtable&&>;
- _M_assign(std::forward<_Fwd_Ht>(__ht), __alloc_gen);
+ _M_assign(std::forward<_Fwd_Ht>(__ht));
  __ht.clear();
}
 }
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index caedb0258ef..cf97e571c1e 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h

[PATCH 10/12] libstdc++: Remove _Hashtable_base::_S_equals

2024-11-08 Thread Jonathan Wakely
This removes the overloaded _S_equals and _S_node_equals functions,
replacing them with 'if constexpr' in the handful of places they're
used.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Hashtable_base::_S_equals):
Remove.
(_Hashtable_base::_S_node_equals): Remove.
(_Hashtable_base::_M_key_equals_tr): Fix inaccurate
static_assert string.
(_Hashtable_base::_M_equals, _Hashtable_base::_M_equals_tr): Use
'if constexpr' instead of _S_equals.
(_Hashtable_base::_M_node_equals): Use 'if constexpr' instead of
_S_node_equals.
---

This one could be pushed independently of the rest of the series.

 libstdc++-v3/include/bits/hashtable_policy.h | 48 ++--
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index c3d89a1101c..ad0dfd55c3f 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1487,24 +1487,6 @@ namespace __detail
 private:
   using _EqualEBO = _Hashtable_ebo_helper<0, _Equal>;
 
-  static bool
-  _S_equals(__hash_code, const _Hash_node_code_cache&)
-  { return true; }
-
-  static bool
-  _S_node_equals(const _Hash_node_code_cache&,
-const _Hash_node_code_cache&)
-  { return true; }
-
-  static bool
-  _S_equals(__hash_code __c, const _Hash_node_code_cache& __n)
-  { return __c == __n._M_hash_code; }
-
-  static bool
-  _S_node_equals(const _Hash_node_code_cache& __lhn,
-const _Hash_node_code_cache& __rhn)
-  { return __lhn._M_hash_code == __rhn._M_hash_code; }
-
 protected:
   _Hashtable_base() = default;
 
@@ -1531,31 +1513,49 @@ namespace __detail
{
  static_assert(
__is_invocable{},
-   "key equality predicate must be invocable with two arguments of "
-   "key type");
+   "key equality predicate must be invocable with the argument type "
+   "and the key type");
  return _M_eq()(__k, _ExtractKey{}(__n._M_v()));
}
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   bool
   _M_equals(const _Key& __k, __hash_code __c,
const _Hash_node_value<_Value, __hash_cached::value>& __n) const
-  { return _S_equals(__c, __n) && _M_key_equals(__k, __n); }
+  {
+   if constexpr (__hash_cached::value)
+ if (__c != __n._M_hash_code)
+   return false;
+
+   return _M_key_equals(__k, __n);
+  }
 
   template
bool
_M_equals_tr(const _Kt& __k, __hash_code __c,
 const _Hash_node_value<_Value,
__hash_cached::value>& __n) const
-   { return _S_equals(__c, __n) && _M_key_equals_tr(__k, __n); }
+   {
+ if constexpr (__hash_cached::value)
+   if (__c != __n._M_hash_code)
+ return false;
+
+ return _M_key_equals_tr(__k, __n);
+   }
 
   bool
   _M_node_equals(
const _Hash_node_value<_Value, __hash_cached::value>& __lhn,
const _Hash_node_value<_Value, __hash_cached::value>& __rhn) const
   {
-   return _S_node_equals(__lhn, __rhn)
- && _M_key_equals(_ExtractKey{}(__lhn._M_v()), __rhn);
+   if constexpr (__hash_cached::value)
+ if (__lhn._M_hash_code != __rhn._M_hash_code)
+   return false;
+
+   return _M_key_equals(_ExtractKey{}(__lhn._M_v()), __rhn);
   }
+#pragma GCC diagnostic pop
 
   void
   _M_swap(_Hashtable_base& __x)
-- 
2.47.0



[PATCH 01/12] libstdc++: Refactor _Hashtable::operator=(initializer_list)

2024-11-08 Thread Jonathan Wakely
This replaces a call to _M_insert_range with open coding the loop. This
will allow removing the node generator parameter from _M_insert_range in
a later commit.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (operator=(initializer_list)):
Refactor to not use _M_insert_range.

Reviewed-by: François Dumont 
---
 libstdc++-v3/include/bits/hashtable.h | 35 ---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index b36142b358a..872fcac22d0 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -610,6 +610,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *this;
   }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   _Hashtable&
   operator=(initializer_list __l)
   {
@@ -617,16 +619,43 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_before_begin._M_nxt = nullptr;
clear();
 
-   // We consider that all elements of __l are going to be inserted.
+   // We assume that all elements of __l are likely to be inserted.
auto __l_bkt_count = _M_rehash_policy._M_bkt_for_elements(__l.size());
 
-   // Do not shrink to keep potential user reservation.
+   // Excess buckets might have been intentionally reserved by the user,
+   // so rehash if we need to grow, but don't shrink.
if (_M_bucket_count < __l_bkt_count)
  rehash(__l_bkt_count);
 
-   this->_M_insert_range(__l.begin(), __l.end(), __roan, __unique_keys{});
+   _ExtractKey __ex;
+   for (auto& __e : __l)
+ {
+   const key_type& __k = __ex(__e);
+
+   if constexpr (__unique_keys::value)
+ if (this->size() <= __small_size_threshold())
+   {
+ auto __it = _M_begin();
+ for (; __it; __it = __it->_M_next())
+   if (this->_M_key_equals(__k, *__it))
+ break;
+ if (__it)
+   continue; // Found existing element with equivalent key
+   }
+
+   __hash_code __code = this->_M_hash_code(__k);
+   size_type __bkt = _M_bucket_index(__code);
+
+   if constexpr (__unique_keys::value)
+ if (_M_find_node(__bkt, __k, __code))
+   continue; // Found existing element with equivalent key
+
+   _M_insert_unique_node(__bkt, __code, __roan(__e));
+ }
+
return *this;
   }
+#pragma GCC diagnostic pop
 
   ~_Hashtable() noexcept;
 
-- 
2.47.0



[PATCH v2 1/4] aarch64: return scalar fp8 values in fp registers

2024-11-08 Thread Claudio Bantaloukas

According to the aapcs64: If the argument is an 8-bit (...) precision
Floating-point or short vector type and the NSRN is less than 8, then the
argument is allocated to the least significant bits of register v[NSRN].

gcc/
* config/aarch64/aarch64.cc
(aarch64_vfp_is_call_or_return_candidate): use fp registers to
return svmfloat8_t parameters.

gcc/testsuite/
* gcc.target/aarch64/fp8_scalar_1.c:
---
 gcc/config/aarch64/aarch64.cc   | 3 ++-
 gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f2b53475adb..0e2f9ef0ca1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22276,7 +22276,8 @@ aarch64_vfp_is_call_or_return_candidate (machine_mode mode,
 
   if ((!composite_p
&& (GET_MODE_CLASS (mode) == MODE_FLOAT
-	   || GET_MODE_CLASS (mode) == MODE_DECIMAL_FLOAT))
+	   || GET_MODE_CLASS (mode) == MODE_DECIMAL_FLOAT
+	   || (type && TYPE_MAIN_VARIANT (type) == aarch64_mfp8_type_node)))
   || aarch64_short_vector_p (type, mode))
 {
   *count = 1;
diff --git a/gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c b/gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c
index 1bc2ac26b2a..61edf06401b 100644
--- a/gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fp8_scalar_1.c
@@ -7,10 +7,10 @@
 
 /*
 **stacktest1:
+**	umov	w0, v0.b\[0\]
 **	sub	sp, sp, #16
-**	and	w0, w0, 255
 **	strb	w0, \[sp, 15\]
-**	ldrb	w0, \[sp, 15\]
+**	ldr	b0, \[sp, 15\]
 **	add	sp, sp, 16
 **	ret
 */


[PATCH v2 3/4] aarch64: specify fpm mode in function instances and groups

2024-11-08 Thread Claudio Bantaloukas

Some intrinsics require setting the fpm register before calling the
specific asm opcode required.
In order to simplify review, this patch:
- adds the fpm_mode_index attribute to function_group_info and
  function_instance objects
- updates existing initialisations and call sites.
- updates equality and hash operations

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svdiv_impl): Specify FPM_unused when folding.
(svmul_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def
(svreinterpret): Specify FPM_unused mode
* config/aarch64/aarch64-sve-builtins-shapes.cc
(build_one): Use the group fpm_mode when creating function instances.
* config/aarch64/aarch64-sve-builtins-sme.def
(DEF_SME_FUNCTION): specify FPM_unset mode
(DEF_SME_ZA_FUNCTION_GS): Allow specifying fpm mode
(DEF_SME_ZA_FUNCTION): specify FPM_unset mode
(svadd,svadd_write,svdot, svdot_lane, svluti2_lane_zt, svluti4_lane_zt,
svmla, svmla_lane, svmls, svmls_lane, svread, svread_hor, svread_ver,
svsub, svsub_write, svsudot, svsudot_lane, svsuvdot_lane, svusdot,
svusdot_lane, svusvdot_lane, svvdot_lane, svwrite, svwrite_hor,
svwrite_ver): Likewise
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svaba_impl, svqrshl_impl, svqshl_impl,svrshl_impl, svsra_impl):
Specify FPM_unused when folding.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svadd, svclamp, svcvt, svcvtn, svld1, svldnt1, svmax, svmaxnm, svmin,
svminnm, svpext_lane, svqcvt, svqcvtn, svqdmulh, svqrshr, svqrshrn,
svqrshru, svqrshrun, svrinta, svrintm, svrintn, svrintp, svrshl, svsel,
svst1, svstnt1, svunpk, svuzp, svuzpq, svwhilege, svwhilegt, svwhilele,
svwhilelt, svzip, svzipq): Likewise
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Set
fpm_mode on all elements.
(neon_sve_function_groups, sme_function_groups): Likewise.
(function_instance::hash): Include fpm_mode in hash.
(function_builder::add_overloaded_functions): Use the group fpm mode.
(function_resolver::lookup_form): Use the function instance fpm_mode
when looking up a function.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_FUNCTION_GS): add argument.
(DEF_SVE_FUNCTION): specify FPM_unset mode.
* config/aarch64/aarch64-sve-builtins.h (fpm_mode_index): New.
(function_group_info): Add fpm_mode.
(function_instance): Likewise.
(function_instance::operator==): Handle fpm_mode.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  15 +-
 .../aarch64/aarch64-sve-builtins-base.def |   3 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc|   3 +-
 .../aarch64/aarch64-sve-builtins-sme.def  | 130 ++
 .../aarch64/aarch64-sve-builtins-sve2.cc  |  20 ++-
 .../aarch64/aarch64-sve-builtins-sve2.def |  96 +++--
 gcc/config/aarch64/aarch64-sve-builtins.cc|  23 ++--
 gcc/config/aarch64/aarch64-sve-builtins.def   |   4 +-
 gcc/config/aarch64/aarch64-sve-builtins.h |  25 +++-
 9 files changed, 188 insertions(+), 131 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 1c9f515a52c..893ecb5f080 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -774,7 +774,8 @@ public:
   {
 	function_instance instance ("svneg", functions::svneg,
 shapes::unary, MODE_none,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	gcall *call = f.redirect_call (instance);
 	unsigned offset_index = 0;
 	if (f.pred == PRED_m)
@@ -802,7 +803,8 @@ public:
   {
 	function_instance instance ("svlsr", functions::svlsr,
 shapes::binary_uint_opt_n, MODE_n,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	call = f.redirect_call (instance);
 	tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : op2_cst;
 	new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
@@ -815,7 +817,8 @@ public:
 
 	function_instance instance ("svasrd", functions::svasrd,
 shapes::shift_right_imm, MODE_n,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	call = f.redirect_call (instance);
 	new_divisor = wide_int_to_tree (scalar_types[VECTOR_TYPE_svuint64_t],
 	tree_log2 (op2_cst));
@@ -2092,7 +2095,8 @@ public:
   {
 	function_instance instance ("svneg", functions::svneg,
 shapes::unary, MODE_none,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	gcall *call = f.redirect_call (instance);
 	unsigned offset_index = 0;
 	if (f.pred == PRED_m)
@@ -2133,7 +2137,8 

[PATCH 03/12] libstdc++: Refactor Hashtable insertion [PR115285]

2024-11-08 Thread Jonathan Wakely
This completely reworks the internal member functions for insertion into
unordered containers. Currently we use a mixture of tag dispatching (for
unique vs non-unique keys) and template specialization (for maps vs
sets) to correctly implement insert and emplace members.

This removes a lot of complexity and indirection by using 'if constexpr'
to select the appropriate member function to call.

Previously there were four overloads of _M_emplace, for unique keys and
non-unique keys, and for hinted insertion and non-hinted. However two of
those were redundant, because we always ignore the hint for unique keys
and always use a hint for non-unique keys. Those four overloads have
been replaced by two new non-overloaded function templates:
_M_emplace_uniq and _M_emplace_multi. The former is for unique keys and
doesn't take a hint, and the latter is for non-unique keys and takes a
hint.

In the body of _M_emplace_uniq there are special cases to handle
emplacing values from which a key_type can be extracted directly. This
means we don't need to allocate a node and construct a value_type that
might be discarded if an equivalent key is already present. The special
case applies when emplacing the key_type into std::unordered_set, or
when emplacing std::pair into std::unordered_map, or
when emplacing two values into std::unordered_map where the first has
type cv key_type. For the std::unordered_set case, obviously if we're
inserting something that's already the key_type, we can look it up
directly. For the std::unordered_map cases, we know that the inserted
std::pair would have its first element
initialized from first member of a std::pair value, or from the first of
two values, so if that is a key_type, we can look that up directly.

All the _M_insert overloads used a node generator parameter, but apart
from the one case where _M_insert_range was called from
_Hashtable::operator=(initializer_list), that parameter was
always the _AllocNode type, never the _ReuseOrAllocNode type. Because
operator=(initializer_list) was rewritten in an earlier
commit, all calls to _M_insert now use _AllocNode, so there's no reason
to pass the generator as a template parameter when inserting.

The multiple overloads of _Hashtable::_M_insert can all be removed now,
because the _Insert_base::insert members now call either _M_emplace_uniq
or _M_emplace_multi directly, only passing a hint to the latter. Which
one to call is decided using 'if constexpr (__unique_keys::value)' so
there is no unnecessary code instantiation, and overload resolution is
much simpler.

The partial specializations of the _Insert class template can be
entirely removed, moving the minor differences in 'insert' member
functions into the common _Insert_base base class. The different
behaviour for maps and sets can be implemented using enable_if
constraints and 'if constexpr'. With the _Insert class template no
longer needed, the _Insert_base class template can be renamed to
_Insert. This is a minor simplification for the complex inheritance
hierarchy used by _Hashtable, removing one base class. It also means
one less class template instantiation, and no need to match the right
partial specialization of _Insert. The _Insert base class could be
removed entirely by moving all its 'insert' members into _Hashtable,
because without any variation in specializations of _Insert there is no
reason to use a base class to define those members. That is left for a
later commit.

Consistently using _M_emplace_uniq or _M_emplace_multi for insertion
means we no longer attempt to avoid constructing a value_type object to
find its key, removing the PR libstdc++/96088 optimizations. This fixes
the bugs caused by those optimizations, such as PR libstdc++/115285, but
causes regressions in the expected number of allocations and temporary
objects constructed for the PR 96088 tests.  It should be noted that the
"regressions" in the 96088 tests put us exactly level with the number of
allocations done by libc++ for those same tests.

To mitigate this to some extent, _M_emplace_uniq detects when the
emplace arguments already contain a key_type (either as the sole
argument, for unordered_set, or as the first part of a pair of
arguments, for unordered_map). In that specific case we don't need to
allocate a node and construct a value type to check for an existing
element with equivalent key.

The remaining regressions in the number of allocations and temporaries
should be addressed separately, with more conservative optimizations
specific to std::string. That is not part of this commit.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_M_emplace): Replace
with _M_emplace_uniq and _M_emplace_multi.
(_Hashtable::_S_forward_key, _Hashtable::_M_insert_unique)
(_Hashtable::_M_insert_unique_aux, _Hashtable::_M_insert):
Remove.
* include/bits/hashtable_policy.h (_ConvertToValueType):
Remove.
(_Insert_

gcc-patches@gcc.gnu.org

2024-11-08 Thread Jonathan Wakely
We have two overloads of _M_find_before_node but they have quite
different performance characteristics, which isn't necessarily obvious.

The original version, _M_find_before_node(bucket, key, hash_code), looks
only in the specified bucket, doing a linear search within that bucket
for an element that compares equal to the key. This is the typical fast
lookup for hash containers, assuming the load factor is low so that each
bucket isn't too large.

The newer _M_find_before_node(key) was added in r12-6272-ge3ef832a9e8d6a
and could be naively assumed to calculate the hash code and bucket for
key and then call the efficient _M_find_before_node(bkt, key, code)
function. But in fact it does a linear search of the entire container.
This is potentially very slow and should only be used for a suitably
small container, as determined by the __small_size_threshold() function.
We don't even have a comment pointing out this O(N) performance of the
newer overload.

Additionally, the newer overload is only ever used in exactly one place,
which would suggest it could just be removed. However there are several
places that do the linear search of the whole container with an explicit
loop each time.

This adds a new member function, _M_locate, and uses it to replace most
uses of _M_find_node and the loops doing linear searches. This new
member function does both forms of lookup, the linear search for small
sizes and the _M_find_node(bkt, key, code) lookup within a single
bucket. The new function returns a __location_type which is a struct
that contains a pointer to the first node matching the key (if such a
node is present), or the hash code and bucket index for the key. The
hash code and bucket index allow the caller to know where a new node
with that key should be inserted, for the cases where the lookup didn't
find a matching node.

The result struct actually contains a pointer to the node *before* the
one that was located, as that is needed for it to be useful in erase and
extract members. There is a member function that returns the found node,
i.e. _M_before->_M_nxt downcast to __node_ptr, which should be used in
most cases.

This new function greatly simplifies the functions that currently have
to do two kinds of lookup and explicitly check the current size against
the small size threshold.

Additionally, now that try_emplace is defined directly in _Hashtable
(not in _Insert_base) we can use _M_locate in there too, to speed up
some try_emplace calls. Previously it did not do the small-size linear
search.

It would be possible to add a function to get a __location_type from an
iterator, and then rewrite some functions like _M_erase and
_M_extract_node to take a __location_type parameter. While that might be
conceptually nice, it wouldn't really make the code any simpler or more
readable than it is now. That isn't done in this change.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (__location_type): New struct.
(_M_locate): New member function.
(_M_find_before_node(const key_type&)): Remove.
(_M_find_node): Move variable initialization into condition.
(_M_find_node_tr): Likewise.
(operator=(initializer_list), try_emplace, _M_reinsert_node)
(_M_merge_unique, find, erase(const key_type&)): Use _M_locate
for lookup.
---
 libstdc++-v3/include/bits/hashtable.h | 333 +++---
 1 file changed, 145 insertions(+), 188 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index aca431ae216..6a2da121ab9 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -596,28 +596,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (_M_bucket_count < __l_bkt_count)
  rehash(__l_bkt_count);
 
-   _ExtractKey __ex;
+   __hash_code __code;
+   size_type __bkt;
for (auto& __e : __l)
  {
-   const key_type& __k = __ex(__e);
+   const key_type& __k = _ExtractKey{}(__e);
 
if constexpr (__unique_keys::value)
- if (this->size() <= __small_size_threshold())
-   {
- auto __it = _M_begin();
- for (; __it; __it = __it->_M_next())
-   if (this->_M_key_equals(__k, *__it))
- break;
- if (__it)
-   continue; // Found existing element with equivalent key
-   }
-
-   __hash_code __code = this->_M_hash_code(__k);
-   size_type __bkt = _M_bucket_index(__code);
-
-   if constexpr (__unique_keys::value)
- if (_M_find_node(__bkt, __k, __code))
-   continue; // Found existing element with equivalent key
+ {
+   if (auto __loc = _M_locate(__k))
+ continue; // Found existing element with equivalent key
+   else
+ {
+   __code = __loc._M_hash_code;
+ 

Re: [PATCH] Add COBOL to gcc

2024-11-08 Thread James K. Lowden
On Fri, 8 Nov 2024 13:50:45 +0100
Jakub Jelinek  wrote:

> > * gcc-changelog/git_commit.py (default_changelog_locations):
> > New entry for gcc/cobol.  New entry for libgcobol.
> 
> Dunno if your mailer ate the tabs at the start of the above 2 lines.
> That is required so that it can be committed.

Don't know how that happened.  New attempt follows.

> Otherwise LGTM, but please wait with it until everything else is
> approved (unless that already happened).

We have steering committee approval as of March 2023.  Is there
something else? 

> Then this needs to be committed, you need to talk to us best on IRC
> or worst case via mail

Hmm, using rcirc in emacs, I'm connected to #g...@irc.oftc.net but
didn't see any traffic today.  Probably pibkac.  

--jkl

[snip]
>From 95d8508ce8dabebbabfe14b9621fca45e2a397dddir.patch 4 Oct 2024 12:01:22 
>-0400
From: "James K. Lowden" 
Date: Sat 02 Nov 2024 12:51:49 PM EDT
Subject: [PATCH]  Add 'cobol' to 1 file

contrib/gcc-changelog/ChangeLog
* contrib/gcc-changelog/git_commit.py: Add 
contrib/gcc-changelog/git_commit.py

---
contrib/gcc-changelog/git_commit.py | ++
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 87ecb9e1a17..e9393012865 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -39,6 +39,7 @@ default_changelog_locations = {
 'gcc/c-family',
 'gcc',
 'gcc/cp',
+'gcc/cobol',
 'gcc/d',
 'gcc/fortran',
 'gcc/go',
@@ -66,6 +67,7 @@ default_changelog_locations = {
 'libgcc',
 'libgcc/config/avr/libf7',
 'libgcc/config/libbid',
+'libgcobol',
 'libgfortran',
 'libgm2',
 'libgomp',
[pins]


Re: [PATCH v2] c: Implement C2y N3356, if declarations [PR117019]

2024-11-08 Thread Joseph Myers
On Thu, 7 Nov 2024, Marek Polacek wrote:

> @@ -8355,7 +8492,9 @@ c_parser_switch_statement (c_parser *parser, bool 
> *if_p, tree before_labels)
>if (c_parser_next_token_is (parser, CPP_OPEN_PAREN)
> && c_token_starts_typename (c_parser_peek_2nd_token (parser)))
>   explicit_cast_p = true;
> -  ce = c_parser_expression (parser);
> +  ce = c_parser_selection_header (parser, /*switch_p=*/true);
> +  /* The call above already performed convert_lvalue_to_rvalue, but with
> +  read_p=false.  */
>ce = convert_lvalue_to_rvalue (switch_cond_loc, ce, true, true);

That comment only seems accurate in the case of an expression; in the case 
of a simple-declaration, read_p=true for the previous call.

I think there's another case of invalid code it would be good to add tests 
for.  You have tests in c2y-if-decls-6.c of valid code with a VLA in in 
if/switch declaration.  It would be good to test also the invalid case 
where there is a jump into the scope of such an identifier with VM type 
(whether with goto, or with an outer switch around an if statement with 
such a declaration), to verify that the checks for this do work in both 
the declaration and the simple-declaration cases of if and switch.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] AArch64: Cleanup fusion defines

2024-11-08 Thread Andrew Pinski
On Fri, Nov 8, 2024 at 8:56 AM Wilco Dijkstra  wrote:
>
>
> Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base
> level of fusion supported by almost all cores.  Add AARCH64_FUSE_MOVK as a
> shortcut for all MOVK fusion.  In most cases there is no change.  It enables
> AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measurable
> effect if a core doesn't support it.  Also it may have been accidentally
> left out on some cores that support all other types of branch fusion.
>
> In the future we could add fusion types to AARCH64_FUSE_BASE if beneficial.
>
> Passes regress & bootstrap, OK for commit?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_BASE): New 
> define.
> (AARCH64_FUSE_MOVK): Likewise.
> * config/aarch64/tuning_models/a64fx.h: Update.
> * config/aarch64/tuning_models/ampere1.h: Likewise.
> * config/aarch64/tuning_models/ampere1a.h: Likewise.
> * config/aarch64/tuning_models/ampere1b.h: Likewise.
> * config/aarch64/tuning_models/cortexa35.h: Likewise.
> * config/aarch64/tuning_models/cortexa53.h: Likewise.
> * config/aarch64/tuning_models/cortexa57.h: Likewise.
> * config/aarch64/tuning_models/cortexa72.h: Likewise.
> * config/aarch64/tuning_models/cortexa73.h: Likewise.
> * config/aarch64/tuning_models/cortexx925.h: Likewise.
> * config/aarch64/tuning_models/exynosm1.h: Likewise.
> * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
> * config/aarch64/tuning_models/generic.h: Likewise.
> * config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
> * config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
> * config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
> * config/aarch64/tuning_models/neoversen1.h: Likewise.
> * config/aarch64/tuning_models/neoversen2.h: Likewise.
> * config/aarch64/tuning_models/neoversen3.h: Likewise.
> * config/aarch64/tuning_models/neoversev1.h: Likewise.
> * config/aarch64/tuning_models/neoversev2.h: Likewise.
> * config/aarch64/tuning_models/neoversev3.h: Likewise.
> * config/aarch64/tuning_models/neoversev3ae.h: Likewise.
> * config/aarch64/tuning_models/qdf24xx.h: Likewise.
> * config/aarch64/tuning_models/saphira.h: Likewise.

qdf24xx.h and saphira.h changes look correct to me.

> * config/aarch64/tuning_models/thunderx2t99.h: Likewise.
> * config/aarch64/tuning_models/thunderx3t110.h: Likewise.

thunderx2t99.h and thunderx3t110.h changes look decent (though since I
am not at Marvell any more).

Thanks,
Andrew

> * config/aarch64/tuning_models/tsv110.h: Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> index 
> bf5e85ba8fe128721521505bd6b73b38c25d9f65..f8413ab0c802c28290ebcc171bfd131622cb33be
>  100644
> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> @@ -41,3 +41,8 @@ AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL)
>  AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>
>  #undef AARCH64_FUSION_PAIR
> +
> +/* Baseline fusion settings suitable for all cores.  */
> +#define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC)
> +
> +#define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK)
> diff --git a/gcc/config/aarch64/tuning_models/a64fx.h 
> b/gcc/config/aarch64/tuning_models/a64fx.h
> index 
> 378a1b3889ee265859786c1ff6525fce2305b615..2de96190b2d668f7f8e09b48fba418788d726ccf
>  100644
> --- a/gcc/config/aarch64/tuning_models/a64fx.h
> +++ b/gcc/config/aarch64/tuning_models/a64fx.h
> @@ -150,7 +150,7 @@ static const struct tune_params a64fx_tunings =
>  4 /* store_pred.  */
>}, /* memmov_cost.  */
>7, /* issue_rate  */
> -  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
> +  AARCH64_FUSE_BASE, /* fusible_ops  */
>"32",/* function_align.  */
>"16",/* jump_align.  */
>"32",/* loop_align.  */
> diff --git a/gcc/config/aarch64/tuning_models/ampere1.h 
> b/gcc/config/aarch64/tuning_models/ampere1.h
> index 
> ace9bf49f7593d3713ed0bc61494c3915749a9a8..b2b376699ae64c3089896491baa6d8dcd948ef87
>  100644
> --- a/gcc/config/aarch64/tuning_models/ampere1.h
> +++ b/gcc/config/aarch64/tuning_models/ampere1.h
> @@ -88,11 +88,8 @@ static const struct tune_params ampere1_tunings =
>  4 /* store_pred.  */
>}, /* memmov_cost.  */
>4, /* issue_rate  */
> -  (AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_AES_AESMC |
> -   AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK |
> -   AARCH64_FUSE_ALU_BRANCH /* adds, ands, bics, ccmp, ccmn */ |
> -   AARCH64_FUSE_CMP_BRANCH),
> -  /* fusible_ops  */
> +  (AARCH64_FUSE_BASE | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK
> +   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
>"32", 

Re: [PATCH] Add COBOL to gcc

2024-11-08 Thread James K. Lowden
On Fri, 8 Nov 2024 13:52:55 +0100
Jakub Jelinek  wrote:

> Rather than a diff from /dev/null,
> > it's a blob with the exact file contents.  I hope it is correct in
> > this form.
> 
> That is just how the web git viewer presents new file commits.
> On gcc-patches those should be posted as normal patches.

Below is hopefully a well formed patch.  It adds ChangeLogs for the
COBOL front end.  

[snip]
>From 304f3678dbade1f60abdadb9ddd2baffae88013dpre.patch 4 Oct 2024 12:01:22 
>-0400
From: "James K. Lowden" 
Date: Fri 08 Nov 2024 03:30:08 PM EST
Subject: [PATCH]  Add 'cobol' to 2 files

gcc/cobol/ChangeLog [new file with mode: 0644]  
libgcobol/ChangeLog [new file with mode: 0644]

---
gcc/cobol/ChangeLog | ++-
libgcobol/ChangeLog | ++
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/gcc/cobol/ChangeLog b/gcc/cobol/ChangeLog
new file mode 100644
index 000..2988f44a1f1
--- /dev/null
+++ b/gcc/cobol/ChangeLog
@@ -0,0 +1,6 @@
+^L
+Copyright (C) 2022 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
diff --git a/libgcobol/ChangeLog b/libgcobol/ChangeLog
new file mode 100644
index 000..2988f44a1f1
--- /dev/null
+++ b/libgcobol/ChangeLog
@@ -0,0 +1,6 @@
+^L
+Copyright (C) 2022 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
[pins]


Re: [PATCH V2 1/11] Add rs6000 architecture masks.

2024-11-08 Thread Segher Boessenkool
On Fri, Nov 08, 2024 at 02:28:11PM -0600, Peter Bergner wrote:
> On 11/8/24 1:44 PM, Michael Meissner wrote:
> > diff --git a/gcc/config/rs6000/rs6000-arch.def 
> > b/gcc/config/rs6000/rs6000-arch.def
> > new file mode 100644
> > index 000..e5b6e958133
> > --- /dev/null
> > +++ b/gcc/config/rs6000/rs6000-arch.def
> > @@ -0,0 +1,48 @@
> > +/* IBM RS/6000 CPU architecture features by processor type.
> > +   Copyright (C) 1991-2024 Free Software Foundation, Inc.
> > +   Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)
> 
> This doesn't make any sense to me.  This is a new file with new
> code not copied form anywhere else, so why the Contributed by
> Richard Kenner line?  Cut/paste error?

If a file is largely copied from some other file, it makes sense to keep
attribution.  If not, not.

Same for that "1991-2024" btw.  Dates should be factual.


Segher


Re: [PATCH] c++: Small initial fixes for zeroing of padding bits [PR117256]

2024-11-08 Thread Jakub Jelinek
On Fri, Nov 08, 2024 at 10:29:09AM +0100, Jakub Jelinek wrote:
> I think we need far more than that, but am not sure where exactly
> to implement that.
> In particular, I think __builtin_bitcast should take it into account
> during constant evaluation, if the padding bits in something are guaranteed
> to be zero, then I'd think std::bitcast out of it and testing those
> bits in there should be well defined.
> But if we do that, the flag needs to be maintained precisely, not just
> conservatively, so e.g. any place where some object is copied into another
> one (except bitcast?) which would be element-wise copy, the bit should
> be cleared (or preserved from the earlier state?  I'd hope
> element-wise copying invalidates even the padding bits, but then what
> about just stores into some members, do those invalidate the padding bits
> in the rest of the object?).  But if it is an elided copy, it shouldn't.
> And am not really sure what happens e.g. with non-automatic constexpr
> variables.  If it is constructed by something that doesn't guarantee
> the zeroing of the padding bits (so similarly constructed constexpr automatic
> variable would have undefined state of the padding bits), are those padding
> bits well defined because it isn't automatic variable?

And there is another thing I'm unsure about, what happens in a union which
is zero-initialized and has padding bits if the active member is changed
(especially in constant evaluation).  Are all padding bits invalidated
through that, or something else?

Jakub



Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-11-08 Thread Richard Biener
On Fri, Nov 8, 2024 at 12:34 AM Li, Pan2  wrote:
>
> Thanks Tamar and Jeff for comments.
>
> > I'm not sure it's that simple.  It'll depend on the micro-architecture.
> > So things like strength of the branch predictors, how fetch blocks are
> > handled (can you have embedded not-taken branches, short-forward-branch
> > optimizations, etc).
>
> > After:
> >
> > .L.sat_add_u_1(unsigned int, unsigned int):
> >  add 4,3,4
> >  rldicl 9,4,0,32
> >  subf 3,3,9
> >  sradi 3,3,63
> >  or 3,3,4
> >  rldicl 3,3,0,32
> >  blr
> >
> > and before
> >
> > .L.sat_add_u_1(unsigned int, unsigned int):
> >  add 4,3,4
> >  cmplw 0,4,3
> >  bge 0,.L2
> >  li 4,-1
> > .L2:
> >  rldicl 3,4,0,32
> >  blr
>
> I am not familiar with branch prediction, but the branch should be 50% token 
> and 50% not-token
> according to the range of sat add input. It is the worst case for branch 
> prediction? I mean if we call
> 100 times with token, not-token, token, not-token... sequence, the branch 
> version will be still faster?
> Feel free to correct me if I'm wrong.
>
> Back to these 16 forms of sat add as below, is there any suggestion which one 
> or two form(s) may be
> cheaper than others from the perspective of gimple IR? Independent with the 
> backend implemented SAT_ADD or not.
>
> #define DEF_SAT_U_ADD_1(T)   \
> T sat_u_add_##T##_1 (T x, T y)   \
> {\
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
>
> #define DEF_SAT_U_ADD_2(T)  \
> T sat_u_add_##T##_2 (T x, T y)  \
> {   \
>   return (T)(x + y) < x ? -1 : (x + y); \
> }
>
> #define DEF_SAT_U_ADD_3(T)   \
> T sat_u_add_##T##_3 (T x, T y)   \
> {\
>   return x <= (T)(x + y) ? (x + y) : -1; \
> }
>
> #define DEF_SAT_U_ADD_4(T)  \
> T sat_u_add_##T##_4 (T x, T y)  \
> {   \
>   return x > (T)(x + y) ? -1 : (x + y); \
> }
>
> #define DEF_SAT_U_ADD_5(T)  \
> T sat_u_add_##T##_1 (T x, T y)  \
> {   \
>   if ((T)(x + y) >= x)  \
> return x + y;   \
>   else  \
> return -1;  \
> }
>
> #define DEF_SAT_U_ADD_6(T)  \
> T sat_u_add_##T##_6 (T x, T y)  \
> {   \
>   if ((T)(x + y) < x)   \
> return -1;  \
>   else  \
> return x + y;   \
> }
>
> #define DEF_SAT_U_ADD_7(T)  \
> T sat_u_add_##T##_7 (T x, T y)  \
> {   \
>   if (x <= (T)(x + y))  \
> return x + y;   \
>   else  \
> return -1;  \
> }
>
> #define DEF_SAT_U_ADD_8(T)  \
> T sat_u_add_##T##_8 (T x, T y)  \
> {   \
>   if (x > (T)(x + y))   \
> return -1;  \
>   else  \
> return x + y;   \
> }
>
> #define DEF_SAT_U_ADD_9(T) \
> T sat_u_add_##T##_9 (T x, T y) \
> {  \
>   T ret;   \
>   return __builtin_add_overflow (x, y, &ret) == 0 ? ret : - 1; \
> }
>
> #define DEF_SAT_U_ADD_10(T)\
> T sat_u_add_##T##_10 (T x, T y)\
> {  \
>   T ret;   \
>   return !__builtin_add_overflow (x, y, &ret) ? ret : - 1; \
> }
>
> #define DEF_SAT_U_ADD_11(T) \
> T sat_u_add_##T##_11 (T x, T y) \
> {   \
>   T ret;\
>   if (__builtin_add_overflow (x, y, &ret) == 0) \
> return ret; \
>   else  \
> return -1;  \
> }
>
> #define DEF_SAT_U_ADD_12(T) \
> T sat_u_add_##T##_12 (T x, T y) \
> {   \
>   T ret;\
>   if (!__builtin_add_overflow (x, y, &ret)) \
> return ret; \
>   else  \
> return -1;  \
> }
>
> #define DEF_SAT_U_ADD_13(T)   \
> T sat_u_add_##T##_13 (T x, T y)   \
> { \
>   T ret;  \
>   return __builtin_add_overflow (x, y, &ret) != 0 ? -1 : ret; \
> }
>
> #define DEF_SAT_U_ADD_14(T)   

[PATCH] tree-optimization/117484 - issue with SLP discovery of permuted .MASK_LOAD

2024-11-08 Thread Richard Biener
When we do SLP discovery of a .MASK_LOAD for a dataref group with gaps
the discovery for the mask will have gaps as well and this was
unexpected in a few places.  The following re-organizes things
slightly to accomodate for this.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

PR tree-optimization/117484
* tree-vect-slp.cc (vect_build_slp_tree_2): Handle gaps in
mask discovery.

* gcc.dg/vect/pr117484-1.c: New testcase.
* gcc.dg/vect/pr117484-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr117484-1.c | 13 +
 gcc/testsuite/gcc.dg/vect/pr117484-2.c | 16 
 gcc/tree-vect-slp.cc   | 17 -
 3 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr117484-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr117484-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr117484-1.c 
b/gcc/testsuite/gcc.dg/vect/pr117484-1.c
new file mode 100644
index 000..453556c50f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr117484-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+extern int a;
+extern short b[];
+extern signed char c[], d[];
+int main()
+{
+  for (long j = 3; j < 1024; j += 3)
+if (c[j] ? b[j] : 0) {
+  b[j] = d[j - 2];
+  a = d[j];
+}
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr117484-2.c 
b/gcc/testsuite/gcc.dg/vect/pr117484-2.c
new file mode 100644
index 000..baffe7597ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr117484-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+int a;
+extern int d[];
+extern int b[];
+extern _Bool c[];
+extern char h[];
+int main()
+{
+  for (int i = 0; i < 1024; i += 4)
+if (h[i] || c[i])
+  {
+   a = d[i];
+   b[i] = d[i - 3];
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 97c362d24f8..38f4c41b8db 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2004,14 +2004,15 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
= STMT_VINFO_GROUPED_ACCESS (stmt_info)
  ? DR_GROUP_FIRST_ELEMENT (stmt_info) : stmt_info;
  bool any_permute = false;
- bool any_null = false;
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
{
  int load_place;
  if (! load_info)
{
- load_place = j;
- any_null = true;
+ if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   load_place = j;
+ else
+   load_place = 0;
}
  else if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
load_place = vect_get_place_in_interleaving_chain
@@ -2022,11 +2023,6 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  any_permute |= load_place != j;
  load_permutation.quick_push (load_place);
}
- if (any_null)
-   {
- gcc_assert (!any_permute);
- load_permutation.release ();
-   }
 
  if (gcall *stmt = dyn_cast  (stmt_info->stmt))
{
@@ -2081,6 +2077,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 followed by 'node' being the desired final permutation.  */
  if (unperm_load)
{
+ gcc_assert
+   (!SLP_TREE_LOAD_PERMUTATION (unperm_load).exists ());
  lane_permutation_t lperm;
  lperm.create (group_size);
  for (unsigned j = 0; j < load_permutation.length (); ++j)
@@ -2675,7 +2673,8 @@ out:
  tree op0;
  tree uniform_val = op0 = oprnd_info->ops[0];
  for (j = 1; j < oprnd_info->ops.length (); ++j)
-   if (!operand_equal_p (uniform_val, oprnd_info->ops[j]))
+   if (oprnd_info->ops[j]
+   && !operand_equal_p (uniform_val, oprnd_info->ops[j]))
  {
uniform_val = NULL_TREE;
break;
-- 
2.43.0


Re: [PATCH v2] testsuite: arm: Use effective-target arm_libc_fp_abi for pr68620.c test

2024-11-08 Thread Richard Earnshaw (lists)

On 07/11/2024 17:48, Torbjörn SVENSSON wrote:

Changes since v1:

- Switch to arm_libc_fp_abi from arm_fp

@Christophe, can you test this patch in the linaro farm to ensure that
it does not fail again?

Ok for trunk and releases/gcc-14?

--

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1407.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68620.c: Use effective-target
arm_libc_fp_abi.
* lib/target-supports.exp: Define effective-target
arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Richard Earnshaw 


OK.

R.


---
  gcc/testsuite/gcc.target/arm/pr68620.c |  4 ++-
  gcc/testsuite/lib/target-supports.exp  | 35 ++
  2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr68620.c 
b/gcc/testsuite/gcc.target/arm/pr68620.c
index 6e38671752f..3ffaa5c5a9c 100644
--- a/gcc/testsuite/gcc.target/arm/pr68620.c
+++ b/gcc/testsuite/gcc.target/arm/pr68620.c
@@ -1,8 +1,10 @@
  /* { dg-do compile } */
  /* { dg-skip-if "-mpure-code supports M-profile without Neon only" { *-*-* } { 
"-mpure-code" } } */
  /* { dg-require-effective-target arm_arch_v7a_ok } */
-/* { dg-options "-mfp16-format=ieee -mfpu=auto -mfloat-abi=softfp" } */
+/* { dg-require-effective-target arm_libc_fp_abi_ok } */
+/* { dg-options "-mfp16-format=ieee -mfpu=auto" } */
  /* { dg-add-options arm_arch_v7a } */
+/* { dg-add-options arm_libc_fp_abi } */
  
  #include "arm_neon.h"
  
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp

index 75703ddca60..0c2fd83f45c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4950,6 +4950,41 @@ proc add_options_for_arm_fp { flags } {
  return "$flags $et_arm_fp_flags"
  }
  
+# Some libc headers will only compile correctly if the correct ABI flags

+# are picked for the target environment.  Try to find an ABI setting
+# that works.  Glibc falls into this category.  This test is intended
+# to enable FP as far as possible, so does not try -mfloat-abi=soft.
+proc check_effective_target_arm_libc_fp_abi_ok_nocache { } {
+global et_arm_libc_fp_abi_flags
+set et_arm_libc_fp_abi_flags ""
+if { [check_effective_target_arm32] } {
+   foreach flags {"-mfloat-abi=hard" "-mfloat-abi=softfp"} {
+   if { [check_no_compiler_messages_nocache arm_libc_fp_abi_ok object {
+   #include 
+   } "$flags"] } {
+   set et_arm_libc_fp_abi_flags $flags
+   return 1
+   }
+   }
+}
+return 0
+}
+
+proc  check_effective_target_arm_libc_fp_abi_ok { } {
+return [check_cached_effective_target arm_libc_fp_abi_ok \
+   check_effective_target_arm_libc_fp_abi_ok_nocache]
+}
+
+# Add flags that pick the right ABI for the supported libc headers on
+# this platform.
+proc add_options_for_arm_libc_fp_abi { flags } {
+if { ! [check_effective_target_arm_libc_fp_abi_ok] } {
+   return "$flags"
+}
+global et_arm_libc_fp_abi_flags
+return "$flags $et_arm_libc_fp_abi_flags"
+}
+
  # Return 1 if this is an ARM target defining __ARM_FP with
  # double-precision support. We may need -mfloat-abi=softfp or
  # equivalent options.  Some multilibs may be incompatible with these




Re: [PATCH] testsuite: arm: Allow vst1.32 instruction in pr40457-2.c

2024-11-08 Thread Richard Earnshaw (lists)

On 07/11/2024 17:15, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

When building the test case with neon, the 'vst1.32' instruction is used
instead of 'strd'. Allow both variants to make the test pass.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed
instruction.

Signed-off-by: Torbjörn SVENSSON 


OK.

R.



Re: [PATCH 04/10] gimple: Disallow sizeless types in BIT_FIELD_REFs.

2024-11-08 Thread Tejas Belagod

On 11/8/24 1:19 PM, Richard Biener wrote:

On Fri, Nov 8, 2024 at 7:30 AM Tejas Belagod  wrote:


On 11/7/24 5:52 PM, Richard Biener wrote:

On Thu, Nov 7, 2024 at 11:13 AM Tejas Belagod  wrote:


On 11/7/24 2:36 PM, Richard Biener wrote:

On Thu, Nov 7, 2024 at 8:25 AM Tejas Belagod  wrote:


On 11/6/24 6:02 PM, Richard Biener wrote:

On Wed, Nov 6, 2024 at 12:49 PM Tejas Belagod  wrote:


Ensure sizeless types don't end up trying to be canonicalised to BIT_FIELD_REFs.


You mean variable-sized?  But don't we know, when there's a constant
array index,
that the size is at least so this indexing is OK?  So what's wrong with a
fixed position, fixed size BIT_FIELD_REF extraction of a VLA object?

Richard.



Ah! The code and comment/description don't match, sorry. This change
started out as gating out all canonicalizations of VLA vectors when I
had limited understanding of how this worked, but eventually was
simplified to gate in only those offsets that were known_le, but missed
out fixing the comment/description. So, for eg.

int foo (svint32_t v) { return v[3]; }

canonicalises to a BIT_FIELD_REF 

but something like:

int foo (svint32_t v) { return v[4]; }


So this is possibly out-of-bounds?


reduces to a VEC_EXTRACT <>


But if out-of-bounds a VEC_EXTRACT isn't any better than a BIT_FIELD_REF, no?


Someone may have code protecting accesses like so:

/* svcntw () returns num of 32-bit elements in a vec */
if (svcntw () >= 8)
  return v[4];

So I didn't error or warn (-Warray-bounds) for this or for that matter
make it UB as it will be spurious. So technically, it may not be OOB access.

Therefore BIT_FIELD_REFs are generated for anything within the range of
a Adv SIMD register and anything beyond is left to be vec_extracted with
SVE instructions.


You still didn't state the technical reason why BIT_FIELD_REF is worse than
.VEC_EXTRACT (which is introduced quite late only btw).

I'm mostly questioning that we have two different canonicalizations that oddly
depend on the constant index.  I'd rather always go .VEC_EXTRACT or
always BIT_FIELD_REF (prefer that one) instead of having a mix for VLA vectors.



Sorry, I misunderstood your question. The choice of canonicalization
based on index range wasn't by design - just happened to be a
side-effect of my trying to accommodate VLA poly sizes in place of
constants. When I checked that potentially-out-of-bounds VLA indices
were taking the VEC_EXTRACT route, I didn't think about using
BIT_FIELD_REFs for them too - frankly I didn't know we could even do
that for access outside the minimum vector size.

When I now try to canonicalize all constant VLA indices to
BIT_FIELD_REFs I encounter ICE and gimple verification does not seem to
be happy about potentially accessing outside object size range. If we
have to make BIT_FIELD_REF work for potentially OOB constant VLA
indices, wouldn't this be quite a fundamental assumption we might have
to change?


I see BIT_FIELD_REF verification uses maybe_gt, it could as well use
known_gt.  How does .VEC_EXTRACT end up handling "maybe_gt" constant
indices?  I'm not familiar enough with VLA regs handling to decide here.

That said, I'd prefer if you either avoid any canonicalization to BIT_FIELD_REF
or make all of them "work".  As said we introudce .VEC_EXTRACT only very
late during ISEL IIRC.



Allowing OOB constants in BIT_FIELD_REF in the gimple-verifier seems to 
work ok, but the extraction happens via memory in the generated code 
which isn't the most optimal - need to fix it up to expand 
BIT_FIELD_REFs more optimally - looking into it now...


Thanks,
Tejas.


Richard.



Thanks,
Tejas.


Richard.



Thanks,
Tejas.





I'll fix the comment/description.

Thanks,
Tejas.


gcc/ChangeLog:

* gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Disallow 
sizeless
types in BIT_FIELD_REFs.
---
 gcc/gimple-fold.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c19dac0dbfd..dd45d9f7348 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6281,6 +6281,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
   && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 
0
 {
   tree vtype = TREE_TYPE (TREE_OPERAND (TREE_OPERAND (*t, 0), 0));
+  /* BIT_FIELD_REF can only happen on constant-size vectors.  */
   if (VECTOR_TYPE_P (vtype))
{
  tree low = array_ref_low_bound (*t);
@@ -6294,7 +6295,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
 (TYPE_SIZE (TREE_TYPE (*t;
  widest_int ext
= wi::add (idx, wi::to_widest (TYPE_SIZE (TREE_TYPE 
(*t;
- if (wi::les_p (ext, wi::to_widest (TYPE_SIZE (vtype
+ if (known_le (ext, wi::to_poly_widest (TYPE_SIZE (vtype
{

Re: [PATCH] i386: Disallow long address mode in the x32 mode. [PR 117418]

2024-11-08 Thread Hongtao Liu
On Fri, Nov 8, 2024 at 3:18 PM Uros Bizjak  wrote:
>
> On Fri, Nov 8, 2024 at 6:52 AM Hongtao Liu  wrote:
>
> > > > > PR target/117418
> > > > > * config/i386/i386-options.cc 
> > > > > (ix86_option_override_internal): raise an
> > > > > error with option -mx32 -maddress-mode=long.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/117418
> > > > > * gcc.target/i386/pr117418-1.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386-options.cc|  4 
> > > > >  gcc/testsuite/gcc.target/i386/pr117418-1.c | 13 +
> > > > >  2 files changed, 17 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr117418-1.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386-options.cc 
> > > > > b/gcc/config/i386/i386-options.cc
> > > > > index 239269ecbdd..ba1abea2537 100644
> > > > > --- a/gcc/config/i386/i386-options.cc
> > > > > +++ b/gcc/config/i386/i386-options.cc
> > > > > @@ -2190,6 +2190,10 @@ ix86_option_override_internal (bool 
> > > > > main_args_p,
> > > > > error ("address mode %qs not supported in the %s bit mode",
> > > > >TARGET_64BIT_P (opts->x_ix86_isa_flags) ? "short" : 
> > > > > "long",
> > > > >TARGET_64BIT_P (opts->x_ix86_isa_flags) ? "64" : "32");
> > > > > +
> > > > > +  if (TARGET_X32_P (opts->x_ix86_isa_flags)
> > > > > + && opts_set->x_ix86_pmode == PMODE_DI)
> > > > > +   error ("address mode 'long' not supported in the x32 ABI");
> > > >
> > > > This looks wrong.   Try the encoded patch.
> > > >
> > > So it means -maddress-mode=long will override x32 to use 64-bit pointer?
> > No, answered by myself.
> > The upper 32-bit is zero, so it's still 32-bit memory space although
> > it uses a 64-bit register as a pointer.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82267
>
> Yes, ptr_mode and Pmode are two different things. ptr_mode is ABI
> mandated pointer mode, while Pmode is an implementation detail. As
> most of the targets, x32 used Pmode == ptr_mode, but as evident from
> the quoted PR, it resulted in many 0x67 prefixes due to address size
> overrides to handle SImode ptr_mode ABI requirements.
>
> -maddress-mode=long was added just for x32 to mitigate this problem.
> It was beneficial for some applications, thus the request to make it
> the default, but it remained as it is.
>
> Because this option is not the default one, it has (some) tendency to
> bitrot, when some new committed code assumes ptr_mode == Pmode. IIRC,
> HJ had a tester that exercised -maddress-mode=long for -mx32 to hunt
> middle-end and back-end issues with ptr_mode != Pmode.
Thanks for explaining the background and reasons for it
>
> Uros.



-- 
BR,
Hongtao


[PATCH v2] testsuite: arm: Use check-function-bodies in epilog-1.c test

2024-11-08 Thread Torbjörn SVENSSON
Changes since v1:

- Added generated assembler in commit message.
- Added comments in test case when each block is relevant.

Ok for trunk and releases/gcc-14?

--

Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
push{r4, lr}
ldr r4, .L6
ldr r4, [r4]
lslsr4, r4, #29
it  mi
addmi   r2, r2, #1
bl  bar
movsr0, #0
pop {r4, pc}

armv8.1-m.main:
push{r3, r4, r5, lr}
ldr r4, .L5
ldr r5, [r4]
tst r5, #4
csinc   r2, r2, r2, eq
bl  bar
movsr0, #0
pop {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/epilog-1.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/epilog-1.c 
b/gcc/testsuite/gcc.target/arm/epilog-1.c
index f97f1ebeaaf..a1516456460 100644
--- a/gcc/testsuite/gcc.target/arm/epilog-1.c
+++ b/gcc/testsuite/gcc.target/arm/epilog-1.c
@@ -2,16 +2,34 @@
 /* { dg-do compile } */
 /* { dg-options "-mthumb -Os" } */
 /* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 volatile int g_k;
 extern void bar(int, int, int, int);
 
+/*
+** foo:
+** ...
+** (
+
+Below block is for non-armv8.1-m.main
+** lslsr[0-9]+, r[0-9]+, #29
+** it  mi
+** addmi   r2, r2, #1
+
+** |
+
+Below block is for armv8.1-m.main
+** tst r[0-9]+, #4
+** csinc   r2, r2, r2, eq
+
+** )
+** bl  bar
+** ...
+*/
 int foo(int a, int b, int c, int d)
 {
   if (g_k & 4) c++;
   bar (a, b, c, d);
   return 0;
 }
-
-/* { dg-final { scan-assembler-times "lsls.*#29" 1 } } */
-/* { dg-final { scan-assembler-not "tst" } } */
-- 
2.25.1



Re: [PATCH v4 4/8] vect: Add maskload else value support.

2024-11-08 Thread Richard Biener
On Thu, 7 Nov 2024, Robin Dapp wrote:

> From: Robin Dapp 
> 
> This patch adds an else operand to vectorized masked load calls.
> The current implementation adds else-value arguments to the respective
> target-querying functions that is used to supply the vectorizer with the
> proper else value.
> 
> We query the target for its supported else operand and uses that for the
> maskload call.  If necessary, i.e. if the mode has padding bits and if
> the else operand is nonzero, a VEC_COND enforcing a zero else value is
> emitted.

LGTM.

Richard.

> gcc/ChangeLog:
> 
>   * optabs-query.cc (supports_vec_convert_optab_p): Return icode.
>   (get_supported_else_val): Return supported else value for
>   optab's operand at index.
>   (supports_vec_gather_load_p): Add else argument.
>   (supports_vec_scatter_store_p): Ditto.
>   * optabs-query.h (supports_vec_gather_load_p): Ditto.
>   (get_supported_else_val): Ditto.
>   * optabs-tree.cc (target_supports_mask_load_store_p): Ditto.
>   (can_vec_mask_load_store_p): Ditto.
>   (target_supports_len_load_store_p): Ditto.
>   (get_len_load_store_mode): Ditto.
>   * optabs-tree.h (target_supports_mask_load_store_p): Ditto.
>   (can_vec_mask_load_store_p): Ditto.
>   * tree-vect-data-refs.cc (vect_lanes_optab_supported_p): Ditto.
>   (vect_gather_scatter_fn_p): Ditto.
>   (vect_check_gather_scatter): Ditto.
>   (vect_load_lanes_supported): Ditto.
>   * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern):
>   Ditto.
>   * tree-vect-slp.cc (vect_get_operand_map): Adjust indices for
>   else operand.
>   (vect_slp_analyze_node_operations): Skip undefined else operand.
>   * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p):
>   Add else operand handling.
>   (vect_get_vec_defs_for_operand): Handle undefined else operand.
>   (check_load_store_for_partial_vectors): Add else argument.
>   (vect_truncate_gather_scatter_offset): Ditto.
>   (vect_use_strided_gather_scatters_p): Ditto.
>   (get_group_load_store_type): Ditto.
>   (get_load_store_type): Ditto.
>   (vect_get_mask_load_else): Ditto.
>   (vect_get_else_val_from_tree): Ditto.
>   (vect_build_one_gather_load_call): Add zero else operand.
>   (vectorizable_load): Use else operand.
>   * tree-vectorizer.h (vect_gather_scatter_fn_p): Add else
>   argument.
>   (vect_load_lanes_supported): Ditto.
>   (vect_get_mask_load_else): Ditto.
>   (vect_get_else_val_from_tree): Ditto.
> 
> vect
> ---
>  gcc/optabs-query.cc|  70 +---
>  gcc/optabs-query.h |   3 +-
>  gcc/optabs-tree.cc |  66 ++--
>  gcc/optabs-tree.h  |   8 +-
>  gcc/tree-vect-data-refs.cc |  74 ++---
>  gcc/tree-vect-patterns.cc  |  12 +-
>  gcc/tree-vect-slp.cc   |  25 ++-
>  gcc/tree-vect-stmts.cc | 326 +++--
>  gcc/tree-vectorizer.h  |  10 +-
>  9 files changed, 468 insertions(+), 126 deletions(-)
> 
> diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
> index cc52bc0f5ea..c1f3558af92 100644
> --- a/gcc/optabs-query.cc
> +++ b/gcc/optabs-query.cc
> @@ -29,6 +29,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "rtl.h"
>  #include "recog.h"
>  #include "vec-perm-indices.h"
> +#include "internal-fn.h"
> +#include "memmodel.h"
> +#include "optabs.h"
>  
>  struct target_optabs default_target_optabs;
>  struct target_optabs *this_fn_optabs = &default_target_optabs;
> @@ -672,34 +675,57 @@ lshift_cheap_p (bool speed_p)
> that mode, given that the second mode is always an integer vector.
> If MODE is VOIDmode, return true if OP supports any vector mode.  */
>  
> -static bool
> -supports_vec_convert_optab_p (optab op, machine_mode mode)
> +static enum insn_code
> +supported_vec_convert_optab (optab op, machine_mode mode)
>  {
>int start = mode == VOIDmode ? 0 : mode;
>int end = mode == VOIDmode ? MAX_MACHINE_MODE - 1 : mode;
> +  enum insn_code icode = CODE_FOR_nothing;
>for (int i = start; i <= end; ++i)
>  if (VECTOR_MODE_P ((machine_mode) i))
>for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j)
> - if (convert_optab_handler (op, (machine_mode) i,
> -(machine_mode) j) != CODE_FOR_nothing)
> -   return true;
> + {
> +   if ((icode
> += convert_optab_handler (op, (machine_mode) i,
> + (machine_mode) j)) != CODE_FOR_nothing)
> + return icode;
> + }
>  
> -  return false;
> +  return icode;
>  }
>  
>  /* If MODE is not VOIDmode, return true if vec_gather_load is available for
> that mode.  If MODE is VOIDmode, return true if gather_load is available
> -   for at least one vector mode.  */
> +   for at least one vector mode.
> +   In that case, and if ELSVALS is nonzero, store the supported else values
> +   into the vector it points to.  */
>  

[PATCH] c++: Small initial fixes for zeroing of padding bits [PR117256]

2024-11-08 Thread Jakub Jelinek
Hi!

https://eel.is/c++draft/dcl.init#general-6
says that even padding bits are supposed to be zeroed during
zero-initialization.
The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665565.html
patch attempts to implement that, though only for the easy
cases so far, in particular marks the CONSTRUCTOR created during
zero-initialization (or zero-initialization done during the
value-initialization) as having padding bits cleared and for
constexpr evaluation attempts to preserve that bit on a new CONSTRUCTOR
created for CONSTRUCTOR_ZERO_PADDING_BITS lhs.

I think we need far more than that, but am not sure where exactly
to implement that.
In particular, I think __builtin_bitcast should take it into account
during constant evaluation, if the padding bits in something are guaranteed
to be zero, then I'd think std::bitcast out of it and testing those
bits in there should be well defined.
But if we do that, the flag needs to be maintained precisely, not just
conservatively, so e.g. any place where some object is copied into another
one (except bitcast?) which would be element-wise copy, the bit should
be cleared (or preserved from the earlier state?  I'd hope
element-wise copying invalidates even the padding bits, but then what
about just stores into some members, do those invalidate the padding bits
in the rest of the object?).  But if it is an elided copy, it shouldn't.
And am not really sure what happens e.g. with non-automatic constexpr
variables.  If it is constructed by something that doesn't guarantee
the zeroing of the padding bits (so similarly constructed constexpr automatic
variable would have undefined state of the padding bits), are those padding
bits well defined because it isn't automatic variable?

Anyway, I hope the following patch is at least a small step in the right
direction.

Bootstrapped/regtested on x86_64-linux and i686-linux, it caused
g++.dg/tm/pr45940-3.C and g++.dg/tm/pr45940-4.C regressions like
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++11 (internal compiler error: in 
create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++11 (test for excess errors)
but that seems to be a trans-mem.cc preexisting bug which I'm going
to post a patch for separately, ok for trunk (of course, provided the
above mentioned patch is acked too)?

2024-11-07  Jakub Jelinek  

PR c++/78620
PR c++/117256
* init.cc (build_zero_init_1): Set CONSTRUCTOR_ZERO_PADDING_BITS.
(build_value_init_noctor): Likewise.
* constexpr.cc (cxx_eval_store_expression): Propagate
CONSTRUCTOR_ZERO_PADDING_BITS flag.

--- gcc/cp/init.cc.jj   2024-11-06 10:19:11.435260625 +0100
+++ gcc/cp/init.cc  2024-11-07 17:23:13.335275180 +0100
@@ -249,6 +249,7 @@ build_zero_init_1 (tree type, tree nelts
 
   /* Build a constructor to contain the initializations.  */
   init = build_constructor (type, v);
+  CONSTRUCTOR_ZERO_PADDING_BITS (init) = 1;
 }
   else if (TREE_CODE (type) == ARRAY_TYPE)
 {
@@ -467,7 +468,9 @@ build_value_init_noctor (tree type, tsub
}
 
  /* Build a constructor to contain the zero- initializations.  */
- return build_constructor (type, v);
+ tree ret = build_constructor (type, v);
+ CONSTRUCTOR_ZERO_PADDING_BITS (ret) = 1;
+ return ret;
}
 }
   else if (TREE_CODE (type) == ARRAY_TYPE)
--- gcc/cp/constexpr.cc.jj  2024-11-05 08:58:25.144845731 +0100
+++ gcc/cp/constexpr.cc 2024-11-07 17:59:54.053170842 +0100
@@ -6421,6 +6421,7 @@ cxx_eval_store_expression (const constex
 
   type = TREE_TYPE (object);
   bool no_zero_init = true;
+  bool zero_padding_bits = false;
 
   auto_vec ctors;
   releasing_vec indexes;
@@ -6433,6 +6434,7 @@ cxx_eval_store_expression (const constex
{
  *valp = build_constructor (type, NULL);
  CONSTRUCTOR_NO_CLEARING (*valp) = no_zero_init;
+ CONSTRUCTOR_ZERO_PADDING_BITS (*valp) = zero_padding_bits;
}
   else if (STRIP_ANY_LOCATION_WRAPPER (*valp),
   TREE_CODE (*valp) == STRING_CST)
@@ -6492,8 +6494,10 @@ cxx_eval_store_expression (const constex
}
 
   /* If the value of object is already zero-initialized, any new ctors for
-subobjects will also be zero-initialized.  */
+subobjects will also be zero-initialized.  Similarly with zeroing of
+padding bits.  */
   no_zero_init = CONSTRUCTOR_NO_CLEARING (*valp);
+  zero_padding_bits = CONSTRUCTOR_ZERO_PADDING_BITS (*valp);
 
   if (code == RECORD_TYPE && is_empty_field (index))
/* Don't build a sub-CONSTRUCTOR for an empty base or field, as they
@@ -6678,6 +6682,7 @@ cxx_eval_store_expression (const constex
{
  *valp = build_constructor (type, NULL);
  CONSTRUCTOR_NO_CLEARING (*valp) = no_zero_init;
+ CONSTRUCTOR_ZERO_PADDING_BITS (*valp) = zero_padding_bits;
}
   new_ctx.ctor = empty_base ? NULL_TREE : 

Re: [PATCH] testsuite: arm: Use effective-target for nomve_fp_1 test

2024-11-08 Thread Torbjorn SVENSSON




On 2024-11-07 23:19, Christophe Lyon wrote:

On Thu, 7 Nov 2024 at 18:33, Torbjorn SVENSSON
 wrote:




On 2024-11-07 11:40, Christophe Lyon wrote:

Hi Torbjörn,

On Thu, 31 Oct 2024 at 19:34, Torbjörn SVENSSON
 wrote:


Ok for trunk and releases/gcc-14?

--

Test uses MVE, so add effective-target arm_fp requirement.

gcc/testsuite/ChangeLog:

  * g++.target/arm/mve/general-c++/nomve_fp_1.c: Use
  effective-target arm_fp.


I see I made a similar change to the corresponding "C" test:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624404.html

Is your patch fixing the same issue?


Yes, it looks like it's the same issue (and resolution).


Thanks for confirming. The patch is OK.


Pushed as r15-5035-ge8886406fac and r14.2.0-372-gef771933842.

Kind regards,
Torbjörn



Thanks,

Christophe


Kind regards,
Torbjörn



Thanks,

Christophe


Signed-off-by: Torbjörn SVENSSON 
---
   gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c 
b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
index e0692ceb8c8..a2069d353cf 100644
--- a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
@@ -1,9 +1,11 @@
   /* { dg-do compile } */
+/* { dg-require-effective-target arm_fp_ok } */
   /* { dg-require-effective-target arm_v8_1m_mve_ok } */
   /* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
  which could imply mve+fp depending on the user settings. We want to make
  sure the '+fp' extension is not enabled.  */
   /* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+/* { dg-add-options arm_fp } */

   #include 

--
2.25.1







[PATCH] trans-mem: Fix ICE caused by expand_assign_tm

2024-11-08 Thread Jakub Jelinek
Hi!

My https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668065.html
patch regressed
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++11 (internal compiler error: in 
create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++11 (test for excess errors)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++14 (internal compiler error: in 
create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++14 (test for excess errors)
...
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++26 (internal compiler error: in 
create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++26 (test for excess errors)
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++98 (internal compiler error: in 
create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++98 (test for excess errors)
tests, but it turns out it is a preexisting bug.
If I modify the pr45940-3.C testcase
--- gcc/testsuite/g++.dg/tm/pr45940-3.C 2020-01-12 11:54:37.258400660 +0100
+++ gcc/testsuite/g++.dg/tm/pr45940-3.C 2024-11-08 10:35:11.918390743 +0100
@@ -16,6 +16,7 @@ class sp_counted_base
 {
 protected:
 int use_count_;// #shared
+int a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, 
x, y, z, aa, ab, ac, ad, ae, af;
 public:
 __attribute__((transaction_safe))
 virtual void dispose() = 0; // nothrow
then it ICEs already on vanilla trunk.

The problem is that expand_assign_tm just wants to force it into
TM memcpy argument, if is_gimple_reg (reg), then it creates a temporary,
stores the value there and takes temporary address, otherwise it takes
address of rhs.  That doesn't work if rhs is an empty CONSTRUCTOR with
C++ non-POD type (TREE_ADDRESSABLE type), we ICE trying to create temporary,
because we shouldn't be creating a temporary.
Now before my patch with the CONSTRUCTOR only having a vtable pointer
(64bit) and 32-bit field, we gimplified the zero initialization just
as storing of 0s to the 2 fields, but as we are supposed to also clear
padding bits, we now gimplify it as MEM[...] = {}; to make sure
even the padding bits are cleared.  With the adjusted testcase,
we gimplified it even before as MEM[...] = {}; because it was simply
too large and clearing everything looked beneficial.

The following patch fixes this ICE by using TM memset, it is both
wasteful to force zero constructor into a temporary just to TM memcpy
it into the lhs, and in C++ cases like this invalid.

Ok for trunk if it passes bootstrap/regtest?

2024-11-08  Jakub Jelinek  

* trans-mem.cc (expand_assign_tm): Don't take address
of empty CONSTRUCTOR, instead use BUILT_IN_TM_MEMSET
to clear lhs in that case.  Formatting fixes.

--- gcc/trans-mem.cc.jj 2024-10-25 10:00:29.527767013 +0200
+++ gcc/trans-mem.cc2024-11-08 09:55:08.963557301 +0100
@@ -2442,26 +2442,33 @@ expand_assign_tm (struct tm_region *regi
  gcall = gimple_build_assign (rtmp, rhs);
  gsi_insert_before (gsi, gcall, GSI_SAME_STMT);
}
+  else if (TREE_CODE (rhs) == CONSTRUCTOR
+  && CONSTRUCTOR_NELTS (rhs) == 0)
+   {
+ /* Don't take address of an empty CONSTRUCTOR, it might not
+work for C++ non-POD constructors at all and otherwise
+would be inefficient.  Use tm memset to clear lhs.  */
+ gcc_assert (!load_p && store_p);
+ rhs_addr = integer_zero_node;
+   }
   else
rhs_addr = gimplify_addr (gsi, rhs);
 
   // Choose the appropriate memory transfer function.
-  if (load_p && store_p)
-   {
- // ??? Figure out if there's any possible overlap between
- // the LHS and the RHS and if not, use MEMCPY.
- copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMMOVE);
-   }
+  if (store_p
+ && TREE_CODE (rhs) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (rhs) == 0)
+   copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMSET);
+  else if (load_p && store_p)
+   // ??? Figure out if there's any possible overlap between
+   // the LHS and the RHS and if not, use MEMCPY.
+   copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMMOVE);
   else if (load_p)
-   {
- // Note that the store is non-transactional and cannot overlap.
- copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMCPY_RTWN);
-   }
+   // Note that the store is non-transactional and cannot overlap.
+   copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMCPY_RTWN);
   else
-   {
- // Note that the load is non-transactional and cannot overlap.
- copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMCPY_RNWT);
-   }
+   // Note that the load is non-transactional and cannot overlap.
+   copy_fn = builtin_decl_explicit (BUILT_IN_TM_MEMCPY_RNWT);
 
   gcall = gimple_build_call (copy_fn, 3, lhs_addr, rhs_addr,
 TYPE_SIZE_UNIT (TREE_TYPE (lhs)));

Jakub



Re: [PATCH] testsuite: arm: Use effective-target for pr84556.cc test

2024-11-08 Thread Richard Earnshaw (lists)

On 06/11/2024 09:39, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

Using "dg-do run" with a selector breaks testing arm-none-eabi for any
architecture when check_effective_target_arm_neon_hw returns 0.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.

Signed-off-by: Torbjörn SVENSSON 


Ah, this is because it overrides the default selector set by vect.exp 
that picks between dg-do run and dg-do compile based on the target's 
support for simd operations.  I think that should be made clearer in the 
commit message as otherwise others may be as confused as I was :)


R.


---
  gcc/testsuite/g++.dg/vect/pr84556.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr84556.cc 
b/gcc/testsuite/g++.dg/vect/pr84556.cc
index 6b1c9cec515..c7e331628a8 100644
--- a/gcc/testsuite/g++.dg/vect/pr84556.cc
+++ b/gcc/testsuite/g++.dg/vect/pr84556.cc
@@ -1,5 +1,5 @@
  // PR c++/84556
-// { dg-do run { target c++11 } }
+// { dg-require-effective-target c++11 }
  // { dg-additional-options "-O2 -fopenmp-simd" }
  // { dg-additional-options "-mavx" { target avx_runtime } }
  




Re: [PATCH] testsuite: arm: Require 16-bit float support

2024-11-08 Thread Richard Earnshaw (lists)

On 05/11/2024 20:06, Torbjörn SVENSSON wrote:

Based on how these functions are used in test cases, I think it's correct
to require 16-bit float support in both functions.

Without this change, the checks passes for armv8-m and armv8.1-m, but the
test cases that uses them fails due to the incorrect -mfpu option.

Ok for trunk and releases/gcc-14?


Can you expand on the issue you're trying to address with this change?

R.



--

In both functions, it's assumed that 16-bit float support is available,
but it's not checked.
In addition, check_effective_target_arm_v8_2a_fp16_neon_ok also assumes
that neon is used, but it's not checked.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): Check
that 16-bit float is supported.
(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): Check
that neon is used and that 16-bit float is supported.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/lib/target-supports.exp | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 75703ddca60..19a9981d9cd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6360,6 +6360,12 @@ proc 
check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {
   "-mfpu=fp-armv8 -mfloat-abi=softfp"} {
if { [check_no_compiler_messages_nocache \
  arm_v8_2a_fp16_scalar_ok object {
+   #if !defined (__ARM_FP)
+   #error "__ARM_FP not defined"
+   #endif
+   #if ((__ARM_FP & 1) == 0)
+   #error "__ARM_FP indicates that 16-bit is not supported"
+   #endif
#if !defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
#error "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined"
#endif
@@ -6395,6 +6401,15 @@ proc 
check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {
   "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
if { [check_no_compiler_messages_nocache \
  arm_v8_2a_fp16_neon_ok object {
+   #if !defined (__ARM_FP)
+   #error "__ARM_FP not defined"
+   #endif
+   #if ((__ARM_FP & 1) == 0)
+   #error "__ARM_FP indicates that 16-bit is not supported"
+   #endif
+   #if !defined (__ARM_NEON__)
+   #error "__ARM_NEON__ not defined"
+   #endif
#if !defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
#error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined"
#endif




Re: [PATCH 0/4] libsanitizer: merge from upstream

2024-11-08 Thread Jakub Jelinek
On Thu, Nov 07, 2024 at 02:35:34PM +0800, Kito Cheng wrote:
> The patch set aims to update libsanitizer from upstream. The motivation is 
> that
> RISC-V is changing the shadow offset for AddressSanitizer, and I also plan to
> submit another patch set to add dynamic shadow offset support for GCC.
> 
> This is my first time updating it, so I used my laptop and an AArch64 server
> from cfram to run the regression tests. I tested on x86_64/Linux (Ubuntu 
> 22.04)
> and AArch64/Linux (Rocky 9.4), both with --with-build-config=bootstrap-asan 
> and
> performed a standard 3-stage build. No new regressions were introduced.
> 
> NOTE: I tried to run regression tests with
> --with-build-config=bootstrap-asan, but I received warnings from
> LeakSanitizer due to the pretty printer, which make the test results
> unusable...

If pretty-printer now massively leaks, we should file a PR and get it fixed
for GCC 15.

> Kito Cheng (4):
>   libsanitizer: merge from upstream (61a6439f35b6de28)
>   libsanitizer: Apply local patches
>   libsanitizer: Improve FrameIsInternal
>   libsanitizer: update test

Ok for trunk, but please mention in LOCAL_PATCHES all the 2/4, 3/4 and 4/4
commits, not just 2/4 (and commit that as the last patch, separately
from the series so that you don't need to repeat it any time git rebase
is needed before pushing).

Jakub



Re: [PATCH 0/4] libsanitizer: merge from upstream

2024-11-08 Thread Xi Ruoyao
On Fri, 2024-11-08 at 12:35 +0100, Jakub Jelinek wrote:
> On Thu, Nov 07, 2024 at 02:35:34PM +0800, Kito Cheng wrote:
> > The patch set aims to update libsanitizer from upstream. The motivation is 
> > that
> > RISC-V is changing the shadow offset for AddressSanitizer, and I also plan 
> > to
> > submit another patch set to add dynamic shadow offset support for GCC.
> > 
> > This is my first time updating it, so I used my laptop and an AArch64 server
> > from cfram to run the regression tests. I tested on x86_64/Linux (Ubuntu 
> > 22.04)
> > and AArch64/Linux (Rocky 9.4), both with --with-build-config=bootstrap-asan 
> > and
> > performed a standard 3-stage build. No new regressions were introduced.
> > 
> > NOTE: I tried to run regression tests with
> > --with-build-config=bootstrap-asan, but I received warnings from
> > LeakSanitizer due to the pretty printer, which make the test results
> > unusable...
> 
> If pretty-printer now massively leaks, we should file a PR and get it fixed
> for GCC 15.
> 
> > Kito Cheng (4):
> >   libsanitizer: merge from upstream (61a6439f35b6de28)
> >   libsanitizer: Apply local patches
> >   libsanitizer: Improve FrameIsInternal
> >   libsanitizer: update test
> 
> Ok for trunk, but please mention in LOCAL_PATCHES all the 2/4, 3/4 and 4/4
> commits, not just 2/4 (and commit that as the last patch, separately
> from the series so that you don't need to repeat it any time git rebase
> is needed before pushing).

IIUC 4/4 shouldn't be in LOCAL_PATCHES?  It modifies our own test case,
not from the upstream.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH v2] testsuite: arm: Use effective-target for pr84556.cc test

2024-11-08 Thread Torbjörn SVENSSON
Changes since v1:

- Clarified the commit message to include where the descision is taken
  and why it's a bad idea to use "dg-do run" in a test case.
  Note: This does not only fix it for arm-none-eabi. I see the same
  kind of construct used by for example sparc.

Sorry for the confusion Richard, I hope it's more clear why this is
needed now. :)

Ok for trunk and releases/gcc-14?

--

Using "dg-do run" with a selector overrides the default selector set by
vect.exp that picks between "dg-do run" and "dg-do compile" based on the
target's support for simd operations for Arm targets.
The actual selection of default operation is performed in
check_vect_support_and_set_flags.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/g++.dg/vect/pr84556.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr84556.cc 
b/gcc/testsuite/g++.dg/vect/pr84556.cc
index 6b1c9cec515..c7e331628a8 100644
--- a/gcc/testsuite/g++.dg/vect/pr84556.cc
+++ b/gcc/testsuite/g++.dg/vect/pr84556.cc
@@ -1,5 +1,5 @@
 // PR c++/84556
-// { dg-do run { target c++11 } }
+// { dg-require-effective-target c++11 }
 // { dg-additional-options "-O2 -fopenmp-simd" }
 // { dg-additional-options "-mavx" { target avx_runtime } }
 
-- 
2.25.1



Re: [PATCH v2] testsuite: arm: Use effective-target for pr84556.cc test

2024-11-08 Thread Richard Earnshaw (lists)

On 08/11/2024 11:48, Torbjörn SVENSSON wrote:

Changes since v1:

- Clarified the commit message to include where the descision is taken
   and why it's a bad idea to use "dg-do run" in a test case.
   Note: This does not only fix it for arm-none-eabi. I see the same
   kind of construct used by for example sparc.

Sorry for the confusion Richard, I hope it's more clear why this is
needed now. :)

Ok for trunk and releases/gcc-14?

--

Using "dg-do run" with a selector overrides the default selector set by
vect.exp that picks between "dg-do run" and "dg-do compile" based on the
target's support for simd operations for Arm targets.
The actual selection of default operation is performed in
check_vect_support_and_set_flags.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.

Signed-off-by: Torbjörn SVENSSON 


OK

R.



Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-08 Thread Wilco Dijkstra
Hi Richard,

> It's ok for instructions to require properties that are false during
> early RTL passes and then transition to true.  But they can't require
> properties that go from true to false, since that would mean that
> existing instructions become unrecognisable at certain points during
> the compilation process.

Only invalid cases are rejected - can_create_pseudo_p is used to reject
instructions that cannot be split after regalloc. This is basically a small
extension to that: we always split the aarch64_float_const_rtx_p case before
regalloc, and thus no such instructions should exist/created during IRA.

So this simply blocks any code that tries to undo the split.

> Also, why are the conditions tighter for aarch64_float_const_rtx_p
> (which we can split) but not for the general case (which we can't,
> and presumably need to force to memory)?  I.e. for what cases do we want
> the final return to be (sometimes) true?  If it's going to be forced
> into memory anyway then wouldn't we get better optimisation by exposing
> that early?

This is the only case that is always split before regalloc. The forced into 
memory
case works exactly like it does now for all other FP immediates.

> Would it be possible to handle the split during expand instead?
> Or do we expect to discover new FP constants during RTL optimisation?
> If so, where do they come from?

The split is done during the split passes. The issue is that combine_and_move
undoes this split during IRA and creates new FP constants that then require to 
be
split, but they aren't because we don't run split passes during IRA.

Cheers,
Wilco


[PATCH] Enable gcc.dg/vect/vect-early-break_21.c on x86_64

2024-11-08 Thread Richard Biener
The following also enables the testcase on x86 as it now has the
required cbranch.

tested on x86_64, pushed.

* gcc.dg/vect/vect-early-break_21.c: Remove disabling of
x86_64 and i?86.
---
 gcc/testsuite/gcc.dg/vect/vect-early-break_21.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
index dbe3f826511..f73f3c2eb86 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
@@ -5,7 +5,7 @@
 
 /* { dg-additional-options "-Ofast" } */
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! 
"x86_64-*-* i?86-*-*" } } } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 #include 
 
-- 
2.43.0


Re: [PATCH 0/4] libsanitizer: merge from upstream

2024-11-08 Thread Jakub Jelinek
On Fri, Nov 08, 2024 at 07:38:13PM +0800, Xi Ruoyao wrote:
> IIUC 4/4 shouldn't be in LOCAL_PATCHES?  It modifies our own test case,
> not from the upstream.

Sure, sorry.

Jakub



[PATCH]AArch64 backport Neoverse and Cortex CPU definitions

2024-11-08 Thread Tamar Christina
Hi All,

This is a conservative backport of a few core definitions backporting only the
core definitions and mapping them to their closest cost model that exist on the
branches.

Bootstrapped Regtested on aarch64-none-linux-gnu on branches and no issues.

Ok for GCC 13 and 14?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (cortex-a725, cortex-x925,
neoverse-n3, neoverse-v3, neoverse-v3ae): New.
* config/aarch64/aarch64-tune.md: Regenerate
* doc/invoke.texi: Document them.

---
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
f5536388f61182af3fad4eefc0d67e3e848581b1..5ee4ffc62111ff957f83cb62a46036b09139cf99
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -178,6 +178,7 @@ AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG,
 AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,  (SVE2_BITPERM, 
MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4d, -1)
 
 AARCH64_CORE("cortex-a720",  cortexa720, cortexa57, V9_2A,  (SVE2_BITPERM, 
MEMTAG, PROFILE), neoversen2, 0x41, 0xd81, -1)
+AARCH64_CORE("cortex-a725",  cortexa725, cortexa57, V9_2A, (SVE2_BITPERM, 
MEMTAG, PROFILE), neoversen2, 0x41, 0xd87, -1)
 
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
 
@@ -185,11 +186,16 @@ AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG, I8M
 
 AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, 
PROFILE), neoversen2, 0x41, 0xd81, -1)
 
+AARCH64_CORE("cortex-x925", cortexx925, cortexa57, V9_2A,  (SVE2_BITPERM, 
MEMTAG, PROFILE), neoversen2, 0x41, 0xd85, -1)
+
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
 AARCH64_CORE("cobalt-100",   cobalt100, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x6d, 0xd49, -1)
+AARCH64_CORE("neoverse-n3", neoversen3, cortexa57, V9_2A, (SVE2_BITPERM, RNG, 
MEMTAG, PROFILE), neoversen2, 0x41, 0xd8e, -1)
 
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 AARCH64_CORE("grace", grace, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
SVE2_AES, SVE2_SHA3, SVE2_SM4, PROFILE), neoversev2, 0x41, 0xd4f, -1)
+AARCH64_CORE("neoverse-v3", neoversev3, cortexa57, V9_2A, (SVE2_BITPERM, RNG, 
LS64, MEMTAG, PROFILE), neoversev2, 0x41, 0xd84, -1)
+AARCH64_CORE("neoverse-v3ae", neoversev3ae, cortexa57, V9_2A, (SVE2_BITPERM, 
RNG, LS64, MEMTAG, PROFILE), neoversev2, 0x41, 0xd83, -1)
 
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
80254836e0efa6fd78da91366c4a683ec3f36e26..ed18b9b1447ff6732183825f4169911197065b66
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,grace,demeter,generic,generic_armv8_a,generic_armv9_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexa725,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cobalt100,neoversen3,neoversev2,grace,neoversev3,neoversev3ae,de

[PATCH] Add push/pop_function_decl

2024-11-08 Thread Richard Sandiford
For the aarch64 simd clones patches, it would be useful to be able to
push a function declaration onto the cfun stack, even though it has no
function body associated with it.  That is, we want cfun to be null,
current_function_decl to be the decl itself, and the target and
optimisation flags to reflect the declaration.

This patch adds a push/pop_function_decl pair to do that.

I think the more direct way of doing what I want to do under the
existing interface would have been:

  push_cfun (nullptr);
  invoke_set_current_function_hook (fndecl);
  pop_cfun ();

where invoke_set_current_function_hook would need to become public.
But it seemed safer to use the higher-level routines, since it makes
sure that the target/optimisation changes are synchronised with the
function changes.  In particular, if cfun was null before the
sequence above, the pop_cfun would leave the flags unchanged,
rather than restore them to the state before the push_cfun.

I realise this might seem a bit clunky though, so please let me know
if you think there's a cleaner way.  I did briefly consider trying
to clean up all the cfun/current_function_decl stuff, but I don't think
I'll have time for that before stage 1.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK for trunk?

Thanks,
Richard


gcc/
* function.h (push_function_decl, pop_function_decl): Declare.
* function.cc (set_function_decl): New function, extracted from...
(set_cfun): ...here.
(push_function_decl): New function, extracted from...
(push_cfun): ...here.
(pop_cfun_1): New function, extracted from...
(pop_cfun): ...here.
(pop_function_decl): New function.
---
 gcc/function.cc | 80 +
 gcc/function.h  |  2 ++
 2 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/gcc/function.cc b/gcc/function.cc
index 73490f0da10..bf74e1ea208 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -4707,40 +4707,74 @@ invoke_set_current_function_hook (tree fndecl)
 }
 }
 
-/* cfun should never be set directly; use this function.  */
+/* Set cfun to NEW_CFUN and switch to the optimization and target options
+   associated with NEW_FNDECL.
 
-void
-set_cfun (struct function *new_cfun, bool force)
+   FORCE says whether we should do the switch even if NEW_CFUN is the current
+   function, e.g. because there has been a change in optimization or target
+   options.  */
+
+static void
+set_function_decl (function *new_cfun, tree new_fndecl, bool force)
 {
   if (cfun != new_cfun || force)
 {
   cfun = new_cfun;
-  invoke_set_current_function_hook (new_cfun ? new_cfun->decl : NULL_TREE);
+  invoke_set_current_function_hook (new_fndecl);
   redirect_edge_var_map_empty ();
 }
 }
 
+/* cfun should never be set directly; use this function.  */
+
+void
+set_cfun (struct function *new_cfun, bool force)
+{
+  set_function_decl (new_cfun, new_cfun ? new_cfun->decl : NULL_TREE, force);
+}
+
 /* Initialized with NOGC, making this poisonous to the garbage collector.  */
 
 static vec cfun_stack;
 
-/* Push the current cfun onto the stack, and set cfun to new_cfun.  Also set
-   current_function_decl accordingly.  */
+/* Push the current cfun onto the stack, then switch to function NEW_CFUN
+   and FUNCTION_DECL NEW_FNDECL.  FORCE is as for set_function_decl.  */
 
-void
-push_cfun (struct function *new_cfun)
+static void
+push_function_decl (function *new_cfun, tree new_fndecl, bool force)
 {
   gcc_assert ((!cfun && !current_function_decl)
  || (cfun && current_function_decl == cfun->decl));
   cfun_stack.safe_push (cfun);
-  current_function_decl = new_cfun ? new_cfun->decl : NULL_TREE;
-  set_cfun (new_cfun);
+  current_function_decl = new_fndecl;
+  set_function_decl (new_cfun, new_fndecl, force);
 }
 
-/* Pop cfun from the stack.  Also set current_function_decl accordingly.  */
+/* Push the current cfun onto the stack and switch to function declaration
+   NEW_FNDECL, which might or might not have a function body.  FORCE is as for
+   set_function_decl.  */
 
 void
-pop_cfun (void)
+push_function_decl (tree new_fndecl, bool force)
+{
+  force |= current_function_decl != new_fndecl;
+  push_function_decl (DECL_STRUCT_FUNCTION (new_fndecl), new_fndecl, force);
+}
+
+/* Push the current cfun onto the stack, and set cfun to new_cfun.  Also set
+   current_function_decl accordingly.  */
+
+void
+push_cfun (struct function *new_cfun)
+{
+  push_function_decl (new_cfun, new_cfun ? new_cfun->decl : NULL_TREE, false);
+}
+
+/* A common subroutine for pop_cfun and pop_function_decl.  FORCE is as
+   for set_function_decl.  */
+
+static void
+pop_cfun_1 (bool force)
 {
   struct function *new_cfun = cfun_stack.pop ();
   /* When in_dummy_function, we do have a cfun but current_function_decl is
@@ -4750,10 +4784,30 @@ pop_cfun (void)
   gcc_checking_assert (in_dummy_function
   || !cfun
   || current_func

[PATCH] Add missing SLP discovery for CFN[_MASK][_LEN]_SCATTER_STORE

2024-11-08 Thread Richard Biener
This was responsible for a bunch of SVE FAILs with --param vect-force-slp=1

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-slp.cc (arg1_arg3_map): New.
(arg1_arg3_arg4_map): Likewise.
(vect_get_operand_map): Handle IFN_SCATTER_STORE,
IFN_MASK_SCATTER_STORE and IFN_MASK_LEN_SCATTER_STORE.
(vect_build_slp_tree_1): Likewise.
---
 gcc/tree-vect-slp.cc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1240dac3d62..4073868e7ab 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -512,7 +512,9 @@ static const int no_arg_map[] = { 0 };
 static const int arg0_map[] = { 1, 0 };
 static const int arg1_map[] = { 1, 1 };
 static const int arg2_map[] = { 1, 2 };
+static const int arg1_arg3_map[] = { 2, 1, 3 };
 static const int arg1_arg4_map[] = { 2, 1, 4 };
+static const int arg1_arg3_arg4_map[] = { 3, 1, 3, 4 };
 static const int arg3_arg2_map[] = { 2, 3, 2 };
 static const int op1_op0_map[] = { 2, 1, 0 };
 static const int off_map[] = { 1, -3 };
@@ -573,6 +575,13 @@ vect_get_operand_map (const gimple *stmt, bool 
gather_scatter_p = false,
  case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
 
+ case IFN_SCATTER_STORE:
+   return arg1_arg3_map;
+
+ case IFN_MASK_SCATTER_STORE:
+ case IFN_MASK_LEN_SCATTER_STORE:
+   return arg1_arg3_arg4_map;
+
  case IFN_MASK_STORE:
return gather_scatter_p ? off_arg3_arg2_map : arg3_arg2_map;
 
@@ -1187,7 +1196,10 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  if (cfn == CFN_MASK_LOAD
  || cfn == CFN_GATHER_LOAD
  || cfn == CFN_MASK_GATHER_LOAD
- || cfn == CFN_MASK_LEN_GATHER_LOAD)
+ || cfn == CFN_MASK_LEN_GATHER_LOAD
+ || cfn == CFN_SCATTER_STORE
+ || cfn == CFN_MASK_SCATTER_STORE
+ || cfn == CFN_MASK_LEN_SCATTER_STORE)
ldst_p = true;
  else if (cfn == CFN_MASK_STORE)
{
@@ -1473,6 +1485,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  && rhs_code != CFN_GATHER_LOAD
  && rhs_code != CFN_MASK_GATHER_LOAD
  && rhs_code != CFN_MASK_LEN_GATHER_LOAD
+ && rhs_code != CFN_SCATTER_STORE
+ && rhs_code != CFN_MASK_SCATTER_STORE
+ && rhs_code != CFN_MASK_LEN_SCATTER_STORE
  && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)
  /* Not grouped loads are handled as externals for BB
 vectorization.  For loop vectorization we can handle
-- 
2.43.0


Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Richard Biener
On Fri, 8 Nov 2024, Richard Sandiford wrote:

> For the aarch64 simd clones patches, it would be useful to be able to
> push a function declaration onto the cfun stack, even though it has no
> function body associated with it.  That is, we want cfun to be null,
> current_function_decl to be the decl itself, and the target and
> optimisation flags to reflect the declaration.
> 
> This patch adds a push/pop_function_decl pair to do that.
> 
> I think the more direct way of doing what I want to do under the
> existing interface would have been:
> 
>   push_cfun (nullptr);
>   invoke_set_current_function_hook (fndecl);
>   pop_cfun ();
> 
> where invoke_set_current_function_hook would need to become public.
> But it seemed safer to use the higher-level routines, since it makes
> sure that the target/optimisation changes are synchronised with the
> function changes.  In particular, if cfun was null before the
> sequence above, the pop_cfun would leave the flags unchanged,
> rather than restore them to the state before the push_cfun.
> 
> I realise this might seem a bit clunky though, so please let me know
> if you think there's a cleaner way.  I did briefly consider trying
> to clean up all the cfun/current_function_decl stuff, but I don't think
> I'll have time for that before stage 1.
> 
> Bootstrapped & regression-tested on aarch64-linux-gnu.  OK for trunk?

I think this is reasonable, thus OK unless you hear otherwise over
the weekend.

Richard.

> Thanks,
> Richard
> 
> 
> gcc/
>   * function.h (push_function_decl, pop_function_decl): Declare.
>   * function.cc (set_function_decl): New function, extracted from...
>   (set_cfun): ...here.
>   (push_function_decl): New function, extracted from...
>   (push_cfun): ...here.
>   (pop_cfun_1): New function, extracted from...
>   (pop_cfun): ...here.
>   (pop_function_decl): New function.
> ---
>  gcc/function.cc | 80 +
>  gcc/function.h  |  2 ++
>  2 files changed, 69 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/function.cc b/gcc/function.cc
> index 73490f0da10..bf74e1ea208 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -4707,40 +4707,74 @@ invoke_set_current_function_hook (tree fndecl)
>  }
>  }
>  
> -/* cfun should never be set directly; use this function.  */
> +/* Set cfun to NEW_CFUN and switch to the optimization and target options
> +   associated with NEW_FNDECL.
>  
> -void
> -set_cfun (struct function *new_cfun, bool force)
> +   FORCE says whether we should do the switch even if NEW_CFUN is the current
> +   function, e.g. because there has been a change in optimization or target
> +   options.  */
> +
> +static void
> +set_function_decl (function *new_cfun, tree new_fndecl, bool force)
>  {
>if (cfun != new_cfun || force)
>  {
>cfun = new_cfun;
> -  invoke_set_current_function_hook (new_cfun ? new_cfun->decl : 
> NULL_TREE);
> +  invoke_set_current_function_hook (new_fndecl);
>redirect_edge_var_map_empty ();
>  }
>  }
>  
> +/* cfun should never be set directly; use this function.  */
> +
> +void
> +set_cfun (struct function *new_cfun, bool force)
> +{
> +  set_function_decl (new_cfun, new_cfun ? new_cfun->decl : NULL_TREE, force);
> +}
> +
>  /* Initialized with NOGC, making this poisonous to the garbage collector.  */
>  
>  static vec cfun_stack;
>  
> -/* Push the current cfun onto the stack, and set cfun to new_cfun.  Also set
> -   current_function_decl accordingly.  */
> +/* Push the current cfun onto the stack, then switch to function NEW_CFUN
> +   and FUNCTION_DECL NEW_FNDECL.  FORCE is as for set_function_decl.  */
>  
> -void
> -push_cfun (struct function *new_cfun)
> +static void
> +push_function_decl (function *new_cfun, tree new_fndecl, bool force)
>  {
>gcc_assert ((!cfun && !current_function_decl)
> || (cfun && current_function_decl == cfun->decl));
>cfun_stack.safe_push (cfun);
> -  current_function_decl = new_cfun ? new_cfun->decl : NULL_TREE;
> -  set_cfun (new_cfun);
> +  current_function_decl = new_fndecl;
> +  set_function_decl (new_cfun, new_fndecl, force);
>  }
>  
> -/* Pop cfun from the stack.  Also set current_function_decl accordingly.  */
> +/* Push the current cfun onto the stack and switch to function declaration
> +   NEW_FNDECL, which might or might not have a function body.  FORCE is as 
> for
> +   set_function_decl.  */
>  
>  void
> -pop_cfun (void)
> +push_function_decl (tree new_fndecl, bool force)
> +{
> +  force |= current_function_decl != new_fndecl;
> +  push_function_decl (DECL_STRUCT_FUNCTION (new_fndecl), new_fndecl, force);
> +}
> +
> +/* Push the current cfun onto the stack, and set cfun to new_cfun.  Also set
> +   current_function_decl accordingly.  */
> +
> +void
> +push_cfun (struct function *new_cfun)
> +{
> +  push_function_decl (new_cfun, new_cfun ? new_cfun->decl : NULL_TREE, 
> false);
> +}
> +
> +/* A common subroutine for pop_cfun a

Re: [PATCH 14/22] aarch64: Add GCS support to the unwinder

2024-11-08 Thread Yury Khrustalev
Hi Richard,

On Thu, Oct 24, 2024 at 05:27:24PM +0100, Richard Sandiford wrote:
> Yury Khrustalev  writes:
> > From: Szabolcs Nagy 
> 
> Could you explain these testsuite changes in more detail?  It seems
> on the face of it that they're changing the tests to test something
> other than the original intention.

After reviewing I agree that this change to the tests is not needed in
the context of GCS work. I will remove this from the patch.

> 
> Having new tests alongside the same lines would be fine though.

Agree, but I think it's better to submit these test changes separately.

> > +/* On signal entry the OS places a token on the GCS that can be used to
> > +   verify the integrity of the GCS pointer on signal return.  It also
> > +   places the signal handler return address (the restorer that calls the
> > +   signal return syscall) on the GCS so the handler can return.
> > +   Because of this token, each stack frame visited during unwinding has
> > +   exactly one corresponding entry on the GCS, so the frame count is
> > +   the number of entries that will have to be popped at EH return time.
> > +
> > +   Note: This depends on the GCS signal ABI of the OS.
> > +
> > +   When unwinding across a stack frame for each frame the corresponding
> > +   entry is checked on the GCS against the computed return address from
> > +   the normal stack.  If they don't match then _URC_FATAL_PHASE2_ERROR
> > +   is returned.  This check is omitted if
> > +
> > +   1. GCS is disabled. Note: asynchronous GCS disable is supported here
> > +  if GCSPR and the GCS remains readable.
> > +   2. Non-catchable exception where exception_class == 0.  Note: the
> > +  pthread cancellation implementation in glibc sets exception_class
> > +  to 0 when the unwinder is used for cancellation cleanup handling,
> > +  so this allows the GCS to get out of sync during cancellation.
> > +  This weakens security but avoids an ABI break in glibc.
> > +   3. Zero return address which marks the outermost stack frame.
> 
> I suppose this is a question for the x86 implementation too, but:
> doesn't this weaken the checks somewhat?  I would imagine zero is the
> easiest value to force.  Would it be possible to add some extra sanity
> check that this really is the outermost frame?  E.g. IIUC, the entry
> above the outermost frame's would be a cap, or at least would not be
> a valid procedure return record, so could we check for that?

Zero might be indeeded easier to forge, however in practice it would mean
that execution would not be able to return anywhere, so one could speculate
this would be hard to exploit in practice.

When we are in the middle of unwinding and hit zero return address (as seen
by the undwinder) we could theoretically check if current GCS entry is non-
zero (in which case it would mean that stack has been corrupted and we need
to raise an error). However if we are really at the outermost frame and GCS
stack has been unwound properly, we would not be able to read from shadow
stack as we have already reached to top of its allocation. This is why we
need to if _Unwind_GetIP (context) == 0 before we access GCS pointer via
__builtin_aarch64_gcspr ().

So I will leave this as implemented by Szabolcs.

Kind regards,
Yury

> 
> Thanks,
> Richard
> 
> > +   4. Signal stack frame, the GCS entry is an OS specific token then
> > +  with the top bit set.
> > + */
> > +#undef _Unwind_Frames_Increment
> > +#define _Unwind_Frames_Increment(exc, context, frames) \
> > +  do   \
> > +{  \
> > +  frames++;\
> > +  if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) != 0 \
> > + || exc->exception_class == 0  \
> > + || _Unwind_GetIP (context) == 0)  \
> > +   break;  \
> > +  const _Unwind_Word *gcs = __builtin_aarch64_gcspr (); \
> > +  if (_Unwind_IsSignalFrame (context)) \
> > +   {   \
> > + if (gcs[frames] >> 63 == 0)   \
> > +   return _URC_FATAL_PHASE2_ERROR; \
> > +   }   \
> > +  else \
> > +   {   \
> > + if (gcs[frames] != _Unwind_GetIP (context))   \
> > +   return _URC_FATAL_PHASE2_ERROR; \
> > +   }   \
> >  }  \
> >while (0)


[committed] libstdc++: Simplify __detail::__distance_fw using 'if constexpr'

2024-11-08 Thread Jonathan Wakely
This uses 'if constexpr' instead of tag dispatching, removing the need
for a second call using that tag, and simplifying the overload set that
needs to be resolved for calls to __distance_fw.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (__distance_fw): Replace tag
dispatching with 'if constexpr'.
---
Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/hashtable_policy.h | 24 
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index e5ad85ed9f1..ecf50313d09 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -62,25 +62,21 @@ namespace __detail
   typename _Unused, typename _Traits>
 struct _Hashtable_base;
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // if constexpr
   // Helper function: return distance(first, last) for forward
   // iterators, or 0/1 for input iterators.
-  template
-inline typename std::iterator_traits<_Iterator>::difference_type
-__distance_fw(_Iterator __first, _Iterator __last,
- std::input_iterator_tag)
-{ return __first != __last ? 1 : 0; }
-
-  template
-inline typename std::iterator_traits<_Iterator>::difference_type
-__distance_fw(_Iterator __first, _Iterator __last,
- std::forward_iterator_tag)
-{ return std::distance(__first, __last); }
-
   template
 inline typename std::iterator_traits<_Iterator>::difference_type
 __distance_fw(_Iterator __first, _Iterator __last)
-{ return __distance_fw(__first, __last,
-  std::__iterator_category(__first)); }
+{
+  using _Cat = typename std::iterator_traits<_Iterator>::iterator_category;
+  if constexpr (is_convertible<_Cat, forward_iterator_tag>::value)
+   return std::distance(__first, __last);
+  else
+   return __first != __last ? 1 : 0;
+}
+#pragma GCC diagnostic pop
 
   struct _Identity
   {
-- 
2.47.0



Re: [PATCH v2] testsuite: arm: Use effective-target for pr84556.cc test

2024-11-08 Thread Torbjorn SVENSSON




On 2024-11-08 12:57, Richard Earnshaw (lists) wrote:

On 08/11/2024 11:48, Torbjörn SVENSSON wrote:

Changes since v1:

- Clarified the commit message to include where the descision is taken
   and why it's a bad idea to use "dg-do run" in a test case.
   Note: This does not only fix it for arm-none-eabi. I see the same
   kind of construct used by for example sparc.

Sorry for the confusion Richard, I hope it's more clear why this is
needed now. :)

Ok for trunk and releases/gcc-14?

--

Using "dg-do run" with a selector overrides the default selector set by
vect.exp that picks between "dg-do run" and "dg-do compile" based on the
target's support for simd operations for Arm targets.
The actual selection of default operation is performed in
check_vect_support_and_set_flags.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.

Signed-off-by: Torbjörn SVENSSON 


OK


Pushed as r15-5039-g85c3d944800 and r14.2.0-373-g8cf9b265704.

Kind regards,
Torbjörn



R.





[PATCH] tree-optimization/117502 - VMAT_STRIDED_SLP vs VMAT_ELEMENTWISE when considering gather

2024-11-08 Thread Richard Biener
The following treats both the same when considering to use gather or
scatter for single-element interleaving accesses.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/117502
* tree-vect-stmts.cc (get_group_load_store_type): Also consider
VMAT_STRIDED_SLP when checking to use gather/scatter for
single-element interleaving access.
---
 gcc/tree-vect-stmts.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 9a2c2ea753e..28bfd8f4e28 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2274,7 +2274,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
  on nearby locations.  Or, even if it's a win over scalar code,
  it might not be a win over vectorizing at a lower VF, if that
  allows us to use contiguous accesses.  */
-  if (*memory_access_type == VMAT_ELEMENTWISE
+  if ((*memory_access_type == VMAT_ELEMENTWISE
+   || *memory_access_type == VMAT_STRIDED_SLP)
   && single_element_p
   && loop_vinfo
   && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
-- 
2.43.0


Re: [PATCH] testsuite: arm: Allow vst1.32 instruction in pr40457-2.c

2024-11-08 Thread Torbjorn SVENSSON




On 2024-11-08 12:02, Richard Earnshaw (lists) wrote:

On 07/11/2024 17:15, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

When building the test case with neon, the 'vst1.32' instruction is used
instead of 'strd'. Allow both variants to make the test pass.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed
instruction.

Signed-off-by: Torbjörn SVENSSON 


OK.


Pushed as r15-5040-g636b8aeacd1 and r14.2.0-374-g82191dec727.

Kind regards,
Torbjörn



R.





  1   2   >