date:20240912

[PATCH] Better recover from SLP reassociation fails during discovery

2024-09-12 Thread Richard Biener

When we decide to not process a association chain of size two and
that would also mismatch with a different chain size on another lane
we shouldn't fail discovery hard at this point.  Instead let the
regular discovery figure out matching lanes so the parent can
decide to perform operand swapping or we can split groups at better
points rather than forcefully splitting away the first single lane.

For example on gcc.dg/vect/vect-strided-u8-i8.c we now see two
groups of size 4 feeding the store instead of groups of size 1,
three, two, one and one.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

* tree-vect-slp.cc (vect_build_slp_tree_2): On reassociation
chain length mismatch do not fail discovery of the node
but try without re-associating to compute a better matches[].
Provide a reassociation failure hint in the dump.
(vect_slp_analyze_node_operations): Avoid stray failure
dumping.
(vectorizable_slp_permutation_1): Dump the address of the
SLP node representing the permutation.
---
 gcc/tree-vect-slp.cc | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0fb17340bd3..2c296bc1926 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2143,19 +2143,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  if (chain.length () == 2)
{
  /* In a chain of just two elements resort to the regular
-operand swapping scheme.  If we run into a length
-mismatch still hard-FAIL.  */
- if (chain_len == 0)
-   hard_fail = false;
- else
-   {
- matches[lane] = false;
- /* ???  We might want to process the other lanes, but
-make sure to not give false matching hints to the
-caller for lanes we did not process.  */
- if (lane != group_size - 1)
-   matches[0] = false;
-   }
+operand swapping scheme.  Likewise if we run into a
+length mismatch process regularly as well as we did not
+process the other lanes we cannot report a good hint what
+lanes to try swapping in the parent.  */
+ hard_fail = false;
  break;
}
  else if (chain_len == 0)
@@ -2428,6 +2420,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  return node;
}
 out:
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"failed to line up SLP graph by re-associating "
+"operations in lanes%s\n",
+!hard_fail ? " trying regular discovery" : "");
   while (!children.is_empty ())
vect_free_slp_tree (children.pop ());
   while (!chains.is_empty ())
@@ -7553,7 +7550,9 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
   /* We're having difficulties scheduling nodes with just constant
  operands and no scalar stmts since we then cannot compute a stmt
  insertion place.  */
-  if (!seen_non_constant_child && SLP_TREE_SCALAR_STMTS (node).is_empty ())
+  if (res
+  && !seen_non_constant_child
+  && SLP_TREE_SCALAR_STMTS (node).is_empty ())
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
@@ -10279,7 +10278,7 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   if (dump_p)
 {
   dump_printf_loc (MSG_NOTE, vect_location,
-  "vectorizing permutation");
+  "vectorizing permutation %p", (void *)node);
   for (unsigned i = 0; i < perm.length (); ++i)
dump_printf (MSG_NOTE, " op%u[%u]", perm[i].first, perm[i].second);
   if (repeating_p)
-- 
2.43.0

Re: [PATCH v1][GCC] aarch64: Add GCS build attributes support.

2024-09-12 Thread Kyrylo Tkachov

Hi Srinath,
Not a full review, just some things that popped out to me.

> On 11 Sep 2024, at 17:50, Srinath Parvathaneni  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch adds support for aarch64 gcs build attributes. This support
> includes generating two new assembler directives .aeabi_subsection and
> .aeabi_attribute. These directives are generated as per the syntax
> mentioned in spec "Build Attributes for the Arm® 64-bit
> Architecture (AArch64)" available at [1].
> 
> To check whether the assembler being used to build the toolchain
> supports these directives, a new gcc configure check is added in
> gcc/configure.ac.
> 
> If the assembler support these directives, .aeabi_subsection and
> .aeabi_attribute directives are emitted in the generated assembly,
> when -mbranch-protection=gcs is passed.
> 
> If the assembler does not support these directives,
> .note.gnu.property section will emit the relevant gcs information
> in the generated assembly, when -mbranch-protection=gcs is passed.
> 
> This patch needs to be applied on top of GCC gcs patch series [2].
> 
> Bootstrapped on aarch64-none-linux-gnu and regression tested on
> aarch64-none-elf, no issues.
> 
> Ok for master?
> 
> Regards,
> Srinath.
> 
> [1]: https://github.com/ARM-software/abi-aa/pull/230
> [2]: 
> https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/gcs
> 
> gcc/ChangeLog:
> 
> 2024-09-11  Srinath Parvathaneni  
> 
>* config.in: Regenerated
>* config/aarch64/aarch64.cc (aarch64_emit_aeabi_attribute): New
>function declaration.
>(aarch64_emit_aeabi_subsection): Likewise.
>(aarch64_start_file): Emit gcs build attributes.
>(aarch64_file_end_indicate_exec_stack): Update gcs bit in
>note.gnu.property section.
>* configure: Regenerated.
>* configure.ac: Add gcc configure check.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-09-11  Srinath Parvathaneni  
> 
>* gcc.target/aarch64/build-attribute-gcs.c: New test.
> ---
> gcc/config.in |   6 +++
> gcc/config/aarch64/a.out  | Bin 0 -> 656 bytes

This binary artifact shouldn’t be in the patch.



> gcc/config/aarch64/aarch64.cc |  43 ++
> gcc/configure |  35 ++
> gcc/configure.ac  |   7 +++
> .../gcc.target/aarch64/build-attribute-gcs.c  |  24 ++
> 6 files changed, 115 insertions(+)
> create mode 100644 gcc/config/aarch64/a.out
> create mode 100644 gcc/testsuite/gcc.target/aarch64/build-attribute-gcs.c

diff --git a/gcc/testsuite/gcc.target/aarch64/build-attribute-gcs.c 
b/gcc/testsuite/gcc.target/aarch64/build-attribute-gcs.c
new file mode 100644
index 000..eb15772757e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/build-attribute-gcs.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target aarch64*-*-linux* } } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-options "-mbranch-protection=gcs" } */
+/* { dg-final { scan-assembler-not "\.aeabi_subsection 
\.aeabi-feature-and-bits, 1, 0" } } */
+/* { dg-final { scan-assembler-not "\.aeabi_attribute 3, 1\t\/\/ 
Tag_Feature_GCS" } } */
+/* { dg-final { scan-assembler ".note.gnu.property" } } */
+
+/* { dg-options "-mbranch-protection=bti" } */
+/* { dg-final { scan-assembler ".note.gnu.property" } } */
+
+
+/* { dg-options "-mbranch-protection=pac-ret" } */
+/* { dg-final { scan-assembler ".note.gnu.property" } } */
+
+
+/* { dg-options "-mbranch-protection=standard" } */
+/* { dg-final { scan-assembler-not "\.aeabi_subsection 
\.aeabi-feature-and-bits, 1, 0" } } */
+/* { dg-final { scan-assembler-not "\.aeabi_attribute 3, 1\t\/\/ 
Tag_Feature_GCS" } } */
+/* { dg-final { scan-assembler ".note.gnu.property" } } */


These scans should be in different tests compiled with different options.
You can’t have multiple dg-options directives in a single test and scanning for 
“.note.gnu.property” multiple times in a single test is redundant too.

Thanks,
Kyrill

Re: [PATCH] c++: Don't ICE to build private access error message [PR116323]

2024-09-12 Thread Simon Martin

On 11 Sep 2024, at 20:57, Jason Merrill wrote:

> On 9/11/24 7:26 AM, Simon Martin wrote:
>> We currently ICE upon the following code while building the "[...] is
>> private within this context" error message
>>
>> === cut here ===
>> class A { enum Enum{}; };
>> template class Alloc>
>> class B : private Alloc, private A {};
>> template class Alloc>
>> int B::foo (Enum m) { return 42; }
>> === cut here ===
>>
>> The problem is that since r11-6880, after detecting that Enum cannot 
>> be
>> accessed in B, enforce_access will access the TYPE_BINFO of all the
>> bases of B, which ICEs for any that is a 
>> BOUND_TEMPLATE_TEMPLATE_PARM.
>> This patch simply skips such bases.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/116323
>>
>> gcc/cp/ChangeLog:
>>
>>  * search.cc (get_parent_with_private_access): Only call 
>> access_in_type
>>  for RECORD_OR_UNION_TYPE_P base BINFOs.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/template/access43.C: New test.
>>
>> ---
>>   gcc/cp/search.cc |  4 +++-
>>   gcc/testsuite/g++.dg/template/access43.C | 11 +++
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/g++.dg/template/access43.C
>>
>> diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
>> index 60c30ecb881..a810cf70d6a 100644
>> --- a/gcc/cp/search.cc
>> +++ b/gcc/cp/search.cc
>> @@ -163,9 +163,11 @@ get_parent_with_private_access (tree decl, tree 
>> binfo)
>> /* Iterate through immediate parent classes.  */
>> for (int i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
>>   {
>> +  tree base_binfo_type = BINFO_TYPE (base_binfo);
>> /* This parent had private access.  Therefore that's why 
>> BINFO can't
>>access DECL.  */
>> -  if (access_in_type (BINFO_TYPE (base_binfo), decl) == 
>> ak_private)
>> +  if (RECORD_OR_UNION_TYPE_P (base_binfo_type)
>
> You might add to the comment to explain that in a template the base 
> list can also contain WILDCARD_TYPE_P types.  OK either way.
Thanks Jason. Pushed with the suggested extra comment as r15-3598.

Simon

[PATCH 1/2] c++: Make __builtin_launder reject invalid types [PR116673]

2024-09-12 Thread Jonathan Wakely

Tested x86_64-linux. OK for trunk?

-- >8 --

The standard says that std::launder is ill-formed for function pointers
and cv void pointers, so there's no reason for __builtin_launder to
accept them. This change allows implementations of std::launder to defer
to the built-in for error checking, although libstdc++ will continue to
diagnose it directly for more user-friendly diagnostics.

PR c++/116673

gcc/cp/ChangeLog:

* semantics.cc (finish_builtin_launder): Diagnose function
pointers and cv void pointers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/launder10.C: New test.
---
 gcc/cp/semantics.cc| 17 +
 gcc/testsuite/g++.dg/cpp1z/launder10.C | 15 +++
 2 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/launder10.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 63212afafb3..b194b01f865 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -13482,11 +13482,20 @@ finish_builtin_launder (location_t loc, tree arg, 
tsubst_flags_t complain)
 arg = decay_conversion (arg, complain);
   if (error_operand_p (arg))
 return error_mark_node;
-  if (!type_dependent_expression_p (arg)
-  && !TYPE_PTR_P (TREE_TYPE (arg)))
+  if (!type_dependent_expression_p (arg))
 {
-  error_at (loc, "non-pointer argument to %<__builtin_launder%>");
-  return error_mark_node;
+  tree type = TREE_TYPE (arg);
+  if (!TYPE_PTR_P (type))
+   {
+ error_at (loc, "non-pointer argument to %<__builtin_launder%>");
+ return error_mark_node;
+   }
+  else if (!object_type_p (TREE_TYPE (type)))
+   {
+ // std::launder is ill-formed for function and cv void pointers.
+ error_at (loc, "invalid argument to %<__builtin_launder%>");
+ return error_mark_node;
+   }
 }
   if (processing_template_decl)
 arg = orig_arg;
diff --git a/gcc/testsuite/g++.dg/cpp1z/launder10.C 
b/gcc/testsuite/g++.dg/cpp1z/launder10.C
new file mode 100644
index 000..7c15eeb891f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/launder10.C
@@ -0,0 +1,15 @@
+// PR c++/116673
+// { dg-do compile }
+
+void
+bar (void *p)
+{
+  __builtin_launder (bar); // { dg-error {invalid argument to 
'__builtin_launder'} }
+  __builtin_launder (p);   // { dg-error {invalid argument to 
'__builtin_launder'} }
+  const void* cp = p;
+  __builtin_launder (cp);  // { dg-error {invalid argument to 
'__builtin_launder'} }
+  volatile void* vp = p;
+  __builtin_launder (vp);  // { dg-error {invalid argument to 
'__builtin_launder'} }
+  const volatile void* cvp = p;
+  __builtin_launder (cvp); // { dg-error {invalid argument to 
'__builtin_launder'} }
+}
-- 
2.46.0

[PATCH 2/2] libstdc++: Simplify std::launder definition

2024-09-12 Thread Jonathan Wakely

Tested x86_64-linux.

Both GCC and Clang support the __is_function built-in, so we should get
the static_assert here. For a compiler that doesn't support it, we rely
on __builtin_launder to diagnose function pointers. That's true for
Clang, and PATCH 1/2 makes it true for G++.

We might want to consider splitting a frequently-used subset of
 into a new  header that we can include in .
Nearly every header outside of libsupc++ includes , but
they don't all need all of it.

-- >8 --

A single static assert is a much simpler way to implement the
compile-time preconditions on std::launder than an overload set of
deleted functions and function templates. The only difficulty is that
 doesn't include  so we can't use std::is_function and
std::is_void for the checks. That can be worked around though, by using
the __is_same and __is_function built-ins. If the __is_function built-in
isn't supported then the __builtin_launder built-in will give an error
anyway, since the commit preceding this one.

We can also remove the redundant __cplusplus >= 201703L check around the
definitions of std::launder and the interference constants, which are
already guarded by the appropriate feature test macros.

libstdc++-v3/ChangeLog:

* libsupc++/new (launder): Add static_assert and remove deleted
overloads.
* testsuite/18_support/launder/requirements_neg.cc: Adjust
expected diagnostics.
---
 libstdc++-v3/libsupc++/new| 36 ---
 .../18_support/launder/requirements_neg.cc| 15 
 2 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 2e2038e1a82..af5c7690bb9 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -198,7 +198,6 @@ inline void operator delete[](void*, void*) 
_GLIBCXX_USE_NOEXCEPT { }
 //@}
 } // extern "C++"
 
-#if __cplusplus >= 201703L
 namespace std
 {
 #ifdef __cpp_lib_launder // C++ >= 17 && HAVE_BUILTIN_LAUNDER
@@ -206,33 +205,28 @@ namespace std
   template
 [[nodiscard]] constexpr _Tp*
 launder(_Tp* __p) noexcept
-{ return __builtin_launder(__p); }
-
-  // The program is ill-formed if T is a function type or
-  // (possibly cv-qualified) void.
-
-  template
-void launder(_Ret (*)(_Args...) _GLIBCXX_NOEXCEPT_QUAL) = delete;
-  template
-void launder(_Ret (*)(_Args..) _GLIBCXX_NOEXCEPT_QUAL) = delete;
-
-  void launder(void*) = delete;
-  void launder(const void*) = delete;
-  void launder(volatile void*) = delete;
-  void launder(const volatile void*) = delete;
+{
+  if constexpr (__is_same(const volatile _Tp, const volatile void))
+   static_assert(!__is_same(const volatile _Tp, const volatile void),
+ "std::launder argument must not be a void pointer");
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function)
+  else if constexpr (__is_function(_Tp))
+   static_assert(!__is_function(_Tp),
+ "std::launder argument must not be a function pointer");
+#endif
+  else
+   return __builtin_launder(__p);
+  return nullptr;
+}
 #endif // __cpp_lib_launder
 
 #ifdef __cpp_lib_hardware_interference_size // C++ >= 17 && 
defined(gcc_dest_sz)
   inline constexpr size_t hardware_destructive_interference_size = 
__GCC_DESTRUCTIVE_SIZE;
   inline constexpr size_t hardware_constructive_interference_size = 
__GCC_CONSTRUCTIVE_SIZE;
 #endif // __cpp_lib_hardware_interference_size
-}
-#endif // C++17
 
 // Emitted despite the FTM potentially being undefined.
-#if __cplusplus > 201703L
-namespace std
-{
+#if __cplusplus >= 202002L
   /// Tag type used to declare a class-specific operator delete that can
   /// invoke the destructor before deallocating the memory.
   struct destroying_delete_t
@@ -241,8 +235,8 @@ namespace std
   };
   /// Tag variable of type destroying_delete_t.
   inline constexpr destroying_delete_t destroying_delete{};
-}
 #endif // C++20
+}
 
 #pragma GCC visibility pop
 
diff --git a/libstdc++-v3/testsuite/18_support/launder/requirements_neg.cc 
b/libstdc++-v3/testsuite/18_support/launder/requirements_neg.cc
index 2808ebf614d..82ce0b35a8c 100644
--- a/libstdc++-v3/testsuite/18_support/launder/requirements_neg.cc
+++ b/libstdc++-v3/testsuite/18_support/launder/requirements_neg.cc
@@ -25,14 +25,17 @@ int f2(const char*, ...);
 void
 test01()
 {
-  std::launder( &f1 ); // { dg-error "deleted function" }
-  std::launder( &f2 ); // { dg-error "deleted function" }
+  std::launder( &f1 ); // { dg-error "here" }
+  std::launder( &f2 ); // { dg-error "here" }
   void* p = nullptr;
-  std::launder( p );  // { dg-error "deleted function" }
+  std::launder( p );   // { dg-error "here" }
   const void* cp = nullptr;
-  std::launder( cp );  // { dg-error "deleted function" }
+  std::launder( cp );  // { dg-error "here" }
   volatile void* vp = nullptr;
-  std::launder( vp );  // { dg-error "deleted function" }
+  std::launder( vp );  // { dg-error "here" }
   const vo

[PATCH] libcpp, v5: Add support for gnu::base64 #embed parameter

2024-09-12 Thread Jakub Jelinek

On Wed, Sep 11, 2024 at 10:23:20PM +, Joseph Myers wrote:
> On Fri, 30 Aug 2024, Jakub Jelinek wrote:
> 
> > +should be no newlines in the string literal and because this parameter
> > +is meant namely for use by the preprocessor itself, there is no support
> > +for any escape sequences in the string literal argument.  If 
> > @code{gnu::base64}
> 
> Given the "no escape sequences" rule, I think there should be a test for 
> that - testing rejection of a string that would be valid if escape 
> sequences were processed (for example, valid base64 but with the 
> individual characters encoded using \x), but is not valid because they are 
> not processed.  As far as I can see, the existing tests with escape 
> sequences are invalid for other reasons (they use \n as the escape 
> sequence).

Thanks.

Here is an updated patch and before that just the incremental diff from
the previous patch.

--- gcc/testsuite/c-c++-common/cpp/embed-18.c   2024-08-30 18:43:18.132097274 
+0200
+++ gcc/testsuite/c-c++-common/cpp/embed-18.c   2024-09-12 12:29:25.919231207 
+0200
@@ -17,6 +17,12 @@
 #embed "." gnu::base64("") /* { dg-error "'gnu::base64' argument not valid 
base64 encoded string" } */
 #embed "." gnu::base64("a===") /* { dg-error "'gnu::base64' argument not valid 
base64 encoded string" } */
 #embed "." 
gnu::base64("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdCwg\nc2VkIGRvIGVpdXNtb2QgdGVtcG9yIGluY2lkaWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu\nYSBhbGlxdWEuCg==")
 /* { dg-error "'gnu::base64' argument not valid base64 encoded string" } */
+#embed "." gnu::base64("\x53\x41\x3d\x3d") /* { dg-error "'gnu::base64' 
argument not valid base64 encoded string" } */
+#embed "." gnu::base64("\123\101\075\075") /* { dg-error "'gnu::base64' 
argument not valid base64 encoded string" } */
+#embed "." gnu::base64("\u0053\u0041\u003d\u003d") /* { dg-error 
"'gnu::base64' argument not valid base64 encoded string" } */
+#embed "." gnu::base64("\u{53}\u{41}\u{3d}\u{3d}") /* { dg-error 
"'gnu::base64' argument not valid base64 encoded string" } */
+#embed "." gnu::base64("\U0053\U0041\U003d\U003d") /* { 
dg-error "'gnu::base64' argument not valid base64 encoded string" } */
+#embed "." gnu::base64("\N{LATIN CAPITAL LETTER S}\N{LATIN CAPITAL LETTER 
A}\N{LATIN CAPITAL LETTER A}\N{LATIN CAPITAL LETTER A}") /* { dg-error 
"'gnu::base64' argument not valid base64 encoded string" } */
 #embed "embed-18.c" gnu::base64("SA==") /* { dg-error "'gnu::base64' parameter 
can be only used with \\\".\\\"" } */
 #embed  gnu::base64("SA==") /* { dg-error "'gnu::base64' parameter 
can be only used with \\\".\\\"" } */
 #embed <.> gnu::base64("SA==") /* { dg-error "'gnu::base64' parameter can be 
only used with \\\".\\\"" } */

Tested on x86_64-linux.

2024-09-12  Jakub Jelinek  

libcpp/
* internal.h (struct cpp_embed_params): Add base64 member.
(_cpp_free_embed_params_tokens): Declare.
* directives.cc (DIRECTIVE_TABLE): Add IN_I flag to T_EMBED.
(save_token_for_embed, _cpp_free_embed_params_tokens): New functions.
(EMBED_PARAMS): Add gnu::base64 entry.
(_cpp_parse_embed_params): Parse gnu::base64 parameter.  If
-fpreprocessed without -fdirectives-only, require #embed to have
gnu::base64 parameter.  Diagnose conflict between gnu::base64 and
limit or gnu::offset parameters.
(do_embed): Use _cpp_free_embed_params_tokens.
* files.cc (finish_embed, base64_dec_fn): New functions.
(base64_dec): New array.
(B64D0, B64D1, B64D2, B64D3): Define.
(finish_base64_embed): New function.
(_cpp_stack_embed): Use finish_embed.  Handle params->base64
using finish_base64_embed.
* macro.cc (builtin_has_embed): Call _cpp_free_embed_params_tokens.
gcc/
* doc/cpp.texi (Binary Resource Inclusion): Document gnu::base64
parameter.
gcc/testsuite/
* c-c++-common/cpp/embed-17.c: New test.
* c-c++-common/cpp/embed-18.c: New test.
* c-c++-common/cpp/embed-19.c: New test.
* c-c++-common/cpp/embed-27.c: New test.
* gcc.dg/cpp/embed-6.c: New test.
* gcc.dg/cpp/embed-7.c: New test.

--- libcpp/internal.h.jj2024-09-12 11:33:53.173949013 +0200
+++ libcpp/internal.h   2024-09-12 11:45:39.455488076 +0200
@@ -638,7 +638,7 @@ struct cpp_embed_params
   location_t loc;
   bool has_embed;
   cpp_num_part limit, offset;
-  cpp_embed_params_tokens prefix, suffix, if_empty;
+  cpp_embed_params_tokens prefix, suffix, if_empty, base64;
 };
 
 /* Character classes.  Based on the more primitive macros in safe-ctype.h.
@@ -812,6 +812,7 @@ extern void _cpp_restore_pragma_names (c
 extern int _cpp_do__Pragma (cpp_reader *, location_t);
 extern void _cpp_init_directives (cpp_reader *);
 extern void _cpp_init_internal_pragmas (cpp_reader *);
+extern void _cpp_free_embed_params_tokens (cpp_embed_params_tokens *);
 extern bool _cpp_parse_embed_params

Re: [PATCH] JSON dumping for GENERIC trees

2024-09-12 Thread David Malcolm

On Wed, 2024-09-11 at 20:49 -0500, tcpreimesber...@gmail.com wrote:
> From: Thor C Preimesberger 
> 
> This patch allows the compiler to dump GENERIC trees as JSON objects.
> 
> The dump flag -fdump-tree-original-json dumps each fndecl node in the
> C frontend's gimplifier as a JSON object and traverses related nodes 
> in an analagous manner as to raw-dumping.

Thanks for posting this patch.

Are you able to upload somewhere some examples of what the dumps look
like?

Some high level thoughts:

* the patch uses "dummy" throughout as a variable name.  To me the name
"dummy" suggests something unimportant that we had to give a name to,
or something that we'd prefer didn't exist but had to create.  However
in most(all?) cases "dummy" seems to refer to the json object being
created or having properties added to it, and thus the most interesting
thing in the function.  I suspect that renaming "dummy" to "js_obj" or
"json_obj" throughout would be an improvement in readability in terms
of capturing the intent of the code (assuming that all of them are
indeed json objects).

* I think the code is leaking memory for all of the json values created
- there are lots of uses of "naked new" in this code, but I don't see
any uses of "delete".  For example, in 

> +void
> +dump_node_json (const_tree t, dump_flags_t flags, FILE *stream)
> +{
> +  struct dump_info di;
> +  dump_queue_p dq;
> +  dump_queue_p next_dq;
> +  pretty_printer pp;
> +  /* Initialize the dump-information structure.  */
> +  di.stream = stream;
> +  di.index = 0;
> +  di.column = 0;
> +  di.queue = 0;
> +  di.queue_end = 0;
> +  di.free_list = 0;
> +  di.flags = flags;
> +  di.node = t;
> +  di.nodes = splay_tree_new (splay_tree_compare_pointers, 0,
> +  splay_tree_delete_pointers);
> +  di.json_dump = new json::array ();
    ^^
 allocated with naked new here

> +  /* Queue up the first node.  */
> +  queue (&di, t);
> +
> +  /* Until the queue is empty, keep dumping nodes.  */
> +  while (di.queue)
> +dequeue_and_dump (&di);
> +
> +  di.json_dump->dump(stream, true);
> +  fputs("\n", stream);
> +  /* Now, clean up.  */
> +  for (dq = di.free_list; dq; dq = next_dq)
> +{
> +  next_dq = dq->next;
> +  free (dq);
> +}
> +  splay_tree_delete (di.nodes);

and di.json_dump goes out of scope here and is leaked, I think.  So I
*think* all of the json values being created during dumping are being
leaked.

> +}

Similarly, in:

> +DEBUG_FUNCTION void
> +debug_tree_json (tree t)
> +{
> +  json::object* _x = node_emit_json(t);
> +  _x->dump(stderr, true);
> +  fprintf(stderr, "\n");
> +}

if I'm reading things right, node_emit_json doesn't "emit" json so much
as create a new json::object on the heap via "new", and when "_x" goes
out of scope, it's leaked.

The pattern in the code seems to be that node_emit_json creates a new
json::object and populates it with properties (sometimes recursively).

Given that, and that we can use C++11, I recommend using
std::unique_ptr for it, to capture the intent that this
is a heap-allocated pointer with responsibility for being "delete"-d at
some point.

That way, rather that:

  json::object* 
  node_emit_json(tree t)
  {
tree op0, op1, type;
enum tree_code code;
expanded_location xloc;
json::object *dummy;
json::array* holder;
char address_buffer[sizeof(&t)] = {"\0"};

dummy = new json::object ();
holder = new json::array ();

[...snip...]

return dummy;
  }

we could have (perhaps renaming to "node_to_json"):

  std::unique_ptr
  node_to_json(tree t)
  {
tree op0, op1, type;
enum tree_code code;
expanded_location xloc;
char address_buffer[sizeof(&t)] = {"\0"};

auto js_obj = ::make_unique (); // to implicitly use 
std::unique_ptr
auto holder = ::make_unique ();  // likewise 
std::unique_ptr

[...snip...]

return js_obj;
  }

...assuming that I'm correctly understanding the ownership of the json
values in the patch.  Note that we have to use ::make_unique from our
make-unique.h, rather than std::make_unique from  since the
latter was only added in C++14.

Many of our data structures don't properly handle objects with
destructors, and I suspect splay_tree is one of these.  You can use
js_obj.release () to transfer ownership to such data structures, and
will (probably) need to manually use "delete" on the pointers in the
right places.

What happens to "holder" in that function?  It seems to get populated
with json objects for the various tree nodes found recursively, but
then it seems to simply be leaked (or populated then auto-deleted, if
we use std::unique_ptr>.  Or am I missing something?

In case it's helpful, a couple of months ago I converted the SARIF
output code from using "naked" json pointers to using std::unique_ptr
in:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658204.html
and I found it helped a *lot* with documenting ownership and avoiding
leaks.

[PATCH] Abort loop SLP analysis quicker

2024-09-12 Thread Richard Biener

As we can't cope with removed SLP instances during analysis there's
no point in doing that or even continuing analysis of SLP instances
after a failure.  The following makes us abort early.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_slp_analyze_operations): When
doing loop analysis fail after the first failed SLP
instance.  Only remove instances when doing BB vectorization.
* tree-vect-loop.cc (vect_analyze_loop_2): Check whether
vect_slp_analyze_operations failed instead of checking
the number of SLP instances remaining.
---
 gcc/tree-vect-loop.cc | 10 --
 gcc/tree-vect-slp.cc  | 10 +-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 65d7ed51067..cc15492f6a0 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2947,12 +2947,10 @@ start_over:
 
   if (slp)
 {
-  /* Analyze operations in the SLP instances.  Note this may
-remove unsupported SLP instances which makes the above
-SLP kind detection invalid.  */
-  unsigned old_size = LOOP_VINFO_SLP_INSTANCES (loop_vinfo).length ();
-  vect_slp_analyze_operations (loop_vinfo);
-  if (LOOP_VINFO_SLP_INSTANCES (loop_vinfo).length () != old_size)
+  /* Analyze operations in the SLP instances.  We can't simply
+remove unsupported SLP instances as this makes the above
+SLP kind detection invalid and might also affect the VF.  */
+  if (! vect_slp_analyze_operations (loop_vinfo))
{
  ok = opt_result::failure_at (vect_location,
   "unsupported SLP instances\n");
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 975949ccbd1..4fcb9e2fa2b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7976,19 +7976,27 @@ vect_slp_analyze_operations (vec_info *vinfo)
  || (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc
  && !vectorizable_bb_reduc_epilogue (instance, &cost_vec)))
 {
+ cost_vec.release ();
  slp_tree node = SLP_INSTANCE_TREE (instance);
  stmt_vec_info stmt_info;
  if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
stmt_info = SLP_INSTANCE_ROOT_STMTS (instance)[0];
  else
stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
+ if (is_a  (vinfo))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"unsupported SLP instance starting from: %G",
+stmt_info->stmt);
+ return false;
+   }
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "removing SLP instance operations starting from: 
%G",
 stmt_info->stmt);
  vect_free_slp_instance (instance);
   vinfo->slp_instances.ordered_remove (i);
- cost_vec.release ();
  while (!visited_vec.is_empty ())
visited.remove (visited_vec.pop ());
}
-- 
2.43.0

[PATCH v2] c++: Don't crash when mangling member with anonymous union or template types [PR100632, PR109790]

2024-09-12 Thread Simon Martin

Hi,

While looking at more open PRs, I have discovered that the problem 
reported in PR109790 is very similar to that in PR100632, so I’m 
combining both in a single patch attached here. The fix is similar to 

the one I initially submitted, only more general and I believe better.

Successfully tested on x86_64-pc-linux-gnu. OK for trunk?

Thanks, Simon

On 10 Sep 2024, at 20:06, Simon Martin wrote:

> We currently crash upon the following valid code (the case from the 

> PR,
> invalid, can be made valid by simply adding a definition for f at line

> 2)
>
> === cut here ===
> struct B { const int *p; };
> template void f() {}
> struct Nested { union { int k; }; } nested;
> template void f();
> === cut here ===
>
> The problem is that because of the anonymous union, nested.k is
> represented as nested.$(decl_of_anon_union).k, and we run into an 
> assert
> in write_member_name just before calling write_unqualified_name, 
> because
> DECL_NAME ($decl_of_anon_union) is 0.
>
> This patch fixes this by relaxing the assert to also accept members 

> with
> an ANON_AGGR_TYPE_P type, that are handled by write_unqualified_name
> just fine.
>
> Successfully tested on x86_64-pc-linux-gnu.
>
>   PR c++/100632
>
> gcc/cp/ChangeLog:
>
>   * mangle.cc (write_member_name): Relax assert to accept anonymous
>   unions.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp2a/nontype-class67.C: New test.
>
> ---
>  gcc/cp/mangle.cc | 3 ++-
>  gcc/testsuite/g++.dg/cpp2a/nontype-class67.C | 9 +
>  2 files changed, 11 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class67.C
>
> diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
> index 46dc6923add..11dc66c8d16 100644
> --- a/gcc/cp/mangle.cc
> +++ b/gcc/cp/mangle.cc
> @@ -3255,7 +3255,8 @@ write_member_name (tree member)
>  }
>else if (DECL_P (member))
>  {
> -  gcc_assert (!DECL_OVERLOADED_OPERATOR_P (member));
> +  gcc_assert (ANON_AGGR_TYPE_P (TREE_TYPE (member))
> +   || !DECL_OVERLOADED_OPERATOR_P (member));
>write_unqualified_name (member);
>  }
>else if (TREE_CODE (member) == TEMPLATE_ID_EXPR)
> diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class67.C 
> b/gcc/testsuite/g++.dg/cpp2a/nontype-class67.C
> new file mode 100644
> index 000..accf4284883
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class67.C
> @@ -0,0 +1,9 @@
> +// PR c++/100632
> +// { dg-do compile { target c++20 } }
> +
> +struct B { const int* p; };
> +template void f() {}
> +
> +struct Nested { union { int k; }; } nested;
> +
> +template void f();
> -- 
> 2.44.0
From 3ce65d06310e694bd6a3918d87049523951c0762 Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Mon, 9 Sep 2024 09:31:10 +0200
Subject: [PATCH] c++: Don't crash when mangling member with anonymous union or 
template type [PR100632, PR109790]

We currently crash upon mangling members that have an anonymous union
or a template type.

The problem is that before calling write_unqualified_name,
write_member_name has an assert that assumes that it has an
IDENTIFIER_NODE in its hand. However it's incorrect: it has an
anonymous union in PR100632, and a template in PR109790.

This patch fixes this by relaxing the assert to accept members that are
not identifiers, that are handled by write_unqualified_name just fine.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/109790
PR c++/100632

gcc/cp/ChangeLog:

* mangle.cc (write_member_name): Relax assert to accept members
that are not identifiers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype83.C: New test.
* g++.dg/cpp1y/lambda-ice3.C: New test.
* g++.dg/cpp2a/nontype-class67.C: New test.

---
 gcc/cp/mangle.cc |  3 ++-
 gcc/testsuite/g++.dg/cpp0x/decltype83.C  | 13 +
 gcc/testsuite/g++.dg/cpp1y/lambda-ice3.C | 12 
 gcc/testsuite/g++.dg/cpp2a/nontype-class67.C |  9 +
 4 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype83.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class67.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 46dc6923add..a63ae9f7ac6 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -3255,7 +3255,8 @@ write_member_name (tree member)
 }
   else if (DECL_P (member))
 {
-  gcc_assert (!DECL_OVERLOADED_OPERATOR_P (member));
+  gcc_assert (!identifier_p (member)
+ || !DECL_OVERLOADED_OPERATOR_P (member));
   write_unqualified_name (member);
 }
   else if (TREE_CODE (member) == TEMPLATE_ID_EXPR)
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype83.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype83.C
new file mode 100644
index 000..db104f333aa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype83.C
@@ -0,0 +1,13 @@
+// PR c++/109790
+// { dg-do

Re: [PATCH] JSON dumping for GENERIC trees

2024-09-12 Thread Richard Biener

On Thu, Sep 12, 2024 at 4:04 AM Andrew Pinski  wrote:
>
> On Wed, Sep 11, 2024 at 6:51 PM  wrote:
> >
> > From: Thor C Preimesberger 
> >
> > This patch allows the compiler to dump GENERIC trees as JSON objects.
> >
> > The dump flag -fdump-tree-original-json dumps each fndecl node in the
> > C frontend's gimplifier as a JSON object and traverses related nodes
> > in an analagous manner as to raw-dumping.
> >
> > Some JSON parsers expect for there to be a single JSON value per file -
> > the following shell command makes the output conformant:
> >
> >   tr -d '\n ' < out.json | sed -e 's/\]\[/,/g' | sed -e 's/}{/},{/g'
> >
> > There is also a debug function that simply prints a node as formatted JSON 
> > to
> > stdout.
> >
> > The information in the dumped JSON is meant to be an amalgation of
> > tree-pretty-print.cc's dump_generic_node and print-tree.cc's debug_tree.
>
> I don't think this is a good idea and there is no obvious use case.
> GIMPLE yes but not GENERIC.

The project idea was mine, inspired by
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646295.html
which explains the use-case.

I don't think it's constructive to question a GSoC project when it's almost
finished.

Richard.

> Can you explain what the use case is for dumping generic as json. Also
> you only hooked up the C and C++ family set of front-ends. Why not
> hook up Fortran, Ada, Rust and go too? Why have it done in the
> gimplifier?
>
> Thanks,
> Andrew
>
> >
> > Bootstrapped and tested on x86_64-pc-linux-gnu without issue.
> >
> > ChangeLog:
> > * gcc/Makefile.in: Link tree-emit-json.o to c-gimplify.o
> > * gcc/c-family/c-gimplify.cc (c_genericize): Hook for
> > -fdump-tree-original-json
> > * gcc/dumpfile.cc: Include tree-emit-json.h to expose
> > node_emit_json and debug_tree_json. Also new headers needed for
> > json.h being implicitly exposed
> > * gcc/dumpfile.h (dump_flag): New dump flag TDF_JSON
> > * gcc/tree-emit-json.cc: Logic for converting a tree to JSON
>  > and dumping.
> > * gcc/tree-emit-json.h: Ditto
>
> A few comments about the changelog entry here.
> it should be something like:
> gcc/ChangeLog:
>  * Makefile.in: ...
>
> gcc/c-family/ChangeLog:
>   * c-gimplify.cc ...
>
> Also there is no testcase or indication on how you tested it.
>
>
> >
> > Signed-off-by: Thor C Preimesberger 
> >
> > ---
> >  gcc/Makefile.in|2 +
> >  gcc/c-family/c-gimplify.cc |   30 +-
> >  gcc/cp/dump.cc |1 +
> >  gcc/dumpfile.cc|3 +
> >  gcc/dumpfile.h |6 +
> >  gcc/tree-emit-json.cc  | 3155 
> >  gcc/tree-emit-json.h   |   82 +
> >  7 files changed, 3268 insertions(+), 11 deletions(-)
> >  create mode 100644 gcc/tree-emit-json.cc
> >  create mode 100644 gcc/tree-emit-json.h
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 68fda1a7591..b65cc7f0ad5 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1042,6 +1042,7 @@ OPTS_H = $(INPUT_H) $(VEC_H) opts.h $(OBSTACK_H)
> >  SYMTAB_H = $(srcdir)/../libcpp/include/symtab.h $(OBSTACK_H)
> >  CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h
> >  TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
> > +TREE_EMIT_JSON_H = tree-emit-json.h $(SPLAY_TREE_H) $(DUMPFILE_H) json.h
> >  TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
> >  TREE_SSA_H = tree-ssa.h tree-ssa-operands.h \
> > $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
> > @@ -1709,6 +1710,7 @@ OBJS = \
> > tree-diagnostic.o \
> > tree-diagnostic-client-data-hooks.o \
> > tree-dump.o \
> > +   tree-emit-json.o \
> > tree-eh.o \
> > tree-emutls.o \
> > tree-if-conv.o \
> > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> > index 3e29766e092..8b0c80f4f75 100644
> > --- a/gcc/c-family/c-gimplify.cc
> > +++ b/gcc/c-family/c-gimplify.cc
> > @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
> >  .  */
> >
> >  #include "config.h"
> > +#define INCLUDE_MEMORY
> >  #include "system.h"
> >  #include "coretypes.h"
> >  #include "tm.h"
> > @@ -43,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "context.h"
> >  #include "tree-pass.h"
> >  #include "internal-fn.h"
> > +#include "tree-emit-json.h"
> >
> >  /*  The gimplification pass converts the language-dependent trees
> >  (ld-trees) emitted by the parser into language-independent trees
> > @@ -629,20 +631,26 @@ c_genericize (tree fndecl)
> >local_dump_flags = dfi->pflags;
> >if (dump_orig)
> >  {
> > -  fprintf (dump_orig, "\n;; Function %s",
> > -  lang_hooks.decl_printable_name (fndecl, 2));
> > -  fprintf (dump_orig, " (%s)\n",
> > -  (!DECL_ASSEMBLER_NAME_SET_P (fndecl) ? "null"
> > -   : IDENTIFIER_POINTER (DECL_ASSEMBLE

[r15-3596 Regression] FAIL: gcc.target/i386/part-vect-vec_cmpbf.c (test for excess errors) on Linux/x86_64

2024-09-12 Thread haochen.jiang

On Linux/x86_64,

89d50c45048e5d7230ddde9afc8fbc83143e34cb is the first bad commit
commit 89d50c45048e5d7230ddde9afc8fbc83143e34cb
Author: Levy Hsu 
Date:   Wed Sep 4 16:34:04 2024 +0930

i386: Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

caused

FAIL: gcc.target/i386/part-vect-vec_cmpbf.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3596/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/part-vect-vec_cmpbf.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH] JSON dumping for GENERIC trees

2024-09-12 Thread Richard Biener

On Thu, Sep 12, 2024 at 12:51 PM David Malcolm  wrote:
>
> On Wed, 2024-09-11 at 20:49 -0500, tcpreimesber...@gmail.com wrote:
> > From: Thor C Preimesberger 
> >
> > This patch allows the compiler to dump GENERIC trees as JSON objects.
> >
> > The dump flag -fdump-tree-original-json dumps each fndecl node in the
> > C frontend's gimplifier as a JSON object and traverses related nodes
> > in an analagous manner as to raw-dumping.
>
> Thanks for posting this patch.
>
> Are you able to upload somewhere some examples of what the dumps look
> like?

I found https://renhongl.github.io/json-editor/ which seems to accept
the output of -fdump-tree-original-json visualizes the raw JSON structure
when the input is from a single function.

struct S { int i; int j; } s;
int bar ()
{
  return s.i + s.j;
}
int main()
{
  return bar ();
}

no longer recognizes it, I would guess we'd need to produce an outer
"file level" JSON node.  Simply wrapping the file in [{ ... }] didn't
work even with comma separating two functions.

The JSON for bar looks like (sorry for the long paste)

[{"addr": "0x7f8256bda3c0",
  "expr_loc": [{"file": "t.c",
"line": 3,
"column": 1}],
  "start_loc": [{"file": "t.c",
 "line": 3,
 "column": 1}],
  "finish_loc": [{"file": "t.c",
  "line": 3,
  "column": 1}],
  "tree code": "bind_expr",
  "bind_expr_body": {"addr": "0x7f8256bab860",
 "expr_loc": [{"file": "t.c",
   "line": 4,
   "column": 14}],
 "start_loc": [{"file": "t.c",
"line": 4,
"column": 10}],
 "finish_loc": [{"file": "t.c",
 "line": 4,
 "column": 18}],
 "tree code": "return_expr",
 "return_expr": {"addr": "0x7f8256bb1b40",
 "expr_loc": [{"file": "t.c",
   "line": 4,
   "column": 14}],
 "start_loc": [{"file": "t.c",
"line": 4,
"column": 10}],
 "finish_loc": [{"file": "t.c",
 "line": 4,
 "column": 18}],
 "tree code": "plus_expr",
 "bin_operator": "+",
 "operands": [{"addr": "0x7f8256bda360",
   "expr_loc": [{"file": "t.c",
 "line": 4,
 "column": 11}],
   "start_loc": [{"file": "t.c",
  "line": 4,

"column": 10}],
   "finish_loc":
[{"file": "t.c",
   "line": 4,

"column": 12}],
   "tree code": "component_ref",
   "expr": {"addr":
"0x7f8256a10c60",

"decl_loc": [{"file": "t.c",

   "line": 1,

   "column": 28}],
"tree
code": "var_decl",
"used": true,
"public": true,
" static": true,
"read": true,
"mode": "DI",

"defer-output": true,
"id_to_locale": "s",
"id_point": "s"},
   "field": {"addr":
"0x7f8256a30688",

"decl_loc": [{"file": "t.c",

"line": 1,

"column": 16}],
 "tree
code": "field_decl",
 "mode": "SI",

"id_to_locale": "i",
 "id_point": "i"}},
  {"addr": "0x7f8256bda390",
   "expr_loc": [{"file": "t.c",
 "line": 4,
 "colum

Re: [PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions

2024-09-12 Thread Richard Sandiford

Soumya AR  writes:
> On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
> instructions like SHL have a throughput of 2. We can lean on that to emit code
> like:
>  add  z31.b, z31.b, z31.b
> instead of:
>  lsl  z31.b, z31.b, #1
>
> The implementation of this change for SVE vectors is similar to a prior patch
>  that adds
> the above functionality for Neon vectors.
>
> Here, the machine descriptor pattern is split up to separately accommodate 
> left
> and right shifts, so we can specifically emit an add for all left shifts by 
> 1. 

Thanks for doing this.

> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Soumya AR 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md (*post_ra_v3): Split 
> pattern to
>   accomodate left and right shifts separately.
>   (*post_ra_v_ashl3): Matches left shifts with additional 
> constraint to
>   check for shifts by 1.
>   (*post_ra_v_3): Matches right shifts.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1 
> with
>   corresponding add
>   * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise. 
>   * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
>   * gcc.target/aarch64/sve/adr_1.c: Likewise.
>   * gcc.target/aarch64/sve/adr_6.c: Likewise.
>   * gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
>   * gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
>   * gcc.target/aarch64/sve/shift_2.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
>   * gcc.target/aarch64/sve/sve_shl_add.c: New test.
>
> From 94e9cbee44d42c60e94fe89e6ce57526206c13aa Mon Sep 17 00:00:00 2001
> From: Soumya AR 
> Date: Tue, 10 Sep 2024 14:18:44 +0530
> Subject: [PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE
>  instructions.
>
> On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
> instructions like SHL have a throughput of 2. We can lean on that to emit code
> like:
>  add  z31.b, z31.b, z31.b
> instead of:
>  lsl  z31.b, z31.b, #1
>
> The implementation of this change for SVE vectors is similar to a prior patch
>  that adds
> the above functionality for Neon vectors.
>
> Here, the machine descriptor pattern is split up to separately accommodate 
> left
> and right shifts, so we can specifically emit an add for all left shifts by 1.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Soumya AR 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md (*post_ra_v3): Split 
> pattern to
>   accomodate left and right shifts separately.
>   (*post_ra_v_ashl3): Matches left shifts with additional 
> constraint to
>   check for shifts by 1.
>   (*post_ra_v_3): Matches right shifts.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1 
> with
>   corresponding add
>   * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
>   * gcc.target/a

Re: [PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions

2024-09-12 Thread Richard Biener

On Thu, Sep 12, 2024 at 2:35 PM Richard Sandiford
 wrote:
>
> Soumya AR  writes:
> > On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
> > instructions like SHL have a throughput of 2. We can lean on that to emit 
> > code
> > like:
> >  add  z31.b, z31.b, z31.b
> > instead of:
> >  lsl  z31.b, z31.b, #1
> >
> > The implementation of this change for SVE vectors is similar to a prior 
> > patch
> >  that 
> > adds
> > the above functionality for Neon vectors.
> >
> > Here, the machine descriptor pattern is split up to separately accommodate 
> > left
> > and right shifts, so we can specifically emit an add for all left shifts by 
> > 1.
>
> Thanks for doing this.

I do wonder whether our scheduling infrastructure has the ability to "mutate"
instructions in cases like here if either adds or shifts exceed their
available resources
but there is a resource readily available in an alternate instruction form?

Richard.

> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > regression.
> > OK for mainline?
> >
> > Signed-off-by: Soumya AR 
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve.md (*post_ra_v3): Split 
> > pattern to
> >   accomodate left and right shifts separately.
> >   (*post_ra_v_ashl3): Matches left shifts with additional 
> > constraint to
> >   check for shifts by 1.
> >   (*post_ra_v_3): Matches right shifts.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of 
> > lsl-1 with
> >   corresponding add
> >   * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
> >   * gcc.target/aarch64/sve/adr_1.c: Likewise.
> >   * gcc.target/aarch64/sve/adr_6.c: Likewise.
> >   * gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
> >   * gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
> >   * gcc.target/aarch64/sve/shift_2.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
> >   * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
> >   * gcc.target/aarch64/sve/sve_shl_add.c: New test.
> >
> > From 94e9cbee44d42c60e94fe89e6ce57526206c13aa Mon Sep 17 00:00:00 2001
> > From: Soumya AR 
> > Date: Tue, 10 Sep 2024 14:18:44 +0530
> > Subject: [PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE
> >  instructions.
> >
> > On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
> > instructions like SHL have a throughput of 2. We can lean on that to emit 
> > code
> > like:
> >  add  z31.b, z31.b, z31.b
> > instead of:
> >  lsl  z31.b, z31.b, #1
> >
> > The implementation of this change for SVE vectors is similar to a prior 
> > patch
> >  that 
> > adds
> > the above functionality for Neon vectors.
> >
> > Here, the machine descriptor pattern is split up to separately accommodate 
> > left
> > and right shifts, so we can specifically emit an add for all left shifts by 
> > 1.
> >
> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > regression.
> > OK for mainline?
> >
> > Signed-off-by: Soumya AR 
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve.md (*post_ra_v3): Split 
> > pattern to
> >

[patch,avr] Rework avr_out_compare

2024-09-12 Thread Georg-Johann Lay


This patch reworks avr_out_compare:

Use new convenient helper functions that may be useful in
other output functions, too.

Generalized some special cases that only work for EQ and NE
comparisons.  For example, with the patch

;; R24:SI == -1 (unused after)
adiw r26,1
sbci r25,hi8(-1)
sbci r24,lo8(-1)

;; R18:SI == -1
cpi r18,-1
cpc r19,r18
cpc r20,r18
cpc r21,r18

Without the patch, we had:

;; R24:SI == -1 (unused after)
cpi r24,-1
sbci r25,-1
sbci r26,-1
sbci r27,-1

;; R18:SI == -1
cpi r18,-1
ldi r24,-1
cpc r19,r24
cpc r20,r24
cpc r21,r24

Ok for trunk?

This patch requires "Tweak 32-bit comparisons".

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662738.html

Johann

--

AVR: Rework avr_out_compare.

16-bit comparisons like R25:24 == -1 are currently performed like
cpi R24, -1
cpc R25, R24
Similar is possible for wider modes.  ADIW can be used like SBIW when
the compare code is EQ or NE because such comparisons are just about
(propagating) the Z flag.  The patch adds helper functions like avr_byte()
that may be useful in other functions than avr_out_compare().

gcc/
* config/avr/avr.cc (avr_chunk, avr_byte, avr_word)
(avr_int8, avr_uint8, avr_int16): New helper functions.
(avr_out_compare): Overhaul.AVR: Rework avr_out_compare.

16-bit comparisons like R25:24 == -1 are currently performed like
cpi R24, -1
cpc R25, R24
Similar is possible for wider modes.  ADIW can be used like SBIW when
the compare code is EQ or NE because such comparisons are just about
(propagating) the Z flag.  The patch adds helper functions like avr_byte()
that may be useful in other functions than avr_out_compare().

gcc/
* config/avr/avr.cc (avr_chunk, avr_byte, avr_word)
(avr_int8, avr_uint8, avr_int16): New helper functions.
(avr_out_compare): Overhaul.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 99657911171..1cfbfe6ec3b 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -322,6 +322,68 @@ avr_to_int_mode (rtx x)
 }
 
 
+/* Return chunk of mode MODE of X as an rtx.  N specifies the subreg
+   byte at which the chunk starts.  N must be an integral multiple
+   of the mode size.  */
+
+static rtx
+avr_chunk (machine_mode mode, rtx x, int n)
+{
+  gcc_assert (n % GET_MODE_SIZE (mode) == 0);
+  machine_mode xmode = GET_MODE (x) == VOIDmode ? DImode : GET_MODE (x);
+  return simplify_gen_subreg (mode, x, xmode, n);
+}
+
+
+/* Return the N-th byte of X as an rtx.  */
+
+static rtx
+avr_byte (rtx x, int n)
+{
+  return avr_chunk (QImode, x, n);
+}
+
+
+/* Return the sub-word of X starting at byte number N.  */
+
+static rtx
+avr_word (rtx x, int n)
+{
+  return avr_chunk (HImode, x, n);
+}
+
+
+/* Return the N-th byte of compile-time constant X as an int8_t.  */
+
+static int8_t
+avr_int8 (rtx x, int n)
+{
+  gcc_assert (CONST_INT_P (x) || CONST_FIXED_P (x) || CONST_DOUBLE_P (x));
+
+  return (int8_t) trunc_int_for_mode (INTVAL (avr_byte (x, n)), QImode);
+}
+
+/* Return the N-th byte of compile-time constant X as an uint8_t.  */
+
+static uint8_t
+avr_uint8 (rtx x, int n)
+{
+  return (uint8_t) avr_int8 (x, n);
+}
+
+
+/* Return the sub-word of compile-time constant X that starts
+   at byte N as an int16_t.  */
+
+static int16_t
+avr_int16 (rtx x, int n)
+{
+  gcc_assert (CONST_INT_P (x) || CONST_FIXED_P (x) || CONST_DOUBLE_P (x));
+
+  return (int16_t) trunc_int_for_mode (INTVAL (avr_word (x, n)), HImode);
+}
+
+
 /* Return true if hard register REG supports the ADIW and SBIW instructions.  */
 
 bool
@@ -5574,9 +5636,6 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
   xval = avr_to_int_mode (xop[1]);
 }
 
-  /* MODE of the comparison.  */
-  machine_mode mode = GET_MODE (xreg);
-
   gcc_assert (REG_P (xreg));
   gcc_assert ((CONST_INT_P (xval) && n_bytes <= 4)
 	  || (const_double_operand (xval, VOIDmode) && n_bytes == 8));
@@ -5584,13 +5643,15 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
   if (plen)
 *plen = 0;
 
+  const bool eqne_p = compare_eq_p (insn);
+
   /* Comparisons == +/-1 and != +/-1 can be done similar to camparing
  against 0 by ORing the bytes.  This is one instruction shorter.
  Notice that 64-bit comparisons are always against reg:ALL8 18 (ACC_A)
  and therefore don't use this.  */
 
-  if (!test_hard_reg_class (LD_REGS, xreg)
-  && compare_eq_p (insn)
+  if (eqne_p
+  && ! test_hard_reg_class (LD_REGS, xreg)
   && reg_unused_after (insn, xreg))
 {
   if (xval == const1_rtx)
@@ -5619,39 +5680,11 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
 	}
 }
 
-  /* Comparisons == -1 and != -1 of a d-register that's used after the
- comparison.  (If it's unused after we use CPI / SBCI or ADIW sequence
- from below.)  Instead of  CPI Rlo,-

Re: [PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions

2024-09-12 Thread Richard Sandiford

Richard Biener  writes:
> On Thu, Sep 12, 2024 at 2:35 PM Richard Sandiford
>  wrote:
>>
>> Soumya AR  writes:
>> > On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
>> > instructions like SHL have a throughput of 2. We can lean on that to emit 
>> > code
>> > like:
>> >  add  z31.b, z31.b, z31.b
>> > instead of:
>> >  lsl  z31.b, z31.b, #1
>> >
>> > The implementation of this change for SVE vectors is similar to a prior 
>> > patch
>> >  that 
>> > adds
>> > the above functionality for Neon vectors.
>> >
>> > Here, the machine descriptor pattern is split up to separately accommodate 
>> > left
>> > and right shifts, so we can specifically emit an add for all left shifts 
>> > by 1.
>>
>> Thanks for doing this.
>
> I do wonder whether our scheduling infrastructure has the ability to "mutate"
> instructions in cases like here if either adds or shifts exceed their
> available resources
> but there is a resource readily available in an alternate instruction form?

Yeah, that sounds like a useful feature in general.  But in this particular
case, the shift resources are a subset of the addition resources, so there
should never be a specific advantage to using shifts.

Thanks,
Richard

> Richard.
>
>> > The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> > regression.
>> > OK for mainline?
>> >
>> > Signed-off-by: Soumya AR 
>> >
>> > gcc/ChangeLog:
>> >
>> >   * config/aarch64/aarch64-sve.md (*post_ra_v3): Split 
>> > pattern to
>> >   accomodate left and right shifts separately.
>> >   (*post_ra_v_ashl3): Matches left shifts with additional 
>> > constraint to
>> >   check for shifts by 1.
>> >   (*post_ra_v_3): Matches right shifts.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of 
>> > lsl-1 with
>> >   corresponding add
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
>> >   * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
>> >   * gcc.target/aarch64/sve/adr_1.c: Likewise.
>> >   * gcc.target/aarch64/sve/adr_6.c: Likewise.
>> >   * gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
>> >   * gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
>> >   * gcc.target/aarch64/sve/shift_2.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
>> >   * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
>> >   * gcc.target/aarch64/sve/sve_shl_add.c: New test.
>> >
>> > From 94e9cbee44d42c60e94fe89e6ce57526206c13aa Mon Sep 17 00:00:00 2001
>> > From: Soumya AR 
>> > Date: Tue, 10 Sep 2024 14:18:44 +0530
>> > Subject: [PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE
>> >  instructions.
>> >
>> > On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
>> > instructions like SHL have a throughput of 2. We can lean on that to emit 
>> > code
>> > like:
>> >  add  z31.b, z31.b, z31.b
>> > instead of:
>> >  lsl  z31.b, z31.b, #1
>> >
>> > The implementation of this change for SVE vectors is similar to a prior 
>> > patch
>> >  that 
>> > adds
>> > the above functionality for Neon vectors.
>> >
>> > Here, the machine descriptor pattern is split up to separately accomm

Re: [RFC 0/4] Hard Register Constraints

2024-09-12 Thread Georg-Johann Lay





Am 10.09.24 um 16:20 schrieb Stefan Schulze Frielinghaus:

This series introduces hard register constraints.  The first patch
enables hard register constraints for asm statements and for
machine descriptions.  The subsequent patch adds some basic error
handling for asm statements.  The third patch adds some verification of
register names used in machine description.  The fourth and last patch
adds the feature of rewriting local register asm into hard register
constraints.

This series was bootstrapped and regtested on s390.  Furthermore, the
new dg-compile tests were verified via cross compilers for the enabled
targets.  There is still some fallout if -fdemote-register-asm is used
since a couple of features are missing as e.g. erroring out during
gimplification if the clobber set of registers intersects with
input/output registers.

As a larger test vehicle I've compiled and regtested glibc on s390 using
-fdemote-register-asm without any fallout.  On x86_64 this fails due to
the limitation that fixed registers are currently not supported for hard
register constraints (see commit message of the first patch).  This is
also the reason why I'm posting this series already since I was hoping
to get some feedback about this limitation.

Furthermore, I've compiled the Linux kernel on s390 and x86_64 with
-fdemote-register-asm.  Interestingly, the Linux kernel for x86_64 makes
use of the following asm statement:

#define call_on_stack(stack, func, asm_call, argconstr...)  \
{   \
 register void *tos asm("r11");  \
 \
 tos = ((void *)(stack));\
 \
 asm_inline volatile(\
 "movq   %%rsp, (%[tos]) \n" \
 "movq   %[tos], %%rsp   \n" \
 \
 asm_call\
 \
 "popq   %%rsp   \n" \
 \
 : "+r" (tos), ASM_CALL_CONSTRAINT   \
 : [__func] "i" (func), [tos] "r" (tos) argconstr\
 : "cc", "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10",   \
   "memory"  \
 );  \
}

Note the output
   "+r" (tos)
and the input
   [tos] "r" (tos)
Currently I error out for this since I consider this as two inputs using
the same hard register.  One time an implicit input via '+' and a second
time via the explicit input.  Thus, actually I would expect a '='


Would you explain why the two operands are supposed to live in the same
hard register?

From my understanding of asm semantics, this gives you two copies of
tos:  The 1st one may be altered by the asm, and the 2nd one may not be
changed.  As the operands neither refer to each other by "0" nor don't
they use the same (single-register) register constraint, there is no
reason / requirement to allocate the two operands to the same reg, no?

Johann



instead of a '+' for the output constraint since the input is explicitly
mentioned, or remove the input entirely and just use the inoutput
[tos] "+r" (tos)
If you consider this valid asm I would have to adjust the error
handling.  Either way, this is just about error handling and doesn't
really affect code generation.

Stefan Schulze Frielinghaus (4):
   Hard register constraints
   Error handling for hard register constraints
   genoutput: Verify hard register constraints
   Rewrite register asm into hard register constraints

  gcc/cfgexpand.cc  |  42 ---
  gcc/common.opt|   4 +
  gcc/function.cc   | 116 
  gcc/genoutput.cc  |  60 
  gcc/genpreds.cc   |   4 +-
  gcc/gimplify.cc   | 151 +-
  gcc/gimplify_reg_info.h   | 130 +
  gcc/ira.cc|  79 +-
  gcc/lra-constraints.cc|  13 +
  gcc/output.h  |   2 +
  gcc/recog.cc  |  11 +-
  gcc/stmt.cc   | 268 +-
  gcc/stmt.h|   9 +-
  gcc/testsuite/gcc.dg/asm-hard-reg-1.c |  85 ++
  gcc/testsuite/gcc.dg/asm-hard-

Re: [PATCH] c++: Disable deprecated/unavailable diagnostics when creating thunks for methods with such attributes [PR116636]

2024-09-12 Thread Marek Polacek

On Wed, Sep 11, 2024 at 11:26:35PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> On the following testcase, we emit false positive warnings/errors about using
> the deprecated or unavailable methods when creating thunks for them, even
> when nothing (in the testcase so far) actually used those.
> 
> The following patch temporarily disables that diagnostics when creating
> the thunks.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.
 
> 2024-09-11  Jakub Jelinek  
> 
>   PR c++/116636
>   * method.cc: Include decl.h.
>   (use_thunk): Temporarily change deprecated_state to
>   UNAVAILABLE_DEPRECATED_SUPPRESS.
> 
>   * g++.dg/warn/deprecated-19.C: New test.
> 
> --- gcc/cp/method.cc.jj   2024-09-06 13:43:37.823301244 +0200
> +++ gcc/cp/method.cc  2024-09-11 12:19:57.420486173 +0200
> @@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.
>  #include "coretypes.h"
>  #include "target.h"
>  #include "cp-tree.h"
> +#include "decl.h"
>  #include "stringpool.h"
>  #include "cgraph.h"
>  #include "varasm.h"
> @@ -283,6 +284,11 @@ use_thunk (tree thunk_fndecl, bool emit_
>/* Thunks are always addressable; they only appear in vtables.  */
>TREE_ADDRESSABLE (thunk_fndecl) = 1;
>  
> +  /* Don't diagnose deprecated or unavailable functions just because they
> + have thunks emitted for them.  */
> +  auto du = make_temp_override (deprecated_state,
> +UNAVAILABLE_DEPRECATED_SUPPRESS);
> +
>/* Figure out what function is being thunked to.  It's referenced in
>   this translation unit.  */
>TREE_ADDRESSABLE (function) = 1;
> --- gcc/testsuite/g++.dg/warn/deprecated-19.C.jj  2024-09-11 
> 12:50:25.34263 +0200
> +++ gcc/testsuite/g++.dg/warn/deprecated-19.C 2024-09-11 13:05:29.210222060 
> +0200
> @@ -0,0 +1,22 @@
> +// PR c++/116636
> +// { dg-do compile }
> +// { dg-options "-pedantic -Wdeprecated" }
> +
> +struct A {
> +  virtual int foo () = 0;
> +};
> +struct B : virtual A {
> +  [[deprecated]] int foo () { return 0; }// { dg-message "declared here" 
> }
> +};   // { dg-warning "C\\\+\\\+11 
> attributes only available with" "" { target c++98_only } .-1 }
> +struct C : virtual A {
> +  [[gnu::unavailable]] int foo () { return 0; }  // { dg-message 
> "declared here" }
> +};   // { dg-warning "C\\\+\\\+11 
> attributes only available with" "" { target c++98_only } .-1 }
> +
> +void
> +bar ()
> +{
> +  B b;
> +  b.foo ();  // { dg-warning "'virtual int 
> B::foo\\\(\\\)' is deprecated" }
> +  C c;
> +  c.foo ();  // { dg-error "'virtual int 
> C::foo\\\(\\\)' is unavailable" }
> +}
> 
>   Jakub
> 

Marek

Re: [PATCH v1][GCC] aarch64: Add GCS build attributes support.

2024-09-12 Thread Eric Gallager

On Wed, Sep 11, 2024 at 11:51 AM Srinath Parvathaneni
 wrote:
>
> This patch adds support for aarch64 gcs build attributes.

Hi, just wondering if you could clarify what "GCS" stands for in this
context? When I see it, my first thought is "GNU Coding Standards",
but I don't think that's right...

> This support includes generating two new assembler directives
> .aeabi_subsection and .aeabi_attribute. These directives are
> generated as per the syntax mentioned in spec
> "Build Attributes for the Arm® 64-bit Architecture (AArch64)"
> available at [1].
>
> To check whether the assembler being used to build the toolchain
> supports these directives, a new gcc configure check is added in
> gcc/configure.ac.
>
> If the assembler support these directives, .aeabi_subsection and
> .aeabi_attribute directives are emitted in the generated assembly,
> when -mbranch-protection=gcs is passed.
>
> If the assembler does not support these directives,
> .note.gnu.property section will emit the relevant gcs information
> in the generated assembly, when -mbranch-protection=gcs is passed.
>
> This patch needs to be applied on top of GCC gcs patch series [2].
>
> Bootstrapped on aarch64-none-linux-gnu and regression tested on
> aarch64-none-elf, no issues.
>
> Ok for master?
>
> Regards,
> Srinath.
>
> [1]: https://github.com/ARM-software/abi-aa/pull/230
> [2]: 
> https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/gcs
>
> gcc/ChangeLog:
>
> 2024-09-11  Srinath Parvathaneni  
>
> * config.in: Regenerated
> * config/aarch64/aarch64.cc (aarch64_emit_aeabi_attribute): New
> function declaration.
> (aarch64_emit_aeabi_subsection): Likewise.
> (aarch64_start_file): Emit gcs build attributes.
> (aarch64_file_end_indicate_exec_stack): Update gcs bit in
> note.gnu.property section.
> * configure: Regenerated.
> * configure.ac: Add gcc configure check.
>
> gcc/testsuite/ChangeLog:
>
> 2024-09-11  Srinath Parvathaneni  
>
> * gcc.target/aarch64/build-attribute-gcs.c: New test.
> ---
>  gcc/config.in |   6 +++
>  gcc/config/aarch64/a.out  | Bin 0 -> 656 bytes
>  gcc/config/aarch64/aarch64.cc |  43 ++
>  gcc/configure |  35 ++
>  gcc/configure.ac  |   7 +++
>  .../gcc.target/aarch64/build-attribute-gcs.c  |  24 ++
>  6 files changed, 115 insertions(+)
>  create mode 100644 gcc/config/aarch64/a.out
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/build-attribute-gcs.c
>

Re: [PATCH] c++: Disable deprecated/unavailable diagnostics when creating thunks for methods with such attributes [PR116636]

2024-09-12 Thread Jason Merrill


On 9/12/24 10:23 AM, Marek Polacek wrote:

On Wed, Sep 11, 2024 at 11:26:35PM +0200, Jakub Jelinek wrote:

Hi!

On the following testcase, we emit false positive warnings/errors about using
the deprecated or unavailable methods when creating thunks for them, even
when nothing (in the testcase so far) actually used those.

The following patch temporarily disables that diagnostics when creating
the thunks.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


LGTM.


Yes, OK.


2024-09-11  Jakub Jelinek  

PR c++/116636
* method.cc: Include decl.h.
(use_thunk): Temporarily change deprecated_state to
UNAVAILABLE_DEPRECATED_SUPPRESS.

* g++.dg/warn/deprecated-19.C: New test.

--- gcc/cp/method.cc.jj 2024-09-06 13:43:37.823301244 +0200
+++ gcc/cp/method.cc2024-09-11 12:19:57.420486173 +0200
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.
  #include "coretypes.h"
  #include "target.h"
  #include "cp-tree.h"
+#include "decl.h"
  #include "stringpool.h"
  #include "cgraph.h"
  #include "varasm.h"
@@ -283,6 +284,11 @@ use_thunk (tree thunk_fndecl, bool emit_
/* Thunks are always addressable; they only appear in vtables.  */
TREE_ADDRESSABLE (thunk_fndecl) = 1;
  
+  /* Don't diagnose deprecated or unavailable functions just because they

+ have thunks emitted for them.  */
+  auto du = make_temp_override (deprecated_state,
+UNAVAILABLE_DEPRECATED_SUPPRESS);
+
/* Figure out what function is being thunked to.  It's referenced in
   this translation unit.  */
TREE_ADDRESSABLE (function) = 1;
--- gcc/testsuite/g++.dg/warn/deprecated-19.C.jj2024-09-11 
12:50:25.34263 +0200
+++ gcc/testsuite/g++.dg/warn/deprecated-19.C   2024-09-11 13:05:29.210222060 
+0200
@@ -0,0 +1,22 @@
+// PR c++/116636
+// { dg-do compile }
+// { dg-options "-pedantic -Wdeprecated" }
+
+struct A {
+  virtual int foo () = 0;
+};
+struct B : virtual A {
+  [[deprecated]] int foo () { return 0; }  // { dg-message "declared here" 
}
+}; // { dg-warning "C\\\+\\\+11 attributes only 
available with" "" { target c++98_only } .-1 }
+struct C : virtual A {
+  [[gnu::unavailable]] int foo () { return 0; }// { dg-message "declared 
here" }
+}; // { dg-warning "C\\\+\\\+11 attributes only 
available with" "" { target c++98_only } .-1 }
+
+void
+bar ()
+{
+  B b;
+  b.foo ();// { dg-warning "'virtual int 
B::foo\\\(\\\)' is deprecated" }
+  C c;
+  c.foo ();// { dg-error "'virtual int 
C::foo\\\(\\\)' is unavailable" }
+}

Jakub



Marek

Re: [PATCH] libcpp: Implement clang -Wheader-guard warning [PR96842]

2024-09-12 Thread Eric Gallager

On Wed, Sep 11, 2024 at 5:28 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch implements the clang -Wheader-guard warning, which warns
> if a valid multiple inclusion header guard's #ifndef/#if !defined directive
> is immediately (no other non-line directives nor other (non-comment)
> tokens in between) followed by #define directive for some different macro,
> which in get_suggestion rules is close enough to the actual header guard
> macro (i.e. likely misspelling), the #define is object-like with empty
> definition (I've followed what clang implements) and the macro isn't defined
> later on (at least not on the final #endif at the end of a header).
>
> In this case it emits a warning, so that
> #ifndef STDIO_H
> #define STDOI_H
> ...
> #endif
> or similar misspellings can be caught.
>
> clang enables this warning by default, but I've put it into -Wall instead
> as it still seems to be a style warning, nothing more severe; if a header
> doesn't survive multiple inclusion because of the misspelling, users will
> get different diagnostics.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>

Thanks for this patch! I can't approve it myself, but I hope someone
who can does so soon! It looks fine to me, for whatever that's worth!

> 2024-09-11  Jakub Jelinek  
>
> PR preprocessor/96842
> libcpp/
> * include/cpplib.h (struct cpp_options): Add warn_header_guard member.
> (enum cpp_warning_reason): Add CPP_W_HEADER_GUARD enumerator.
> * internal.h (struct cpp_reader): Add mi_def_cmacro, mi_loc and
> mi_def_loc members.
> (_cpp_defined_macro_p): Constify type pointed by argument type.
> Formatting fix.
> * init.cc (cpp_create_reader): Clear
> CPP_OPTION (pfile, warn_header_guard).
> * directives.cc (struct if_stack): Add def_loc and mi_def_cmacro
> members.
> (DIRECTIVE_TABLE): Add IF_COND flag to define.
> (do_define): Set ifs->mi_def_cmacro on a define immediately following
> #ifndef directive for the guard.  Clear pfile->mi_valid.  Formatting
> fix.
> (do_endif): Copy over pfile->mi_def_cmacro and pfile->mi_def_loc
> if ifs->mi_def_cmacro is set and pfile->mi_cmacro isn't a defined
> macro.
> (push_conditional): Clear mi_def_cmacro and mi_def_loc members.
> * files.cc (_cpp_pop_file_buffer): Emit -Wheader-guard diagnostics.
> gcc/
> * doc/invoke.texi (Wheader-guard): Document.
> gcc/c-family/
> * c.opt (Wheader-guard): New option.
> * c.opt.urls: Regenerated.
> * c-ppoutput.cc (init_pp_output): Initialize also cb->get_suggestion.
> gcc/testsuite/
> * c-c++-common/cpp/Wheader-guard-1.c: New test.
> * c-c++-common/cpp/Wheader-guard-1-1.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-2.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-3.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-4.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-5.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-6.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-7.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-8.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-9.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-10.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-11.h: New test.
> * c-c++-common/cpp/Wheader-guard-1-12.h: New test.
> * c-c++-common/cpp/Wheader-guard-2.c: New test.
> * c-c++-common/cpp/Wheader-guard-2.h: New test.
> * c-c++-common/cpp/Wheader-guard-3.c: New test.
> * c-c++-common/cpp/Wheader-guard-3.h: New test.
>
> --- libcpp/include/cpplib.h.jj  2024-09-03 16:47:47.323031836 +0200
> +++ libcpp/include/cpplib.h 2024-09-11 16:39:36.373680969 +0200
> @@ -435,6 +435,10 @@ struct cpp_options
>/* Different -Wimplicit-fallthrough= levels.  */
>unsigned char cpp_warn_implicit_fallthrough;
>
> +  /* Nonzero means warn about a define of a different macro right after
> + #ifndef/#if !defined header guard directive.  */
> +  unsigned char warn_header_guard;
> +
>/* Nonzero means we should look for header.gcc files that remap file
>   names.  */
>unsigned char remap;
> @@ -702,7 +706,8 @@ enum cpp_warning_reason {
>CPP_W_EXPANSION_TO_DEFINED,
>CPP_W_BIDIRECTIONAL,
>CPP_W_INVALID_UTF8,
> -  CPP_W_UNICODE
> +  CPP_W_UNICODE,
> +  CPP_W_HEADER_GUARD
>  };
>
>  /* Callback for header lookup for HEADER, which is the name of a
> --- libcpp/internal.h.jj2024-09-03 16:47:47.324031823 +0200
> +++ libcpp/internal.h   2024-09-11 17:09:26.481097532 +0200
> @@ -493,9 +493,11 @@ struct cpp_reader
>   been used.  */
>bool seen_once_only;
>
> -  /* Multiple include optimization.  */
> +  /* Multiple include optimization and -Wheader-guard warning.  */
>const cpp_hashnode *mi_cmacro;
>const cpp_hashnode *mi_ind_cmacro;
> +  const

Re: [PATCH v2] c++: Don't crash when mangling member with anonymous union or template types [PR100632, PR109790]

2024-09-12 Thread Jason Merrill


On 9/12/24 7:23 AM, Simon Martin wrote:

Hi,

While looking at more open PRs, I have discovered that the problem
reported in PR109790 is very similar to that in PR100632, so I’m
combining both in a single patch attached here. The fix is similar to

the one I initially submitted, only more general and I believe better.



We currently crash upon mangling members that have an anonymous union
or a template type.

The problem is that before calling write_unqualified_name,
write_member_name has an assert that assumes that it has an
IDENTIFIER_NODE in its hand. However it's incorrect: it has an
anonymous union in PR100632, and a template in PR109790.


The assert does not assume it has an IDENTIFIER_NODE; it assumes it has 
a _DECL, and expects its DECL_NAME to be an IDENTIFIER_NODE.


!identifier_p will always be true for a _DECL, making the assert useless.

How about checking !DECL_NAME (member) instead of !identifier_p?


This patch fixes this by relaxing the assert to accept members that are
not identifiers, that are handled by write_unqualified_name just fine.

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-12 Thread Hongtao Liu

On Wed, Sep 11, 2024 at 4:21 PM Hongtao Liu  wrote:
>
> On Wed, Sep 11, 2024 at 4:04 PM Richard Biener
>  wrote:
> >
> > On Wed, Sep 11, 2024 at 4:17 AM liuhongt  wrote:
> > >
> > > GCC12 enables vectorization for O2 with very cheap cost model which is 
> > > restricted
> > > to constant tripcount. The vectorization capacity is very limited w/ 
> > > consideration
> > > of codesize impact.
> > >
> > > The patch extends the very cheap cost model a little bit to support 
> > > variable tripcount.
> > > But still disable peeling for gaps/alignment, runtime aliasing checking 
> > > and epilogue
> > > vectorization with the consideration of codesize.
> > >
> > > So there're at most 2 versions of loop for O2 vectorization, one 
> > > vectorized main loop
> > > , one scalar/remainder loop.
> > >
> > > .i.e.
> > >
> > > void
> > > foo1 (int* __restrict a, int* b, int* c, int n)
> > > {
> > >  for (int i = 0; i != n; i++)
> > >   a[i] = b[i] + c[i];
> > > }
> > >
> > > with -O2 -march=x86-64-v3, will be vectorized to
> > >
> > > .L10:
> > > vmovdqu (%r8,%rax), %ymm0
> > > vpaddd  (%rsi,%rax), %ymm0, %ymm0
> > > vmovdqu %ymm0, (%rdi,%rax)
> > > addq$32, %rax
> > > cmpq%rdx, %rax
> > > jne .L10
> > > movl%ecx, %eax
> > > andl$-8, %eax
> > > cmpl%eax, %ecx
> > > je  .L21
> > > vzeroupper
> > > .L12:
> > > movl(%r8,%rax,4), %edx
> > > addl(%rsi,%rax,4), %edx
> > > movl%edx, (%rdi,%rax,4)
> > > addq$1, %rax
> > > cmpl%eax, %ecx
> > > jne .L12
> > >
> > > As measured with SPEC2017 on EMR, the patch(N-Iter) improves performance 
> > > by 4.11%
> > > with extra 2.8% codeisze, and cheap cost model improve performance by 
> > > 5.74% with
> > > extra 8.88% codesize. The details are as below
> >
> > I'm confused by this, is the N-Iter numbers ontop of the cheap cost
> > model numbers?
> No, it's N-iter vs base(very cheap cost model), and cheap vs base.
> >
> > > Performance measured with -march=x86-64-v3 -O2 on EMR
> > >
> > > N-Iter  cheap cost model
> > > 500.perlbench_r -0.12%  -0.12%
> > > 502.gcc_r   0.44%   -0.11%
> > > 505.mcf_r   0.17%   4.46%
> > > 520.omnetpp_r   0.28%   -0.27%
> > > 523.xalancbmk_r 0.00%   5.93%
> > > 525.x264_r  -0.09%  23.53%
> > > 531.deepsjeng_r 0.19%   0.00%
> > > 541.leela_r 0.22%   0.00%
> > > 548.exchange2_r -11.54% -22.34%
> > > 557.xz_r0.74%   0.49%
> > > GEOMEAN INT -1.04%  0.60%
> > >
> > > 503.bwaves_r3.13%   4.72%
> > > 507.cactuBSSN_r 1.17%   0.29%
> > > 508.namd_r  0.39%   6.87%
> > > 510.parest_r3.14%   8.52%
> > > 511.povray_r0.10%   -0.20%
> > > 519.lbm_r   -0.68%  10.14%
> > > 521.wrf_r   68.20%  76.73%
> >
> > So this seems to regress as well?
> Niter increases performance less than the cheap cost model, that's
> expected, it is not a regression.
> >
> > > 526.blender_r   0.12%   0.12%
> > > 527.cam4_r  19.67%  23.21%
> > > 538.imagick_r   0.12%   0.24%
> > > 544.nab_r   0.63%   0.53%
> > > 549.fotonik3d_r 14.44%  9.43%
> > > 554.roms_r  12.39%  0.00%
> > > GEOMEAN FP  8.26%   9.41%
> > > GEOMEAN ALL 4.11%   5.74%

I've tested the patch on aarch64, it shows similar improvement with
little codesize increasement.
I haven't tested it on other backends, but I think it would have
similar good improvements
> > >
> > > Code sise impact
> > > N-Iter  cheap cost model
> > > 500.perlbench_r 0.22%   1.03%
> > > 502.gcc_r   0.25%   0.60%
> > > 505.mcf_r   0.00%   32.07%
> > > 520.omnetpp_r   0.09%   0.31%
> > > 523.xalancbmk_r 0.08%   1.86%
> > > 525.x264_r  0.75%   7.96%
> > > 531.deepsjeng_r 0.72%   3.28%
> > > 541.leela_r 0.18%   0.75%
> > > 548.exchange2_r 8.29%   12.19%
> > > 557.xz_r0.40%   0.60%
> > > GEOMEAN INT 1.07%%  5.71%
> > >
> > > 503.bwaves_r12.89%  21.59%
> > > 507.cactuBSSN_r 0.90%   20.19%
> > > 508.namd_r  0.77%   14.75%
> > > 510.parest_r0.91%   3.91%
> > > 511.povray_r0.45%   4.08%
> > > 519.lbm_r   0.00%   0.00%
> > > 521.wrf_r   5.97%   12.79%
> > > 526.blender_r   0.49%   3.84%
> > > 527.cam4_r  1.39%   3.28%
> > > 538.imagick_r   1.86%   7.78%
> > > 544.nab_r   0.41%   3.00%
> > > 549.fotonik3d_r 25.50%  47.47%
> > > 554.roms_r  5.17%   13.01%
> > > GEOMEAN FP  4.14%   11.38%
> > > GEOMEAN ALL 2.80%   8.88%
> > >
> > >
> > > The only regression is from 548.exchange_r, the vectorizati

Re: [PATCH 1/2] c++: Make __builtin_launder reject invalid types [PR116673]

2024-09-12 Thread Jason Merrill


On 9/12/24 4:49 AM, Jonathan Wakely wrote:

Tested x86_64-linux. OK for trunk?

-- >8 --

The standard says that std::launder is ill-formed for function pointers
and cv void pointers, so there's no reason for __builtin_launder to
accept them. This change allows implementations of std::launder to defer
to the built-in for error checking, although libstdc++ will continue to
diagnose it directly for more user-friendly diagnostics.

PR c++/116673

gcc/cp/ChangeLog:

* semantics.cc (finish_builtin_launder): Diagnose function
pointers and cv void pointers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/launder10.C: New test.
---
  gcc/cp/semantics.cc| 17 +
  gcc/testsuite/g++.dg/cpp1z/launder10.C | 15 +++
  2 files changed, 28 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/launder10.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 63212afafb3..b194b01f865 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -13482,11 +13482,20 @@ finish_builtin_launder (location_t loc, tree arg, 
tsubst_flags_t complain)
  arg = decay_conversion (arg, complain);
if (error_operand_p (arg))
  return error_mark_node;
-  if (!type_dependent_expression_p (arg)
-  && !TYPE_PTR_P (TREE_TYPE (arg)))
+  if (!type_dependent_expression_p (arg))
  {
-  error_at (loc, "non-pointer argument to %<__builtin_launder%>");
-  return error_mark_node;
+  tree type = TREE_TYPE (arg);
+  if (!TYPE_PTR_P (type))
+   {
+ error_at (loc, "non-pointer argument to %<__builtin_launder%>");
+ return error_mark_node;
+   }
+  else if (!object_type_p (TREE_TYPE (type)))
+   {
+ // std::launder is ill-formed for function and cv void pointers.
+ error_at (loc, "invalid argument to %<__builtin_launder%>");


Let's be more specific by combining both errors into

"type %qT of argument to %<__builtin_launder"> is not a pointer to 
object type"


The tests can also be combined to !TYPE_PTROB_P.

OK with that change.


+ return error_mark_node;
+   }
  }
if (processing_template_decl)
  arg = orig_arg;
diff --git a/gcc/testsuite/g++.dg/cpp1z/launder10.C 
b/gcc/testsuite/g++.dg/cpp1z/launder10.C
new file mode 100644
index 000..7c15eeb891f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/launder10.C
@@ -0,0 +1,15 @@
+// PR c++/116673
+// { dg-do compile }
+
+void
+bar (void *p)
+{
+  __builtin_launder (bar); // { dg-error {invalid argument to 
'__builtin_launder'} }
+  __builtin_launder (p);   // { dg-error {invalid argument to 
'__builtin_launder'} }
+  const void* cp = p;
+  __builtin_launder (cp);  // { dg-error {invalid argument to 
'__builtin_launder'} }
+  volatile void* vp = p;
+  __builtin_launder (vp);  // { dg-error {invalid argument to 
'__builtin_launder'} }
+  const volatile void* cvp = p;
+  __builtin_launder (cvp); // { dg-error {invalid argument to 
'__builtin_launder'} }
+}

Re: [PATCH] libcpp: Implement clang -Wheader-guard warning [PR96842]

2024-09-12 Thread David Malcolm

On Wed, 2024-09-11 at 23:26 +0200, Jakub Jelinek wrote:
> Hi!
> 
> The following patch implements the clang -Wheader-guard warning,
> which warns
> if a valid multiple inclusion header guard's #ifndef/#if !defined
> directive
> is immediately (no other non-line directives nor other (non-comment)
> tokens in between) followed by #define directive for some different
> macro,
> which in get_suggestion rules is close enough to the actual header
> guard
> macro (i.e. likely misspelling), the #define is object-like with
> empty
> definition (I've followed what clang implements) and the macro isn't
> defined
> later on (at least not on the final #endif at the end of a header).
> 
> In this case it emits a warning, so that
> #ifndef STDIO_H
> #define STDOI_H
> ...
> #endif
> or similar misspellings can be caught.
> 
> clang enables this warning by default, but I've put it into -Wall
> instead
> as it still seems to be a style warning, nothing more severe; if a
> header
> doesn't survive multiple inclusion because of the misspelling, users
> will
> get different diagnostics.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 

Overall, LGTM, but I'm not as familiar with libcpp's implementation
details.

> 
> --- libcpp/files.cc.jj2024-09-03 16:47:47.322031849 +0200
> +++ libcpp/files.cc   2024-09-11 19:32:43.754868132 +0200
> @@ -1664,7 +1664,28 @@ _cpp_pop_file_buffer (cpp_reader *pfile,
>    /* Record the inclusion-preventing macro, which could be NULL
>   meaning no controlling macro.  */
>    if (pfile->mi_valid && file->cmacro == NULL)
> -    file->cmacro = pfile->mi_cmacro;
> +    {
> +  file->cmacro = pfile->mi_cmacro;
> +  if (pfile->mi_cmacro
> +   && pfile->mi_def_cmacro
> +   && pfile->cb.get_suggestion)
> + {
> +   const char *names[]
> +     = { (const char *) NODE_NAME (pfile->mi_def_cmacro),
> NULL };
> +   if (pfile->cb.get_suggestion (pfile,
> + (const char *)
> + NODE_NAME (pfile-
> >mi_cmacro), names)
> +   && cpp_warning_with_line (pfile, CPP_W_HEADER_GUARD,
> + pfile->mi_loc, 0,
> + "header guard \"%s\"
> followed by "
> + "\"#define\" of a different
> macro",
> + NODE_NAME (pfile-
> >mi_cmacro)))
> +     cpp_error_at (pfile, CPP_DL_NOTE, pfile->mi_def_loc,
> +   "\"%s\" is defined here; did you mean
> \"%s\"?",
> +   NODE_NAME (pfile->mi_def_cmacro),
> +   NODE_NAME (pfile->mi_cmacro));
> + }

We were chatting on IRC about how it would be nice to be able to use
%qs in libcppp diagnostics; here is an example (rather than using
\"%s\").

Not a blocker, but it occurs to me that ideally we'd group the warning
and note into a diagnostic group, but unfortunately there's no way to
express that currently via the interface libcpp has.  We would need to
add {begin,end}_group hooks, which in turn suggests that maybe that
libcpp's interface into diagnostics should be an abstract base class
with various vfuncs, rather than a callback.

Also not a blocker, but it would nice to have a fix-it hint here, by
using the rich_location overload of cpp_error_at and adding a fix-it
hint to the rich_location.

Hope this is constructive
Dave

[PATCH 0/2] arm: Allow -mcpu and -march options to be unset

2024-09-12 Thread Richard Earnshaw

This short patch series adds the ability to unset the -mcpu and -march
options on the Arm port.  This helps to avoid ambiguities and warnings
if, for some reason, the compiler flags need to be overridden.

The main intent of this is to help improve the compatibility of tests
in the testsuite.  I haven't fixed all of the possible use cases with
this series, but I have converted some of the main tables in the
dejagnu target support.

Richard Earnshaw (2):
  arm: Allow -mcpu and -march options to be unset
  arm: testsuite: make use of -mcpu=unset/-march=unset

 gcc/config/arm/arm.h   | 14 --
 gcc/doc/invoke.texi| 12 ++
 gcc/testsuite/gcc.target/arm/g2.c  |  4 +-
 gcc/testsuite/gcc.target/arm/scd42-2.c |  4 +-
 gcc/testsuite/lib/target-supports.exp  | 59 --
 5 files changed, 83 insertions(+), 10 deletions(-)

-- 
2.34.1

[PATCH 1/2] arm: Allow -mcpu and -march options to be unset

2024-09-12 Thread Richard Earnshaw

The compiler will warn if the architectural specification derived from
a -mcpu option is not the same as that specified by -march.  This is
because it was never intended that the two should be used at the same
time: -mcpu= is supposed to be shorthand for -mtune=
-march=arch-of().

Unfortunately, there are times when the two options passed to the
compiler may come from distinct sources: one example is makefiles
which accumulate options; another is the testsuite itself, when some
tests require a particular architecture setting to be useful - only
running the tests when the compiler/testsuite configuration exactly
matched the requirements would make regression testing especially hard
(we have too many permutations).

So this patch allows a user to cancel any earlier setting of a
particular flag and to make the compiler behave as though it was never
passed.  The intended usecase is (sources of options are shown in
parenthesis, but that's just for grouping:

 (-march=armv7-a+simd) (-march=unset -mcpu=cortex-m33)

The option processing logic will now simplify this to:

 -mcpu=cortex-m33

A useful corollary of this is that

 -march=armv7-a -march=unset

will now cause the compiler to behave as though neither the
architecture nor the CPU was ever set and to default back to the
configure-time settings.

gcc/ChangeLog:

* config/arm/arm.h (OPTION_DEFAULT_SPECS): Allow -mcpu and -march
to be unset.
(ARCH_CPU_CLEANUP_SPECS): Likewise
(DRIVER_SELF_SPECS): Add ARCH_CPU_CLEANUP_SPECS
* doc/invoke.texi (arm: -mcpu= and -march=): Document use of 'unset'.
---
 gcc/config/arm/arm.h | 14 +++---
 gcc/doc/invoke.texi  | 12 
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 0cd5d733952..b092ba6ffe0 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -394,9 +394,11 @@ emission of floating point pcs attributes.  */
TARGET_MODE_CHECK that also takes into account the selected CPU and
architecture.  */
 #define OPTION_DEFAULT_SPECS \
-  {"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" }, \
-  {"cpu", "%{!march=*:%{!mcpu=*:-mcpu=%(VALUE)}}" }, \
-  {"tune", "%{!mcpu=*:%{!mtune=*:-mtune=%(VALUE)}}" }, \
+  {"arch", "%{!march=*|march=unset:"\
+  "%{!mcpu=*|mcpu=unset:%

[PATCH 2/2] arm: testsuite: make use of -mcpu=unset/-march=unset

2024-09-12 Thread Richard Earnshaw

This patch makes use of the new ability to unset the CPU or
architecture flags on the command line to enable several more tests on
Arm.  It doesn't cover every case and it does enable some tests that
now fail for different reasons when the tests are no-longer skipped;
these were failing anyway for other testsuite configurations, so it's
still an overall improvement.

There's some restructuring required to fully implement this change: we
could previously treat Xscale as an architecture, even though the
option set -mcpu=, we now need to handle this correctly so that we
unset the architecture rather than the CPU.  To do this I've added a
new table for these variants and renamed the template functions to use
'cpu' rather than 'arch'.  This entailed updating the two XScale
related tests accordingly.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Move xscale to new generator table.
(check_effective_target_arm_arch_FUNC_ok): Add -mcpu=unset to the
list of flags.
(add_options_for_arm_arch_FUNC): Likewise.
(check_effective_target_arm_cpu_FUNC_ok): New function.
(add_options_for_arm_cpu_FUNC): Likewise.
(check_effective_target_arm_cpu_FUNC_link): Likewise.
(check_effective_target_arm_cpu_FUNC_multilib): Likewise.
* gcc.target/arm/g2.c: Update dg directives.
* gcc.target/arm/scd42-2.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/g2.c  |  4 +-
 gcc/testsuite/gcc.target/arm/scd42-2.c |  4 +-
 gcc/testsuite/lib/target-supports.exp  | 59 --
 3 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/g2.c 
b/gcc/testsuite/gcc.target/arm/g2.c
index 04334c97713..7e43a907a4c 100644
--- a/gcc/testsuite/gcc.target/arm/g2.c
+++ b/gcc/testsuite/gcc.target/arm/g2.c
@@ -1,8 +1,8 @@
 /* Verify that hardware multiply is preferred on XScale. */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-require-effective-target arm_arch_xscale_arm_ok } */
-/* { dg-add-options arm_arch_xscale_arm } */
+/* { dg-require-effective-target arm_cpu_xscale_arm_ok } */
+/* { dg-add-options arm_cpu_xscale_arm } */
 
 
 /* Brett Gaines' test case. */
diff --git a/gcc/testsuite/gcc.target/arm/scd42-2.c 
b/gcc/testsuite/gcc.target/arm/scd42-2.c
index cd416885a80..a263c1fbff9 100644
--- a/gcc/testsuite/gcc.target/arm/scd42-2.c
+++ b/gcc/testsuite/gcc.target/arm/scd42-2.c
@@ -1,8 +1,8 @@
 /* Verify that mov is preferred on XScale for loading a 2 byte constant. */
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_xscale_arm_ok } */
+/* { dg-require-effective-target arm_cpu_xscale_arm_ok } */
 /* { dg-options "-O" } */
-/* { dg-add-options arm_arch_xscale_arm } */
+/* { dg-add-options arm_cpu_xscale_arm } */
 
 unsigned load2(void) __attribute__ ((naked));
 unsigned load2(void)
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index cb9971d5398..c4d2c33cf62 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5679,6 +5679,9 @@ proc check_effective_target_arm_fp16_hw { } {
 # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
 #/* { dg-add-options arm_arch_v5t } */
 #   /* { dg-require-effective-target arm_arch_v5t_multilib } */
+
+# This table should only be used to set -march= (and associated
+# flags).  See below for setting -mcpu
 foreach { armfunc armflag armdefs } {
v4 "-march=armv4 -marm" __ARM_ARCH_4__
v4t "-march=armv4t -mfloat-abi=softfp" __ARM_ARCH_4T__
@@ -5690,7 +5693,6 @@ foreach { armfunc armflag armdefs } {
v5te "-march=armv5te+fp -mfloat-abi=softfp" __ARM_ARCH_5TE__
v5te_arm "-march=armv5te+fp -marm" "__ARM_ARCH_5TE__ && !__thumb__"
v5te_thumb "-march=armv5te+fp -mthumb -mfloat-abi=softfp" 
"__ARM_ARCH_5TE__ && __thumb__"
-   xscale_arm "-mcpu=xscale -mfloat-abi=soft -marm" "__XSCALE__ && 
!__thumb__"
v6 "-march=armv6+fp -mfloat-abi=softfp" __ARM_ARCH_6__
v6_arm "-march=armv6+fp -marm" "__ARM_ARCH_6__ && !__thumb__"
v6_thumb "-march=armv6+fp -mthumb -mfloat-abi=softfp" "__ARM_ARCH_6__ 
&& __thumb__"
@@ -5735,11 +5737,11 @@ foreach { armfunc armflag armdefs } {
{
return 0;
}
-   } "FLAG" ]
+   } "-mcpu=unset FLAG" ]
}
 
proc add_options_for_arm_arch_FUNC { flags } {
-   return "$flags FLAG"
+   return "$flags -mcpu=unset FLAG"
}
 
proc check_effective_target_arm_arch_FUNC_link { } {
@@ -5762,6 +5764,57 @@ foreach { armfunc armflag armdefs } {
 }]
 }
 
+# Creates a series of routines that return 1 if the given CPU
+# can be selected and a routine to give the flags to select that CPU
+# Note: Extra flags may be added to disable options from newer compilers
+# (Thumb in particular - but others may be added in the future).
+# Usage: /* { dg-require-effective-target arm_cpu_xscale_ok

Re: [PATCH] libcpp: Implement clang -Wheader-guard warning [PR96842]

2024-09-12 Thread Jakub Jelinek

On Thu, Sep 12, 2024 at 11:12:26AM -0400, David Malcolm wrote:
> We were chatting on IRC about how it would be nice to be able to use
> %qs in libcppp diagnostics; here is an example (rather than using
> \"%s\").

Yeah, I'm working on a patch for that.

> Not a blocker, but it occurs to me that ideally we'd group the warning
> and note into a diagnostic group, but unfortunately there's no way to
> express that currently via the interface libcpp has.  We would need to
> add {begin,end}_group hooks, which in turn suggests that maybe that
> libcpp's interface into diagnostics should be an abstract base class
> with various vfuncs, rather than a callback.

I haven't added auto_diagnostic_group because nothing in libcpp does that,
yes, we need some solution for that.

> Also not a blocker, but it would nice to have a fix-it hint here, by
> using the rich_location overload of cpp_error_at and adding a fix-it
> hint to the rich_location.

And yes, I was thinking about fix-it hint, but I think that depends on
better locations there first, currently the patch uses just the lines
with the directives.
I was hoping that can be done incrementally.

Jakub

Re: [PATCH] libcpp: Implement clang -Wheader-guard warning [PR96842]

2024-09-12 Thread David Malcolm

On Thu, 2024-09-12 at 17:18 +0200, Jakub Jelinek wrote:
> On Thu, Sep 12, 2024 at 11:12:26AM -0400, David Malcolm wrote:
> > We were chatting on IRC about how it would be nice to be able to
> > use
> > %qs in libcppp diagnostics; here is an example (rather than using
> > \"%s\").
> 
> Yeah, I'm working on a patch for that.

Thanks.

> 
> > Not a blocker, but it occurs to me that ideally we'd group the
> > warning
> > and note into a diagnostic group, but unfortunately there's no way
> > to
> > express that currently via the interface libcpp has.  We would need
> > to
> > add {begin,end}_group hooks, which in turn suggests that maybe that
> > libcpp's interface into diagnostics should be an abstract base
> > class
> > with various vfuncs, rather than a callback.
> 
> I haven't added auto_diagnostic_group because nothing in libcpp does
> that,
> yes, we need some solution for that.

(nods)

> 
> > Also not a blocker, but it would nice to have a fix-it hint here,
> > by
> > using the rich_location overload of cpp_error_at and adding a fix-
> > it
> > hint to the rich_location.
> 
> And yes, I was thinking about fix-it hint, but I think that depends
> on
> better locations there first, currently the patch uses just the lines
> with the directives.
> I was hoping that can be done incrementally.

Indeed, let's defer the fix-it hint to a possible followup.


Thanks
Dave

Re: [PATCH] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-12 Thread Richard Earnshaw (lists)

On 03/09/2024 13:57, Christophe Lyon wrote:
> Hi Torbjörn,
> 
> 
> On 9/3/24 11:30, Torbjörn SVENSSON wrote:
>>
>> Ok for trunk and releases/gcc-14?
>>
>> -- 
>>
>> Some of the test cases were scanning for "bti", but it would,
>> incorrectly, match the ".arch_extenssion pacbti".
>> Also, keep test cases active if a supported Cortex-M core is supplied.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/arm/bti-1.c: Enable for Cortex-M(52|55|85) and
>> check for \tbti.
>> * gcc.target/arm/bti-2.c: Likewise.
>> * gcc.target/arm/pac-15.c: Likewise.
> For pac-15.c, your patch only enables the test for cortex-m{52|55|85}, 
> there's not scan-assembler for bti :-)
> 
>> * gcc.target/arm/pac-4.c: Check for \tbti.
>> * gcc.target/arm/pac-6.c: Likewise.
>>
>> Signed-off-by: Torbjörn SVENSSON 
>> Co-authored-by: Yvan ROUX 
>> ---
>>   gcc/testsuite/gcc.target/arm/bti-1.c  | 4 ++--
>>   gcc/testsuite/gcc.target/arm/bti-2.c  | 4 ++--
>>   gcc/testsuite/gcc.target/arm/pac-15.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pac-4.c  | 2 +-
>>   gcc/testsuite/gcc.target/arm/pac-6.c  | 2 +-
>>   5 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/bti-1.c 
>> b/gcc/testsuite/gcc.target/arm/bti-1.c
>> index 79dd8010d2d..70a62b5a70c 100644
>> --- a/gcc/testsuite/gcc.target/arm/bti-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/bti-1.c
>> @@ -1,6 +1,6 @@
>>   /* Check that GCC does bti instruction.  */
>>   /* { dg-do compile } */
>> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
>> "-mcpu=*" } } */
>> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
>> "-mcpu=*" } { "-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } 
>> } */
> I'm not sure this is the way forward, but I'll let Richard comment.

We shouldn't need to do this now, but this test will need some reworking to use 
the new feature that I've added today.

> 
>>   /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
>> -mbranch-protection=bti --save-temps" } 

I think we can replace the dg-skip-if and dg-options with
/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
/* { dg-options "-mfloat-abi=softfp -mbranch-protection=bti --save-temps" } */
/* { dg-add-options arm_arch_v8_1m_main } */

And now the framework knows how to correctly override any -mcpu= option passed 
from the test run configuration.

*/
>>     int
>> @@ -9,4 +9,4 @@ main (void)
>>     return 0;
>>   }
>>   -/* { dg-final { scan-assembler "bti" } } */
>> +/* { dg-final { scan-assembler "\tbti" } } */
>> diff --git a/gcc/testsuite/gcc.target/arm/bti-2.c 
>> b/gcc/testsuite/gcc.target/arm/bti-2.c
>> index 33910563849..44c04d3df68 100644
>> --- a/gcc/testsuite/gcc.target/arm/bti-2.c
>> +++ b/gcc/testsuite/gcc.target/arm/bti-2.c
>> @@ -1,7 +1,7 @@
>>   /* { dg-do compile } */
>>   /* -Os to create jump table.  */
>>   /* { dg-options "-Os" } */
>> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
>> "-mcpu=*" } } */
>> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
>> "-mcpu=*" } { "-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } 
>> } */
>>   /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
>> -mbranch-protection=bti --save-temps" } */
>>     extern int f1 (void);
>> @@ -55,4 +55,4 @@ lab2:
>>     return 2;
>>   }
>>   -/* { dg-final { scan-assembler-times "bti" 15 } } */
>> +/* { dg-final { scan-assembler-times "\tbti" 15 } } */
>> diff --git a/gcc/testsuite/gcc.target/arm/pac-15.c 
>> b/gcc/testsuite/gcc.target/arm/pac-15.c
>> index e1054902955..a2582e64d0a 100644
>> --- a/gcc/testsuite/gcc.target/arm/pac-15.c
>> +++ b/gcc/testsuite/gcc.target/arm/pac-15.c
>> @@ -1,7 +1,7 @@
>>   /* Check that GCC does .save and .cfi_offset directives with RA_AUTH_CODE 
>> pseudo hard-register.  */
>>   /* { dg-do compile } */
>>   /* { dg-require-effective-target mbranch_protection_ok } */
>> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
>> "-mcpu=*" } } */
>> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
>> "-mcpu=*" } { "-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } 
>> } */
>>   /* { dg-options "-march=armv8.1-m.main+mve+pacbti 
>> -mbranch-protection=pac-ret -mthumb -mfloat-abi=hard 
>> -fasynchronous-unwind-tables -g -O0" } */

This one will need to use arm_arch_v8_1m_main_pacbti, but is otherwise similar.

>>     #include "stdio.h"
> How about
> -/* { dg-final { scan-assembler-times "pac   ip, lr, sp" 3 } } */
> +/* { dg-final { scan-assembler-times "\tpac\tip, lr, sp" 3 } } */
> ?
> 
>> diff --git a/gcc/testsuite/gcc.target/arm/pac-4.c 
>> b/gcc/testsuite/gcc.target/arm/pac-4.c
>> index cf915cdba50..81907079d77 100644
>> --- a/gcc/testsuite/gcc.target/arm/pac-4.c
>> +++ b/gcc/testsuite/gcc.target/arm/pac-4.c
>> @@ -5,6 +5,6 @@
>>     #include "pac.h"
>>   -/* { dg-final { scan-assembler-not "\tbti\t" } } */
>>

[PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-12 Thread Evgeny Karpov

The current binutils implementation does not support offset up to 4GB in
IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
This is related to differences in ELF and COFF relocation records.
There are ways to fix this. This work on relocation change will be extracted to
a separate binutils patch series and discussion.

To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21
relocation will remain unchanged, and the workaround below will be applied to
bypass the 1MB offset limitation.

Regards,
Evgeny


The patch will be replaced by this change.

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 03362a975c0..5f17936df1f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2896,7 +2896,30 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
if (can_create_pseudo_p ())
  tmp_reg = gen_reg_rtx (mode);

-   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
+   do
+ {
+   if (TARGET_PECOFF)
+ {
+   poly_int64 offset;
+   HOST_WIDE_INT const_offset;
+   strip_offset (imm, &offset);
+
+   if (offset.is_constant (&const_offset)
+   && abs(const_offset) >= 1 << 20)
+ {
+   rtx const_int = imm;
+   const_int = XEXP (const_int, 0);
+   XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20));
+
+   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, 
copy_rtx(imm)));
+   emit_insn (gen_add_hioffset (tmp_reg, 
GEN_INT(const_offset)));
+   break;
+ }
+ }
+
+ emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
+ } while(0);
+
emit_insn (gen_add_losym (dest, tmp_reg, imm));
return;
   }
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 665a333903c..072110f93e7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7405,6 +7405,13 @@
   DONE;
 })

+(define_insn "add_hioffset"
+  [(match_operand 0 "register_operand")
+   (match_operand 1 "const_int_operand")]
+  ""
+  "add %0, %0, (%1 & ~0xf) >> 12, lsl #12"
+)
+
 (define_insn "add_losym_"
   [(set (match_operand:P 0 "register_operand" "=r")
(lo_sum:P (match_operand:P 1 "register_operand" "r")

[PATCH v2 7/9] aarch64: Disable the anchors

2024-09-12 Thread Evgeny Karpov

This patch is not needed anymore, more information here.
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662903.html

Regards,
Evgeny

Re: [PATCH] libcpp, v5: Add support for gnu::base64 #embed parameter

2024-09-12 Thread Joseph Myers

On Thu, 12 Sep 2024, Jakub Jelinek wrote:

> On Wed, Sep 11, 2024 at 10:23:20PM +, Joseph Myers wrote:
> > On Fri, 30 Aug 2024, Jakub Jelinek wrote:
> > 
> > > +should be no newlines in the string literal and because this parameter
> > > +is meant namely for use by the preprocessor itself, there is no support
> > > +for any escape sequences in the string literal argument.  If 
> > > @code{gnu::base64}
> > 
> > Given the "no escape sequences" rule, I think there should be a test for 
> > that - testing rejection of a string that would be valid if escape 
> > sequences were processed (for example, valid base64 but with the 
> > individual characters encoded using \x), but is not valid because they are 
> > not processed.  As far as I can see, the existing tests with escape 
> > sequences are invalid for other reasons (they use \n as the escape 
> > sequence).
> 
> Thanks.
> 
> Here is an updated patch and before that just the incremental diff from
> the previous patch.

This version is OK.

-- 
Joseph S. Myers
josmy...@redhat.com

[Patch] Fortran: Fixes to OpenMP 'interop' directive parsing support

2024-09-12 Thread Tobias Burnus

This patch fixes a couple of issues, like a missing white-space gobbling 
after matching an expression.


It also reorganizes some code to handle 'identifier_"string"' vs. 
'identifier' better as there were some diagnostic issues.


(OpenMP requires for 'fr' that the argument is either an identifier 
(that is a scalar integer parameter) or a string; while for the older 
syntax, it can be any constant integer expression.)


However, the two main changes are:

* 'fr' and 'attr' actually support a list of arguments. While I believe 
'attr("x", "y") and "attr("x"),attr("y")' are semantically identically, 
supporting more than one (or zero) values for 'fr' required a different 
encoding.


* Jakub additionally suggested that for 'fr', which supports constant 
integers and string literals, we could pass on integer values – and do 
some checking.


That's what this patch does: Known string values are converted to their 
associated integer values, others to 0. And if the integer/string value 
is unknown, a warning is printed [-Wopenmp].


Known values are those in the "OpenMP API Additional Definitions" 
document, https://www.openmp.org/specifications/ – with the addition of 
hsa / 7, which has been voted at spec level (no idea about ARB level) 
but not yet published.


Note that that's the warning is based on what is defined there, i.e. 
'level_zero' there is no warning, even though GCC does not support it. 
Obviously, if will add another value next year, GCC 15 will not support 
it and warn, even if the code is perfectly valid. — But I guess we can 
live with a warning in that case.


Comments, remarks, suggestions? — Especially regarding the internal 
representation?


Tobias

PS: Next step will be to get the C/C++ parsing working, which also 
implies encoding this representation into 'tree'. (Then doing the tree 
conversion for Fortran.) Once satisfied with that, the middle end + 
libgomp part that links those bits will come next. And the question 
whether there should be one call per 'interop' directive or might be 
multiple (e.g. one per interop object in 'init'/'use'/'destroy').
Fortran: Fixes to OpenMP 'interop' directive parsing support

Handle lists as argument to 'fr' and 'attr'; fix parsing corner cases.
Additionally, 'fr' values are now internally stored as integer, permitting
the diagnoses (warning) for values not defined in the OpenMP additional
definitions document.

	PR fortran/116661

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_omp_namelist): Rename 'init' members for clarity.
	* match.cc (gfc_free_omp_namelist): Handle renaming.
	* dump-parse-tree.cc (show_omp_namelist): Update for new format
	and features.
	* openmp.cc (gfc_match_omp_prefer_type): Parse list to 'fr' and 'attr';
	store 'fr' values as integer.
	(gfc_match_omp_init): Rename variable names.

gcc/ChangeLog:

	* omp-api.h (omp_get_fr_id_from_name, omp_get_name_from_fr_id): New
	prototypes.
	* omp-general.cc (omp_get_fr_id_from_name, omp_get_name_from_fr_id):
	New.

include/ChangeLog:

	* gomp-constants.h (GOMP_INTEROP_IFR_LAST,
	GOMP_INTEROP_IFR_SEPARATOR, GOMP_INTEROP_IFR_NONE): New.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/interop-1.f90: Extend, update dg-*.
	* gfortran.dg/gomp/interop-2.f90: Update dg-error.
	* gfortran.dg/gomp/interop-3.f90: Add dg-warning.

 gcc/fortran/dump-parse-tree.cc   |  84 +---
 gcc/fortran/gfortran.h   |   4 +-
 gcc/fortran/match.cc |  10 +-
 gcc/fortran/openmp.cc| 305 ---
 gcc/omp-api.h|   3 +
 gcc/omp-general.cc   |  29 +++
 gcc/testsuite/gfortran.dg/gomp/interop-1.f90 |  32 ++-
 gcc/testsuite/gfortran.dg/gomp/interop-2.f90 |   2 +-
 gcc/testsuite/gfortran.dg/gomp/interop-3.f90 |   2 +-
 include/gomp-constants.h |   5 +
 10 files changed, 314 insertions(+), 162 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 8fc6141611c..3547d7f8aca 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -37,6 +37,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "constructor.h"
 #include "version.h"
 #include "parse.h"  /* For gfc_ascii_statement.  */
+#include "omp-api.h"  /* For omp_get_name_from_fr_id.  */
+#include "gomp-constants.h"  /* For GOMP_INTEROP_IFR_SEPARATOR.  */
 
 /* Keep track of indentation for symbol tree dumps.  */
 static int show_level = 0;
@@ -1537,35 +1539,69 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 	}
   else if (list_type == OMP_LIST_INIT)
 	{
-	  int i = 0;
 	  if (n->u.init.target)
 	fputs ("target,", dumpfile);
 	  if (n->u.init.targetsync)
 	fputs ("targetsync,", dumpfile);
-	  char *prefer_type = n->u.init.str;
-	  if (n->u.init.len)
-	fputs ("prefer_type(", dumpfile);
-	  if (n->u.init.len)
-	while (*prefer_type)
-	  {
-		fputc ('{', dumpfile);
-		if (n->u2.interop_int

Re: [RFC 0/4] Hard Register Constraints

2024-09-12 Thread Stefan Schulze Frielinghaus

On Thu, Sep 12, 2024 at 04:03:33PM +0200, Georg-Johann Lay wrote:
> 
> 
> Am 10.09.24 um 16:20 schrieb Stefan Schulze Frielinghaus:
> > This series introduces hard register constraints.  The first patch
> > enables hard register constraints for asm statements and for
> > machine descriptions.  The subsequent patch adds some basic error
> > handling for asm statements.  The third patch adds some verification of
> > register names used in machine description.  The fourth and last patch
> > adds the feature of rewriting local register asm into hard register
> > constraints.
> > 
> > This series was bootstrapped and regtested on s390.  Furthermore, the
> > new dg-compile tests were verified via cross compilers for the enabled
> > targets.  There is still some fallout if -fdemote-register-asm is used
> > since a couple of features are missing as e.g. erroring out during
> > gimplification if the clobber set of registers intersects with
> > input/output registers.
> > 
> > As a larger test vehicle I've compiled and regtested glibc on s390 using
> > -fdemote-register-asm without any fallout.  On x86_64 this fails due to
> > the limitation that fixed registers are currently not supported for hard
> > register constraints (see commit message of the first patch).  This is
> > also the reason why I'm posting this series already since I was hoping
> > to get some feedback about this limitation.
> > 
> > Furthermore, I've compiled the Linux kernel on s390 and x86_64 with
> > -fdemote-register-asm.  Interestingly, the Linux kernel for x86_64 makes
> > use of the following asm statement:
> > 
> > #define call_on_stack(stack, func, asm_call, argconstr...)  \
> > {   \
> >  register void *tos asm("r11");  \
> >  \
> >  tos = ((void *)(stack));\
> >  \
> >  asm_inline volatile(\
> >  "movq   %%rsp, (%[tos]) \n" \
> >  "movq   %[tos], %%rsp   \n" \
> >  \
> >  asm_call\
> >  \
> >  "popq   %%rsp   \n" \
> >  \
> >  : "+r" (tos), ASM_CALL_CONSTRAINT   \
> >  : [__func] "i" (func), [tos] "r" (tos) argconstr\
> >  : "cc", "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10",   \
> >"memory"  \
> >  );  \
> > }
> > 
> > Note the output
> >"+r" (tos)
> > and the input
> >[tos] "r" (tos)
> > Currently I error out for this since I consider this as two inputs using
> > the same hard register.  One time an implicit input via '+' and a second
> > time via the explicit input.  Thus, actually I would expect a '='
> 
> Would you explain why the two operands are supposed to live in the same
> hard register?
> 
> From my understanding of asm semantics, this gives you two copies of
> tos:  The 1st one may be altered by the asm, and the 2nd one may not be
> changed.  As the operands neither refer to each other by "0" nor don't
> they use the same (single-register) register constraint, there is no
> reason / requirement to allocate the two operands to the same reg, no?

During gimplification an inout operand is canonicalized into one output
and one input operand.  The input operand refers via a digit to the
output operand.  For example

asm ("" : "+r" (x));

is rewritten into

asm ("" : "=r" (x) : "0" (x));

I didn't find documentation how "digit references" behave in combination
with register asm.  At least it is not defined here
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#index-0-in-constraint
https://gcc.gnu.org/onlinedocs/gcc/Explicit-Register-Variables.html
which is why I broad this up.

In combination with an explicit input, the situation is a bit unclear to
me:

asm ("" : "+r" (x) : "r" (x));

becomes

asm ("" : "=r" (x) : "0" (x), "r" (x));

If x is a register asm, do all three operands end up in the same
register (N.B. this is exactly the situation from above)?  At least this
is the current implementation and I think this also aligns with how
"aliases" are implemented which puzzled me first:

register int x asm ("0") = 42;
register int y asm ("0") = 24;
asm ("" : "=r" (x) : "r" (y));

Anyway, I digress.  I haven't made up my mind how hard registe

Re: [PATCH] c++: explicit spec of constrained member tmpl [PR107522]

2024-09-12 Thread Patrick Palka

(Sorry to resurrect this thread so late, I lost track of this patch...)

On Fri, 2 Dec 2022, Jason Merrill wrote:

> On 12/2/22 09:30, Patrick Palka wrote:
> > On Thu, 1 Dec 2022, Jason Merrill wrote:
> > 
> > > On 12/1/22 14:51, Patrick Palka wrote:
> > > > On Thu, 1 Dec 2022, Jason Merrill wrote:
> > > > 
> > > > > On 12/1/22 11:37, Patrick Palka wrote:
> > > > > > When defining a explicit specialization of a constrained member
> > > > > > template
> > > > > > (of a class template) such as f and g in the below testcase, the
> > > > > > DECL_TEMPLATE_PARMS of the corresponding TEMPLATE_DECL are partially
> > > > > > instantiated, whereas its associated constraints are carried over
> > > > > > from the original template and thus are in terms of the original
> > > > > > DECL_TEMPLATE_PARMS.
> > > > > 
> > > > > But why are they carried over?  We wrote a specification of the
> > > > > constraints in
> > > > > terms of the template parameters of the specialization, why are we
> > > > > throwing
> > > > > that away?
> > > > 
> > > > Using the partially instantiated constraints would require adding a
> > > > special case to satisfaction since during satisfaction we currently
> > > > always use the full set of template arguments (relative to the most
> > > > general template).
> > > 
> > > But not for partial specializations, right?  It seems natural to handle
> > > this
> > > explicit instantiation the way we handle partial specializations, as both
> > > have
> > > their constraints written in terms of their template parameters.
> > 
> > True, but what about the general rule that we don't partially instantiate
> > constraints outside of declaration matching?  Checking satisfaction of
> > partially instantiated constraints here can introduce hard errors during
> > normalization, e.g.
> > 
> >template
> >concept C1 = __same_as(T, void);
> > 
> >template
> >concept C2 = C1;
> > 
> >template
> >concept D = (N == 42);
> > 
> >template
> >struct A {
> >  template
> >  static void f() requires C2 || D;
> >};
> > 
> >template<>
> >template
> >void A::f() requires C2 || D { }
> > 
> >int main() {
> >  A::f<42>();
> >}
> > 
> > Normalization of the the partially instantiated constraints will give a
> > hard error due to 'int::type' being ill-formed, whereas the uninstantiated
> > constraints are fine.
> 
> Hmm, interesting point, but in this example that happens because the
> specialization is nonsensical: we wouldn't be normalizing the
> partially-instantiated constraints so much as the ones that the user
> explicitly wrote, so a hard error seems justified.

While the written partially-instantiated constraints are nonsensical,
aren't they only needed for sake of declaration matching?  It doesn't
seem to necessarily imply that that form of constraints is what should
prevail.  This is where the analogy with partial specializations breaks
down IMHO: partial specializations own their constraints.

Implementing your desired approach isn't so bad either however.  We
mainly just need to correct for TI_ARGS being relative to the primary
template rather than the partially instantiated template.  Something
like the following?

-- >8 --

Subject: [PATCH] c++: explicit spec of constrained member tmpl [PR107522]

PR c++/107522

gcc/cp/ChangeLog:

* constraint.cc (satisfy_declaration_constraints): Remove
extraneous outer arguments for a partial or explicit
specialization.
* pt.cc (determine_specialization): For an explicit
specialization of a member template, make the partially
instantiated constraints prevail.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-explicit-spec7.C: New test.
---
 gcc/cp/constraint.cc  | 14 -
 gcc/cp/pt.cc  |  7 -
 .../g++.dg/cpp2a/concepts-explicit-spec7.C| 30 +++
 3 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-explicit-spec7.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ebfcdefd284..4dc4fedc659 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2728,6 +2728,11 @@ satisfy_declaration_constraints (tree t, sat_info info)
   args = TI_ARGS (ti);
   if (inh_ctor_targs)
args = add_outermost_template_args (args, inh_ctor_targs);
+  if (DECL_TEMPLATE_SPECIALIZATION (TI_TEMPLATE (ti)))
+   {
+ tree parms = DECL_TEMPLATE_PARMS (TI_TEMPLATE (ti));
+ args = get_innermost_template_args (args, TMPL_PARMS_DEPTH (parms));
+   }
 }
 
   if (regenerated_lambda_fn_p (t))
@@ -2811,7 +2816,14 @@ satisfy_declaration_constraints (tree t, tree args, 
sat_info info)
   args = add_to_template_args (outer_args, args);
 }
   else
-args = add_outermost_template_args (t, args);
+{
+  args = add_outermost_template_args (t, args);
+  if (DECL_TEMPLATE_SPEC

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-12 Thread Martin Storsjö


On Thu, 12 Sep 2024, Evgeny Karpov wrote:


The current binutils implementation does not support offset up to 4GB in
IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
This is related to differences in ELF and COFF relocation records.


Yes, I agree.

But I would not consider this a limitation of the binutils implementation, 
this is a limitation of the object file format. It can't be worked around 
by inventing your own custom relocations, but should instead worked around 
on the code generation side, to avoid needing such large offsets.


This approach is one such, quite valid. Another one is to generate extra 
symbols to allow addressing anything with a smaller offset.



To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21
relocation will remain unchanged, and the workaround below will be applied to
bypass the 1MB offset limitation.


This looks very reasonable - I presume this will make sure that you only 
use the other code form if the offset actually is larger than 1 MB.


For the case when the offset actually is larger than 1 MB, I guess this 
also ends up generating some other instruction sequence than just a "add 
x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code, 
it looks like it generates something like "mov x16, #imm; add x0, x0, 
x16"? That's probably quite reasonable.


I don't know how emit_move_insn behaves if the immediates are larger - 
does it generate a sequence of mov/movk to materialize a larger constant? 
Because the range of immediates you can encode in one single mov 
instruction is pretty limited anyway.


// Martin

[PATCH] Fix factor_out_conditional_operation heuristics for constants

2024-09-12 Thread Andrew Pinski

While working on a different patch, I noticed the heuristics were not
doing the right thing if there was statements before the NOP/PREDICTs.
(LABELS don't have other statements before them).

This fixes that oversight which was added in r15-3334-gceda727dafba6e.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Instead
of just ignorning a NOP/PREDICT, skip over them before checking
the heuristics.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 5710bc32e61..e5413e40572 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -332,15 +332,17 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
{
  gsi = gsi_for_stmt (arg0_def_stmt);
  gsi_prev_nondebug (&gsi);
+ /* Ignore nops, predicates and labels. */
+ while (!gsi_end_p (gsi)
+ && (gimple_code (gsi_stmt (gsi)) == GIMPLE_NOP
+ || gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT
+ || gimple_code (gsi_stmt (gsi)) == GIMPLE_LABEL))
+   gsi_prev_nondebug (&gsi);
+
  if (!gsi_end_p (gsi))
{
  gimple *stmt = gsi_stmt (gsi);
- /* Ignore nops, predicates and labels. */
- if (gimple_code (stmt) == GIMPLE_NOP
- || gimple_code (stmt) == GIMPLE_PREDICT
- || gimple_code (stmt) == GIMPLE_LABEL)
-   ;
- else if (gassign *assign = dyn_cast  (stmt))
+ if (gassign *assign = dyn_cast  (stmt))
{
  tree lhs = gimple_assign_lhs (assign);
  enum tree_code ass_code
-- 
2.43.0

Re: [PATCH] c++/modules: Merge default arguments [PR99274]

2024-09-12 Thread Patrick Palka

On Fri, 23 Aug 2024, Nathaniel Shead wrote:

> On Thu, Aug 22, 2024 at 02:20:14PM -0400, Patrick Palka wrote:
> > On Mon, 12 Aug 2024, Nathaniel Shead wrote:
> > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > 
> > > I tried to implement a remapping of the slots for TARGET_EXPRs for the
> > > FIXME but I wasn't able to work out how to do so effectively.  Given
> > > that I doubt this will be a common issue I felt probably easiest to
> > > leave it for now and focus on other issues in the meantime; thoughts?
> > > 
> > > The other thing to note is that most of this function just has a single
> > > error message always indicated by a 'goto mismatch;' but I felt that it
> > > seemed reasonable to provide more specific error messages where we can.
> > > But given that in the long term we probably want to replace this
> > > function with an appropriately enhanced 'duplicate_decls' anyway maybe
> > > it's not worth worrying about; this patch is still useful in the
> > > meantime if only for the testcases, I hope.
> > > 
> > > -- >8 --
> > > 
> > > When merging a newly imported declaration with an existing declaration
> > > we don't currently propagate new default arguments, which causes issues
> > > when modularising header units.  This patch adds logic to propagate
> > > default arguments to existing declarations on import, and error if the
> > > defaults do not match.
> > > 
> > >   PR c++/99274
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * module.cc (trees_in::is_matching_decl): Merge default
> > >   arguments.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/modules/default-arg-1_a.H: New test.
> > >   * g++.dg/modules/default-arg-1_b.C: New test.
> > >   * g++.dg/modules/default-arg-2_a.H: New test.
> > >   * g++.dg/modules/default-arg-2_b.C: New test.
> > >   * g++.dg/modules/default-arg-3.h: New test.
> > >   * g++.dg/modules/default-arg-3_a.H: New test.
> > >   * g++.dg/modules/default-arg-3_b.C: New test.
> > > 
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >  gcc/cp/module.cc  | 62 ++-
> > >  .../g++.dg/modules/default-arg-1_a.H  | 17 +
> > >  .../g++.dg/modules/default-arg-1_b.C  | 26 
> > >  .../g++.dg/modules/default-arg-2_a.H  | 17 +
> > >  .../g++.dg/modules/default-arg-2_b.C  | 28 +
> > >  gcc/testsuite/g++.dg/modules/default-arg-3.h  | 13 
> > >  .../g++.dg/modules/default-arg-3_a.H  |  5 ++
> > >  .../g++.dg/modules/default-arg-3_b.C  |  6 ++
> > >  8 files changed, 171 insertions(+), 3 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-1_a.H
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-1_b.C
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-2_a.H
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-2_b.C
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3.h
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3_a.H
> > >  create mode 100644 gcc/testsuite/g++.dg/modules/default-arg-3_b.C
> > > 
> > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > > index f4d137b13a1..87f34bac578 100644
> > > --- a/gcc/cp/module.cc
> > > +++ b/gcc/cp/module.cc
> > > @@ -11551,8 +11551,6 @@ trees_in::is_matching_decl (tree existing, tree 
> > > decl, bool is_typedef)
> > >  
> > > if (!same_type_p (TREE_VALUE (d_args), TREE_VALUE (e_args)))
> > >   goto mismatch;
> > > -
> > > -   // FIXME: Check default values
> > >   }
> > >  
> > >/* If EXISTING has an undeduced or uninstantiated exception
> > > @@ -11690,7 +11688,65 @@ trees_in::is_matching_decl (tree existing, tree 
> > > decl, bool is_typedef)
> > >if (!DECL_EXTERNAL (d_inner))
> > >  DECL_EXTERNAL (e_inner) = false;
> > >  
> > > -  // FIXME: Check default tmpl and fn parms here
> > > +  if (TREE_CODE (decl) == TEMPLATE_DECL)
> > > +{
> > > +  /* Merge default template arguments.  */
> > > +  tree d_parms = DECL_INNERMOST_TEMPLATE_PARMS (decl);
> > > +  tree e_parms = DECL_INNERMOST_TEMPLATE_PARMS (existing);
> > > +  gcc_checking_assert (TREE_VEC_LENGTH (d_parms)
> > > +== TREE_VEC_LENGTH (e_parms));
> > > +  for (int i = 0; i < TREE_VEC_LENGTH (d_parms); ++i)
> > > + {
> > > +   tree d_default = TREE_PURPOSE (TREE_VEC_ELT (d_parms, i));
> > > +   tree& e_default = TREE_PURPOSE (TREE_VEC_ELT (e_parms, i));
> > > +   if (e_default == NULL_TREE)
> > > + e_default = d_default;
> > > +   else if (d_default != NULL_TREE
> > > +&& !cp_tree_equal (d_default, e_default))
> > > + {
> > > +   auto_diagnostic_group d;
> > > +   tree d_parm = TREE_VALUE (TREE_VEC_ELT (d_parms, i));
> > > +   tree e_parm = TREE_VALUE (TREE_VEC_ELT (e_parms, i));
> > > +   error_at (DECL_SOURCE_LOCATION (d_parm),
> > > + "conflicting default argument for %#qD", d_parm);
> > > +

Re: [PATCH v2] c++: Fix constrained auto deduction templ parms resolution [PR114915, PR115030]

2024-09-12 Thread Patrick Palka

On Mon, 12 Aug 2024, Seyed Sajad Kahani wrote:

> When deducing auto for `adc_return_type`, `adc_variable_type`, and
> `adc_decomp_type` contexts (at the usage time), we try to resolve the 
> outermost
> template arguments to be used for satisfaction. This is done by one of the
> following, depending on the scope:
> 
> 1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and
> extracting `DECL_TI_ARGS` from it for function scope deductions (pt.cc:31236).
> 2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with other
> conditions) for non-function scope variable declaration deductions
> (decl.cc:8527).
> 
> Note that `DECL_TI_ARGS` for partial and explicit specializations will yield 
> the
> arguments with respect to the most_general_template, which is the primary
> template. This can lead to rejection of valid code or acceptance of invalid 
> code
> (PR115030) in a partial specialization context. For an explicitly specialized
> case, due to the mismatch between the desired depth and the actual depth of
> args, it can lead to ICEs (PR114915) where we intend to fill the missing 
> levels
> with dummy levels (pt.cc:31260), while the missing levels are negative.
> 
> This patch resolves PR114915 and PR115030 by replacing the logic of extracting
> args for the declaration in those two places with `outer_template_args`.
> `outer_template_args` is an existing function that was used in limited 
> contexts to
> do so. Now, it is extended to handle partial and explicit specializations and
> lambda functions as well. A few inevitable changes are also made to the
> signature of some functions, relaxing `const_tree` to `tree`.

Thanks for working on this and for the patch ping!

> 
>   PR c++/114915
>   PR c++/115030
> 
> gcc/cp/ChangeLog:
> 
>   * constraint.cc (maybe_substitute_reqs_for): Relax the argument type to
>   be compatible with outer_template_args.
>   * cp-tree.h (outer_template_args): Relax the argument type and add an
>   optional argument.
>   (maybe_substitute_reqs_for): Relax the argument type to be compatible
>   with outer_template_args.
>   * decl.cc (cp_finish_decl): Replace the logic of extracting args with
>   outer_template_args.
>   * pt.cc (outer_template_args): Handle partial and explicit
>   specializations and lambda functions.

Jason, I'm not sure if you had in mind extending outer_template_args
with a flag like you have, or creating a new function?

>   (do_auto_deduction): Replace the logic of extracting args with
>   outer_template_args.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-placeholder14.C: New test.
>   * g++.dg/cpp2a/concepts-placeholder15.C: New test.
>   * g++.dg/cpp2a/concepts-placeholder16.C: New test.
>   * g++.dg/cpp2a/concepts-placeholder17.C: New test.
> ---
>  gcc/cp/constraint.cc  |  2 +-
>  gcc/cp/cp-tree.h  |  4 +-
>  gcc/cp/decl.cc|  2 +-
>  gcc/cp/pt.cc  | 84 +--
>  .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +
>  .../g++.dg/cpp2a/concepts-placeholder15.C | 26 ++
>  .../g++.dg/cpp2a/concepts-placeholder16.C | 33 
>  .../g++.dg/cpp2a/concepts-placeholder17.C | 20 +
>  8 files changed, 161 insertions(+), 29 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder17.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index ebf4255e5..a1c3962c4 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -1332,7 +1332,7 @@ remove_constraints (tree t)
> for declaration matching.  */
>  
>  tree
> -maybe_substitute_reqs_for (tree reqs, const_tree decl)
> +maybe_substitute_reqs_for (tree reqs, tree decl)
>  {
>if (reqs == NULL_TREE)
>  return NULL_TREE;
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 9a8c86591..2d6733f57 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7215,7 +7215,7 @@ extern tree maybe_set_retval_sentinel   (void);
>  extern tree template_parms_to_args   (tree);
>  extern tree template_parms_level_to_args (tree);
>  extern tree generic_targs_for(tree);
> -extern tree outer_template_args  (const_tree);
> +extern tree outer_template_args  (tree, bool = true);
>  
>  /* in expr.cc */
>  extern tree cplus_expand_constant(tree);
> @@ -8560,7 +8560,7 @@ extern void remove_constraints  (tree);
>  extern tree current_template_constraints (void);
>  extern tree associate_classtype_constraints (tree);
>  extern tree build_constraints   (tre

Re: [PATCH 1/2] c++: Make __builtin_launder reject invalid types [PR116673]

2024-09-12 Thread Patrick Palka

On Thu, 12 Sep 2024, Jonathan Wakely wrote:

> Tested x86_64-linux. OK for trunk?
> 
> -- >8 --
> 
> The standard says that std::launder is ill-formed for function pointers
> and cv void pointers, so there's no reason for __builtin_launder to
> accept them. This change allows implementations of std::launder to defer
> to the built-in for error checking, although libstdc++ will continue to
> diagnose it directly for more user-friendly diagnostics.
> 
>   PR c++/116673
> 
> gcc/cp/ChangeLog:
> 
>   * semantics.cc (finish_builtin_launder): Diagnose function
>   pointers and cv void pointers.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1z/launder10.C: New test.
> ---
>  gcc/cp/semantics.cc| 17 +
>  gcc/testsuite/g++.dg/cpp1z/launder10.C | 15 +++
>  2 files changed, 28 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1z/launder10.C
> 
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 63212afafb3..b194b01f865 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -13482,11 +13482,20 @@ finish_builtin_launder (location_t loc, tree arg, 
> tsubst_flags_t complain)
>  arg = decay_conversion (arg, complain);
>if (error_operand_p (arg))
>  return error_mark_node;
> -  if (!type_dependent_expression_p (arg)
> -  && !TYPE_PTR_P (TREE_TYPE (arg)))
> +  if (!type_dependent_expression_p (arg))
>  {
> -  error_at (loc, "non-pointer argument to %<__builtin_launder%>");
> -  return error_mark_node;
> +  tree type = TREE_TYPE (arg);
> +  if (!TYPE_PTR_P (type))
> + {
> +   error_at (loc, "non-pointer argument to %<__builtin_launder%>");

Do we care about making this builtin SFINAE-friendly by guarding these
errors with tf_error?

> +   return error_mark_node;
> + }
> +  else if (!object_type_p (TREE_TYPE (type)))
> + {
> +   // std::launder is ill-formed for function and cv void pointers.
> +   error_at (loc, "invalid argument to %<__builtin_launder%>");
> +   return error_mark_node;
> + }
>  }
>if (processing_template_decl)
>  arg = orig_arg;
> diff --git a/gcc/testsuite/g++.dg/cpp1z/launder10.C 
> b/gcc/testsuite/g++.dg/cpp1z/launder10.C
> new file mode 100644
> index 000..7c15eeb891f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1z/launder10.C
> @@ -0,0 +1,15 @@
> +// PR c++/116673
> +// { dg-do compile }
> +
> +void
> +bar (void *p)
> +{
> +  __builtin_launder (bar); // { dg-error {invalid argument to 
> '__builtin_launder'} }
> +  __builtin_launder (p);   // { dg-error {invalid argument to 
> '__builtin_launder'} }
> +  const void* cp = p;
> +  __builtin_launder (cp);  // { dg-error {invalid argument to 
> '__builtin_launder'} }
> +  volatile void* vp = p;
> +  __builtin_launder (vp);  // { dg-error {invalid argument to 
> '__builtin_launder'} }
> +  const volatile void* cvp = p;
> +  __builtin_launder (cvp); // { dg-error {invalid argument to 
> '__builtin_launder'} }
> +}
> -- 
> 2.46.0
> 
>

[committed] i386: Implement SAT_ADD for signed vector integers

2024-09-12 Thread Uros Bizjak

Enable V4QI, V2QI and V2HI mode signed saturated arithmetic insn patterns
and add a couple of testcases to test for PADDSB and PADDSW instructions.

PR target/112600

gcc/ChangeLog:

* config/i386/mmx.md (3): Rename
from *3.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-3a.c: New test.
* gcc.target/i386/pr112600-3b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2f8d958dd5f..e88a06c441f 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3218,7 +3218,7 @@ (define_insn "*mmx_3"
(set_attr "type" "mmxadd,sseadd,sseadd")
(set_attr "mode" "DI,TI,TI")])
 
-(define_insn "*3"
+(define_insn "3"
   [(set (match_operand:VI_16_32 0 "register_operand" "=x,Yw")
 (sat_plusminus:VI_16_32
  (match_operand:VI_16_32 1 "register_operand" "0,Yw")
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-3a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-3a.c
new file mode 100644
index 000..0c38659643d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-3a.c
@@ -0,0 +1,25 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+#define MIN -128
+#define MAX 127
+
+typedef char T;
+typedef unsigned char UT;
+
+void foo (T *out, T *op_1, T *op_2, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+{
+  T x = op_1[i];
+  T y = op_2[i];
+  T sum = (UT) x + (UT) y;
+
+  out[i] = (x ^ y) < 0 ? sum : (sum ^ x) >= 0 ? sum : x < 0 ? MIN : MAX;
+}
+}
+
+/* { dg-final { scan-assembler "paddsb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-3b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-3b.c
new file mode 100644
index 000..746c422ceb9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-3b.c
@@ -0,0 +1,25 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+#define MIN -32768
+#define MAX 32767
+
+typedef short T;
+typedef unsigned short UT;
+
+void foo (T *out, T *op_1, T *op_2, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+{
+  T x = op_1[i];
+  T y = op_2[i];
+  T sum = (UT) x + (UT) y;
+
+  out[i] = (x ^ y) < 0 ? sum : (sum ^ x) >= 0 ? sum : x < 0 ? MIN : MAX;
+}
+}
+
+/* { dg-final { scan-assembler "paddsw" } } */

[PATCH v3] c++: deleting explicitly-defaulted functions [PR116162]

2024-09-12 Thread Marek Polacek

On Wed, Sep 11, 2024 at 10:25:34PM -0400, Jason Merrill wrote:
> On 9/11/24 4:08 PM, Marek Polacek wrote:
> > @@ -6503,10 +6504,17 @@ check_bases_and_members (tree t)
> > bool fn_const_p = (copy == 2);
> > if (fn_const_p && !imp_const_p)
> > - /* If the function is defaulted outside the class, we just
> > -give the synthesis error.  Core Issue #1331 says this is
> > -no longer ill-formed, it is defined as deleted instead.  */
> > - DECL_DELETED_FN (fn) = true;
> > + {
> > +   tree implicit_fn
> > + = implicitly_declare_fn (kind, DECL_CONTEXT (fn),
> > +  /*const_p=*/false,
> > +  /*pattern_fn=*/NULL_TREE,
> > +  /*inherited_parms=*/NULL_TREE);
> > +   /* If the function is defaulted outside the class, we just
> > +  give the synthesis error.  Core Issue #1331 says this is
> > +  no longer ill-formed, it is defined as deleted instead.  */
> > +   maybe_delete_defaulted_fn (fn, implicit_fn);
> > + }
> 
> Since we're about to call defaulted_late_check anyway, can we remove all the
> copy ctor handling here?  I don't want to call implicitly_declare_fn twice
> for the same function.

Yeh, I should've done that; it wasn't complicated in the end.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR points out the we're not implementing [dcl.fct.def.default]
properly.  Consider e.g.

  struct C {
 C(const C&&) = default;
  };

where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted.  There's an exception for
assignment operators in which case the program is ill-formed.

clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.  I'm also downgrading an error to a pedwarn
in C++17 since the code compiles in C++20.

PR c++/116162

gcc/c-family/ChangeLog:

* c.opt (Wdefaulted-function-deleted): New.

gcc/cp/ChangeLog:

* class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here,
leave it to defaulted_late_check.
* cp-tree.h (maybe_delete_defaulted_fn): Declare.
(defaulted_late_check): Add a tristate parameter.
* method.cc (maybe_delete_defaulted_fn): New.
(defaulted_late_check): Add a tristate parameter.  Call
maybe_delete_defaulted_fn instead of giving an error.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdefaulted-function-deleted.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/defaulted15.C: Add dg-warning/dg-error.
* g++.dg/cpp0x/defaulted51.C: Likewise.
* g++.dg/cpp0x/defaulted52.C: Likewise.
* g++.dg/cpp0x/defaulted53.C: Likewise.
* g++.dg/cpp0x/defaulted54.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted58.C: Likewise.
* g++.dg/cpp0x/defaulted59.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: New test.
* g++.dg/cpp0x/defaulted64.C: New test.
* g++.dg/cpp0x/defaulted65.C: New test.
* g++.dg/cpp23/defaulted1.C: New test.
---
 gcc/c-family/c.opt   |  4 ++
 gcc/cp/class.cc  | 27 ++--
 gcc/cp/cp-tree.h |  3 +-
 gcc/cp/method.cc | 88 +---
 gcc/doc/invoke.texi  |  9 +++
 gcc/testsuite/g++.dg/cpp0x/defaulted15.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted51.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted52.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted53.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted54.C |  2 +
 gcc/testsuite/g++.dg/cpp0x/defaulted56.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted57.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted58.C |  2 +
 gcc/testsuite/g++.dg/cpp0x/defaulted59.C |  3 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted63.C | 39 +++
 gcc/testsuite/g++.dg/cpp0x/defaulted64.C | 27 
 gcc/testsuite/g++.dg/cpp0x/defaulted65.C | 25 +++
 gcc/testsuite/g++.dg/cpp23/defaulted1.C  | 23 +++
 18 files changed, 236 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted63.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted64.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/defaulted65.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/defaulted1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ec23249c959..98a35f043c7 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -629,6 +629,10 @@ Wdeclaration-missing-parameter-type
 C ObjC Var(warn_declaration_missi

Re: [PATCH 1/2] c++: Make __builtin_launder reject invalid types [PR116673]

2024-09-12 Thread Jonathan Wakely

On Thu, 12 Sept 2024 at 19:38, Patrick Palka  wrote:
>
> On Thu, 12 Sep 2024, Jonathan Wakely wrote:
>
> > Tested x86_64-linux. OK for trunk?
> >
> > -- >8 --
> >
> > The standard says that std::launder is ill-formed for function pointers
> > and cv void pointers, so there's no reason for __builtin_launder to
> > accept them. This change allows implementations of std::launder to defer
> > to the built-in for error checking, although libstdc++ will continue to
> > diagnose it directly for more user-friendly diagnostics.
> >
> >   PR c++/116673
> >
> > gcc/cp/ChangeLog:
> >
> >   * semantics.cc (finish_builtin_launder): Diagnose function
> >   pointers and cv void pointers.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/cpp1z/launder10.C: New test.
> > ---
> >  gcc/cp/semantics.cc| 17 +
> >  gcc/testsuite/g++.dg/cpp1z/launder10.C | 15 +++
> >  2 files changed, 28 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp1z/launder10.C
> >
> > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> > index 63212afafb3..b194b01f865 100644
> > --- a/gcc/cp/semantics.cc
> > +++ b/gcc/cp/semantics.cc
> > @@ -13482,11 +13482,20 @@ finish_builtin_launder (location_t loc, tree arg, 
> > tsubst_flags_t complain)
> >  arg = decay_conversion (arg, complain);
> >if (error_operand_p (arg))
> >  return error_mark_node;
> > -  if (!type_dependent_expression_p (arg)
> > -  && !TYPE_PTR_P (TREE_TYPE (arg)))
> > +  if (!type_dependent_expression_p (arg))
> >  {
> > -  error_at (loc, "non-pointer argument to %<__builtin_launder%>");
> > -  return error_mark_node;
> > +  tree type = TREE_TYPE (arg);
> > +  if (!TYPE_PTR_P (type))
> > + {
> > +   error_at (loc, "non-pointer argument to %<__builtin_launder%>");
>
> Do we care about making this builtin SFINAE-friendly by guarding these
> errors with tf_error?

I don't think so. I don't think there's any use case for using
__builtin_launder (or std::launder for that matter) in deduction
contexts.

>
> > +   return error_mark_node;
> > + }
> > +  else if (!object_type_p (TREE_TYPE (type)))
> > + {
> > +   // std::launder is ill-formed for function and cv void pointers.
> > +   error_at (loc, "invalid argument to %<__builtin_launder%>");
> > +   return error_mark_node;
> > + }
> >  }
> >if (processing_template_decl)
> >  arg = orig_arg;
> > diff --git a/gcc/testsuite/g++.dg/cpp1z/launder10.C 
> > b/gcc/testsuite/g++.dg/cpp1z/launder10.C
> > new file mode 100644
> > index 000..7c15eeb891f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp1z/launder10.C
> > @@ -0,0 +1,15 @@
> > +// PR c++/116673
> > +// { dg-do compile }
> > +
> > +void
> > +bar (void *p)
> > +{
> > +  __builtin_launder (bar); // { dg-error {invalid argument to 
> > '__builtin_launder'} }
> > +  __builtin_launder (p);   // { dg-error {invalid argument to 
> > '__builtin_launder'} }
> > +  const void* cp = p;
> > +  __builtin_launder (cp);  // { dg-error {invalid argument to 
> > '__builtin_launder'} }
> > +  volatile void* vp = p;
> > +  __builtin_launder (vp);  // { dg-error {invalid argument to 
> > '__builtin_launder'} }
> > +  const volatile void* cvp = p;
> > +  __builtin_launder (cvp); // { dg-error {invalid argument to 
> > '__builtin_launder'} }
> > +}
> > --
> > 2.46.0
> >
> >
>

[patch, fortran, committed] module support for UNSIGNED

2024-09-12 Thread Thomas Koenig


Hello world,

I just pushed Steve's patch for module support to trunk as obvious, as
https://gcc.gnu.org/g:2847a541c1f19b67ae84be8d0f6dc8e1f9371d16 .

Best regards

Thomas

gcc/fortran/ChangeLog:

* module.cc (bt_types): Add BT_UNSIGNED.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_kiss.f90: New test.

diff --git a/gcc/fortran/module.cc b/gcc/fortran/module.cc
index c565b84d61b..8cf58ff5142 100644
--- a/gcc/fortran/module.cc
+++ b/gcc/fortran/module.cc
@@ -2781,6 +2781,7 @@ static const mstring bt_types[] = {
 minit ("UNKNOWN", BT_UNKNOWN),
 minit ("VOID", BT_VOID),
 minit ("ASSUMED", BT_ASSUMED),
+minit ("UNSIGNED", BT_UNSIGNED),
 minit (NULL, -1)
 };

diff --git a/gcc/testsuite/gfortran.dg/unsigned_kiss.f90 
b/gcc/testsuite/gfortran.dg/unsigned_kiss.f90

new file mode 100644
index 000..46ee86ccd26
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/unsigned_kiss.f90
@@ -0,0 +1,100 @@
+!
+! { dg-do run }
+! { dg-options "-funsigned" }
+!
+! Modern Fortran rewrite of Marsaglia's 64-bit KISS PRNG.
+! https://www.thecodingforums.com/threads/64-bit-kiss-rngs.673657/
+!
+module kissm
+
+   implicit none
+   private
+   public uk, kseed, kiss
+
+   integer, parameter :: uk = kind(1u_8)  ! Check kind() works.
+
+   ! Default seeds.  Checks unsigned with parameter attribute.
+   unsigned(uk), parameter :: seed(4) = [ &
+   &  1234567890987654321u_uk, 362436362436362436u_uk, &
+   &  1066149217761810u_uk, 123456123456123456u_uk ]
+
+   ! Seeds used during generation
+   unsigned(uk), save :: sd(4) = seed
+
+   contains
+
+  ! Tests unsigned in an internal function.
+  function s(x)
+ unsigned(uk) s
+ unsigned(uk), intent(in) :: x
+ s = ishft(x, -63)! Tests ishft
+  end function
+
+  ! Poor seeding routine.  Need to check v for entropy!
+  ! Tests intent(in) and optional attributes.
+  ! Tests ishftc() and array constructors.
+  subroutine kseed(v)
+ unsigned(uk), intent(in), optional :: v
+ if (present(v)) then
+sd = seed + [ishftc(v,1), ishftc(v,15), ishftc(v,31), 
ishftc(v,44)]

+ else
+sd = seed
+ end if
+  end subroutine kseed
+
+  function kiss()
+ unsigned(uk) kiss
+ unsigned(uk) m, t
+ integer k
+
+ ! Test unsigned in a statement function
+ m(t, k) = ieor(t, ishft(t, k))
+
+ t = ishft(sd(1), 58) + sd(4)
+ if (s(sd(1)) == s(t)) then
+sd(4) = ishft(sd(1), -6) + s(sd(1))
+ else
+sd(4) = ishft(sd(1), -6) + 1u_uk - s(sd(1) + t)
+ endif
+
+ sd(1) = t + sd(1)
+ sd(2) = m(m(m(sd(2), 13), -17), 43)
+ sd(3) = 6906969069u_uk * sd(3) + 1234567u_uk
+ kiss = sd(1) + sd(2) + sd(3)
+  end function kiss
+
+end module kissm
+
+program testkiss
+   use kissm
+   integer, parameter :: n = 4
+   unsigned(uk) prn(4)
+
+   ! Default sequence
+   unsigned(uk), parameter :: a(4) = [8932985056925012148u_uk, &
+   &  5710300428094272059u_uk, 18342510866933518593u_uk,   &
+   &  14303636270573868250u_uk]
+
+   ! Sequence with the seed 123412341234u_uk
+   unsigned(uk), parameter :: b(4) = [4002508872477953753u_uk, &
+   &  18025327658415290923u_uk,  16058856976144281263u_uk, &
+   &  11842224026193909403u_uk]
+
+   do i = 1, n
+  prn(i) = kiss()
+   end do
+   if (any(prn /= a)) stop 1
+
+   call kseed(123412341234u_uk)
+   do i = 1, n
+  prn(i) = kiss()
+   end do
+   if (any(prn /= b)) stop 2
+
+   call kseed()
+   do i = 1, n
+  prn(i) = kiss()
+   end do
+   if (any(prn /= a)) stop 3
+
+end program testkiss

[PATCH] libquadmath: Fix typos

2024-09-12 Thread Andrew Kreimer

Fix typos in documentation, comments, etc.

Signed-off-by: Andrew Kreimer 
---
 libquadmath/configure  | 2 +-
 libquadmath/math/rem_pio2q.c   | 2 +-
 libquadmath/printf/printf_fp.c | 2 +-
 libquadmath/update-quadmath.py | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libquadmath/configure b/libquadmath/configure
index 49d70809218..0f2ddeb133b 100755
--- a/libquadmath/configure
+++ b/libquadmath/configure
@@ -15597,7 +15597,7 @@ func_basename ()
 # to NONDIR_REPLACEMENT.
 # value returned in "$func_dirname_result"
 #   basename: Compute filename of FILE.
-# value retuned in "$func_basename_result"
+# value returned in "$func_basename_result"
 # Implementation must be kept synchronized with func_dirname
 # and func_basename. For efficiency, we do not delegate to
 # those functions but instead duplicate the functionality here.
diff --git a/libquadmath/math/rem_pio2q.c b/libquadmath/math/rem_pio2q.c
index 3308b218473..5835b8d3349 100644
--- a/libquadmath/math/rem_pio2q.c
+++ b/libquadmath/math/rem_pio2q.c
@@ -45,7 +45,7 @@
  * z= (z-x[i])*2**24
  *
  *
- * y[] ouput result in an array of double precision numbers.
+ * y[] output result in an array of double precision numbers.
  * The dimension of y[] is:
  * 24-bit  precision   1
  * 53-bit  precision   2
diff --git a/libquadmath/printf/printf_fp.c b/libquadmath/printf/printf_fp.c
index 9968aa5307c..3dd55f3c7f8 100644
--- a/libquadmath/printf/printf_fp.c
+++ b/libquadmath/printf/printf_fp.c
@@ -1195,7 +1195,7 @@ __quadmath_printf_fp (struct __quadmath_printf_file *fp,
 
  /* Now copy the wide character string.  Since the character
 (except for the decimal point and thousands separator) must
-be coming from the ASCII range we can esily convert the
+be coming from the ASCII range we can easily convert the
 string without mapping tables.  */
  for (cp = buffer, copywc = wstartp; copywc < wcp; ++copywc)
if (*copywc == decimalwc)
diff --git a/libquadmath/update-quadmath.py b/libquadmath/update-quadmath.py
index d40b2724dd3..27317ef92c1 100755
--- a/libquadmath/update-quadmath.py
+++ b/libquadmath/update-quadmath.py
@@ -90,7 +90,7 @@ def update_sources(glibc_srcdir, quadmath_srcdir):
   'GET_LDOUBLE_WORDS64', 'SET_LDOUBLE_LSW64',
   'SET_LDOUBLE_MSW64', 'SET_LDOUBLE_WORDS64'):
 repl_names[macro] = macro.replace('LDOUBLE', 'FLT128')
-# The classication macros are replaced.
+# The classification macros are replaced.
 for macro in ('FP_NAN', 'FP_INFINITE', 'FP_ZERO', 'FP_SUBNORMAL',
   'FP_NORMAL'):
 repl_names[macro] = 'QUAD' + macro
-- 
2.46.0

[committed] libstdc++: Remove unused alias template in std::optional

2024-09-12 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk.

-- >8 --

I added this __is_bool alias template in r15-2309-g6d86486292acbe but
it isn't actually used so can be removed.

libstdc++-v3/ChangeLog:

* include/std/optional (__is_bool): Remove.
---
 libstdc++-v3/include/std/optional | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 933a5b15e56..6a8e76f60e3 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -850,8 +850,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
using __not_self = __not_>>;
   template
using __not_tag = __not_>>;
-  template
-   using __is_bool = is_same, bool>;
   template
using _Requires = enable_if_t<__and_v<_Cond...>, bool>;
 #endif
-- 
2.46.0

[PATCH 1/2 v2] c++: Make __builtin_launder reject invalid types [PR116673]

2024-09-12 Thread Jonathan Wakely

On Thu, 12 Sept 2024 at 16:02, Jason Merrill wrote:
>
> On 9/12/24 4:49 AM, Jonathan Wakely wrote:
> > Tested x86_64-linux. OK for trunk?
> >
> > -- >8 --
> >
> > The standard says that std::launder is ill-formed for function pointers
> > and cv void pointers, so there's no reason for __builtin_launder to
> > accept them. This change allows implementations of std::launder to defer
> > to the built-in for error checking, although libstdc++ will continue to
> > diagnose it directly for more user-friendly diagnostics.
> >
> >   PR c++/116673
> >
> > gcc/cp/ChangeLog:
> >
> >   * semantics.cc (finish_builtin_launder): Diagnose function
> >   pointers and cv void pointers.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/cpp1z/launder10.C: New test.
> > ---
> >   gcc/cp/semantics.cc| 17 +
> >   gcc/testsuite/g++.dg/cpp1z/launder10.C | 15 +++
> >   2 files changed, 28 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1z/launder10.C
> >
> > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> > index 63212afafb3..b194b01f865 100644
> > --- a/gcc/cp/semantics.cc
> > +++ b/gcc/cp/semantics.cc
> > @@ -13482,11 +13482,20 @@ finish_builtin_launder (location_t loc, tree arg, 
> > tsubst_flags_t complain)
> >   arg = decay_conversion (arg, complain);
> > if (error_operand_p (arg))
> >   return error_mark_node;
> > -  if (!type_dependent_expression_p (arg)
> > -  && !TYPE_PTR_P (TREE_TYPE (arg)))
> > +  if (!type_dependent_expression_p (arg))
> >   {
> > -  error_at (loc, "non-pointer argument to %<__builtin_launder%>");
> > -  return error_mark_node;
> > +  tree type = TREE_TYPE (arg);
> > +  if (!TYPE_PTR_P (type))
> > + {
> > +   error_at (loc, "non-pointer argument to %<__builtin_launder%>");
> > +   return error_mark_node;
> > + }
> > +  else if (!object_type_p (TREE_TYPE (type)))
> > + {
> > +   // std::launder is ill-formed for function and cv void pointers.
> > +   error_at (loc, "invalid argument to %<__builtin_launder%>");
>
> Let's be more specific by combining both errors into
>
> "type %qT of argument to %<__builtin_launder"> is not a pointer to
> object type"
>
> The tests can also be combined to !TYPE_PTROB_P.
>
> OK with that change.

Thanks, here's what I pushed.
commit 9fe57e4879de93b6e3c7b4c226f42d5f3a48474f
Author: Jonathan Wakely 
Date:   Wed Sep 11 11:47:44 2024

c++: Make __builtin_launder reject invalid types [PR116673]

The standard says that std::launder is ill-formed for function pointers
and cv void pointers, so there's no reason for __builtin_launder to
accept them. This change allows implementations of std::launder to defer
to the built-in for error checking, although libstdc++ will continue to
diagnose it directly for more user-friendly diagnostics.

PR c++/116673

gcc/cp/ChangeLog:

* semantics.cc (finish_builtin_launder): Diagnose function
pointers and cv void pointers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/launder2.C: Adjust dg-error strings.
* g++.dg/cpp1z/launder10.C: New test.

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 63212afafb3..8219d6410b8 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -13482,10 +13482,10 @@ finish_builtin_launder (location_t loc, tree arg, 
tsubst_flags_t complain)
 arg = decay_conversion (arg, complain);
   if (error_operand_p (arg))
 return error_mark_node;
-  if (!type_dependent_expression_p (arg)
-  && !TYPE_PTR_P (TREE_TYPE (arg)))
+  if (!type_dependent_expression_p (arg) && !TYPE_PTROB_P (TREE_TYPE (arg)))
 {
-  error_at (loc, "non-pointer argument to %<__builtin_launder%>");
+  error_at (loc, "type %qT of argument to %<__builtin_launder%> "
+   "is not a pointer to object type", TREE_TYPE (arg));
   return error_mark_node;
 }
   if (processing_template_decl)
diff --git a/gcc/testsuite/g++.dg/cpp1z/launder10.C 
b/gcc/testsuite/g++.dg/cpp1z/launder10.C
new file mode 100644
index 000..2109a2e3839
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/launder10.C
@@ -0,0 +1,15 @@
+// PR c++/116673
+// { dg-do compile }
+
+void
+bar (void *p)
+{
+  __builtin_launder (bar); // { dg-error {argument to '__builtin_launder'} }
+  __builtin_launder (p);   // { dg-error {argument to '__builtin_launder'} }
+  const void* cp = p;
+  __builtin_launder (cp);  // { dg-error {argument to '__builtin_launder'} }
+  volatile void* vp = p;
+  __builtin_launder (vp);  // { dg-error {argument to '__builtin_launder'} }
+  const volatile void* cvp = p;
+  __builtin_launder (cvp); // { dg-error {argument to '__builtin_launder'} }
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/launder2.C 
b/gcc/testsuite/g++.dg/cpp1z/launder2.C
index 9cd1779704b..a2d44861265 100644
--- a/gcc/testsuite/g++.dg/cpp1z/launder2.C
+++ b/gcc/testsuite

[PATCH] libstdc++: Enable most of for freestanding

2024-09-12 Thread Jonathan Wakely

This restores support for most of  with -ffreestanding. In case
there are users who want a minimal freestanding implementation that only
provides what the standard guarantees, there's a new macro that disables
 again. This can be used to write more portable freestanding
code that doesn't rely on  being usable. As we add other things
to the freestanding subset (e.g.  for PR 113398 and  for
PR 109814) we can add other _GLIBCXX_NO_FREESTANDING_XXX macros, and a
_GLIBCXX_NO_FREESTANDING_EXTRAS to define all of them at once. I haven't
done that in this patch, because there's on the CHRONO one for now.

Tested x86_64-linux.

-- >8 --

This makes durations, time points and calendrical types available for
freestanding. The clocks and time zone utilities are disabled for
freestanding, as they require functions in the hosted lib.

Add support for a new macro _GLIBCXX_NO_FREESTANDING_CHRONO which can be
used to explicitly disable  for freestanding.

libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml (_GLIBCXX_NO_FREESTANDING_CHRONO):
Document macro.
* doc/html/*: Regenerate.
* include/bits/chrono.h [_GLIBCXX_NO_FREESTANDING_CHRONO]:
Only include  when this macro is defined.
[_GLIBCXX_HOSTED]: Only define clocks for hosted.
* include/bits/version.def (chrono_udls): Remove hosted=yes.
* include/bits/version.h: Regenerate.
* include/std/chrono [_GLIBCXX_HOSTED]: Only define clocks and
time zone utilities for hosted.
* testsuite/std/time/freestanding.cc: New test.
---
 .../doc/html/manual/using_macros.html |  7 +++
 libstdc++-v3/doc/xml/manual/using.xml | 12 +
 libstdc++-v3/include/bits/chrono.h| 24 ++---
 libstdc++-v3/include/bits/version.def |  1 -
 libstdc++-v3/include/bits/version.h   |  2 +-
 libstdc++-v3/include/std/chrono   | 24 +++--
 .../testsuite/std/time/freestanding.cc| 52 +++
 7 files changed, 109 insertions(+), 13 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/freestanding.cc

diff --git a/libstdc++-v3/doc/html/manual/using_macros.html 
b/libstdc++-v3/doc/html/manual/using_macros.html
index ae564692630..67623b5e2af 100644
--- a/libstdc++-v3/doc/html/manual/using_macros.html
+++ b/libstdc++-v3/doc/html/manual/using_macros.html
@@ -124,4 +124,11 @@
 must be present on all vector operations or none, so this macro must
 be defined to the same value for all translation units that create,
 destroy, or modify vectors.
+  _GLIBCXX_NO_FREESTANDING_CHRONO
+   Undefined by default. When defined, the
+    header cannot
+   be used with -ffreestanding.
+   When not defined, durations, time points, and calendar types are
+   available for freestanding, but the standard clocks and the time zone
+   database are not (because they require OS support).
   PrevÂ UpÂ NextHeadersÂ HomeÂ 
Dual ABI
\ No newline at end of file
diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 6675359f3b3..4e1c70040b5 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -1321,6 +1321,18 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
hello.cc -o test.exe
 destroy, or modify vectors.
   
 
+
+_GLIBCXX_NO_FREESTANDING_CHRONO
+
+  
+   Undefined by default. When defined, the
+    header cannot
+   be used with -ffreestanding.
+   When not defined, durations, time points, and calendar types are
+   available for freestanding, but the standard clocks and the time zone
+   database are not (because they require OS support).
+  
+
 
 
   
diff --git a/libstdc++-v3/include/bits/chrono.h 
b/libstdc++-v3/include/bits/chrono.h
index 0773867da71..fd9c4642f4f 100644
--- a/libstdc++-v3/include/bits/chrono.h
+++ b/libstdc++-v3/include/bits/chrono.h
@@ -37,7 +37,9 @@
 #include 
 #include 
 #include 
-#include 
+#if _GLIBCXX_HOSTED
+# include 
+#endif
 #include  // for literals support.
 #if __cplusplus >= 202002L
 # include 
@@ -50,7 +52,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-#if __cplusplus >= 201703L
+#if __cplusplus >= 201703L && _GLIBCXX_HOSTED
   namespace filesystem { struct __file_clock; };
 #endif
 
@@ -372,7 +374,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { };
 #endif // C++20
 
-#ifdef __glibcxx_chrono // C++ >= 17 && HOSTED
+#if __cplusplus >= 201703L // C++ >= 17
 /** Convert a `duration` to type `ToDur` and round down.
  *
  * If the duration cannot be represented exactly in the result type,
@@ -1196,6 +1198,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 /// @}
 /// @} group chrono
 
+#if _GLIBCXX_HOSTED
 // Clocks.
 
 // Why nanosecond resolution as the default?
@@ -1310,9 +1313,18 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 template<> inline constexpr

[PATCH] libstdc++: Refactor loops in std::__platform_semaphore

2024-09-12 Thread Jonathan Wakely

You might notice that this removes handling of EINVAL from the call to
sem_timedwait. That error can only happen with a negative ts_nsec value,
which can only happen for a timestamp before the epoch. We should handle
that properly, not just for the case where ts_nsec happens to be
negative. I opened PR 116586 for that, as it's bigger than just
.

Tested x86_64-linux.

-- >8 --

Refactor the loops to all use the same form, and to not need explicit
'break' or 'continue' jumps. This also avoids a -Wunused-variable
warning with -Wsystem-headers.

Also fix a bug for absolute timeouts specified with a time that isn't
implicitly convertible to __clock_t::time_point, e.g. one with a higher
resolution such as picoseconds. Use chrono::ceil to round up to the next
time point representable by the clock.

libstdc++-v3/ChangeLog:

* include/bits/semaphore_base.h (__platform_semaphore): Refactor
loops to all use similar forms.
(__platform_semaphore::_M_try_acquire_until): Use chrono::ceil
to explicitly convert to __clock_t::time_point.
* testsuite/30_threads/semaphore/try_acquire_for.cc: Check that
using a very high resolution timeout compiles.
* testsuite/30_threads/semaphore/platform_try_acquire_for.cc:
New test.
---
 libstdc++-v3/include/bits/semaphore_base.h| 58 +++
 .../semaphore/platform_try_acquire_for.cc |  7 +++
 .../30_threads/semaphore/try_acquire_for.cc   | 13 +
 3 files changed, 40 insertions(+), 38 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/30_threads/semaphore/platform_try_acquire_for.cc

diff --git a/libstdc++-v3/include/bits/semaphore_base.h 
b/libstdc++-v3/include/bits/semaphore_base.h
index 44a68645e47..2b19c9c6c6a 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -73,52 +73,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_ALWAYS_INLINE void
 _M_acquire() noexcept
 {
-  for (;;)
-   {
- auto __err = sem_wait(&_M_semaphore);
- if (__err && (errno == EINTR))
-   continue;
- else if (__err)
-   std::__terminate();
- else
-   break;
-   }
+  while (sem_wait(&_M_semaphore))
+   if (errno != EINTR)
+ std::__terminate();
 }
 
 _GLIBCXX_ALWAYS_INLINE bool
 _M_try_acquire() noexcept
 {
-  for (;;)
+  while (sem_trywait(&_M_semaphore))
{
- auto __err = sem_trywait(&_M_semaphore);
- if (__err && (errno == EINTR))
-   continue;
- else if (__err && (errno == EAGAIN))
+ if (errno == EAGAIN) // already locked
return false;
- else if (__err)
+ else if (errno != EINTR)
std::__terminate();
- else
-   break;
+ // else got EINTR so retry
}
   return true;
 }
 
 _GLIBCXX_ALWAYS_INLINE void
-_M_release(std::ptrdiff_t __update) noexcept
+_M_release(ptrdiff_t __update) noexcept
 {
   for(; __update != 0; --__update)
-   {
-  auto __err = sem_post(&_M_semaphore);
-  if (__err)
-std::__terminate();
-   }
+   if (sem_post(&_M_semaphore))
+ std::__terminate();
 }
 
 bool
 _M_try_acquire_until_impl(const chrono::time_point<__clock_t>& __atime)
   noexcept
 {
-
   auto __s = chrono::time_point_cast(__atime);
   auto __ns = chrono::duration_cast(__atime - __s);
 
@@ -128,19 +113,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static_cast(__ns.count())
   };
 
-  for (;;)
+  while (sem_timedwait(&_M_semaphore, &__ts))
{
- if (auto __err = sem_timedwait(&_M_semaphore, &__ts))
-   {
- if (errno == EINTR)
-   continue;
- else if (errno == ETIMEDOUT || errno == EINVAL)
-   return false;
- else
-   std::__terminate();
-   }
- else
-   break;
+ if (errno == ETIMEDOUT)
+   return false;
+ else if (errno != EINTR)
+   std::__terminate();
}
   return true;
 }
@@ -152,10 +130,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
if constexpr (std::is_same_v<__clock_t, _Clock>)
  {
-   return _M_try_acquire_until_impl(__atime);
+   using _Dur = __clock_t::duration;
+   return _M_try_acquire_until_impl(chrono::ceil<_Dur>(__atime));
  }
else
  {
+   // TODO: if _Clock is monotonic_clock we could use
+   // sem_clockwait with CLOCK_MONOTONIC.
+
const typename _Clock::time_point __c_entry = _Clock::now();
const auto __s_entry = __clock_t::now();
const auto __delta = __atime - __c_entry;
diff --git 
a/libstdc++-v3/testsuite/30_threads/semaphore/platform_try_acquire_for.cc 
b/libstdc++-v3/testsuite/30_threads/semaphore/platform_try_acquire_for.cc
new file mode 100644
i

[PATCH RFC] libstdc++: add #pragma diagnostic

2024-09-12 Thread Jason Merrill

Tested x86_64-pc-linux-gnu.  Thoughts about the remaining warnings discussed
below?  Any other comments?

-- 8< --

The use of #pragma GCC system_header in libstdc++ has led to bugs going
undetected for a while due to the silencing of compiler warnings that would
have revealed them promptly, and also interferes with warnings about
problematic template instantiations induced by user code.

But removing it, or even compiling with -Wsystem-header, is also problematic
due to warnings about deliberate uses of extensions.

So this patch adds #pragma GCC diagnostic as needed to suppress these
warnings.

The change to acinclude.m4 changes -Wabi to warn only in comparison to ABI
19, to avoid lots of warnings that we now mangle concept requirements, which
are in any case still experimental.

It also enables -Wsystem-headers while building the library, so we see any
warnings not silenced by these #pragmas.  After this patch, the remaining
warnings I see are:

ios_base_init.h:12: warning about reserved init_priority
I'll add a warning flag to silence this.

exception_ptr.h:155: warning about useless attribute((const)) on void fn.
Should we remove the attribute?

cxx11-ios_failure.cc:98: warning about overriding the two-parameter
__do_upcast that isn't overridden in __si_class_type_info, rather than the
three-parameter form that is overridden in __si_class_type_info.
I suppose this must be working, and changing the signature would be an ABI
break, so maybe we just want to silence the warning?
It's also unclear that -Woverloaded-virtuals=1 is supposed to warn when the
function is in fact an override; it warns in this case because it's
overriding a function that its direct base didn't also override.

I have a follow-up patch to #ifdef out all the #pragma system_header, but
want that to be considered separately.

libstdc++-v3/ChangeLog:

* include/bits/algorithmfwd.h:
* include/bits/allocator.h:
* include/bits/codecvt.h:
* include/bits/concept_check.h:
* include/bits/cpp_type_traits.h:
* include/bits/hashtable.h:
* include/bits/iterator_concepts.h:
* include/bits/ostream_insert.h:
* include/bits/ranges_base.h:
* include/bits/regex_automaton.h:
* include/bits/std_abs.h:
* include/bits/stl_algo.h:
* include/c_compatibility/fenv.h:
* include/c_compatibility/inttypes.h:
* include/c_compatibility/stdint.h:
* include/ext/concurrence.h:
* include/ext/type_traits.h:
* testsuite/ext/type_traits/add_unsigned_floating_neg.cc:
* testsuite/ext/type_traits/add_unsigned_integer_neg.cc:
* testsuite/ext/type_traits/remove_unsigned_floating_neg.cc:
* testsuite/ext/type_traits/remove_unsigned_integer_neg.cc:
* include/bits/basic_ios.tcc:
* include/bits/basic_string.tcc:
* include/bits/fstream.tcc:
* include/bits/istream.tcc:
* include/bits/locale_classes.tcc:
* include/bits/locale_facets.tcc:
* include/bits/ostream.tcc:
* include/bits/regex_compiler.tcc:
* include/bits/sstream.tcc:
* include/bits/streambuf.tcc:
* configure: Regenerate.
* include/bits/c++config:
* include/c/cassert:
* include/c/cctype:
* include/c/cerrno:
* include/c/cfloat:
* include/c/climits:
* include/c/clocale:
* include/c/cmath:
* include/c/csetjmp:
* include/c/csignal:
* include/c/cstdarg:
* include/c/cstddef:
* include/c/cstdio:
* include/c/cstdlib:
* include/c/cstring:
* include/c/ctime:
* include/c/cwchar:
* include/c/cwctype:
* include/c_global/climits:
* include/c_global/cmath:
* include/c_global/cstddef:
* include/c_global/cstdlib:
* include/decimal/decimal:
* include/ext/rope:
* include/std/any:
* include/std/charconv:
* include/std/complex:
* include/std/coroutine:
* include/std/format:
* include/std/iomanip:
* include/std/limits:
* include/std/numbers:
* include/tr1/functional:
* include/tr1/tuple:
* include/tr1/type_traits:
* libsupc++/compare:
* libsupc++/new: Add #pragma GCC diagnostic to suppress
undesired warnings.
* acinclude.m4: Change -Wabi version from 2 to 19.
---
 libstdc++-v3/include/bits/algorithmfwd.h | 5 +
 libstdc++-v3/include/bits/allocator.h| 4 
 libstdc++-v3/include/bits/codecvt.h  | 4 
 libstdc++-v3/include/bits/concept_check.h| 4 
 libstdc++-v3/include/bits/cpp_type_traits.h  | 5 +
 libstdc++-v3/include/bits/hashtable.h| 5 +
 libstdc++-v3/include/bits/iterator_concepts.h| 4 
 libstdc++-v3/include/bits/ostream_insert.h   | 4

[PATCH] libstdc++: Tweak localized formatting for floating-point types

2024-09-12 Thread Jonathan Wakely

This adds some comments to explain what this rather subtle code is
doing. It also replaces string::copy with using char_traits::copy
directly, because the bounds checks and length adjustments that
string::copy does are redundant here - we already ensure the lengths are
correct.

Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/format (__formatter_fp::_M_localize): Add comments
and micro-optimize string copy.
---
 libstdc++-v3/include/std/format | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 52243eb5479..e963d7f79b3 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1886,25 +1886,28 @@ namespace __format
if (__grp.empty() && __point == __dot)
  return __lstr; // Locale uses '.' and no grouping.
 
-   size_t __d = __str.find(__dot);
-   size_t __e = min(__d, __str.find(__exp));
+   size_t __d = __str.find(__dot); // Index of radix character (if any).
+   size_t __e = min(__d, __str.find(__exp)); // First of radix or exponent
if (__e == __str.npos)
  __e = __str.size();
-   const size_t __r = __str.size() - __e;
+   const size_t __r = __str.size() - __e; // Length of remainder.
auto __overwrite = [&](_CharT* __p, size_t) {
+ // Apply grouping to the digits before the radix or exponent.
  auto __end = std::__add_grouping(__p, __np.thousands_sep(),
   __grp.data(), __grp.size(),
   __str.data(), __str.data() + __e);
- if (__r)
+ if (__r) // If there's a fractional part or exponent
{
  if (__d != __str.npos)
{
- *__end = __point;
+ *__end = __point; // Add the locale's radix character.
  ++__end;
  ++__e;
}
- if (__r > 1)
-   __end += __str.copy(__end, __str.npos, __e);
+ const size_t __rlen = __str.size() - __e;
+ // Append fractional digits and/or exponent:
+ char_traits<_CharT>::copy(__end, __str.data() + __e, __rlen);
+ __end += __rlen;
}
  return (__end - __p);
};
-- 
2.46.0

[PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

2024-09-12 Thread Pengxuan Zheng

We can still use SVE's INDEX instruction to construct vectors even if not all
elements are constants. For example, { 0, x, 2, 3 } can be constructed by first
using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, and then set the elements which
are non-constants separately.

PR target/113328

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
Improve part-variable vector generation with SVE's INDEX if TARGET_SVE
is available.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
check-function-bodies.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_4.c: New test.
* gcc.target/aarch64/sve/vec_init_5.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64.cc | 81 ++-
 .../aarch64/sve/acle/general/dupq_1.c | 18 -
 .../aarch64/sve/acle/general/dupq_2.c | 18 -
 .../aarch64/sve/acle/general/dupq_3.c | 18 -
 .../aarch64/sve/acle/general/dupq_4.c | 18 -
 .../gcc.target/aarch64/sve/vec_init_4.c   | 47 +++
 .../gcc.target/aarch64/sve/vec_init_5.c   | 12 +++
 7 files changed, 199 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6b3ca57d0eb..7305a5c6375 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx target, rtx 
vals)
   if (n_var != n_elts)
 {
   rtx copy = copy_rtx (vals);
+  bool is_index_seq = false;
+
+  /* If at least half of the elements of the vector are constants and all
+these constant elements form a linear sequence of the form { B, B + S,
+B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's
+INDEX instruction if SVE is available and then set the elements which
+are not constant separately.  More precisely, each constant element I
+has to be B + I * S where B and S must be valid immediate operand for
+an SVE INDEX instruction.
+
+For example, { X, 1, 2, 3} is a vector satisfying these conditions and
+we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first
+and then set the first element of the vector to X.  */
+
+  if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+ && n_var <= n_elts / 2)
+   {
+ int const_idx = -1;
+ HOST_WIDE_INT const_val = 0;
+ int base = 16;
+ int step = 16;
+
+ for (int i = 0; i < n_elts; ++i)
+   {
+ rtx x = XVECEXP (vals, 0, i);
+
+ if (!CONST_INT_P (x))
+   continue;
+
+ if (const_idx == -1)
+   {
+ const_idx = i;
+ const_val = INTVAL (x);
+   }
+ else
+   {
+ if ((INTVAL (x) - const_val) % (i - const_idx) == 0)
+   {
+ HOST_WIDE_INT s
+ = (INTVAL (x) - const_val) / (i - const_idx);
+ if (s >= -16 && s <= 15)
+   {
+ int b = const_val - s * const_idx;
+ if (b >= -16 && b <= 15)
+   {
+ base = b;
+ step = s;
+   }
+   }
+   }
+ break;
+   }
+   }
+
+ if (base != 16
+ && (!CONST_INT_P (v0)
+ || (CONST_INT_P (v0) && INTVAL (v0) == base)))
+   {
+ if (!CONST_INT_P (v0))
+   XVECEXP (copy, 0, 0) = GEN_INT (base);
+
+ is_index_seq = true;
+ for (int i = 1; i < n_elts; ++i)
+   {
+ rtx x = XVECEXP (copy, 0, i);
+
+ if (CONST_INT_P (x))
+   {
+ if (INTVAL (x) != base + i * step)
+   {
+ is_index_seq = false;
+ break;
+   }
+   }
+ else
+   XVECEXP (copy, 0, i) = GEN_INT (base + i * step);
+   }
+   }
+   }
 
   /* Load constant part of vector.  We really don't care what goes into the
 parts we will overwrite, but we're more likely to be able to load the
 constant efficiently if it has fewer, larger, repeating parts
 (see aarch64_simd_valid_immediate).  */
-  for (in

RE: [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-12 Thread Pengxuan Zheng (QUIC)

> On Thu, Sep 12, 2024 at 2:53 AM Pengxuan Zheng
>  wrote:
> >
> > SVE's INDEX instruction can be used to populate vectors by values
> > starting from "base" and incremented by "step" for each subsequent
> > value. We can take advantage of it to generate vector constants if
> > TARGET_SVE is available and the base and step values are within [-16, 15].
> 
> Are there multiplication by or addition of scalar immediate instructions to
> enhance this with two-instruction sequences?

No, Richard, I can't think of any equivalent two-instruction sequences.

Thanks,
Pengxuan
> 
> > For example, with the following function:
> >
> > typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si
> > (void) {
> >   return (v4si){ 0, 1, 2, 3 };
> > }
> >
> > GCC currently generates:
> >
> > f_v4si:
> > adrpx0, .LC4
> > ldr q0, [x0, #:lo12:.LC4]
> > ret
> >
> > .LC4:
> > .word   0
> > .word   1
> > .word   2
> > .word   3
> >
> > With this patch, we generate an INDEX instruction instead if
> > TARGET_SVE is available.
> >
> > f_v4si:
> > index   z0.s, #0, #1
> > ret
> >
> > PR target/113328
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc (aarch64_simd_valid_immediate):
> Improve
> > handling of some ADVSIMD vectors by using SVE's INDEX if TARGET_SVE
> is
> > available.
> > (aarch64_output_simd_mov_immediate): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
> > SVE's INDEX instruction.
> > * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
> > * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
> > * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
> > * gcc.target/aarch64/sve/vec_init_3.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64.cc | 12 ++-
> >  .../aarch64/sve/acle/general/dupq_1.c |  3 +-
> >  .../aarch64/sve/acle/general/dupq_2.c |  3 +-
> >  .../aarch64/sve/acle/general/dupq_3.c |  3 +-
> >  .../aarch64/sve/acle/general/dupq_4.c |  3 +-
> >  .../gcc.target/aarch64/sve/vec_init_3.c   | 99 +++
> >  6 files changed, 114 insertions(+), 9 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_3.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 27e24ba70ab..6b3ca57d0eb 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22991,7 +22991,7 @@ aarch64_simd_valid_immediate (rtx op,
> simd_immediate_info *info,
> >if (CONST_VECTOR_P (op)
> >&& CONST_VECTOR_DUPLICATE_P (op))
> >  n_elts = CONST_VECTOR_NPATTERNS (op);
> > -  else if ((vec_flags & VEC_SVE_DATA)
> > +  else if (which == AARCH64_CHECK_MOV && TARGET_SVE
> >&& const_vec_series_p (op, &base, &step))
> >  {
> >gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT); @@
> > -25249,6 +25249,16 @@ aarch64_output_simd_mov_immediate (rtx
> > const_vector, unsigned width,
> >
> >if (which == AARCH64_CHECK_MOV)
> >  {
> > +  if (info.insn == simd_immediate_info::INDEX)
> > +   {
> > + gcc_assert (TARGET_SVE);
> > + snprintf (templ, sizeof (templ), "index\t%%Z0.%c, #"
> > +   HOST_WIDE_INT_PRINT_DEC ", #"
> HOST_WIDE_INT_PRINT_DEC,
> > +   element_char, INTVAL (info.u.index.base),
> > +   INTVAL (info.u.index.step));
> > + return templ;
> > +   }
> > +
> >mnemonic = info.insn == simd_immediate_info::MVN ? "mvni" : "movi";
> >shift_op = (info.u.mov.modifier == simd_immediate_info::MSL
> >   ? "msl" : "lsl");
> > diff --git
> > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > index 216699b0536..0940bedd0dd 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > @@ -10,7 +10,6 @@ dupq (int x)
> >return svdupq_s32 (x, 1, 2, 3);
> >  }
> >
> > -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> > +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #0, #1} } } */
> >  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
> >  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n}
> > } } */
> > -/* { dg-final { scan-assembler
> > {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */ diff --git
> > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> > index d494943a275..218a6601337 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_2.c
> > @@ -10,7 +10,6 @@ dupq

[PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-12 Thread Evgeny Karpov

Thursday, September 12, 2024
Martin Storsjö  wrote:

> This looks very reasonable - I presume this will make sure that you only
> use the other code form if the offset actually is larger than 1 MB.
>
> For the case when the offset actually is larger than 1 MB, I guess this
> also ends up generating some other instruction sequence than just a "add
> x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code,
> it looks like it generates something like "mov x16, #imm; add x0, x0,
> x16"? That's probably quite reasonable.

The generated code will stay unchanged for the offset less than 1MB:

adrp x0, symbol + offset
add x0, x0, :lo12:symbol + offset

When the offset is >= 1MB:

adrp x0, symbol + offset % (1 << 20) // it prevents relocation overflow in 
IMAGE_REL_ARM64_PAGEBASE_REL21
add x0, x0, (offset & ~0xf) >> 12, lsl #12 // a workaround to support 4GB 
offset
add x0, x0, :lo12:symbol + offset % (1 << 20)

Regards,
Evgeny

RE: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

2024-09-12 Thread Pengxuan Zheng (QUIC)

> > Pengxuan Zheng  writes:
> > > SVE's INDEX instruction can be used to populate vectors by values
> > > starting from "base" and incremented by "step" for each subsequent
> > > value. We can take advantage of it to generate vector constants if
> > > TARGET_SVE is available and the base and step values are within [-16, 15].
> > >
> > > For example, with the following function:
> > >
> > > typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si
> > > (void) {
> > >   return (v4si){ 0, 1, 2, 3 };
> > > }
> > >
> > > GCC currently generates:
> > >
> > > f_v4si:
> > >   adrpx0, .LC4
> > >   ldr q0, [x0, #:lo12:.LC4]
> > >   ret
> > >
> > > .LC4:
> > >   .word   0
> > >   .word   1
> > >   .word   2
> > >   .word   3
> > >
> > > With this patch, we generate an INDEX instruction instead if
> > > TARGET_SVE is available.
> > >
> > > f_v4si:
> > >   index   z0.s, #0, #1
> > >   ret
> > >
> > > [...]
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> > > b/gcc/config/aarch64/aarch64.cc index 9e12bd9711c..01bfb8c52e4
> > > 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op,
> > simd_immediate_info *info,
> > >if (CONST_VECTOR_P (op)
> > >&& CONST_VECTOR_DUPLICATE_P (op))
> > >  n_elts = CONST_VECTOR_NPATTERNS (op);
> > > -  else if ((vec_flags & VEC_SVE_DATA)
> > > -&& const_vec_series_p (op, &base, &step))
> > > +  else if (TARGET_SVE && const_vec_series_p (op, &base, &step))
> >
> > I think we need to check which == AARCH64_CHECK_MOV too.  (Previously
> > that wasn't necessary, because native SVE only uses this routine for
> > moves.)
> >
> > FTR: I was initially a bit nervous about testing TARGET_SVE without
> > looking at vec_flags at all.  But looking at the previous handling of
> > predicates and structures, I agree it looks like the correct thing to do.
> >
> > >  {
> > >gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> > >if (!aarch64_sve_index_immediate_p (base) [...] diff --git
> > > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > index 216699b0536..3d6a0160f95 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > > @@ -10,7 +10,6 @@ dupq (int x)
> > >return svdupq_s32 (x, 1, 2, 3);
> > >  }
> > >
> > > -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> > > +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */
> > >  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } }
> > > */
> > >  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q,
> > > z[0-9]+\.q\[0\]\n} } } */
> > > -/* { dg-final { scan-assembler
> > > {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */
> >
> > This seems to be a regression of sorts.  Previously we had:
> >
> > adrpx1, .LC0
> > ldr q0, [x1, #:lo12:.LC0]
> > ins v0.s[0], w0
> > dup z0.q, z0.q[0]
> >
> > whereas now we have:
> >
> > moviv0.2s, 0x2
> > index   z31.s, #1, #2
> > ins v0.s[0], w0
> > zip1v0.4s, v0.4s, v31.4s
> > dup z0.q, z0.q[0]
> >
> > I think we should try to aim for:
> >
> > index   z0.s, #0, #1
> > ins v0.s[0], w0
> > dup z0.q, z0.q[0]
> >
> > instead.
> 
> Thanks for the feedback, Richard!
> 
> I've added support to handle vectors with non-constant elements. I've split
> that change into a separate patch. Please let me know if you have any
> comments.
> 
> [PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX
> instruction [PR113328] https://gcc.gnu.org/pipermail/gcc-patches/2024-
> September/662842.html
> 
> [PATCH 2/2] aarch64: Improve part-variable vector initialization with SVE
> INDEX instruction [PR113328] https://gcc.gnu.org/pipermail/gcc-
> patches/2024-September/662843.html

Just updated [PATCH 2/2] to fix some issue in the test cases. Here's the latest 
patch:
[PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE 
INDEX instruction [PR113328]
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662925.html

Thanks,
Pengxuan
> 
> Thanks,
> Pengxuan
> >
> > > [...]
> > > +/*
> > > +** g_v4si:
> > > +**   index   z0\.s, #3, #\-4
> >
> > The backslash looks redundant here.
> >
> > Thanks,
> > Richard
> >
> > > +**   ret
> > > +*/
> > > +v4si
> > > +g_v4si (void)
> > > +{
> > > +  return (v4si){ 3, -1, -5, -9 };
> > > +}

Re: [PATCH v2] testsuite: introduce hostedlib effective target

2024-09-12 Thread Mike Stump

On Sep 3, 2024, at 11:44 PM, Alexandre Oliva  wrote:
> 
> Here's an updated and refreshed version that gets trunk built with
> --disable-hosted-libstdcxx on x86_64-linux-gnu to not get any spurious
> fails during in-tree testing.  Also bootstrapped on hosted
> x86_64-linux-gnu.  Ok to install?

Ok.

[PATCH v4 1/4] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-12 Thread pan2 . li

From: Pan Li 

When matching the cond with 2 args phi node, we need to figure out
which arg of phi node comes from the true edge of cond block, as
well as the false edge.  This patch would like to add interface
to perform the action and return the true and false arg in TREE type.

There will be some additional handling if one of the arg is INTEGER_CST.
Because the INTEGER_CST args may have no source block, thus its' edge
source points to the condition block.  See below example in line 31,
the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
For example, the _1(3) takes block 3 as source, which is the dest
of false edge of the condition block.

   4   │ __attribute__((noinline))
   5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
   6   │ {
   7   │   unsigned char _1;
   8   │   unsigned char _2;
   9   │   uint8_t _3;
  10   │   __complex__ unsigned char _5;
  11   │
  12   │ ;;   basic block 2, loop depth 0
  13   │ ;;pred:   ENTRY
  14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
  15   │   _2 = IMAGPART_EXPR <_5>;
  16   │   if (_2 != 0)
  17   │ goto ; [35.00%]
  18   │   else
  19   │ goto ; [65.00%]
  20   │ ;;succ:   3
  21   │ ;;4
  22   │
  23   │ ;;   basic block 3, loop depth 0
  24   │ ;;pred:   2
  25   │   _1 = REALPART_EXPR <_5>;
  26   │ ;;succ:   4
  27   │
  28   │ ;;   basic block 4, loop depth 0
  29   │ ;;pred:   2
  30   │ ;;3
  31   │   # _3 = PHI <255(2), _1(3)>
  32   │   return _3;
  33   │ ;;succ:   EXIT
  34   │
  35   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* gimple-match-head.cc (match_cond_with_binary_phi): Add new func
impl to match binary phi for true and false arg.

Signed-off-by: Pan Li 
---
 gcc/gimple-match-head.cc | 118 +++
 1 file changed, 118 insertions(+)

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 924d3f1e710..6e7a3a0d62e 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -375,3 +375,121 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool &wascmp, tree (*va
 return true;
   return false;
 }
+
+/*
+ * Return the relevant gcond * of the given phi, as well as the true
+ * and false TREE args of the phi.  Or return NULL.
+ *
+ * If matched the gcond *, the output argument TREE true_arg and false_arg
+ * will be updated to the relevant args of phi.
+ *
+ * If failed to match, NULL gcond * will be returned, as well as the output
+ * arguments will be set to NULL_TREE.
+ */
+
+static inline gcond *
+match_cond_with_binary_phi (gphi *phi, tree *true_arg, tree *false_arg)
+{
+  *true_arg = *false_arg = NULL_TREE;
+
+  if (gimple_phi_num_args (phi) != 2
+  || EDGE_COUNT (gimple_bb (phi)->preds) != 2)
+return NULL;
+
+  basic_block pred_0 = EDGE_PRED (gimple_bb (phi), 0)->src;
+  basic_block pred_1 = EDGE_PRED (gimple_bb (phi), 1)->src;
+  basic_block cond_block = NULL;
+
+  if ((EDGE_COUNT (pred_0->succs) == 2 && EDGE_COUNT (pred_1->succs) == 1)
+ || (EDGE_COUNT (pred_0->succs) == 1 && EDGE_COUNT (pred_1->succs) == 2))
+{
+  /* For below control flow graph:
+   *|
+   *v
+   * +--+
+   * | b0:  |
+   * | def  |   +-+
+   * | ...  |   | b1: |
+   * | cond |-->| def |
+   * +--+   | ... |
+   *|   +-+
+   *|  |
+   *v  |
+   * +-+   |
+   * | b2: |   |
+   * | def |<--+
+   * +-+
+   */
+  basic_block b0 = EDGE_COUNT (pred_0->succs) == 2 ? pred_0 : pred_1;
+  basic_block b1 = EDGE_COUNT (pred_0->succs) == 1 ? pred_0 : pred_1;
+
+  if (EDGE_COUNT (b1->preds) == 1 && EDGE_PRED (b1, 0)->src == b0)
+   cond_block = b0;
+}
+
+  if (EDGE_COUNT (pred_0->succs) == 1 && EDGE_COUNT (pred_0->preds) == 1
+  && EDGE_COUNT (pred_1->succs) == 1 && EDGE_COUNT (pred_1->preds) == 1)
+{
+  /* For below control flow graph:
+   *|
+   *v
+   * +--+
+   * | b0:  |
+   * | ...  |   +-+
+   * | cond |-->| b2: |
+   * +--+   | ... |
+   *|   +-+
+   *|  |
+   *v  |
+   * +-+   |
+   * | b1: |   |
+   * | ... |   |
+   * +-+   |
+   *|  |
+   *|  |
+   *v  |
+   * +-+   |
+   * | b3: |<--+
+   * | ... |
+   * +-+
+   */
+  basic_block b0 = EDGE_PRED (pred_0, 0)->src;
+
+  if (EDGE_COUNT (b0->succs) == 2 && EDGE_PRED (pred_1, 0)->src == b0)
+   cond_block = b0;
+}
+

[PATCH v4 2/4] Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi

2024-09-12 Thread pan2 . li

From: Pan Li 

This patch would like to leverage the match_cond_with_binary_phi to
match the phi on cond, and get the true/false arg if matched.  This
helps a lot to simplify the implementation of gen_phi_on_cond.

Before this patch:
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : _pb_1_1;
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
? _pb_1_1 : _pb_0_1;
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
&& EDGE_COUNT (_other_db_1->succs) == 1
&& EDGE_PRED (_other_db_1, 0)->src == _db_1)
{
  tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
  tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
  tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
_cond_lhs_1, _cond_rhs_1);
  bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
EDGE_TRUE_VALUE;
  tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
  tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
...

After this patch:
basic_block _b1 = gimple_bb (_a1);
tree _p1, _p2;
gcond *_cond_1 = match_cond_with_binary_phi (_a1, &_p1, &_p2);
if (_cond_1 && _p1 && _p2)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_cond_1);
tree _cond_rhs_1 = gimple_cond_rhs (_cond_1);
tree _p0 = build2 (gimple_cond_code (_cond_1), boolean_type_node, 
_cond_lhs_1, _cond_rhs_1);
...

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* genmatch.cc (dt_operand::gen_phi_on_cond): Leverage the
match_cond_with_binary_phi API to get cond gimple, true and
false TREE arg.

Signed-off-by: Pan Li 
---
 gcc/genmatch.cc | 67 +++--
 1 file changed, 15 insertions(+), 52 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a56bd90cb2c..e3d2ecc6266 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3516,79 +3516,42 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
depth)
 void
 dt_operand::gen_phi_on_cond (FILE *f, int indent, int depth)
 {
-  fprintf_indent (f, indent,
-"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
-
-  fprintf_indent (f, indent, "if (gimple_phi_num_args (_a%d) == 2)\n", depth);
+  char opname_0[20];
+  char opname_1[20];
+  char opname_2[20];
 
-  indent += 2;
-  fprintf_indent (f, indent, "{\n");
-  indent += 2;
+  gen_opname (opname_0, 0);
+  gen_opname (opname_1, 1);
+  gen_opname (opname_2, 2);
 
   fprintf_indent (f, indent,
-"basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth);
-  fprintf_indent (f, indent,
-"basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth);
-  fprintf_indent (f, indent,
-"basic_block _db_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d)) ? "
-"_pb_0_%d : _pb_1_%d;\n", depth, depth, depth, depth);
+"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
+  fprintf_indent (f, indent, "tree %s, %s;\n", opname_1, opname_2);
   fprintf_indent (f, indent,
-"basic_block _other_db_%d = safe_dyn_cast  "
-"(*gsi_last_bb (_pb_0_%d)) ? _pb_1_%d : _pb_0_%d;\n",
-depth, depth, depth, depth);
+"gcond *_cond_%d = match_cond_with_binary_phi (_a%d, &%s, &%s);\n",
+depth, depth, opname_1, opname_2);
 
-  fprintf_indent (f, indent,
-"gcond *_ct_%d = safe_dyn_cast  (*gsi_last_bb (_db_%d));\n",
-depth, depth);
-  fprintf_indent (f, indent, "if (_ct_%d"
-" && EDGE_COUNT (_other_db_%d->preds) == 1\n", depth, depth);
-  fprintf_indent (f, indent,
-"  && EDGE_COUNT (_other_db_%d->succs) == 1\n", depth);
-  fprintf_indent (f, indent,
-"  && EDGE_PRED (_other_db_%d, 0)->src == _db_%d)\n", depth, depth);
+  fprintf_indent (f, indent, "if (_cond_%d && %s && %s)\n",
+depth, opname_1, opname_2);
 
   indent += 2;
   fprintf_indent (f, indent, "{\n");
   indent += 2;
 
   fprintf_indent (f, indent,
-"tree _cond_lhs_%d = gimple_cond_lhs (_ct_%d);\n", depth, depth);
+"tree _cond_lhs_%d = gimple_cond_lhs (_cond_%d);\n", depth, depth);
   fprintf_indent (f, indent,
-"tree _cond_rhs_%d = gimple_cond_rhs (_ct_%d);\n", depth, depth);
-
-  char opname_0[20];
-  char opname_1[20];
-  char opname_2[20];
-  gen_opname (opname_0, 0);
-
+"tree _cond_rhs_%d = gimple_cond_rhs (_cond_%d);\n", depth, depth);
   fprintf_indent (f, indent,
-"tree %s = build2 (gimple_cond_code (_ct_%d), "
+"tree %s = build2 (gimple_cond_code (_cond_%d), "
 "boolean_type_node, _cond_lhs_%d, _cond_rhs_%d);\n",
 opname_0, depth, depth, depth);
 
-  fprintf_indent (f, indent,
-"bool _arg_0_is_true_%d = gimple_phi_arg_edge (_a%d, 0)->flags"
-" & EDGE_TRUE_VALUE;\n", depth, depth);
-
-  ge

[PATCH v4 3/4] Match: Support form 3 for scalar signed integer .SAT_ADD

2024-09-12 Thread pan2 . li

From: Pan Li 

This patch would like to support the form 3 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_3 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return overflow ? x < 0 ? MIN : MAX : sum;   \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _3 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the form 3 of signed .SAT_ADD matching.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4cef965c9c7..167b1b106dd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3237,6 +3237,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@2)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
 
+/* Signed saturation add, case 4:
+   Z = .ADD_OVERFLOW (X, Y)
+   SAT_S_ADD = IMAGPART_EXPR (Z) != 0 ? (-(T)(X < 0) ^ MAX) : sum;  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   (realpart @2))
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
-- 
2.43.0

[PATCH v4 4/4] RISC-V: Fix vector SAT_ADD dump check due to middle-end change

2024-09-12 Thread pan2 . li

From: Pan Li 

This patch would like fix the dump check times of vector SAT_ADD.  The
middle-end change makes the match times from 2 to 4 times.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Adjust
the dump check times from 2 to 4.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c| 2 +-
 16 files changed, 16 insertions(+), 16 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
index c525ba97c52..47dd5012cc6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint8_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
index 41372d08e52..df8d5a8d275 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint16_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
index dddebb54426..f286bd10e4b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint32_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
index ad5162d10a0..307ff36cc35 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint64_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c
index 39c20b3cea6..3218962724c 10064

Ping: [PATCH] testsuite/gcc.dg/pr84877.c: Add machinery to stabilize stack aligmnent

2024-09-12 Thread Hans-Peter Nilsson

Ping...

> From: Hans-Peter Nilsson 
> Date: Thu, 5 Sep 2024 17:44:52 +0200
> 
> Tested adding 0..more-than-four environment variables,
> running cris-sim+cris-elf.  I also checked that foo stays
> the same generated code regardless of the new code: this is
> not obviously true as foo is "just" noinline, not __noipa__.
> 
> Ok to commit?
> 
> -- >8 --
> This test awkwardly "blinks"; xfails and xpasses apparently
> randomly for cris-elf using the "gdb simulator".  On
> inspection, I see that the stack address depends on the
> number of environment variables, deliberately passed to the
> simulator, each adding the size of a pointer.
> 
> This test is IMHO important enough not to be just skipped
> just because it blinks (fixing the actual problem is a
> different task).
> 
> I guess a random non-16 stack-alignment could happen for
> other targets as well, so let's try and add a generic
> machinery to "stabilize" the test as failing, by allocating
> a dynamic amount to make sure it's misaligned.  The most
> target-dependent item here is an offset between the incoming
> stack-pointer value (within main in the added framework) and
> outgoing (within "xmain" as called from main when setting up
> the p0 parameter).  I know there are other wonderful stack
> shapes, but such targets would fall under the "complicated
> situations"-label and are no worse off than before.
> 
>   * gcc.dg/pr84877.c: Try to make the test result consistent by
>   misaligning the stack.
> ---
>  gcc/testsuite/gcc.dg/pr84877.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr84877.c b/gcc/testsuite/gcc.dg/pr84877.c
> index e82991f42dd4..2f2e29578df9 100644
> --- a/gcc/testsuite/gcc.dg/pr84877.c
> +++ b/gcc/testsuite/gcc.dg/pr84877.c
> @@ -3,6 +3,32 @@
>  
>  #include 
>  
> +#ifdef __CRIS__
> +#define OUTGOING_SP_OFFSET (-sizeof (void *))
> +/* Suggestion: append #elif defined() after this 
> comment,
> +   either defining OUTGOING_SP_OFFSET to whatever the pertinent amount is at 
> -O2,
> +   if that makes your target consistently fail this test, or define
> +   DO_NOT_TAMPER for more complicated situations.  Either way, compile with
> +   -DDO_NO_TAMPER to avoid any meddling.  */
> +#endif
> +
> +#if defined (OUTGOING_SP_OFFSET) && !defined (DO_NOT_TAMPER)
> +extern int xmain () __attribute__ ((__noipa__));
> +int main ()
> +{
> +  uintptr_t misalignment
> += (OUTGOING_SP_OFFSET
> ++ (15 & (uintptr_t) __builtin_stack_address ()));
> +  /* Allocate a minimal amount if the stack was accidentally aligned.  */
> +  void *q = __builtin_alloca (misalignment == 0);
> +  xmain ();
> +  /* Fake use to avoid the "allocation" being optimized out.  */
> +  asm volatile ("" : : "rm" (q));
> +  return 0;
> +}
> +#define main xmain
> +#endif
> +
>  struct U {
>  int M0;
>  int M1;
> -- 
> 2.30.2
>

[PATCH] testsuite: a few more hostedlib adjustments

2024-09-12 Thread Alexandre Oliva

On Sep 12, 2024, Mike Stump  wrote:

> On Sep 3, 2024, at 11:44 PM, Alexandre Oliva  wrote:
>> 
>> Here's an updated and refreshed version that gets trunk built with
>> --disable-hosted-libstdcxx on x86_64-linux-gnu to not get any spurious
>> fails during in-tree testing.  Also bootstrapped on hosted
>> x86_64-linux-gnu.  Ok to install?

> Ok.

Thanks.  There's more!

Regstrapped on x86_64-linux-gnu, also tested on the same platform with
--disable-hosted-libstdcxx.  Ok to install?


This adjusts some recently-added tests that won't compile without a
hostedlib libstdc++, missed in the patch that just went in, and also
an old test that I'd missed because it also failed in my baseline.


for  gcc/testsuite/ChangeLog

* g++.dg/coroutines/pr108620.C: Skip if !hostedlib because of
unavailable headers.
* g++.dg/other/profile1.C: Likewise.
* g+.dg/ext/pragma-unroll-lambda-lto.C: Skip if !hostedlib
because of unavailable declarations.
---
 gcc/testsuite/g++.dg/coroutines/pr108620.C |2 ++
 .../g++.dg/ext/pragma-unroll-lambda-lto.C  |1 +
 gcc/testsuite/g++.dg/other/profile1.C  |1 +
 3 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/g++.dg/coroutines/pr108620.C 
b/gcc/testsuite/g++.dg/coroutines/pr108620.C
index e8016b9f8a233..22bf0c18bac45 100644
--- a/gcc/testsuite/g++.dg/coroutines/pr108620.C
+++ b/gcc/testsuite/g++.dg/coroutines/pr108620.C
@@ -1,3 +1,5 @@
+// { dg-skip-if "requires hosted libstdc++ for iostream" { ! hostedlib } }
+
 // https://gcc.gnu.org/PR108620
 #include 
 #include 
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C 
b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
index 144c4c3269249..64cdf90f34d33 100644
--- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -1,5 +1,6 @@
 // { dg-do link { target c++11 } }
 // { dg-options "-O2 -flto -fdump-rtl-loop2_unroll" }
+// { dg-skip-if "requires hosted libstdc++ for cstdlib rand" { ! hostedlib } }
 
 #include 
 
diff --git a/gcc/testsuite/g++.dg/other/profile1.C 
b/gcc/testsuite/g++.dg/other/profile1.C
index a4bf6b3d0fea7..99844373189e0 100644
--- a/gcc/testsuite/g++.dg/other/profile1.C
+++ b/gcc/testsuite/g++.dg/other/profile1.C
@@ -2,6 +2,7 @@
 // { dg-do run }
 // { dg-require-profiling "" }
 // { dg-options "-fnon-call-exceptions -fprofile-arcs" }
+// { dg-skip-if "requires hosted libstdc++ for string" { ! hostedlib } }
 
 #include 
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH v1] RISC-V: Fix signed SAT_ADD test case for int64_t

2024-09-12 Thread pan2 . li

From: Pan Li 

The int8_t test for signed SAT_ADD is sat_s_add-1.c, the sat_s_add-4.c
should be for int64_t.  Thus, update sat_s_add-4.c for int64_t type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_s_add-4.c: Update test for int64_t
instead of int8_t.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_s_add-4.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-4.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-4.c
index f85675c1a05..12c9540eaec 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_s_add-4.c
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-4.c
@@ -5,26 +5,25 @@
 #include "sat_arith.h"
 
 /*
-** sat_s_add_int8_t_fmt_1:
+** sat_s_add_int64_t_fmt_1:
 ** add\s+[atx][0-9]+,\s*a0,\s*a1
 ** xor\s+[atx][0-9]+,\s*a0,\s*a1
 ** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
-** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7
-** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
 ** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
 ** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
-** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
 ** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
-** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*127
+** li\s+[atx][0-9]+,\s*-1
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** xor\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
 ** neg\s+[atx][0-9]+,\s*[atx][0-9]+
 ** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
 ** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
 ** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
 ** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
-** slliw\s+a0,\s*a0,\s*24
-** sraiw\s+a0,\s*a0,\s*24
 ** ret
 */
-DEF_SAT_S_ADD_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)
+DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
 
 /* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
-- 
2.43.0

[PATCH v1] RISC-V: Add testcases for form 2 of signed scalar SAT_ADD

2024-09-12 Thread pan2 . li

From: Pan Li 

This patch would like to add testcases of the signed scalar SAT_ADD
for form 2.  Aka:

Form 2:
  #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_add_##T##_fmt_2 (T x, T y) \
  {\
T sum = (UT)x + (UT)y; \
if ((x ^ y) < 0 || (sum ^ x) >= 0) \
  return sum;  \
return x < 0 ? MIN : MAX;  \
  }

DEF_SAT_S_ADD_FMT_2 (int64_t, uint64_t, INT64_MIN, INT64_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-5.c: New test.
* gcc.target/riscv/sat_s_add-6.c: New test.
* gcc.target/riscv/sat_s_add-7.c: New test.
* gcc.target/riscv/sat_s_add-8.c: New test.
* gcc.target/riscv/sat_s_add-run-5.c: New test.
* gcc.target/riscv/sat_s_add-run-6.c: New test.
* gcc.target/riscv/sat_s_add-run-7.c: New test.
* gcc.target/riscv/sat_s_add-run-8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 13 
 gcc/testsuite/gcc.target/riscv/sat_s_add-5.c  | 30 +
 gcc/testsuite/gcc.target/riscv/sat_s_add-6.c  | 32 +++
 gcc/testsuite/gcc.target/riscv/sat_s_add-7.c  | 31 ++
 gcc/testsuite/gcc.target/riscv/sat_s_add-8.c  | 29 +
 .../gcc.target/riscv/sat_s_add-run-5.c| 16 ++
 .../gcc.target/riscv/sat_s_add-run-6.c| 16 ++
 .../gcc.target/riscv/sat_s_add-run-7.c| 16 ++
 .../gcc.target/riscv/sat_s_add-run-8.c| 16 ++
 9 files changed, 199 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_add-run-8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index a8672f66322..b4fbf5dc662 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -132,9 +132,22 @@ sat_s_add_##T##_fmt_1 (T x, T y) \
 #define DEF_SAT_S_ADD_FMT_1_WRAP(T, UT, MIN, MAX) \
   DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX)
 
+#define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
+T __attribute__((noinline))  \
+sat_s_add_##T##_fmt_2 (T x, T y) \
+{\
+  T sum = (UT)x + (UT)y; \
+  if ((x ^ y) < 0 || (sum ^ x) >= 0) \
+return sum;  \
+  return x < 0 ? MIN : MAX;  \
+}
+
 #define RUN_SAT_S_ADD_FMT_1(T, x, y) sat_s_add_##T##_fmt_1(x, y)
 #define RUN_SAT_S_ADD_FMT_1_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_1(T, x, y)
 
+#define RUN_SAT_S_ADD_FMT_2(T, x, y) sat_s_add_##T##_fmt_2(x, y)
+#define RUN_SAT_S_ADD_FMT_2_WRAP(T, x, y) RUN_SAT_S_ADD_FMT_2(T, x, y)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_s_add-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_s_add-5.c
new file mode 100644
index 000..b644022eb4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_s_add-5.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_s_add_int8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*a1
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*7
+** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+[atx][0-9]+,\s*[atx][0-9]+,\s*63
+** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*127
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slliw\s+a0,\s*a0,\s*24
+** sraiw\s+a0,\s*a0,\s*24
+** ret
+*/
+DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)
+
+/* { dg-final { scan-rtl-dump-ti

[PATCH] s390: Fix AQ and AR constraints

2024-09-12 Thread Stefan Schulze Frielinghaus

Ensure for AQ and AR constraints that the resulting displacement after
adding any positive offset less than the size of the object being
referenced is still valid.

Bootstrapped and regtested on s390.  As approved by
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662865.html
I will push shortly.

gcc/ChangeLog:

* config/s390/s390.cc (s390_mem_constraint): Check displacement
for AQ and AR constraints.
---
 gcc/config/s390/s390.cc | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 7aea776da2f..ae1f369e19d 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3714,6 +3714,18 @@ s390_mem_constraint (const char *str, rtx op)
   if ((reload_completed || reload_in_progress)
  ? !offsettable_memref_p (op) : !offsettable_nonstrict_memref_p (op))
return 0;
+  /* offsettable_memref_p ensures only that any positive offset added to
+the address forms a valid general address.  For AQ and AR constraints
+we also have to verify that the resulting displacement after adding
+any positive offset less than the size of the object being referenced
+is still valid.  */
+  if (str[1] == 'Q' || str[1] == 'R')
+   {
+ int o = GET_MODE_SIZE (GET_MODE (op)) - 1;
+ rtx tmp = adjust_address (op, QImode, o);
+ if (!s390_check_qrst_address (str[1], XEXP (tmp, 0), true))
+   return 0;
+   }
   return s390_check_qrst_address (str[1], XEXP (op, 0), true);
 case 'B':
   /* Check for non-literal-pool variants of memory constraints.  */
-- 
2.45.2

[PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-12 Thread Stefan Schulze Frielinghaus

Bootstrapped and regtested on s390.  Approved offlist and as also
discussed offlist I went for removing format specifier %V.  This fixes

FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++23 execution test
FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++26 execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O3 -g  execution 
test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -Os  execution test
FAIL: gfortran.dg/pr96711.f90   -O0  execution test
FAIL: libffi.closures/nested_struct5.c -W -Wall -Wno-psabi -O2 output pattern 
test
FAIL: libphobos.phobos/std/algorithm/mutation.d execution test
FAIL: libphobos.phobos/std/conv.d execution test
FAIL: libphobos.phobos/std/internal/math/errorfunction.d execution test
FAIL: libphobos.phobos/std/variant.d execution test
FAIL: libphobos.phobos_shared/std/algorithm/mutation.d execution test
FAIL: libphobos.phobos_shared/std/conv.d execution test
FAIL: libphobos.phobos_shared/std/internal/math/errorfunction.d execution test
FAIL: libphobos.phobos_shared/std/variant.d execution test

I will push shortly.

-- >8 --

Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
survive register allocation.  This in turn leads to wrong register
renaming.  Keeping the current approach would mean we need two insns for
*tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
lines

(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
(unspec:DF [(match_operand:TF 1 "general_operand" "v")]
   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "#")

(define_insn "*tf_to_fprx2_0"
  [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
(unspec:DF [(match_operand:TF 1 "general_operand" "v")]
   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "vpdi\t%v0,%v1,%v0,1
  [(set_attr "op_type" "VRR")])

and similar for *tf_to_fprx2_1.  Note, pre register allocation operand 0
has mode FPRX2 and afterwards DF once subregs have been eliminated.

Since we always copy a whole vector register into a floating-point
register pair, another way to fix this is to merge *tf_to_fprx2_0 and
*tf_to_fprx2_1 into a single insn which means we don't have to use
subregs at all.  The downside of this is that the assembler template
contains two instructions, now.  The upside is that we don't have to
come up with some artificial insn before RA which might be more
readable/maintainable.  That is implemented by this patch.

In commit r11-4872-ge627cda5686592, the output operand specifier %V was
introduced which is used in tf_to_fprx2 only, now.  Instead of coming up
with its counterpart %F for floating-point registers, which would also
only be used in tf_to_fprx2, I print the operands directly.  This
renders %V unused which is why it is removed by this patch.

gcc/ChangeLog:

PR 115860
* config/s390/s390.cc (print_operand): Remove operand specifier
%V.
* config/s390

[patch, teststuite, Fortran, committed] Fix endianness issue on test case

2024-09-12 Thread Thomas Koenig


I just committed the fix for PR 116653 as obvious.

Unfortunately, I left out the description in the ChangeLog, I hope it
is clear enough.

Best regards

Thomas

https://gcc.gnu.org/g:5d9486c29938d79beb798dce1a5509da54fe8c9f

commit r15-3619-g5d9486c29938d79beb798dce1a5509da54fe8c9f
Author: Thomas Koenig 
Date:   Fri Sep 13 07:47:24 2024 +0200

Fix endianness issue on unsigned_21.f90.

gcc/testsuite/ChangeLog:

PR fortran/116653
* gfortran.dg/unsigned_21.f90:
* gfortran.dg/unsigned_21_be.f90: New test.

diff --git a/gcc/testsuite/gfortran.dg/unsigned_21.f90 
b/gcc/testsuite/gfortran.dg/unsigned_21.f90

index 23302c7eabe..c3f65a469dc 100644
--- a/gcc/testsuite/gfortran.dg/unsigned_21.f90
+++ b/gcc/testsuite/gfortran.dg/unsigned_21.f90
@@ -1,5 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-funsigned" }
+! { dg-require-effective-target le }
 program main
   integer :: i
   integer(2) :: j
diff --git a/gcc/testsuite/gfortran.dg/unsigned_21_be.f90 
b/gcc/testsuite/gfortran.dg/unsigned_21_be.f90

new file mode 100644
index 000..64fecd9cd4a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/unsigned_21_be.f90
@@ -0,0 +1,14 @@
+! { dg-do run }
+! { dg-options "-funsigned" }
+! { dg-require-effective-target be }
+program main
+  integer :: i
+  integer(2) :: j
+  unsigned :: u
+  i = -1
+  u = transfer(i,u)
+  if (u /= huge(u)) error stop 1
+  u = 4278058235u
+  j = transfer(u,j)
+  if (j /= -259) error stop 2
+end program main

[PATCH v1] Match: Remove unnecessary types_match for case 1 of signed SAT_ADD

2024-09-12 Thread pan2 . li

From: Pan Li 

Given all commutative binary operators requires types matching
for both operands.  Remove the types_match check for case 1 of
the signed SAT_ADD, because we have (bit_xor @0 @1), which ensure
the operands have the correct TREE type.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Remove the types_match check for signed SAT_ADD
case 1.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4cef965c9c7..5566c0e4c41 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3204,8 +3204,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
integer_zerop)
(bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
@2)
- (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
 
 /* Signed saturation add, case 2:
T sum = (T)((UT)X + (UT)Y)
-- 
2.43.0

[PATCH] Try fixing RISC-V .SELECT_VL with SLP

2024-09-12 Thread Richard Biener

The following simply removes a seemingly bogus guard.

* tree-vect-loop.cc (vect_analyze_loop_1): Remove SLP guard
from .SELECT_VL disabling.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index cc15492f6a0..378e7c560bd 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3078,7 +3078,7 @@ start_over:
   if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
  OPTIMIZE_FOR_SPEED)
  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
- && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
+ && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1
  && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
-- 
2.43.0

Re: [PATCH] JSON dumping for GENERIC trees

2024-09-12 Thread Thor Preimesberger

> There are three oddities I immediately notice:
>
> The PLUS_EXPR operands are in a array "operands" while the RETURN_EXPR
> "operand" or "child pointer" is refered to from "return_expr".  I think both 
> are
> tcc_expression trees and the operands are in exp.operands.  Ideally the
> JSON would more closely reflect that rather than possibly following the 
> "pretty"
> printing logic.

Ah - for the binary operator, operands may have been a poor choice of
words there.
There's an abstract nonsense definition that would not necessarily be
reasonable here.
Would it make more sense to dump it e.g. as
.
   "bin_operator": "+",
   "op0": {"addr": "0x7f8256bda360"
   ...}
   "op1": {"addr": ...}
.
I think there are some parts of the code that I wrote that don't have
the accessor used as their key when referring to a different node -
e.g. case PLACEHOLDER_EXPR. Would this be an issue?

> While the tree_code of a tree node is the most important bit (maybe besides of
> its address), the "tree code" attribute is after the locations (which
> are also quite
> verbose and distracting - at least when looking at raw JSON).  For locations
> one could honor TDF_LINENO and only dump those when using
> -fdump-tree-original-json-lineno.  I'd re-order "tree code" after "addr".

Sounds good - I'll implement dumping locs iff TDF_LINENO is enabled.

> The third issue is that above the tree node with address 0x7f8256a10c60
> (and its children) appear twice - while you maintain a splay tree and assign
> unique numbers the duplicate nodes are not output by reference?  I would
> suggest to use { "ref_addr" : "0x7f8256a10c60" } for the output of such
> reference for example.
>
> I'm not sure whether JSON allows different object kinds or if that's solely
> done by having a special attribute if that's needed.  With the above
> regular tree nodes would be "addr" and references be "ref_addr".  A recursive
> JSON structure like above is OK to look at in RAW, I'm not sure whether
> for automatic processing and for example translating to a graph a linear
> collection of nodes with references would be easier to handle.

I agree that it should be easier to process the JSON if the references have a
different key. Should be easy to implement.

> Few comments on the patch itself - the #include of tree-emit-json.h from
> dumpfile.cc doesn't seem to be necessary.  Since you declare
> dump_node_json in dumpfile.h it should be possible to elide the header
> and put the contents into the tree-emit-json.cc file.
>
> Another #include is duplicated (and also looks unnecessary).
>

All fixed now on my working tree.

> I know you have some crude JSON -> html translation script written in
> python - can you share that as well?  I'd suggest to post separate from
> this main patch, adding it to contrib/.

Sure - let me get the fixes suggested in this email done since it'll
change (and simplify) the logic a bit.

> Can we solve the multi-function issue somehow?  I know we have some
> abstraction for a dump file, we'd need a hook that's invoked on opening
> and closing to emit a something there - I guess it would be even OK to
> hard-code this into dumpfile.cc for the -JSON dump variant.  It might
> be possible to register dump specific data with that object and get
> to the "current" dump file in dump_node_json so the splay-tree could
> be kept live and the allocations released on dump-file close?  Again,
> two hard-coded hooks from dumpfile.cc at open/close time into
> the JSON dumping for this might be feasible and track the global state
> with global variables.  That's to allow references to global objects and
> types streamed in a previous function context.

If the multi-function issue is that the dump pass currently produces
a series of JSON objects rather than a single one - I think what you're
suggesting is essentially done by optrecord_json_writer, for
-fsave-optimization-record. One approach I have in my head
is for, let's call it a tree_json_writer, to hold a
json::array, append each node we traverse, and then
flush this array to the dumpfile at the end.

This would also enable a way to address what you brought
up at the very end.

(In the python script I have written up, I just call the bash command
I posted in the first email to turn the output into a single JSON object.
I don't expect that it's really possible to call sed from within gcc.)

Best,
Thor

On Thu, Sep 12, 2024 at 7:14 AM Richard Biener
 wrote:
>
> On Thu, Sep 12, 2024 at 12:51 PM David Malcolm  wrote:
> >
> > On Wed, 2024-09-11 at 20:49 -0500, tcpreimesber...@gmail.com wrote:
> > > From: Thor C Preimesberger 
> > >
> > > This patch allows the compiler to dump GENERIC trees as JSON objects.
> > >
> > > The dump flag -fdump-tree-original-json dumps each fndecl node in the
> > > C frontend's gimplifier as a JSON object and traverses related nodes
> > > in an analagous manner as to raw-dumping.
> >
> > Thanks for posting this patch.
> >
> > Are you able to upload somewhe

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-12 Thread Martin Storsjö


On Thu, 12 Sep 2024, Evgeny Karpov wrote:


Thursday, September 12, 2024
Martin Storsjö  wrote:


This looks very reasonable - I presume this will make sure that you only
use the other code form if the offset actually is larger than 1 MB.

For the case when the offset actually is larger than 1 MB, I guess this
also ends up generating some other instruction sequence than just a "add
x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code,
it looks like it generates something like "mov x16, #imm; add x0, x0,
x16"? That's probably quite reasonable.


The generated code will stay unchanged for the offset less than 1MB:

adrp x0, symbol + offset
add x0, x0, :lo12:symbol + offset

When the offset is >= 1MB:

adrp x0, symbol + offset % (1 << 20) // it prevents relocation overflow in 
IMAGE_REL_ARM64_PAGEBASE_REL21
add x0, x0, (offset & ~0xf) >> 12, lsl #12 // a workaround to support 4GB 
offset
add x0, x0, :lo12:symbol + offset % (1 << 20)


Ah, I see. Yeah, that works.

That won't get you up to a full 4 GB offset from your symbol though, I 
think that'll get you up to 16 MB offsets. In the "add x0, x0, #imm, lsl 
#12" case, the immediate is a 12 bit immediate, shifted left by 12, so you 
effectively have 24 bit range there. But clearly this works a bit further 
than 1 MB at least.


// Martin

73 matches

Mail list logo