[PATCH v13 0/4] c: Add __lengthof__ operator

2024-10-02 Thread Alejandro Colomar
Hi!

This operator is as voted in a WG14 meeting yesterday, with the only
difference that we name it __lengthof__ instead of _Lengthof, to be able
to add it without being bound by ISO bureaucracy.

No semantic changes since v12; only the rename, according to what WG14
preferred.  WG14 agreed on the semantic changes of the operator as I
implemented them in v12.

Changes since v12:

-  Rename s/__nelementsof__/__lengthof__/
-  Fix typo in documentation.

Below is a range diff against v12.

Have a lovely day!
Alex


Alejandro Colomar (4):
  contrib/: Add support for Cc: and Link: tags
  gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
  Merge definitions of array_type_nelts_top()
  c: Add __lengthof__ operator

 contrib/gcc-changelog/git_commit.py|   5 +-
 gcc/c-family/c-common.cc   |  26 
 gcc/c-family/c-common.def  |   3 +
 gcc/c-family/c-common.h|   2 +
 gcc/c/c-decl.cc|  32 +++--
 gcc/c/c-fold.cc|   7 +-
 gcc/c/c-parser.cc  |  62 +++--
 gcc/c/c-tree.h |   4 +
 gcc/c/c-typeck.cc  | 118 +++-
 gcc/config/aarch64/aarch64.cc  |   2 +-
 gcc/config/i386/i386.cc|   2 +-
 gcc/cp/cp-tree.h   |   1 -
 gcc/cp/decl.cc |   2 +-
 gcc/cp/init.cc |   8 +-
 gcc/cp/lambda.cc   |   3 +-
 gcc/cp/operators.def   |   1 +
 gcc/cp/tree.cc |  13 --
 gcc/doc/extend.texi|  30 +
 gcc/expr.cc|   8 +-
 gcc/fortran/trans-array.cc |   2 +-
 gcc/fortran/trans-openmp.cc|   4 +-
 gcc/rust/backend/rust-tree.cc  |  13 --
 gcc/rust/backend/rust-tree.h   |   2 -
 gcc/target.h   |   3 +
 gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
 gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
 gcc/testsuite/gcc.dg/nelementsof.c | 150 +
 gcc/tree.cc|  17 ++-
 gcc/tree.h |   3 +-
 29 files changed, 604 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c

Range-diff against v12:
1:  d7fca49888a = 1:  d7fca49888a contrib/: Add support for Cc: and Link: tags
2:  e65245ac294 = 2:  e65245ac294 gcc/: Rename array_type_nelts() => 
array_type_nelts_minus_one()
3:  03de2d67bb1 = 3:  03de2d67bb1 Merge definitions of array_type_nelts_top()
4:  4373c48205d ! 4:  f635871da1f c: Add __nelementsof__ operator
@@ Metadata
 Author: Alejandro Colomar 
 
  ## Commit message ##
-c: Add __nelementsof__ operator
+c: Add __lengthof__ operator
 
 This operator is similar to sizeof but can only be applied to an array,
 and returns its number of elements.
@@ Commit message
 
 gcc/ChangeLog:
 
-* doc/extend.texi: Document __nelementsof__ operator.
-* target.h (enum type_context_kind): Add __nelementsof__ 
operator.
+* doc/extend.texi: Document __lengthof__ operator.
+* target.h (enum type_context_kind): Add __lengthof__ operator.
 
 gcc/c-family/ChangeLog:
 
 * c-common.h
 * c-common.def:
-* c-common.cc (c_nelementsof_type): Add __nelementsof__ 
operator.
+* c-common.cc (c_lengthof_type): Add __lengthof__ operator.
 
 gcc/c/ChangeLog:
 
 * c-tree.h
-(c_expr_nelementsof_expr, c_expr_nelementsof_type)
+(c_expr_lengthof_expr, c_expr_lengthof_type)
 * c-decl.cc
 (start_struct, finish_struct)
 (start_enum, finish_enum)
 * c-parser.cc
 (c_parser_sizeof_expression)
-(c_parser_nelementsof_expression)
-(c_parser_sizeof_or_nelementsof_expression)
+(c_parser_lengthof_expression)
+(c_parser_sizeof_or_lengthof_expression)
 (c_parser_unary_expression)
 * c-typeck.cc
 (build_external_ref)
 (record_maybe_used_decl, pop_maybe_used)
 (is_top_array_vla)
-(c_expr_nelementsof_expr, c_expr_nelementsof_type):
-Add __nelementsof__operator.
+(c_expr_lengthof_expr, c_expr_lengthof_type):
+Add __lengthof__operator.
 
 gcc/cp/ChangeLog:
 
-* operators.def: Add __nelementsof__ operator.
+* opera

[PATCH v13 4/4] c: Add __lengthof__ operator

2024-10-02 Thread Alejandro Colomar
This operator is similar to sizeof but can only be applied to an array,
and returns its number of elements.

FUTURE DIRECTIONS:

-  We should make it work with array parameters to functions,
   and somehow magically return the number of elements of the array,
   regardless of it being really a pointer.

-  Fix support for [0].

gcc/ChangeLog:

* doc/extend.texi: Document __lengthof__ operator.
* target.h (enum type_context_kind): Add __lengthof__ operator.

gcc/c-family/ChangeLog:

* c-common.h
* c-common.def:
* c-common.cc (c_lengthof_type): Add __lengthof__ operator.

gcc/c/ChangeLog:

* c-tree.h
(c_expr_lengthof_expr, c_expr_lengthof_type)
* c-decl.cc
(start_struct, finish_struct)
(start_enum, finish_enum)
* c-parser.cc
(c_parser_sizeof_expression)
(c_parser_lengthof_expression)
(c_parser_sizeof_or_lengthof_expression)
(c_parser_unary_expression)
* c-typeck.cc
(build_external_ref)
(record_maybe_used_decl, pop_maybe_used)
(is_top_array_vla)
(c_expr_lengthof_expr, c_expr_lengthof_type):
Add __lengthof__operator.

gcc/cp/ChangeLog:

* operators.def: Add __lengthof__ operator.

gcc/testsuite/ChangeLog:

* gcc.dg/lengthof-compile.c
* gcc.dg/lengthof-vla.c
* gcc.dg/lengthof.c: Add tests for __lengthof__ operator.

Link: 
Link: 
Link: 
Cc: Joseph Myers 
Cc: Gabriel Ravier 
Cc: Jakub Jelinek 
Cc: Kees Cook 
Cc: Qing Zhao 
Cc: Jens Gustedt 
Cc: David Brown 
Cc: Florian Weimer 
Cc: Andreas Schwab 
Cc: Timm Baeder 
Cc: Daniel Plakosh 
Cc: "A. Jiang" 
Cc: Eugene Zelenko 
Cc: Aaron Ballman 
Cc: Paul Koning 
Cc: Daniel Lundin 
Cc: Nikolaos Strimpas 
Cc: JeanHeyd Meneide 
Cc: Fernando Borretti 
Cc: Jonathan Protzenko 
Cc: Chris Bazley 
Cc: Ville Voutilainen 
Cc: Alex Celeste 
Cc: Jakub Łukasiewicz 
Cc: Douglas McIlroy 
Cc: Jason Merrill 
Suggested-by: Xavier Del Campo Romero 
Co-authored-by: Martin Uecker 
Signed-off-by: Alejandro Colomar 
---
 gcc/c-family/c-common.cc   |  26 
 gcc/c-family/c-common.def  |   3 +
 gcc/c-family/c-common.h|   2 +
 gcc/c/c-decl.cc|  22 ++-
 gcc/c/c-parser.cc  |  62 +++--
 gcc/c/c-tree.h |   4 +
 gcc/c/c-typeck.cc  | 118 +++-
 gcc/cp/operators.def   |   1 +
 gcc/doc/extend.texi|  30 +
 gcc/target.h   |   3 +
 gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
 gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
 gcc/testsuite/gcc.dg/nelementsof.c | 150 +
 13 files changed, 558 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e7e371fd26f..13704a55669 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -465,6 +465,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__inline",RID_INLINE, 0 },
   { "__inline__",  RID_INLINE, 0 },
   { "__label__",   RID_LABEL,  0 },
+  { "__lengthof__",RID_LENGTHOF,   0 },
   { "__null",  RID_NULL,   0 },
   { "__real",  RID_REALPART,   0 },
   { "__real__",RID_REALPART,   0 },
@@ -4070,6 +4071,31 @@ c_alignof_expr (location_t loc, tree expr)
 
   return fold_convert_loc (loc, size_type_node, t);
 }
+
+/* Implement the lementsof keyword:
+   Return the number of elements of an array.  */
+
+tree
+c_lengthof_type (location_t loc, tree type)
+{
+  enum tree_code type_code;
+
+  type_code = TREE_CODE (type);
+  if (type_code != ARRAY_TYPE)
+{
+  error_at (loc, "invalid application of % to type %qT", type);
+  return error_mark_node;
+}
+  if (!COMPLETE_TYPE_P (type))
+{
+  error_at (loc,
+   "invalid application of % to incomplete type %qT",
+   type);
+  return error_mark_node;
+}
+
+  return array_type_nelts_top (type);
+}
 
 /* Handle C and C++ default attributes.  */
 
diff --git a/gcc/c-family/c-common.def b/gcc/c-family/c-common.def
index 5de96e5d4a8..6d162f67104 100644
--- a/gcc/c-family/c-common.def
+++ b/gcc/c-family/c-common.def
@@ -50,6 +50,9 @@ DEFTREECODE (EXCESS_PRECISION_EXPR, "excess_precision_expr", 
tcc_expression, 1)
number.  */
 DEFTREECODE (USERDEF_LITERAL, "userdef_literal", tcc_exceptional, 3)
 
+/* Represents a 'lengthof' expression.  */
+DEFTREECODE (LENGTHOF_EXPR, "lengthof_expr", tc

[PATCH] libstdc++: Add __gnu_cxx::__alloc_traits::_Destroy_static_type [PR110057]

2024-10-02 Thread Jonathan Wakely
We could also use this in all containers (and node handles) and in
the control block created by std::allocate_shared.

To use it for containers we will need to decide whether to change the
non-standard std::_Destroy(FwdIter, FwdIter, Allocator&) to use this, or
whether to add a new _Destroy_static function and make the containers
use that instead.

For this RFC patch, I've only used it in a few places in std::vector
where a single object is created/destroyed directly, so not when
destroying multiple elements using std::_Destroy.

Tested x86_64-linux.

-- >8 --

Add a version of allocator_traits::destroy that can be used when the
static type is known, so that we can make an explicit destructor call
that avoids virtual lookup.

This is only possible when the allocator doesn't provide a destroy
member, so that we know we're going to invoke the destructor directly.
In containers like std::vector we know that we're never destroying
elements through a pointer to base, because the dynamic type is known to
be the same as the static type.

libstdc++-v3/ChangeLog:

PR libstdc++/110057
* include/bits/stl_vector.h: Use _Destroy_static_type instead of
destroy.
* include/bits/vector.tcc: Likewise.
* include/ext/alloc_traits.h (__alloc_traits::_Destroy_static_type):
New function template.
---
 libstdc++-v3/include/bits/stl_vector.h  |  5 ++--
 libstdc++-v3/include/bits/vector.tcc|  3 ++-
 libstdc++-v3/include/ext/alloc_traits.h | 32 -
 3 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index e284536ad31..f7451fa9e18 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1325,7 +1325,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   {
__glibcxx_requires_nonempty();
--this->_M_impl._M_finish;
-   _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+   _Alloc_traits::_Destroy_static_type(this->_M_impl,
+   this->_M_impl._M_finish);
_GLIBCXX_ASAN_ANNOTATE_SHRINK(1);
   }
 
@@ -1870,7 +1871,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
_GLIBCXX20_CONSTEXPR
~_Temporary_value()
-   { _Alloc_traits::destroy(_M_this->_M_impl, _M_ptr()); }
+   { _Alloc_traits::_Destroy_static_type(_M_this->_M_impl, _M_ptr()); }
 
_GLIBCXX20_CONSTEXPR value_type&
_M_val() noexcept { return _M_storage._M_val; }
diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index a99a5b56b77..660a8b4e414 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -184,7 +184,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   if (__position + 1 != end())
_GLIBCXX_MOVE3(__position + 1, end(), __position);
   --this->_M_impl._M_finish;
-  _Alloc_traits::destroy(this->_M_impl, this->_M_impl._M_finish);
+  _Alloc_traits::_Destroy_static_type(this->_M_impl,
+ this->_M_impl._M_finish);
   _GLIBCXX_ASAN_ANNOTATE_SHRINK(1);
   return __position;
 }
diff --git a/libstdc++-v3/include/ext/alloc_traits.h 
b/libstdc++-v3/include/ext/alloc_traits.h
index d2560531bac..17caaee2715 100644
--- a/libstdc++-v3/include/ext/alloc_traits.h
+++ b/libstdc++-v3/include/ext/alloc_traits.h
@@ -33,7 +33,7 @@
 #pragma GCC system_header
 #endif
 
-# include 
+#include 
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
@@ -95,6 +95,30 @@ template
   noexcept(noexcept(_Base_type::destroy(__a, std::__to_address(__p
   { _Base_type::destroy(__a, std::__to_address(__p)); }
 
+// Equivalent to `destroy` except that when `a.destroy(p)` is not valid,
+// the destructor will be called using a qualified name, so that no
+// dynamic dispatch to a virtual destructor is done. This can be used
+// in e.g. std::vector where we know that the elements do not have a
+// dynamic type that is different from the static type.
+template
+  [[__gnu__::__always_inline__]]
+  static _GLIBCXX20_CONSTEXPR void
+  _Destroy_static_type(_Alloc& __a, _Ptr __p)
+  {
+#if __cpp_concepts
+   auto __ptr = std::__to_address(__p);
+   if constexpr (requires { __a.destroy(__ptr); })
+ __a.destroy(__ptr);
+   else
+ {
+   using _Tp = std::remove_pointer_t;
+   __ptr->_Tp::~_Tp();
+ }
+#else
+   destroy(__a, __p);
+#endif
+  }
+
 [[__gnu__::__always_inline__]]
 static constexpr _Alloc _S_select_on_copy(const _Alloc& __a)
 { return _Base_type::select_on_container_copy_construction(__a); }
@@ -178,6 +202,12 @@ template
 template
   struct rebind
   { typedef typename _Alloc::template rebind<_Tp>::other other; };
+
+template
+  __attribute__((__always_inline__))
+  static void
+  _Destroy_static_type(_Alloc& __a

[PATCH] libstdc++: Enable _GLIBCXX_ASSERTIONS by default for -O0 [PR112808]

2024-10-02 Thread Jonathan Wakely
I think we should do this.

Tested x86_64-linux.

-- >8 --

Too many users don't know about -D_GLIBCXX_ASSERTIONS and so are missing
valuable checks for C++ standard library preconditions. This change
enables libstdc++ assertions by default when compiling with -O0 so that
we diagnose more bugs by default.

When users enable optimization we don't add the assertions by default
(because they have non-zero overhead) so they still need to enable them
manually.

For users who really don't want the assertions even in unoptimized
builds, defining _GLIBCXX_NO_ASSERTIONS will prevent them from being
enabled automatically.

libstdc++-v3/ChangeLog:

PR libstdc++/112808
* doc/xml/manual/using.xml (_GLIBCXX_ASSERTIONS): Document
implicit definition for -O0 compilation.
(_GLIBCXX_NO_ASSERTIONS): Document.
* doc/html/manual/using_macros.html: Regenerate.
* include/bits/c++config [!__OPTIMIZE__] (_GLIBCXX_ASSERTIONS):
Define for unoptimized builds.
---
 libstdc++-v3/doc/html/manual/using_macros.html | 12 +---
 libstdc++-v3/doc/xml/manual/using.xml  | 16 +---
 libstdc++-v3/include/bits/c++config|  9 +++--
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/using_macros.html 
b/libstdc++-v3/doc/html/manual/using_macros.html
index 67623b5e2af..c1406ec76f7 100644
--- a/libstdc++-v3/doc/html/manual/using_macros.html
+++ b/libstdc++-v3/doc/html/manual/using_macros.html
@@ -82,9 +82,15 @@
This is described in more detail in
Compile Time Checks.
   _GLIBCXX_ASSERTIONS
-   Undefined by default. When defined, enables extra error checking in
-the form of precondition assertions, such as bounds checking in
-strings and null pointer checks when dereferencing smart pointers.
+   Defined by default when compiling with no optimization, undefined
+   by default when compiling with optimization.
+   When defined, enables extra error checking in the form of
+   precondition assertions, such as bounds checking in strings
+   and null pointer checks when dereferencing smart pointers.
+  _GLIBCXX_NO_ASSERTIONS
+   Undefined by default.  When defined, prevents the implicit
+   definition of _GLIBCXX_ASSERTIONS when 
compiling
+   with no optimization.
   _GLIBCXX_DEBUG
Undefined by default. When defined, compiles user code using
the debug mode.
diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 89119f6fb2d..7ca3a3f4b4c 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -1247,9 +1247,19 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
hello.cc -o test.exe
 _GLIBCXX_ASSERTIONS
 
   
-   Undefined by default. When defined, enables extra error checking in
-the form of precondition assertions, such as bounds checking in
-strings and null pointer checks when dereferencing smart pointers.
+   Defined by default when compiling with no optimization, undefined
+   by default when compiling with optimization.
+   When defined, enables extra error checking in the form of
+   precondition assertions, such as bounds checking in strings
+   and null pointer checks when dereferencing smart pointers.
+  
+
+_GLIBCXX_NO_ASSERTIONS
+
+  
+   Undefined by default.  When defined, prevents the implicit
+   definition of _GLIBCXX_ASSERTIONS when compiling
+   with no optimization.
   
 
 _GLIBCXX_DEBUG
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 29d795f687c..b87a3527f24 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -586,9 +586,14 @@ namespace std
 #pragma GCC visibility pop
 }
 
+#ifndef _GLIBCXX_ASSERTIONS
+# if defined(_GLIBCXX_DEBUG)
 // Debug Mode implies checking assertions.
-#if defined(_GLIBCXX_DEBUG) && !defined(_GLIBCXX_ASSERTIONS)
-# define _GLIBCXX_ASSERTIONS 1
+#  define _GLIBCXX_ASSERTIONS 1
+# elif ! defined(__OPTIMIZE__) && ! defined(_GLIBCXX_NO_ASSERTIONS)
+// Enable assertions for unoptimized builds.
+#  define _GLIBCXX_ASSERTIONS 1
+# endif
 #endif
 
 // Disable std::string explicit instantiation declarations in order to assert.
-- 
2.46.1



[RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Jakub Jelinek
Hi!

In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was
discussed it would be nice to tell the compiler something about what
the toplevel asm does.  Sure, I'm aware the kernel people said they
aren't willing to use something like that, but perhaps other projects
do.  And for kernel perhaps we should add some new option which allows
some dumb parsing of the toplevel asms and gather something from that
parsing.

The following patch is just a small step towards that, namely, allow
some subset of extended inline asm outside of functions.
The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't
implemented, nor any cgraph/varpool changes to find out references etc.

The patch allows something like:

int a[2], b;
enum { E1, E2, E3, E4, E5 };
struct S { int a; char b; long long c; };
asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous"
 : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S)));

Even for non-LTO, that could be useful e.g. for getting enumerators from
C/C++ as integers into the toplevel asm, or sizeof/offsetof etc.

The restrictions I've implemented are:
1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
   specified at toplevel, asm volatile has the volatile ignored for C++ with
   a warning and is an error in C like before
2) I see good use for mainly input operands, output maybe to make it clear
   that the inline asm may write some memory, I don't see a good use for
   clobbers, so the patch doesn't allow those (and of course labels because
   asm goto can't be specified)
3) the patch allows only constraints which don't allow registers, so
   typically "m" or "i" or other memory or immediate constraints; for
   memory, it requires that the operand is addressable and its address
   could be used in static var initializer (so that no code actually
   needs to be emitted for it), for others that they are constants usable
   in the static var initializers
4) the patch disallows + (there is no reload of the operands, so I don't
   see benefits of tying some operands together), nor % (who cares if
   something is commutative in this case), or & (again, no code is emitted
   around the asm), nor the 0-9 constraints

Right now there is no way to tell the compiler that the inline asm defines
some symbol, I wonder if we can find some unused constraint letter or
sequence or other syntax to denote that.  Note, I think we want to be
able to specify that an inline asm defines a function or variable and
be able to provide the type etc. thereof.  So
extern void foo (void);
extern int var;
asm ("%P0: ret" : : "defines" (foo));
asm ("%P0: .quad 0" : : "defines" (var));
where the exact "defines" part is TBD.

Another question is whether all targets have something like x86 P print
modifier which doesn't add any stuff around the printed expressions
(perhaps there are targets which don't do that just for %0 etc.), or
whether we want to add something that will be usable on all targets.

Thoughts on this?

2024-10-02  Jakub Jelinek  

gcc/
* output.h (insn_noperands): Declare.
* final.cc (insn_noperands): No longer static.
* varasm.cc (assemble_asm): Handle ASM_EXPR.
* doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document
that extended asm is now allowed outside of functions with certain
restrictions.
gcc/c/
* c-parser.cc (c_parser_asm_string_literal): Add forward declaration.
(c_parser_asm_definition): Parse also extended asm without
clobbers/labels.
* c-typeck.cc (build_asm_expr): Allow extended asm outside of
functions and check extra restrictions.
gcc/cp/
* cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument.
* parser.cc (cp_parser_asm_definition): Parse also extended asm
without clobbers/labels outside of functions.
* semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set,
check extra restrictions for extended asm outside of functions.
* pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller.

--- gcc/output.h.jj 2024-10-02 10:02:08.031896380 +0200
+++ gcc/output.h2024-10-02 11:27:13.383943702 +0200
@@ -338,6 +338,9 @@ extern rtx_insn *current_output_insn;
The precise value is the insn being output, to pass to error_for_asm.  */
 extern const rtx_insn *this_is_asm_operands;
 
+/* Number of operands of this insn, for an `asm' with operands.  */
+extern unsigned int insn_noperands;
+
 /* Carry information from ASM_DECLARE_OBJECT_NAME
to ASM_FINISH_DECLARE_OBJECT.  */
 extern int size_directive_output;
--- gcc/final.cc.jj 2024-10-02 10:02:08.031896380 +0200
+++ gcc/final.cc2024-10-02 11:27:13.382943715 +0200
@@ -149,7 +149,7 @@ extern const int length_unit_log; /* Thi
 const rtx_insn *this_is_asm_operands;
 
 /* Number of operands of this insn, for an `asm' with operands.  */
-static unsigned int insn_noperands;
+unsigned int insn_noperands;
 
 /* Compare opt

[PATCH] testsuite/116596 - fix gcc.dg/vect/slp-11a.c

2024-10-02 Thread Richard Biener
The condition on "vectorizing stmts using SLP" needs to match that
of "vectorized 1 loops", obviously.

Pushed.

PR testsuite/116596
* gcc.dg/vect/slp-11a.c: Fix.
---
 gcc/testsuite/gcc.dg/vect/slp-11a.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c 
b/gcc/testsuite/gcc.dg/vect/slp-11a.c
index 2efa1796757..196ef65bb78 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
@@ -72,4 +72,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_strided8 && vect_int_mult } } } }  */
-- 
2.43.0


[PATCH 3/3] Record template specialization hash

2024-10-02 Thread Richard Biener
For a specific testcase a lot of compile-time is spent in re-hashing
hashtable elements upon expansion.  The following records the hash
in the hash element.  This speeds up compilation by 20%.

There's probably module-related uses that need to be adjusted.

Bootstrap failed (guess I was expecting this), but still I think this
is a good idea - maybe somebody can pick it up.  Possibly instead
of having a single global hash table having one per ID would be
better.  The hashtable also keeps things GC-live ('args' for example).

gcc/cp/
* cp-tree.h (spec_entry::hash): New member.
* pt.cc (spec_hasher::hash): Return the elements recorded hash.
(lookup_template_class): Record the hash in the element.
---
 gcc/cp/cp-tree.h | 1 +
 gcc/cp/pt.cc | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c5d02567cb4..56345214d8f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5843,6 +5843,7 @@ struct GTY((for_user)) spec_entry
   tree tmpl;  /* The general template this is a specialization of.  */
   tree args;  /* The args for this (maybe-partial) specialization.  */
   tree spec;  /* The specialization itself.  */
+  hashval_t hash;
 };
 
 /* in class.cc */
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 2c8b0d8609d..11f84133fa7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1739,7 +1739,7 @@ spec_hasher::hash (tree tmpl, tree args)
 hashval_t
 spec_hasher::hash (spec_entry *e)
 {
-  return spec_hasher::hash (e->tmpl, e->args);
+  return e->hash;
 }
 
 /* Recursively calculate a hash value for a template argument ARG, for use
@@ -10234,7 +10234,7 @@ lookup_template_class (tree d1, tree arglist, tree 
in_decl, tree context,
   elt.tmpl = gen_tmpl;
   elt.args = arglist;
   elt.spec = NULL_TREE;
-  hash = spec_hasher::hash (&elt);
+  hash = elt.hash = spec_hasher::hash (gen_tmpl, arglist);
   entry = type_specializations->find_with_hash (&elt, hash);
 
   if (entry)
@@ -10519,7 +10519,7 @@ lookup_template_class (tree d1, tree arglist, tree 
in_decl, tree context,
 use it for hash table lookup.  */
  elt.tmpl = found;
  elt.args = arglist = INNERMOST_TEMPLATE_ARGS (arglist);
- hash = spec_hasher::hash (&elt);
+ hash = elt.hash = spec_hasher::hash (found, arglist);
}
}
 
-- 
2.43.0


[PATCH 1/3] Speedup iterative_hash_template_arg

2024-10-02 Thread Richard Biener
Using iterative_hash_object is expensive compared to using
iterative_hash_hashval_t which is fit for integer sized values.
The following reduces the number of perf cycles spent in
iterative_hash_template_arg and iterative_hash combined by 20%.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

gcc/cp/
* pt.cc (iterative_hash_template_arg): Avoid using
iterative_hash_object.
---
 gcc/cp/pt.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 43468e5f62e..04f0a1d5fff 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1751,7 +1751,7 @@ hashval_t
 iterative_hash_template_arg (tree arg, hashval_t val)
 {
   if (arg == NULL_TREE)
-return iterative_hash_object (arg, val);
+return iterative_hash_hashval_t (0, val);
 
   if (!TYPE_P (arg))
 /* Strip nop-like things, but not the same as STRIP_NOPS.  */
@@ -1762,7 +1762,7 @@ iterative_hash_template_arg (tree arg, hashval_t val)
 
   enum tree_code code = TREE_CODE (arg);
 
-  val = iterative_hash_object (code, val);
+  val = iterative_hash_hashval_t (code, val);
 
   switch (code)
 {
@@ -1777,7 +1777,7 @@ iterative_hash_template_arg (tree arg, hashval_t val)
   return val;
 
 case IDENTIFIER_NODE:
-  return iterative_hash_object (IDENTIFIER_HASH_VALUE (arg), val);
+  return iterative_hash_hashval_t (IDENTIFIER_HASH_VALUE (arg), val);
 
 case TREE_VEC:
   for (tree elt : tree_vec_range (arg))
-- 
2.43.0



[PATCH 2/3] Release expanded template argument vector

2024-10-02 Thread Richard Biener
This reduces peak memory usage by 20% for a specific testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It's very ugly so I'd appreciate suggestions on how to handle such
situations better?

gcc/cp/
* pt.cc (coerce_template_parms): Release expanded argument
vector when not needed.
---
 gcc/cp/pt.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 04f0a1d5fff..2c8b0d8609d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9442,6 +9442,9 @@ coerce_template_parms (tree parms,
 SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 TREE_VEC_LENGTH (new_inner_args));
 
+  if ((return_full_args ? new_args != inner_args : new_inner_args != 
inner_args)
+  && inner_args != orig_inner_args)
+ggc_free (inner_args);
   return return_full_args ? new_args : new_inner_args;
 }
 
-- 
2.43.0



Re: [PATCH 1/3] Speedup iterative_hash_template_arg

2024-10-02 Thread Jason Merrill

On 10/2/24 7:49 AM, Richard Biener wrote:

Using iterative_hash_object is expensive compared to using
iterative_hash_hashval_t which is fit for integer sized values.
The following reduces the number of perf cycles spent in
iterative_hash_template_arg and iterative_hash combined by 20%.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?


OK.


Thanks,
Richard.

gcc/cp/
* pt.cc (iterative_hash_template_arg): Avoid using
iterative_hash_object.
---
  gcc/cp/pt.cc | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 43468e5f62e..04f0a1d5fff 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1751,7 +1751,7 @@ hashval_t
  iterative_hash_template_arg (tree arg, hashval_t val)
  {
if (arg == NULL_TREE)
-return iterative_hash_object (arg, val);
+return iterative_hash_hashval_t (0, val);
  
if (!TYPE_P (arg))

  /* Strip nop-like things, but not the same as STRIP_NOPS.  */
@@ -1762,7 +1762,7 @@ iterative_hash_template_arg (tree arg, hashval_t val)
  
enum tree_code code = TREE_CODE (arg);
  
-  val = iterative_hash_object (code, val);

+  val = iterative_hash_hashval_t (code, val);
  
switch (code)

  {
@@ -1777,7 +1777,7 @@ iterative_hash_template_arg (tree arg, hashval_t val)
return val;
  
  case IDENTIFIER_NODE:

-  return iterative_hash_object (IDENTIFIER_HASH_VALUE (arg), val);
+  return iterative_hash_hashval_t (IDENTIFIER_HASH_VALUE (arg), val);
  
  case TREE_VEC:

for (tree elt : tree_vec_range (arg))




[PATCH v2] Improve vsetvl vconfig alignment

2024-10-02 Thread Dusan Stojkovic
This patch is a new version of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662745.html

> Can you elaborate a bit on that?  Rearranging the CFG shouldn't matter
> in general and relying on the specific TARGET_SFB_ALU feels overly
> specific.
> Why does the same register in the if_then_else and interfere with vsetvl?

When ce1 pass transforms CFG in the case of the conditional move,
it deletes then and else basic blocks and in their place adds the conditional
move which uses the same pseudo-register as the original vsetvl.

This interferes with vsetvl pass precisely because of the merge policy.
Use by non rvv flag limits the cases where merging might still be possible.
This patch tries to addresses one such issue.

Agreed. I have removed TARGET_SFB_ALU flag from the condition.

> BTW Bohan Lei has since fixed a bug regarding non-RVV uses.  Does the
> situation change with that applied?

Repeated the testing for sifive-7-series as well as rocket. The same tests
are still effected positively: vsetvlmax-9, vsetvlmax-10, vsetvlmax-11, 
vsetvlmax-15
on sifive-7-series.

2024-10-2  Dusan Stojkovic  

PR target/113035

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
New fuse condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Updated 
scan-assembler-times num parameter.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/riscv-vsetvl.cc  | 24 +++
 .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 030ffbe2ebb..e2a5231333f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3061,6 +3061,30 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
  else
{
  vsetvl_info &prev_info = src_block_info.get_exit_info ();
+ if (prev_info.valid_p ()
+ && curr_info.valid_p ()
+ && prev_info.vl_used_by_non_rvv_insn_p ()
+ && !curr_info.vl_used_by_non_rvv_insn_p ())
+ {
+   // Try to merge each demand individually
+   if (m_dem.sew_lmul_compatible_p (prev_info, curr_info))
+   {
+ m_dem.merge_sew_lmul (prev_info, curr_info);
+   }
+   if (m_dem.policy_compatible_p (prev_info, curr_info))
+   {
+ m_dem.merge_policy (prev_info, curr_info);
+   }
+   if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "After fusing curr info and "
+ "prev info demands individually:\n");
+ fprintf (dump_file, "  prev_info: ");
+ prev_info.dump (dump_file, "  ");
+ fprintf (dump_file, "  curr_info: ");
+ curr_info.dump (dump_file, "  ");
+   }
+ }
  if (!prev_info.valid_p ()
  || m_dem.available_p (prev_info, curr_info)
  || !m_dem.compatible_p (prev_info, curr_info))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c
index 23042460885..65aceed0e4e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c
@@ -18,6 +18,6 @@ void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t 
*out, size_t n, int c
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { 
no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" 
no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*5} 1 { 
target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } 
} } */
-- 
2.43.0



Re: [RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Richard Biener
On Wed, 2 Oct 2024, Jakub Jelinek wrote:

> Hi!
> 
> In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was
> discussed it would be nice to tell the compiler something about what
> the toplevel asm does.  Sure, I'm aware the kernel people said they
> aren't willing to use something like that, but perhaps other projects
> do.  And for kernel perhaps we should add some new option which allows
> some dumb parsing of the toplevel asms and gather something from that
> parsing.
> 
> The following patch is just a small step towards that, namely, allow
> some subset of extended inline asm outside of functions.
> The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't
> implemented, nor any cgraph/varpool changes to find out references etc.
> 
> The patch allows something like:
> 
> int a[2], b;
> enum { E1, E2, E3, E4, E5 };
> struct S { int a; char b; long long c; };
> asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous"
>  : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S)));
> 
> Even for non-LTO, that could be useful e.g. for getting enumerators from
> C/C++ as integers into the toplevel asm, or sizeof/offsetof etc.
> 
> The restrictions I've implemented are:
> 1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
>specified at toplevel, asm volatile has the volatile ignored for C++ with
>a warning and is an error in C like before
> 2) I see good use for mainly input operands, output maybe to make it clear
>that the inline asm may write some memory, I don't see a good use for
>clobbers, so the patch doesn't allow those (and of course labels because
>asm goto can't be specified)
> 3) the patch allows only constraints which don't allow registers, so
>typically "m" or "i" or other memory or immediate constraints; for
>memory, it requires that the operand is addressable and its address
>could be used in static var initializer (so that no code actually
>needs to be emitted for it), for others that they are constants usable
>in the static var initializers
> 4) the patch disallows + (there is no reload of the operands, so I don't
>see benefits of tying some operands together), nor % (who cares if
>something is commutative in this case), or & (again, no code is emitted
>around the asm), nor the 0-9 constraints
> 
> Right now there is no way to tell the compiler that the inline asm defines
> some symbol, I wonder if we can find some unused constraint letter or
> sequence or other syntax to denote that.  Note, I think we want to be
> able to specify that an inline asm defines a function or variable and
> be able to provide the type etc. thereof.  So
> extern void foo (void);
> extern int var;
> asm ("%P0: ret" : : "defines" (foo));
> asm ("%P0: .quad 0" : : "defines" (var));
> where the exact "defines" part is TBD.

As you are using input constraints to mark symbol uses maybe we can
use output constraints with a magic identifier (and a constraint letter
specifying 'identifier'):

asm (".globl %0; %0: ret" : "_D" (extern int foo()) : ...);

In the BOF it was noted that LTO wants to be able to rename / localize
symbols so both use and definition should be used in a way to support
this (though changing visibility is difficult - the assembler might
tie to GOT uses, and .globl is hard to replace).

Richard.

> Another question is whether all targets have something like x86 P print
> modifier which doesn't add any stuff around the printed expressions
> (perhaps there are targets which don't do that just for %0 etc.), or
> whether we want to add something that will be usable on all targets.
> 
> Thoughts on this?
> 
> 2024-10-02  Jakub Jelinek  
> 
> gcc/
>   * output.h (insn_noperands): Declare.
>   * final.cc (insn_noperands): No longer static.
>   * varasm.cc (assemble_asm): Handle ASM_EXPR.
>   * doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document
>   that extended asm is now allowed outside of functions with certain
>   restrictions.
> gcc/c/
>   * c-parser.cc (c_parser_asm_string_literal): Add forward declaration.
>   (c_parser_asm_definition): Parse also extended asm without
>   clobbers/labels.
>   * c-typeck.cc (build_asm_expr): Allow extended asm outside of
>   functions and check extra restrictions.
> gcc/cp/
>   * cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument.
>   * parser.cc (cp_parser_asm_definition): Parse also extended asm
>   without clobbers/labels outside of functions.
>   * semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set,
>   check extra restrictions for extended asm outside of functions.
>   * pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller.
> 
> --- gcc/output.h.jj   2024-10-02 10:02:08.031896380 +0200
> +++ gcc/output.h  2024-10-02 11:27:13.383943702 +0200
> @@ -338,6 +338,9 @@ extern rtx_insn *current_output_insn;
> The precise value is the insn being output, to pas

Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector reductions

2024-10-02 Thread Kyrylo Tkachov


> On 2 Oct 2024, at 13:43, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Tamar Christina  writes:
>> Hi Jennifer,
>> 
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Tuesday, October 1, 2024 12:20 PM
>>> To: Jennifer Schmitz 
>>> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
>>> Subject: Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector
>>> reductions
>>> 
>>> Jennifer Schmitz  writes:
 This patch implements the optabs reduc_and_scal_,
 reduc_ior_scal_, and reduc_xor_scal_ for Advanced SIMD
 integers for TARGET_SVE in order to use the SVE instructions ANDV, ORV, and
 EORV for fixed-width bitwise reductions.
 For example, the test case
 
 int32_t foo (int32_t *a)
 {
  int32_t b = -1;
  for (int i = 0; i < 4; ++i)
b &= a[i];
  return b;
 }
 
 was previously compiled to
 (-O2 -ftree-vectorize --param aarch64-autovec-preference=asimd-only):
 foo:
ldp w2, w1, [x0]
ldp w3, w0, [x0, 8]
and w1, w1, w3
and w0, w0, w2
and w0, w1, w0
ret
 
 With patch, it is compiled to:
 foo:
ldr q31, [x0]
   ptrue   p7.b, all
   andvs31, p7, z31.s
   fmovw0, s3
   ret
 
 Test cases were added to check the produced assembly for use of SVE
 instructions.
>>> 
>>> I would imagine that in this particular case, the scalar version is
>>> better.  But I agree it's a useful feature for other cases.
>>> 
>> 
>> Yeah, I'm concerned because ANDV and other reductions are extremely 
>> expensive.
>> But assuming the reductions are done outside of a loop then it should be ok, 
>> though.
>> 
>> The issue is that the reduction latency grows with VL, so e.g. compare the 
>> latencies and
>> throughput for Neoverse V1 and Neoverse V2.  So I think we want to gate this 
>> on VL128.
>> 
>> As an aside, is the sequence correct?  With ORR reduction ptrue makes sense, 
>> but for
>> VL > 128 ptrue doesn't work as the top bits would be zero. So an ANDV on 
>> zero values
>> lanes would result in zero.
> 
> Argh!  Thanks for spotting that.  I'm kicking myself for missing it :(
> 
>> You'd want to predicate the ANDV with the size of the vector being reduced. 
>> The same
>> is true for SMIN and SMAX.
>> 
>> I do wonder whether we need to split the pattern into two, where w->w uses 
>> the SVE
>> Instructions but w->r uses Adv SIMD.
>> 
>> In the case of w->r as the example above
>> 
>>ext v1.16b, v0.16b, v0.16b, #8
>>and v0.8b, v0.8b, v1.8b
>>fmovx8, d0
>>lsr x9, x8, #32
>>and w0, w8, w9
>> 
>> would beat the ADDV on pretty much every uarch.
>> 
>> But I'll leave it up to the maintainers.
> 
> Also a good point.  And since these are integer reductions, an r
> result is more probable than a w result.  w would typically only
> be used if the result is stored directly to memory.
> 
> At which point, the question (which you might have been implying)
> is whether it's worth doing this at all, given the limited cases
> for which it's beneficial, and the complication that's needed to
> (a) detect those cases and (b) make them work.

These are good points in the thread. Maybe it makes sense to do this only for 
V16QI reductions?
Maybe a variant of Tamar’s w->r sequence wins out even there.

Originally I had hoped that we’d tackle the straight-line case from PR113816 
but it seems that GCC didn’t even try to create a reduction op for the code 
there.
Maybe that’s something to look into separately.

Also, for the alternative test case that we tried to use for a motivation:
char sior_loop (char *a)
{
  char b = 0;
  for (int i = 0; i < 16; ++i)
b |= a[i];
  return b;
}

GCC generates some terrible code: https://godbolt.org/z/a68jodKca
So it feels that we should do something to improve the bitwise reductions for 
AArch64 regardless.
Maybe we need to agree on the optimal sequences for the various modes and 
implement them.

Thanks,
Kyrill

> 
> Thanks,
> Richard
> 
>> 
>> Thanks for the patch though!
>> 
>> Cheers,
>> Tamar
>> 
>>> However, a natural follow-on would be to use SMAXV and UMAXV for V2DI.
>>> Taking that into account:
>>> 
 [...]
 diff --git a/gcc/config/aarch64/aarch64-sve.md 
 b/gcc/config/aarch64/aarch64-
>>> sve.md
 index bfa28849adf..0d9e5cebef0 100644
 --- a/gcc/config/aarch64/aarch64-sve.md
 +++ b/gcc/config/aarch64/aarch64-sve.md
 @@ -8927,6 +8927,28 @@
   "addv\t%d0, %1, %2."
 )
 
 +;; Unpredicated logical integer reductions for Advanced SIMD modes.
 +(define_expand "reduc__scal_"
 +  [(set (match_operand: 0 "register_operand")
 +  (unspec: [(match_dup 2)
 + (match_operand:VQ_I 1 "register_operand")]
 +SVE_INT_REDUCTION_LOGICAL))]
 +  "TARGET_SVE"
 +  {
 +operan

[PATCH v13 2/4] gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()

2024-10-02 Thread Alejandro Colomar
The old name was misleading.

While at it, also rename some temporary variables that are used with
this function, for consistency.

Link: 


gcc/ChangeLog:

* tree.cc (array_type_nelts, array_type_nelts_minus_one)
* tree.h (array_type_nelts, array_type_nelts_minus_one)
* expr.cc (count_type_elements)
* config/aarch64/aarch64.cc
(pure_scalable_type_info::analyze_array)
* config/i386/i386.cc (ix86_canonical_va_list_type):
Rename array_type_nelts() => array_type_nelts_minus_one()
The old name was misleading.

gcc/c/ChangeLog:

* c-decl.cc (one_element_array_type_p, get_parm_array_spec)
* c-fold.cc (c_fold_array_ref):
Rename array_type_nelts() => array_type_nelts_minus_one()

gcc/cp/ChangeLog:

* decl.cc (reshape_init_array)
* init.cc
(build_zero_init_1)
(build_value_init_noctor)
(build_vec_init)
(build_delete)
* lambda.cc (add_capture)
* tree.cc (array_type_nelts_top):
Rename array_type_nelts() => array_type_nelts_minus_one()

gcc/fortran/ChangeLog:

* trans-array.cc (structure_alloc_comps)
* trans-openmp.cc
(gfc_walk_alloc_comps)
(gfc_omp_clause_linear_ctor):
Rename array_type_nelts() => array_type_nelts_minus_one()

gcc/rust/ChangeLog:

* backend/rust-tree.cc (array_type_nelts_top):
Rename array_type_nelts() => array_type_nelts_minus_one()

Cc: Gabriel Ravier 
Cc: Martin Uecker 
Cc: Joseph Myers 
Cc: Xavier Del Campo Romero 
Cc: Jakub Jelinek 
Suggested-by: Richard Biener 
Signed-off-by: Alejandro Colomar 
---
 gcc/c/c-decl.cc   | 10 +-
 gcc/c/c-fold.cc   |  7 ---
 gcc/config/aarch64/aarch64.cc |  2 +-
 gcc/config/i386/i386.cc   |  2 +-
 gcc/cp/decl.cc|  2 +-
 gcc/cp/init.cc|  8 
 gcc/cp/lambda.cc  |  3 ++-
 gcc/cp/tree.cc|  2 +-
 gcc/expr.cc   |  8 
 gcc/fortran/trans-array.cc|  2 +-
 gcc/fortran/trans-openmp.cc   |  4 ++--
 gcc/rust/backend/rust-tree.cc |  2 +-
 gcc/tree.cc   |  4 ++--
 gcc/tree.h|  2 +-
 14 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index aa7f69d1b7b..c73d3107efb 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5358,7 +5358,7 @@ one_element_array_type_p (const_tree type)
 {
   if (TREE_CODE (type) != ARRAY_TYPE)
 return false;
-  return integer_zerop (array_type_nelts (type));
+  return integer_zerop (array_type_nelts_minus_one (type));
 }
 
 /* Determine whether TYPE is a zero-length array type "[0]".  */
@@ -6306,15 +6306,15 @@ get_parm_array_spec (const struct c_parm *parm, tree 
attrs)
  for (tree type = parm->specs->type; TREE_CODE (type) == ARRAY_TYPE;
   type = TREE_TYPE (type))
{
- tree nelts = array_type_nelts (type);
- if (error_operand_p (nelts))
+ tree nelts_minus_one = array_type_nelts_minus_one (type);
+ if (error_operand_p (nelts_minus_one))
return attrs;
- if (TREE_CODE (nelts) != INTEGER_CST)
+ if (TREE_CODE (nelts_minus_one) != INTEGER_CST)
{
  /* Each variable VLA bound is represented by the dollar
 sign.  */
  spec += "$";
- tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
+ tpbnds = tree_cons (NULL_TREE, nelts_minus_one, tpbnds);
}
}
  tpbnds = nreverse (tpbnds);
diff --git a/gcc/c/c-fold.cc b/gcc/c/c-fold.cc
index 57b67c74bd8..9ea174f79c4 100644
--- a/gcc/c/c-fold.cc
+++ b/gcc/c/c-fold.cc
@@ -73,11 +73,12 @@ c_fold_array_ref (tree type, tree ary, tree index)
   unsigned elem_nchars = (TYPE_PRECISION (elem_type)
  / TYPE_PRECISION (char_type_node));
   unsigned len = (unsigned) TREE_STRING_LENGTH (ary) / elem_nchars;
-  tree nelts = array_type_nelts (TREE_TYPE (ary));
+  tree nelts_minus_one = array_type_nelts_minus_one (TREE_TYPE (ary));
   bool dummy1 = true, dummy2 = true;
-  nelts = c_fully_fold_internal (nelts, true, &dummy1, &dummy2, false, false);
+  nelts_minus_one = c_fully_fold_internal (nelts_minus_one, true, &dummy1,
+  &dummy2, false, false);
   unsigned HOST_WIDE_INT i = tree_to_uhwi (index);
-  if (!tree_int_cst_le (index, nelts)
+  if (!tree_int_cst_le (index, nelts_minus_one)
   || i >= len
   || i + elem_nchars > len)
 return NULL_TREE;
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 27e24ba70ab..21606701725 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1084,7 +1084,7 @@ pure_scalable

[PATCH v13 3/4] Merge definitions of array_type_nelts_top()

2024-10-02 Thread Alejandro Colomar
There were two identical definitions, and none of them are available
where they are needed for implementing __nelementsof__.  Merge them, and
provide the single definition in gcc/tree.{h,cc}, where it's available
for __nelementsof__, which will be added in the following commit.

gcc/ChangeLog:

* tree.h (array_type_nelts_top)
* tree.cc (array_type_nelts_top):
Define function (moved from gcc/cp/).

gcc/cp/ChangeLog:

* cp-tree.h (array_type_nelts_top)
* tree.cc (array_type_nelts_top):
Remove function (move to gcc/).

gcc/rust/ChangeLog:

* backend/rust-tree.h (array_type_nelts_top)
* backend/rust-tree.cc (array_type_nelts_top):
Remove function.

Signed-off-by: Alejandro Colomar 
---
 gcc/cp/cp-tree.h  |  1 -
 gcc/cp/tree.cc| 13 -
 gcc/rust/backend/rust-tree.cc | 13 -
 gcc/rust/backend/rust-tree.h  |  2 --
 gcc/tree.cc   | 13 +
 gcc/tree.h|  1 +
 6 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2eeb5e3e8b1..6913175c3ce 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8108,7 +8108,6 @@ extern tree build_exception_variant   (tree, 
tree);
 extern void fixup_deferred_exception_variants   (tree, tree);
 extern tree bind_template_template_parm(tree, tree);
 extern tree array_type_nelts_total (tree);
-extern tree array_type_nelts_top   (tree);
 extern bool array_of_unknown_bound_p   (const_tree);
 extern tree break_out_target_exprs (tree, bool = false);
 extern tree build_ctor_subob_ref   (tree, tree, tree);
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 040136c70ab..7d179491476 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -3079,19 +3079,6 @@ cxx_print_statistics (void)
 depth_reached);
 }
 
-/* Return, as an INTEGER_CST node, the number of elements for TYPE
-   (which is an ARRAY_TYPE).  This counts only elements of the top
-   array.  */
-
-tree
-array_type_nelts_top (tree type)
-{
-  return fold_build2_loc (input_location,
- PLUS_EXPR, sizetype,
- array_type_nelts_minus_one (type),
- size_one_node);
-}
-
 /* Return, as an INTEGER_CST node, the number of elements for TYPE
(which is an ARRAY_TYPE).  This one is a recursive count of all
ARRAY_TYPEs that are clumped together.  */
diff --git a/gcc/rust/backend/rust-tree.cc b/gcc/rust/backend/rust-tree.cc
index 8d32e5203ae..3dc6b076711 100644
--- a/gcc/rust/backend/rust-tree.cc
+++ b/gcc/rust/backend/rust-tree.cc
@@ -859,19 +859,6 @@ is_empty_class (tree type)
   return CLASSTYPE_EMPTY_P (type);
 }
 
-// forked from gcc/cp/tree.cc array_type_nelts_top
-
-/* Return, as an INTEGER_CST node, the number of elements for TYPE
-   (which is an ARRAY_TYPE).  This counts only elements of the top
-   array.  */
-
-tree
-array_type_nelts_top (tree type)
-{
-  return fold_build2_loc (input_location, PLUS_EXPR, sizetype,
- array_type_nelts_minus_one (type), size_one_node);
-}
-
 // forked from gcc/cp/tree.cc builtin_valid_in_constant_expr_p
 
 /* Test whether DECL is a builtin that may appear in a
diff --git a/gcc/rust/backend/rust-tree.h b/gcc/rust/backend/rust-tree.h
index 26c8b653ac6..e597c3ab81d 100644
--- a/gcc/rust/backend/rust-tree.h
+++ b/gcc/rust/backend/rust-tree.h
@@ -2993,8 +2993,6 @@ extern location_t rs_expr_location (const_tree);
 extern int
 is_empty_class (tree type);
 
-extern tree array_type_nelts_top (tree);
-
 extern bool
 is_really_empty_class (tree, bool);
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 7439777f307..d0a7156d982 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -3729,6 +3729,19 @@ array_type_nelts_minus_one (const_tree type)
  ? max
  : fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, min));
 }
+
+/* Return, as an INTEGER_CST node, the number of elements for TYPE
+   (which is an ARRAY_TYPE).  This counts only elements of the top
+   array.  */
+
+tree
+array_type_nelts_top (tree type)
+{
+  return fold_build2_loc (input_location,
+ PLUS_EXPR, sizetype,
+ array_type_nelts_minus_one (type),
+ size_one_node);
+}
 
 /* If arg is static -- a reference to an object in static storage -- then
return the object.  This is not the same as the C meaning of `static'.
diff --git a/gcc/tree.h b/gcc/tree.h
index 4e29544a36c..372f4dd71da 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4922,6 +4922,7 @@ extern tree build_method_type (tree, tree);
 extern tree build_offset_type (tree, tree);
 extern tree build_complex_type (tree, bool named = false);
 extern tree array_type_nelts_minus_one (const_tree);
+extern tree array_type_nelts_top (tree);
 
 extern tree value_member (tree, tree);
 extern tree purpose_member (const_tree, tree);
-- 
2.45.2



signature.asc

[PATCH v13 1/4] contrib/: Add support for Cc: and Link: tags

2024-10-02 Thread Alejandro Colomar
contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit):
Add support for 'Cc: ' and 'Link: ' tags.

Cc: Jason Merrill 
Signed-off-by: Alejandro Colomar 
---
 contrib/gcc-changelog/git_commit.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 87ecb9e1a17..64fb986b74c 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -182,7 +182,8 @@ CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
 
 REVIEW_PREFIXES = ('reviewed-by: ', 'reviewed-on: ', 'signed-off-by: ',
'acked-by: ', 'tested-by: ', 'reported-by: ',
-   'suggested-by: ')
+   'suggested-by: ', 'cc: ')
+LINK_PREFIXES = ('link: ')
 DATE_FORMAT = '%Y-%m-%d'
 
 
@@ -524,6 +525,8 @@ class GitCommit:
 continue
 elif lowered_line.startswith(REVIEW_PREFIXES):
 continue
+elif lowered_line.startswith(LINK_PREFIXES):
+continue
 else:
 m = cherry_pick_regex.search(line)
 if m:
-- 
2.45.2



signature.asc
Description: PGP signature


[committed] libstdc++: Fix -Wlong-long warning in

2024-10-02 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

For 32-bit targets __INT64_TYPE__ expands to long long, which gives a
pedwarn for C++98 mode, causing:

FAIL: 17_intro/headers/c++1998/all_pedantic_errors.cc  -std=gnu++98 (test for 
excess errors)
Excess errors:
.../bits/postypes.h:64: error: ISO C++ 1998 does not support 'long long' 
[-Wlong-long]

libstdc++-v3/ChangeLog:

* include/bits/postypes.h: Fix -Wlong-long warning.
---
 libstdc++-v3/include/bits/postypes.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libstdc++-v3/include/bits/postypes.h 
b/libstdc++-v3/include/bits/postypes.h
index 7bd973e089b..cf5f30187fd 100644
--- a/libstdc++-v3/include/bits/postypes.h
+++ b/libstdc++-v3/include/bits/postypes.h
@@ -52,6 +52,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // unspecified. The behaviour in this implementation is as noted
   // below.
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wlong-long"
   /**
*  @brief  Type used by fpos, char_traits, and char_traits.
*
@@ -65,6 +67,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
   typedef long long streamoff;
 #endif
+#pragma GCC diagnostic pop
 
   /// Integral type for I/O operation counts and buffer sizes.
   typedef ptrdiff_tstreamsize; // Signed integral type
-- 
2.46.1



Re: [PATCH] libstdc++: Fix rounding in chrono::parse

2024-10-02 Thread Jonathan Wakely
Nobody had any comments, so I've pushed it to trunk now.

On Thu, 26 Sept 2024 at 13:57, Jonathan Wakely  wrote:
>
> Does this rounding heuristic seem reasonable? I have discussed it with
> Howard and he agreed that rounding "2024-09-22 18:34:56" up to the next
> day, 2024-09-23, could be surprising. But he convinced me that using
> chrono::round was correct in general (certainly better than what we have
> now using chrono::time_point_cast). A period like ratio<1,60> (e.g.
> a 60Hz refresh rate) can't be represented exactly with decimal digits,
> so we want to round from the parsed value to the closest representation
> in the result type. Truncating is strictly inferior when we're dealing
> with such periods.
>
> The hybrid rounding approach here tries to Do The Right Thing.
>
> MSVC seems to avoid this problem by just refusing to parse the date time
> "2024-09-22 18:34:56" into a result type of sys_days, presumably because
> the parsed value cannot be represented in the result type. I can see
> some justification for that, but I think rounding is more useful than
> failing to parse.
>
> Tested x86_64-linux.
>
> -- >8 --
>
> I noticed that chrono::parse was using duration_cast and time_point_cast
> to convert the parsed value to the result. Those functions truncate
> towards zero, which is not generally what you want. Especially for
> negative times before the epoch, where truncating towards zero rounds
> "up" towards the next duration/time_point. Using chrono::round is
> typically better, as that rounds to nearest.
>
> However, while testing the fix I realised that rounding to the nearest
> can give surprising results in some cases. For example if we parse a
> chrono::sys_days using chrono::parse("F %T", "2024-09-22 18:34:56", tp)
> then we will round up to the next day, i.e. sys_days(2024y/09/23). That
> seems surprising, and I think 2024-09-22 is what most users would
> expect.
>
> This change attempts to provide a hybrid rounding heuristic where we use
> chrono::round for the general case, but when the result has a period
> that is one of minutes, hours, days, weeks, or years then we truncate
> towards negative infinity using chrono::floor. This means that we
> truncate "2024-09-22 18:34:56" to the start of the current
> minute/hour/day/week/year, instead of rounding up to 2024-09-23, or to
> 18:35, or 17:00. For a period of months chrono::round is used, because
> the months duration is defined as a twelfth of a year, which is not
> actually the length of any calendar month. We don't want to truncate to
> a whole number of "months" if that can actually go from e.g. 2023-03-01
> to 2023-01-31, because February is shorter than chrono::months(1).
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (__detail::__use_floor): New
> function.
> (__detail::__round): New function.
> (from_stream): Use __detail::__round.
> * testsuite/std/time/clock/file/io.cc: Check for expected
> rounding in parse.
> * testsuite/std/time/clock/gps/io.cc: Likewise.
> ---
>  libstdc++-v3/include/bits/chrono_io.h | 64 +--
>  .../testsuite/std/time/clock/file/io.cc   | 21 +-
>  .../testsuite/std/time/clock/gps/io.cc| 20 ++
>  3 files changed, 97 insertions(+), 8 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/chrono_io.h 
> b/libstdc++-v3/include/bits/chrono_io.h
> index 1e34c82b532..362bb5aa9e9 100644
> --- a/libstdc++-v3/include/bits/chrono_io.h
> +++ b/libstdc++-v3/include/bits/chrono_io.h
> @@ -2407,6 +2407,56 @@ namespace __detail
>template
>  using _Parser_t = _Parser>;
>
> +  template
> +consteval bool
> +__use_floor()
> +{
> +  if constexpr (_Duration::period::den == 1)
> +   {
> + switch (_Duration::period::num)
> + {
> +   case minutes::period::num:
> +   case hours::period::num:
> +   case days::period::num:
> +   case weeks::period::num:
> +   case years::period::num:
> + return true;
> + }
> +   }
> +  return false;
> +}
> +
> +  // A "do the right thing" rounding function for duration and time_point
> +  // values extracted by from_stream. When treat_as_floating_point is true
> +  // we don't want to do anything, just a straightforward conversion.
> +  // When the destination type has a period of minutes, hours, days, weeks,
> +  // or years, we use chrono::floor to truncate towards negative infinity.
> +  // This ensures that an extracted timestamp such as 2024-09-05 13:00:00
> +  // will produce 2024-09-05 when rounded to days, rather than rounding up
> +  // to 2024-09-06 (a different day).
> +  // Otherwise, use chrono::round to get the nearest value representable
> +  // in the destination type.
> +  template
> +constexpr auto
> +__round(const _Tp& __t)
> +{
> +  if constexpr (__is_duration_v<_Tp>)
> +   {
> + if constexpr (treat_as_floating_point_v)
> + 

[PATCH] doc: Drop GCC 2.6 ABI change note for H8/h8300-hms

2024-10-02 Thread Gerald Pfeifer
Hi Jeff,

going through doc/install.texi I noticed there is same really old note on 
h8300-hms, even predating egcs. :-)  Shall we drop that?

Gerald


gcc:
PR target/69374
* doc/install.texi (Specific) : Drop GCC 2.6
ABI change note.
---
 gcc/doc/install.texi | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index e035061a23e..09559615bbf 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4118,11 +4118,6 @@ This configuration is intended for embedded systems.
 @heading h8300-hms
 Renesas H8/300 series of processors.
 
-The calling convention and structure layout has changed in release 2.6.
-All code must be recompiled.  The calling convention now passes the
-first three arguments in function calls in registers.  Structures are no
-longer a multiple of 2 bytes.
-
 @html
 
 @end html
-- 
2.46.0


[committed v2] libstdc++: Populate std::time_get::get's %c format for C locale

2024-10-02 Thread Jonathan Wakely
The v1 patch included fixes to a new file for the POSIX-2008 locale,
config/locale/ieee_1003.1-2008/time_members.cc, but I haven't actually
pushed the POSIX-2008 locale to trunk yet. So this v2 patch removes the
changes to that file, and if/when I push the POSIX-2008 work this fix
will be in there from day one.

Tested x86_64-linux. Pushed to trunk. This should be backported too.

-- >8 --

We were using the empty string "" for D_T_FMT and ERA_D_T_FMT in the C
locale, instead of "%a %b %e %T %Y" as the C standard requires. Set it
correctly for each locale implementation that defines time_members.cc.

We can also explicitly set the _M_era_xxx pointers to the same values as
the corresponding _M_xxx ones, rather than setting them to point to
identical string literals. This doesn't rely on the compiler merging
string literals, and makes it more explicit that they're the same in the
C locale.

libstdc++-v3/ChangeLog:

* config/locale/dragonfly/time_members.cc
(__timepunct::_M_initialize_timepunc)
(__timepunct::_M_initialize_timepunc): Set
_M_date_time_format for C locale. Set %Ex formats to the same
values as the %x formats.
* config/locale/generic/time_members.cc: Likewise.
* config/locale/gnu/time_members.cc: Likewise.
* testsuite/22_locale/time_get/get/char/5.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/5.cc: New test.
---
 .../config/locale/dragonfly/time_members.cc   | 16 
 .../config/locale/generic/time_members.cc |  8 ++--
 .../config/locale/gnu/time_members.cc | 16 
 .../22_locale/time_get/get/char/5.cc  | 37 +++
 .../22_locale/time_get/get/wchar_t/5.cc   | 37 +++
 5 files changed, 94 insertions(+), 20 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/22_locale/time_get/get/char/5.cc
 create mode 100644 libstdc++-v3/testsuite/22_locale/time_get/get/wchar_t/5.cc

diff --git a/libstdc++-v3/config/locale/dragonfly/time_members.cc 
b/libstdc++-v3/config/locale/dragonfly/time_members.cc
index 0c96928135e..069b2ddd26b 100644
--- a/libstdc++-v3/config/locale/dragonfly/time_members.cc
+++ b/libstdc++-v3/config/locale/dragonfly/time_members.cc
@@ -67,11 +67,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_c_locale_timepunct = _S_get_c_locale();
 
  _M_data->_M_date_format = "%m/%d/%y";
- _M_data->_M_date_era_format = "%m/%d/%y";
+ _M_data->_M_date_era_format = _M_data->_M_date_format;
  _M_data->_M_time_format = "%H:%M:%S";
- _M_data->_M_time_era_format = "%H:%M:%S";
- _M_data->_M_date_time_format = "";
- _M_data->_M_date_time_era_format = "";
+ _M_data->_M_time_era_format = _M_data->_M_time_format;
+ _M_data->_M_date_time_format = "%a %b %e %T %Y";
+ _M_data->_M_date_time_era_format = _M_data->_M_date_time_format;
  _M_data->_M_am = "AM";
  _M_data->_M_pm = "PM";
  _M_data->_M_am_pm_format = "%I:%M:%S %p";
@@ -224,11 +224,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_c_locale_timepunct = _S_get_c_locale();
 
  _M_data->_M_date_format = L"%m/%d/%y";
- _M_data->_M_date_era_format = L"%m/%d/%y";
+ _M_data->_M_date_era_format = _M_data->_M_date_format;
  _M_data->_M_time_format = L"%H:%M:%S";
- _M_data->_M_time_era_format = L"%H:%M:%S";
- _M_data->_M_date_time_format = L"";
- _M_data->_M_date_time_era_format = L"";
+ _M_data->_M_time_era_format = _M_data->_M_time_format;
+ _M_data->_M_date_time_format = L"%a %b %e %T %Y";
+ _M_data->_M_date_time_era_format = _M_data->_M_date_time_format;
  _M_data->_M_am = L"AM";
  _M_data->_M_pm = L"PM";
  _M_data->_M_am_pm_format = L"%I:%M:%S %p";
diff --git a/libstdc++-v3/config/locale/generic/time_members.cc 
b/libstdc++-v3/config/locale/generic/time_members.cc
index 68395820fef..6619f0ca881 100644
--- a/libstdc++-v3/config/locale/generic/time_members.cc
+++ b/libstdc++-v3/config/locale/generic/time_members.cc
@@ -65,11 +65,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_data = new __timepunct_cache;
 
   _M_data->_M_date_format = "%m/%d/%y";
-  _M_data->_M_date_era_format = "%m/%d/%y";
+  _M_data->_M_date_era_format = _M_data->_M_date_format;
   _M_data->_M_time_format = "%H:%M:%S";
-  _M_data->_M_time_era_format = "%H:%M:%S";
-  _M_data->_M_date_time_format = "";
-  _M_data->_M_date_time_era_format = "";
+  _M_data->_M_time_era_format = _M_data->_M_time_format;
+  _M_data->_M_date_time_format = "%a %b %e %T %Y";
+  _M_data->_M_date_time_era_format = _M_data->_M_date_time_format;
   _M_data->_M_am = "AM";
   _M_data->_M_pm = "PM";
   _M_data->_M_am_pm_format = "%I:%M:%S %p";
diff --git a/libstdc++-v3/config/locale/gnu/time_members.cc 
b/libstdc++-v3/config/locale/gnu/time_members.cc
index 1e3b87488fa..88c8ab70080 100644
--- a/libstdc++-v3/config/locale

[PATCH] libstdc++: Simplify std::aligned_storage and fix for versioned namespace [PR61458]

2024-10-02 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

This simplifies the implementation of std::aligned_storage. For the
unstable ABI it also fixes the bug where its size is too large when the
default alignment is used. We can't fix that for the stable ABI though,
so just add a comment about the bug.

libstdc++-v3/ChangeLog:

PR libstdc++/61458
* doc/doxygen/user.cfg.in (GENERATE_BUGLIST): Set to NO.
* include/std/type_traits (__aligned_storage_msa): Remove.
(__aligned_storage_max_align_t): New struct.
(__aligned_storage_default_alignment): New function.
(aligned_storage): Use __aligned_storage_default_alignment for
default alignment. Replace union with a struct containing an
aligned buffer. Improve Doxygen comment.
(aligned_storage_t): Use __aligned_storage_default_alignment for
default alignment.
---
 libstdc++-v3/doc/doxygen/user.cfg.in |  2 +-
 libstdc++-v3/include/std/type_traits | 83 
 2 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 8fe337adf75..ae50f6dd0c7 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -681,7 +681,7 @@ GENERATE_TESTLIST  = NO
 # list. This list is created by putting \bug commands in the documentation.
 # The default value is: YES.
 
-GENERATE_BUGLIST   = YES
+GENERATE_BUGLIST   = NO
 
 # The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or disable (NO)
 # the deprecated list. This list is created by putting \deprecated commands in
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 6e6778078dc..28e403460a4 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2234,39 +2234,74 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 using add_pointer_t = typename add_pointer<_Tp>::type;
 #endif
 
-  template
-struct __aligned_storage_msa
-{
-  union __type
-  {
-   unsigned char __data[_Len];
-   struct __attribute__((__aligned__)) { } __align;
-  };
-};
+  /// @cond undocumented
+
+  // Aligned to maximum fundamental alignment
+  struct __attribute__((__aligned__)) __aligned_storage_max_align_t
+  { };
+
+  constexpr size_t
+  __aligned_storage_default_alignment([[__maybe_unused__]] size_t __len)
+  {
+#if _GLIBCXX_INLINE_VERSION
+using _Max_align
+  = integral_constant;
+
+return __len > (_Max_align::value / 2)
+? _Max_align::value
+# if _GLIBCXX_USE_BUILTIN_TRAIT(__builtin_clzg)
+: 1 << (__SIZE_WIDTH__ - __builtin_clzg(__len - 1u));
+# else
+: 1 << (__LLONG_WIDTH__ - __builtin_clzll(__len - 1ull));
+# endif
+#else
+// Returning a fixed value is incorrect, but kept for ABI compatibility.
+// XXX GLIBCXX_ABI Deprecated
+return alignof(__aligned_storage_max_align_t);
+#endif
+  }
+  /// @endcond
 
   /**
-   *  @brief Alignment type.
+   *  @brief Aligned storage
*
-   *  The value of _Align is a default-alignment which shall be the
-   *  most stringent alignment requirement for any C++ object type
-   *  whose size is no greater than _Len (3.9). The member typedef
-   *  type shall be a POD type suitable for use as uninitialized
-   *  storage for any object whose size is at most _Len and whose
-   *  alignment is a divisor of _Align.
+   *  The member typedef `type` is be a POD type suitable for use as
+   *  uninitialized storage for any object whose size is at most `_Len`
+   *  and whose alignment is a divisor of `_Align`.
+   *
+   *  It is important to use the nested `type` as uninitialized storage,
+   *  not the `std::aligned_storage` type itself which is an empty class
+   *  with 1-byte alignment. So this is correct:
+   *
+   *  `typename std::aligned_storage::type m_xobj;`
+   *
+   *  This is wrong:
+   *
+   *  `std::aligned_storage m_xobj;`
+   *
+   *  In C++14 and later `std::aligned_storage_t`
+   *  can be used to refer to the `type` member typedef.
+   *
+   *  The default value of _Align is supposed to be the most stringent
+   *  fundamental alignment requirement for any C++ object type whose size
+   *  is no greater than `_Len` (see [basic.align] in the C++ standard).
+   *
+   *  @bug In this implementation the default value for _Align is always the
+   *  maximum fundamental alignment, i.e. `alignof(max_align_t)`, which is
+   *  incorrect. It should be an alignment value no greater than `_Len`.
*
*  @deprecated Deprecated in C++23. Uses can be replaced by an
-   *  array std::byte[_Len] declared with alignas(_Align).
+   *  array `std::byte[_Len]` declared with `alignas(_Align)`.
   */
-  template::__type)>
+  template
 struct
 _GLIBCXX23_DEPRECATED
 aligned_storage
 {
-  union type
+  struct type
   {
-   unsigned char __data[_Len];
-   struct __attribute__((__aligned__((_Align { } __align;

[PATCH] un-XFAIL gcc.dg/vect/vect-double-reduc-5.c

2024-10-02 Thread Richard Biener
The testcase now passes, we can handle double reductions with multiple
types fine.

Pushed.

* gcc.dg/vect/vect-double-reduc-5.c: Un-XFAIL everywhere.
---
 gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
index b990405745e..a40aa304740 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
@@ -51,7 +51,4 @@ int main ()
   return 0;
 }
 
-/* Vectorization of loops with multiple types and double reduction is not 
-   supported yet.  */   
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
-  
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
-- 
2.43.0


Re: [RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Uros Bizjak
On Wed, Oct 2, 2024 at 1:03 PM Jakub Jelinek  wrote:
>
> Hi!
>
> In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was
> discussed it would be nice to tell the compiler something about what
> the toplevel asm does.  Sure, I'm aware the kernel people said they
> aren't willing to use something like that, but perhaps other projects
> do.  And for kernel perhaps we should add some new option which allows
> some dumb parsing of the toplevel asms and gather something from that
> parsing.
>
> The following patch is just a small step towards that, namely, allow
> some subset of extended inline asm outside of functions.
> The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't
> implemented, nor any cgraph/varpool changes to find out references etc.
>
> The patch allows something like:
>
> int a[2], b;
> enum { E1, E2, E3, E4, E5 };
> struct S { int a; char b; long long c; };
> asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous"
>  : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S)));
>
> Even for non-LTO, that could be useful e.g. for getting enumerators from
> C/C++ as integers into the toplevel asm, or sizeof/offsetof etc.
>
> The restrictions I've implemented are:
> 1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
>specified at toplevel, asm volatile has the volatile ignored for C++ with
>a warning and is an error in C like before
> 2) I see good use for mainly input operands, output maybe to make it clear
>that the inline asm may write some memory, I don't see a good use for
>clobbers, so the patch doesn't allow those (and of course labels because
>asm goto can't be specified)
> 3) the patch allows only constraints which don't allow registers, so
>typically "m" or "i" or other memory or immediate constraints; for
>memory, it requires that the operand is addressable and its address
>could be used in static var initializer (so that no code actually
>needs to be emitted for it), for others that they are constants usable
>in the static var initializers
> 4) the patch disallows + (there is no reload of the operands, so I don't
>see benefits of tying some operands together), nor % (who cares if
>something is commutative in this case), or & (again, no code is emitted
>around the asm), nor the 0-9 constraints
>
> Right now there is no way to tell the compiler that the inline asm defines
> some symbol, I wonder if we can find some unused constraint letter or
> sequence or other syntax to denote that.  Note, I think we want to be
> able to specify that an inline asm defines a function or variable and
> be able to provide the type etc. thereof.  So
> extern void foo (void);
> extern int var;
> asm ("%P0: ret" : : "defines" (foo));
> asm ("%P0: .quad 0" : : "defines" (var));
> where the exact "defines" part is TBD.
>
> Another question is whether all targets have something like x86 P print
> modifier which doesn't add any stuff around the printed expressions
> (perhaps there are targets which don't do that just for %0 etc.), or
> whether we want to add something that will be usable on all targets.

%P is very x86 specific, perhaps you should use %c instead?

The %c modifier is described in:

6.48.2.8 Generic Operand Modifiers
..

The following table shows the modifiers supported by all targets and
their effects:

ModifierDescriptionExample
---
‘c’ Require a constant operand and print the   ‘%c0’
   constant expression with no punctuation.
...

E.g.:

void bar (void);
void foo (void)
{
  asm ("%c0" :  : "i"(bar));
}

generates:

#APP
# 5 "c.c" 1
   bar
# 0 "" 2
#NO_APP

Uros.


[PATCH] Adjust expectation for gcc.dg/vect/slp-19c.c

2024-10-02 Thread Richard Biener
We can now vectorize the first loop with SLP when using V2SImode
vectors since then we can handle the non-power-of-two interleaving.
We can also SLP the second loop reliably now after adding induction
support for VLA vectors.

Pushed.

* gcc.dg/vect/slp-19c.c: Adjust expectation.
---
 gcc/testsuite/gcc.dg/vect/slp-19c.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-19c.c 
b/gcc/testsuite/gcc.dg/vect/slp-19c.c
index 188ab37a0b6..588c171dd83 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19c.c
@@ -105,5 +105,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! vect64 } } } } */
+/* The unsupported interleaving works fine with V2SImode vectors given we
+   can always combine that from two vectors.  */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target 
vect64 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { ! vect64 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target vect64 } } } */
-- 
2.43.0


[PATCH] Adjust gcc.dg/vect/vect-double-reduc-5.c

2024-10-02 Thread Richard Biener
The testcase XPASSes now and should do so everywhere I think.

Pushed.

* gcc.dg/vect/vect-double-reduc-5.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
index b990405745e..a40aa304740 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
@@ -51,7 +51,4 @@ int main ()
   return 0;
 }
 
-/* Vectorization of loops with multiple types and double reduction is not 
-   supported yet.  */   
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
-  
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
-- 
2.43.0


Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector reductions

2024-10-02 Thread Richard Sandiford
Tamar Christina  writes:
> Hi Jennifer,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, October 1, 2024 12:20 PM
>> To: Jennifer Schmitz 
>> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
>> Subject: Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector
>> reductions
>> 
>> Jennifer Schmitz  writes:
>> > This patch implements the optabs reduc_and_scal_,
>> > reduc_ior_scal_, and reduc_xor_scal_ for Advanced SIMD
>> > integers for TARGET_SVE in order to use the SVE instructions ANDV, ORV, and
>> > EORV for fixed-width bitwise reductions.
>> > For example, the test case
>> >
>> > int32_t foo (int32_t *a)
>> > {
>> >   int32_t b = -1;
>> >   for (int i = 0; i < 4; ++i)
>> > b &= a[i];
>> >   return b;
>> > }
>> >
>> > was previously compiled to
>> > (-O2 -ftree-vectorize --param aarch64-autovec-preference=asimd-only):
>> > foo:
>> > ldp w2, w1, [x0]
>> > ldp w3, w0, [x0, 8]
>> > and w1, w1, w3
>> > and w0, w0, w2
>> > and w0, w1, w0
>> > ret
>> >
>> > With patch, it is compiled to:
>> > foo:
>> > ldr q31, [x0]
>> >ptrue   p7.b, all
>> >andvs31, p7, z31.s
>> >fmovw0, s3
>> >ret
>> >
>> > Test cases were added to check the produced assembly for use of SVE
>> > instructions.
>> 
>> I would imagine that in this particular case, the scalar version is
>> better.  But I agree it's a useful feature for other cases.
>> 
>
> Yeah, I'm concerned because ANDV and other reductions are extremely expensive.
> But assuming the reductions are done outside of a loop then it should be ok, 
> though.
>
> The issue is that the reduction latency grows with VL, so e.g. compare the 
> latencies and
> throughput for Neoverse V1 and Neoverse V2.  So I think we want to gate this 
> on VL128.
>
> As an aside, is the sequence correct?  With ORR reduction ptrue makes sense, 
> but for
> VL > 128 ptrue doesn't work as the top bits would be zero. So an ANDV on zero 
> values
> lanes would result in zero.

Argh!  Thanks for spotting that.  I'm kicking myself for missing it :(

> You'd want to predicate the ANDV with the size of the vector being reduced. 
> The same
> is true for SMIN and SMAX.
>
> I do wonder whether we need to split the pattern into two, where w->w uses 
> the SVE
> Instructions but w->r uses Adv SIMD.
>
> In the case of w->r as the example above
>
> ext v1.16b, v0.16b, v0.16b, #8
> and v0.8b, v0.8b, v1.8b
> fmovx8, d0
> lsr x9, x8, #32
> and w0, w8, w9
>
> would beat the ADDV on pretty much every uarch.
>
> But I'll leave it up to the maintainers.

Also a good point.  And since these are integer reductions, an r
result is more probable than a w result.  w would typically only
be used if the result is stored directly to memory.

At which point, the question (which you might have been implying)
is whether it's worth doing this at all, given the limited cases
for which it's beneficial, and the complication that's needed to
(a) detect those cases and (b) make them work.

Thanks,
Richard

>
> Thanks for the patch though!
>
> Cheers,
> Tamar
>
>> However, a natural follow-on would be to use SMAXV and UMAXV for V2DI.
>> Taking that into account:
>> 
>> > [...]
>> > diff --git a/gcc/config/aarch64/aarch64-sve.md 
>> > b/gcc/config/aarch64/aarch64-
>> sve.md
>> > index bfa28849adf..0d9e5cebef0 100644
>> > --- a/gcc/config/aarch64/aarch64-sve.md
>> > +++ b/gcc/config/aarch64/aarch64-sve.md
>> > @@ -8927,6 +8927,28 @@
>> >"addv\t%d0, %1, %2."
>> >  )
>> >
>> > +;; Unpredicated logical integer reductions for Advanced SIMD modes.
>> > +(define_expand "reduc__scal_"
>> > +  [(set (match_operand: 0 "register_operand")
>> > +  (unspec: [(match_dup 2)
>> > + (match_operand:VQ_I 1 "register_operand")]
>> > +SVE_INT_REDUCTION_LOGICAL))]
>> > +  "TARGET_SVE"
>> > +  {
>> > +operands[2] = aarch64_ptrue_reg (mode);
>> > +  }
>> > +)
>> > +
>> > +;; Predicated logical integer reductions for Advanced SIMD modes.
>> > +(define_insn "*aarch64_pred_reduc__"
>> > +  [(set (match_operand: 0 "register_operand" "=w")
>> > +  (unspec: [(match_operand: 1 "register_operand" "Upl")
>> > + (match_operand:VQ_I 2 "register_operand" "w")]
>> > +SVE_INT_REDUCTION_LOGICAL))]
>> > +  "TARGET_SVE"
>> > +  "\t%0, %1, %Z2."
>> > +)
>> > +
>> 
>> ...I think we should avoid adding more patterns, and instead extend
>> the existing:
>> 
>> (define_expand "reduc__scal_"
>>   [(set (match_operand: 0 "register_operand")
>>  (unspec: [(match_dup 2)
>> (match_operand:SVE_FULL_I 1 "register_operand")]
>>SVE_INT_REDUCTION))]
>>   "TARGET_SVE"
>>   {
>> operands[2] = aarch64_ptrue_reg (mode);
>>   }
>> )
>> 
>> to all Advanced SIMD integer modes except V1DI.  This would involve
>> adding a new mode iterator along the same lines as SVE_FULL_SDI_SIMD
>> (SV

[PATCH] Adjust gcc.dg/vect/slp-12a.c

2024-10-02 Thread Richard Biener
We can now SLP the loop.  There's PR116583 tracking that this still
fails for VLA vectors when load-lanes doesn't support a group of
size 8.  We can't express this right now so the testcase keeps
FAILing for aarch64 with SVE (but passes now for riscv).

Pushed.

* gcc.dg/vect/slp-12a.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/slp-12a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c 
b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index fedf27b69d2..c526ea07c7a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -80,5 +80,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { vect_strided8 && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target { vect_strided8 && vect_int_mult } } } } */
-- 
2.43.0


Re: [PATCH v12 0/4] c: Add __nelementsof__ operator

2024-10-02 Thread Alejandro Colomar
Hi!

On Sat, Aug 31, 2024 at 04:56:28PM GMT, Alejandro Colomar wrote:
> Hi!
> 
> v12 changes:
> 
> -  Fix typo in changelog entry.
> 
> For ISO C2y, I'm proposing either nelementsof() or a contracted version
> of that name.  However, since in GCC we want an uglified name that
> already takes four characters for the __*__, I think this long name
> makes sense.  See also:
> 
> 
> The WG14 discussion seems to have settled, and while the exact name
> isn't yet clear, there seems to be rough consensus on something derived
> from "number of elements of" (with some votes for "lenght", but not so
> many), and the rest of the properties of the operator don't seem to be
> questioned.
> 
> Martin, Joseph, can you please review and merge?  Thanks!

WG14 (the C standard committee) has voted this feature yesterday.  A few
things to note:

-  The semantics are accepted exactly as implemented here.

-  There was divided consensus on the question "Should the name derive
   from 'number of elements' or 'count' (5 votes) or from something else
   including mainly length (8 votes) with some abstentions.

   After that poll, there was another to choose between 'length' or
   something else, and there was an overwhelming majority to prefer
   'length' over anything else.  However, that still leaves 'length' vs
   'number of elements'/'count' with only a small difference in votes,
   which I'm not entirely happy with.

I will resend this patch with only renaming the operator to lengthof.

Have a lovely day!
Alex

> 
> Have a lovely day!
> Alex
> 
> Alejandro Colomar (4):
>   contrib/: Add support for Cc: and Link: tags
>   gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
>   Merge definitions of array_type_nelts_top()
>   c: Add __nelementsof__ operator
> 
>  contrib/gcc-changelog/git_commit.py|   5 +-
>  gcc/c-family/c-common.cc   |  26 
>  gcc/c-family/c-common.def  |   3 +
>  gcc/c-family/c-common.h|   2 +
>  gcc/c/c-decl.cc|  32 +++--
>  gcc/c/c-fold.cc|   7 +-
>  gcc/c/c-parser.cc  |  62 +++--
>  gcc/c/c-tree.h |   4 +
>  gcc/c/c-typeck.cc  | 118 +++-
>  gcc/config/aarch64/aarch64.cc  |   2 +-
>  gcc/config/i386/i386.cc|   2 +-
>  gcc/cp/cp-tree.h   |   1 -
>  gcc/cp/decl.cc |   2 +-
>  gcc/cp/init.cc |   8 +-
>  gcc/cp/lambda.cc   |   3 +-
>  gcc/cp/operators.def   |   1 +
>  gcc/cp/tree.cc |  13 --
>  gcc/doc/extend.texi|  30 +
>  gcc/expr.cc|   8 +-
>  gcc/fortran/trans-array.cc |   2 +-
>  gcc/fortran/trans-openmp.cc|   4 +-
>  gcc/rust/backend/rust-tree.cc  |  13 --
>  gcc/rust/backend/rust-tree.h   |   2 -
>  gcc/target.h   |   3 +
>  gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
>  gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
>  gcc/testsuite/gcc.dg/nelementsof.c | 150 +
>  gcc/tree.cc|  17 ++-
>  gcc/tree.h |   3 +-
>  29 files changed, 604 insertions(+), 80 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
>  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
>  create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c
> 
> Range-diff against v11:
> 1:  2e851b8f8d2 ! 1:  d7fca49888a contrib/: Add support for Cc: and Link: tags
> @@ Commit message
>  
>  contrib/ChangeLog:
>  
> -* gcc-changelog/git_commit.py: (GitCommit):
> +* gcc-changelog/git_commit.py (GitCommit):
>  Add support for 'Cc: ' and 'Link: ' tags.
>  
>  Cc: Jason Merrill 
> 2:  d582d12adb8 = 2:  e65245ac294 gcc/: Rename array_type_nelts() => 
> array_type_nelts_minus_one()
> 3:  34d14beb7da = 3:  03de2d67bb1 Merge definitions of array_type_nelts_top()
> 4:  49b8d51db4a = 4:  4373c48205d c: Add __nelementsof__ operator
> 
> base-commit: 9cbcf8d1de159e6113fafb5dc2feb4a7e467a302
> prerequisite-patch-id: 3bb58e302e54b35e987452de2e3cb6da7f6917ce
> prerequisite-patch-id: 090612df745c5ed878a75175606ce1deabf61cb0
> prerequisite-patch-id: c3a80d94c326f7402bdf46049c434e809a9f376e
> prerequisite-patch-id: 883007acf6a47f51128798dc952895854b40e1ff
> prerequisite-patch-id: 93e211483ce2b939ce5e549de78e9ac4606bb114
> prerequisite-patch-id: 008f55a1dc4b0a2551e4682b3319a3b0df86c72e
> prerequisite-patch-id: be6ecc95f5fcf2fe9ac44263e6eae8a69068ee53
> prerequisite-patch-id: e64a9e2

[PATCH v3] RISC-V: Implement TARGET_CAN_INLINE_P

2024-10-02 Thread Yangyu Chen
Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
ISA. As a result, certain functions cannot be optimized with inlining
when specific options, such as __attribute__((target("arch=+v"))) .
This can lead to potential performance issues when building
retargetable binaries for RISC-V.

To address this, I have implemented the riscv_can_inline_p function.
This addition enables inlining when the callee either has no special
options or when the some options match, and also ensuring that the
callee's ISA is a subset of the caller's. I also check some other
options when there is no always_inline set.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (cl_opt_var_ref_t): Add
cl_opt_var_ref_t pointer to member of cl_target_option.
(struct riscv_ext_flag_table_t): Add new cl_opt_var_ref_t field.
(RISCV_EXT_FLAG_ENTRY): New macro to simplify the definition of
riscv_ext_flag_table.
(riscv_ext_is_subset): New function to check if the callee's ISA
is a subset of the caller's.
(riscv_x_target_flags_isa_mask): New function to get the mask of
ISA extension in x_target_flags of gcc_options.
* config/riscv/riscv-subset.h (riscv_ext_is_subset): Declare
riscv_ext_is_subset function.
(riscv_x_target_flags_isa_mask): Declare
riscv_x_target_flags_isa_mask function.
* config/riscv/riscv.cc (riscv_can_inline_p): New function.
(TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.

Signed-off-by: Yangyu Chen 
---
 gcc/common/config/riscv/riscv-common.cc | 372 +---
 gcc/config/riscv/riscv-subset.h |   3 +
 gcc/config/riscv/riscv.cc   |  66 +
 3 files changed, 276 insertions(+), 165 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index bd42fd01532..33b19752b15 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1567,191 +1567,196 @@ riscv_arch_str (bool version_p)
 return std::string();
 }
 
-/* Type for pointer to member of gcc_options.  */
+/* Type for pointer to member of gcc_options and cl_target_option.  */
 typedef int (gcc_options::*opt_var_ref_t);
+typedef int (cl_target_option::*cl_opt_var_ref_t);
 
 /* Types for recording extension to internal flag.  */
 struct riscv_ext_flag_table_t {
   const char *ext;
   opt_var_ref_t var_ref;
+  cl_opt_var_ref_t cl_var_ref;
   int mask;
 };
 
+#define RISCV_EXT_FLAG_ENTRY(NAME, VAR, MASK) \
+  {NAME, &gcc_options::VAR, &cl_target_option::VAR, MASK}
+
 /* Mapping table between extension to internal flag.  */
 static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
 {
-  {"e", &gcc_options::x_target_flags, MASK_RVE},
-  {"m", &gcc_options::x_target_flags, MASK_MUL},
-  {"a", &gcc_options::x_target_flags, MASK_ATOMIC},
-  {"f", &gcc_options::x_target_flags, MASK_HARD_FLOAT},
-  {"d", &gcc_options::x_target_flags, MASK_DOUBLE_FLOAT},
-  {"c", &gcc_options::x_target_flags, MASK_RVC},
-  {"v", &gcc_options::x_target_flags, MASK_FULL_V},
-  {"v", &gcc_options::x_target_flags, MASK_VECTOR},
-
-  {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
-  {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
-  {"zicond",   &gcc_options::x_riscv_zi_subext, MASK_ZICOND},
-
-  {"za64rs",  &gcc_options::x_riscv_za_subext, MASK_ZA64RS},
-  {"za128rs", &gcc_options::x_riscv_za_subext, MASK_ZA128RS},
-  {"zawrs",   &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
-  {"zaamo",   &gcc_options::x_riscv_za_subext, MASK_ZAAMO},
-  {"zalrsc",  &gcc_options::x_riscv_za_subext, MASK_ZALRSC},
-  {"zabha",   &gcc_options::x_riscv_za_subext, MASK_ZABHA},
-  {"zacas",   &gcc_options::x_riscv_za_subext, MASK_ZACAS},
-
-  {"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
-  {"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
-  {"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
-  {"zbs",&gcc_options::x_riscv_zb_subext, MASK_ZBS},
-
-  {"zfinx",&gcc_options::x_riscv_zinx_subext, MASK_ZFINX},
-  {"zdinx",&gcc_options::x_riscv_zinx_subext, MASK_ZDINX},
-  {"zhinx",&gcc_options::x_riscv_zinx_subext, MASK_ZHINX},
-  {"zhinxmin", &gcc_options::x_riscv_zinx_subext, MASK_ZHINXMIN},
-
-  {"zbkb",   &gcc_options::x_riscv_zk_subext, MASK_ZBKB},
-  {"zbkc",   &gcc_options::x_riscv_zk_subext, MASK_ZBKC},
-  {"zbkx",   &gcc_options::x_riscv_zk_subext, MASK_ZBKX},
-  {"zknd",   &gcc_options::x_riscv_zk_subext, MASK_ZKND},
-  {"zkne",   &gcc_options::x_riscv_zk_subext, MASK_ZKNE},
-  {"zknh",   &gcc_options::x_riscv_zk_subext, MASK_ZKNH},
-  {"zkr",&gcc_options::x_riscv_zk_subext, MASK_ZKR},
-  {"zksed",  &gcc_options::x_riscv_zk_subext, MASK_ZKSED},
-  {"zksh",   &gcc_options::x_riscv_zk_subext, MASK_ZKSH},
-  {"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
-
-  {"zihintntl", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTNTL},
-  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTP

Re: [PATCH] phiopt: Fix VCE moving by rewriting it into cast [PR116098]

2024-10-02 Thread Richard Biener
On Wed, Oct 2, 2024 at 1:11 AM Andrew Pinski  wrote:
>
> Phiopt match_and_simplify might move a well defined VCE assign statement
> from being conditional to being uncondtitional; that VCE might no longer
> being defined. It will need a rewrite into a cast instead.
>
> This adds the rewriting code to move_stmt for the VCE case.

Indeed.

> This is enough to fix the issue at hand. It should also be using 
> rewrite_to_defined_overflow
> but first I need to move the check to see a rewrite is needed into its own 
> function
> and that is causing issues (see 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663938.html).
> Plus this version is easiest to backport.

OK.

Thanks,
Richard.

> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/116098
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (move_stmt): Rewrite VCEs from integer to integer
> types to case.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/torture/pr116098-2.c: New test.
> * g++.dg/torture/pr116098-1.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  .../c-c++-common/torture/pr116098-2.c | 46 +++
>  gcc/testsuite/g++.dg/torture/pr116098-1.C | 33 +
>  gcc/tree-ssa-phiopt.cc| 28 ++-
>  3 files changed, 106 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/pr116098-2.c
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116098-1.C
>
> diff --git a/gcc/testsuite/c-c++-common/torture/pr116098-2.c 
> b/gcc/testsuite/c-c++-common/torture/pr116098-2.c
> new file mode 100644
> index 000..614ed049171
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/torture/pr116098-2.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* PR tree-optimization/116098 */
> +
> +
> +#include 
> +
> +struct Value {
> +int type;
> +union {
> +bool boolean;
> +long long t;
> +};
> +};
> +
> +static struct Value s_item_mem;
> +
> +/* truthy was being miscompiled for the value.type==2 case,
> +   because we would have a VCE from unsigned char to bool
> +   that went from being conditional in the value.type==1 case
> +   to unconditional when `value.type!=0`.
> +   The move of the VCE from conditional to unconditional,
> +   needs to changed into a convert (NOP_EXPR). */
> +static bool truthy(void) __attribute__((noipa));
> +static bool
> +truthy(void)
> +{
> +struct Value value = s_item_mem;
> +if (value.type == 0)
> +  return 0;
> +if (value.type == 1)
> +  return value.boolean;
> +return 1;
> +}
> +
> +int
> +main(void)
> +{
> +s_item_mem.type = 2;
> +s_item_mem.t = -1;
> +bool b1 = !truthy();
> +s_item_mem.type = 1;
> +s_item_mem.boolean = b1;
> +bool b = truthy();
> +if (b1 != b)  __builtin_abort();
> +if (b) __builtin_abort();
> +}
> diff --git a/gcc/testsuite/g++.dg/torture/pr116098-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116098-1.C
> new file mode 100644
> index 000..90e44a6eeed
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116098-1.C
> @@ -0,0 +1,33 @@
> +// { dg-do run }
> +/* PR tree-optimization/116098 */
> +
> +
> +static bool truthy(int type, unsigned char data) __attribute__((noipa));
> +/* truthy was being miscompiled for the type==2 case,
> +   because we would have a VCE from unsigned char to bool
> +   that went from being conditional in the type==1 case
> +   to unconditional when `type!=0`.
> +   The move of the VCE from conditional to unconditional,
> +   needs to changed into a convert (NOP_EXPR). */
> +
> +static bool truthy(void) __attribute__((noipa));
> +static bool
> +truthy(int type, unsigned char data)
> +{
> +if (type == 0)
> +  return 0;
> +if (type == 1)
> +  /* Emulate what SRA does, so this can be
> +tested without depending on SRA. */
> +  return __builtin_bit_cast (bool, data);
> +return 1;
> +}
> +
> +int
> +main(void)
> +{
> +bool b1 = !truthy(2, -1);
> +bool b = truthy(1, b1);
> +if (b1 != b)  __builtin_abort();
> +if (b) __builtin_abort();
> +}
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index bd7f9607eb9..43b65b362a3 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -742,7 +742,8 @@ empty_bb_or_one_feeding_into_p (basic_block bb,
>  }
>
>  /* Move STMT to before GSI and insert its defining
> -   name into INSERTED_EXPRS bitmap. */
> +   name into INSERTED_EXPRS bitmap.
> +   Also rewrite its if it might be undefined when unconditionalized.  */
>  static void
>  move_stmt (gimple *stmt, gimple_stmt_iterator *gsi, auto_bitmap 
> &inserted_exprs)
>  {
> @@ -761,6 +762,31 @@ move_stmt (gimple *stmt, gimple_stmt_iterator *gsi, 
> auto_bitmap &inserted_exprs)
>gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt);
>gsi_move_before (&gsi1, gsi);
>reset_flow_sensitive_info (name);
> +
> +  /* Rewrite some code which might be undefined when
> + unconditionalized. */

Re: [PATCH] backprop: Fix deleting of a phi node [PR116922]

2024-10-02 Thread Richard Biener
On Wed, Oct 2, 2024 at 5:13 AM Andrew Pinski  wrote:
>
> The problem here is remove_unused_var is called on a name that is
> defined by a phi node but it deletes it like removing a normal statement.
> remove_phi_node should be called rather than gsi_remove for phinodes.
>
> Note there is a possibility of using simple_dce_from_worklist instead
> but that is for another day.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK (not that I like the two conflicting APIs, maybe we can work towards removing
remove_phi_node and make PHI node handling in gsi_* better)

Thanks,
Richard.

> PR tree-optimization/116922
>
> gcc/ChangeLog:
>
> * gimple-ssa-backprop.cc (remove_unused_var): Handle phi
> nodes correctly.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116922.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-ssa-backprop.cc  | 10 --
>  gcc/testsuite/gcc.dg/torture/pr116922.c | 19 +++
>  2 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116922.c
>
> diff --git a/gcc/gimple-ssa-backprop.cc b/gcc/gimple-ssa-backprop.cc
> index fe27ef51cdf..e3374b18138 100644
> --- a/gcc/gimple-ssa-backprop.cc
> +++ b/gcc/gimple-ssa-backprop.cc
> @@ -663,8 +663,14 @@ remove_unused_var (tree var)
>print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>  }
>gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> -  gsi_remove (&gsi, true);
> -  release_defs (stmt);
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +remove_phi_node (&gsi, true);
> +  else
> +{
> +  unlink_stmt_vdef (stmt);
> +  gsi_remove (&gsi, true);
> +  release_defs (stmt);
> +}
>  }
>
>  /* Note that we're replacing OLD_RHS with NEW_RHS in STMT.  */
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116922.c 
> b/gcc/testsuite/gcc.dg/torture/pr116922.c
> new file mode 100644
> index 000..0fcf912930f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116922.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-ffast-math" } */
> +/* PR tree-optimization/116922 */
> +
> +
> +static int g;
> +
> +void
> +foo (int c, double v, double *r)
> +{
> +b:
> +  do
> +v /= g - v;
> +  while (c);
> +  *r = v;
> +
> +  double x;
> +  foo (5, (double)0, &x);
> +}
> --
> 2.43.0
>


[PATCH v2] RISC-V: Improve vsetvl vconfig alignment

2024-10-02 Thread Dusan Stojkovic
This patch is a new version of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662745.html

> Can you elaborate a bit on that?  Rearranging the CFG shouldn't matter
> in general and relying on the specific TARGET_SFB_ALU feels overly
> specific.
> Why does the same register in the if_then_else and interfere with vsetvl?

When ce1 pass transforms CFG in the case of the conditional move, it deletes
then and else basic blocks and in their place adds the conditional move which
uses the same pseudo-register as the original vsetvl.

This interferes with vsetvl pass precisely because of the merge policy. Use by
non rvv flag limits the cases where merging might still be possible. This patch
tries to addresses one such issue.

Agreed. I have removed TARGET_SFB_ALU flag from the condition.

> BTW Bohan Lei has since fixed a bug regarding non-RVV uses.  Does the
> situation change with that applied?

Repeated the testing for sifive-7-series as well as rocket. The same tests are
still effected positively: vsetvlmax-9, vsetvlmax-10, vsetvlmax-11, vsetvlmax-15
on sifive-7-series.

2024-10-2  Dusan Stojkovic  

PR target/113035

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
  New fuse condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Updated
  scan-assembler-times num parameter.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.


[PATCH v2] RISC-V: Improve vsetvl vconfig alignment

2024-10-02 Thread Dusan Stojkovic
This patch is a new version of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662745.html

> Can you elaborate a bit on that?  Rearranging the CFG shouldn't matter
> in general and relying on the specific TARGET_SFB_ALU feels overly
> specific.
> Why does the same register in the if_then_else and interfere with vsetvl?

When ce1 pass transforms CFG in the case of the conditional move, it deletes
then and else basic blocks and in their place adds the conditional move which
uses the same pseudo-register as the original vsetvl.

This interferes with vsetvl pass precisely because of the merge policy. Use by
non rvv flag limits the cases where merging might still be possible. This patch
tries to addresses one such issue.

Agreed. I have removed TARGET_SFB_ALU flag from the condition.

> BTW Bohan Lei has since fixed a bug regarding non-RVV uses.  Does the
> situation change with that applied?

Repeated the testing for sifive-7-series as well as rocket. The same tests are
still effected positively: vsetvlmax-9, vsetvlmax-10, vsetvlmax-11, vsetvlmax-15
on sifive-7-series.

2024-10-2  Dusan Stojkovic  

PR target/113035

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
  New fuse condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Updated
  scan-assembler-times num parameter.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/riscv-vsetvl.cc  | 24 +++
 .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 030ffbe2ebb..e2a5231333f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3061,6 +3061,30 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
  else
{
  vsetvl_info &prev_info = src_block_info.get_exit_info ();
+ if (prev_info.valid_p ()
+ && curr_info.valid_p ()
+ && prev_info.vl_used_by_non_rvv_insn_p ()
+ && !curr_info.vl_used_by_non_rvv_insn_p ())
+ {
+   // Try to merge each demand individually
+   if (m_dem.sew_lmul_compatible_p (prev_info, curr_info))
+   {
+ m_dem.merge_sew_lmul (prev_info, curr_info);
+   }
+   if (m_dem.policy_compatible_p (prev_info, curr_info))
+   {
+ m_dem.merge_policy (prev_info, curr_info);
+   }
+   if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "After fusing curr info and "
+ "prev info demands individually:\n");
+ fprintf (dump_file, "  prev_info: ");
+ prev_info.dump (dump_file, "  ");
+ fprintf (dump_file, "  curr_info: ");
+ curr_info.dump (dump_file, "  ");
+   }
+ }
  if (!prev_info.valid_p ()
  || m_dem.available_p (prev_info, curr_info)
  || !m_dem.compatible_p (prev_info, curr_info))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c
index 23042460885..65aceed0e4e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c
@@ -18,6 +18,6 @@ void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t 
*out, size_t n, int c
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { 
no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" 
no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*5} 1 { 
target { no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } 
} } */
-- 
2.43.0



[PATCH 1/2] aarch64: Split FCMA feature bit from Armv8.3-A

2024-10-02 Thread Andre Vieira

This patch splits out FCMA as a feature from Armv8.3-A and adds it as a separate
feature bit which now controls 'TARGET_COMPLEX'.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (FCMA): New feature bit, can not be
used as an extension in the command-line.
* config/aarch64/aarch64.h (TARGET_COMPLEX): Use FCMA feature bit
rather than ARMV8_3.
---
 gcc/config/aarch64/aarch64-arches.def| 2 +-
 gcc/config/aarch64/aarch64-option-extensions.def | 1 +
 gcc/config/aarch64/aarch64.h | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 4634b272e28..fadf9c36b03 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -33,7 +33,7 @@
 AARCH64_ARCH("armv8-a",   generic_armv8_a,   V8A,   8,  (SIMD))
 AARCH64_ARCH("armv8.1-a", generic_armv8_a,   V8_1A, 8,  (V8A, LSE, CRC, RDMA))
 AARCH64_ARCH("armv8.2-a", generic_armv8_a,   V8_2A, 8,  (V8_1A))
-AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, PAUTH, RCPC))
+AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, PAUTH, RCPC, FCMA))
 AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, F16FML, DOTPROD, FLAGM))
 AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, SSBS, PREDRES))
 AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, BF16))
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 6998627f377..4732c20ec96 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -193,6 +193,7 @@ AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
 AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
 
 AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme")
+AARCH64_OPT_EXTENSION("", FCMA, (), (), (), "fcma")
 
 AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index a99e7bb6c47..c0ad305e324 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -362,7 +362,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 #define TARGET_JSCVT	(TARGET_FLOAT && TARGET_ARMV8_3)
 
 /* Armv8.3-a Complex number extension to AdvSIMD extensions.  */
-#define TARGET_COMPLEX (TARGET_SIMD && TARGET_ARMV8_3)
+#define TARGET_COMPLEX (TARGET_SIMD && AARCH64_HAVE_ISA (FCMA))
 
 /* Floating-point rounding instructions from Armv8.5-a.  */
 #define TARGET_FRINT (AARCH64_HAVE_ISA (V8_5A) && TARGET_FLOAT)


[PATCH 2/2] aarch64: remove SVE2 requirement from SME and diagnose it as unsupported

2024-10-02 Thread Andre Vieira

As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2, so we are removing
that false dependency in GCC.  However, we chose for now to not support this
combination of features and will diagnose the combination of FEAT_SME without
FEAT_SVE2 as unsupported by GCC.  We may choose to support this in the future.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (SME): Remove SVE2 as prerequisite
and add in FCMA and F16FML.
* config/aarch64/aarch64.cc (aarch64_override_options): Diagnose use of
SME without SVE2.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c:
Pass +sve2 to existing +sme pragma.
* gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise.
---
 gcc/config/aarch64/aarch64-option-extensions.def  | 3 ++-
 gcc/config/aarch64/aarch64.cc | 4 
 .../aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c| 2 +-
 .../aarch64/sve/acle/general-c/binary_opt_single_n_2.c| 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/binary_single_1.c   | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c| 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/clamp_1.c | 2 +-
 .../aarch64/sve/acle/general-c/compare_scalar_count_1.c   | 2 +-
 .../aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c   | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/storexn_1.c | 2 +-
 .../aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c | 2 +-
 13 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 4732c20ec96..e38a4ab3f78 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -192,9 +192,10 @@ AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
 
 AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
 
-AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme")
 AARCH64_OPT_EXTENSION("", FCMA, (), (), (), "fcma")
 
+AARCH64_OPT_FMV_EXTENSION("sme", SME, (BF16, FCMA, F16FML), (), (), "sme")
+
 AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
 AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 68913beaee2..bc2023da180 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18998,6 +18998,10 @@ aarch64_override_options (void)
  while processing functions with potential target attributes.  */
   target_option_default_node = target_option_current_node
 = build_target_option_node (&global_options, &global_options_set);
+
+  if (TARGET_SME && !TARGET_SVE2)
+warning (0, "this gcc version does not guarantee full support for +sme"
+		" without +sve2");
 }
 
 /* Implement targetm.override_options_after_change.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c
index 976d5af7f23..7150d37a2aa 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 
-#pragma GCC target "+sme2"
+#pragma GCC target "+sve2+sme2"
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c
index 5cc8a4c5c50..2823264edbd 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 
-#pragma GCC target "+sme2"
+#pragma GCC target "+sve2+sme2"
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_single_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/binary_single_1

[PATCH 0/2] aarch64: remove SVE2 requirement from SME and diagnose it as unsupported

2024-10-02 Thread Andre Vieira
This patch series removes the requirement of SVE2 for SME, so when a user
passes +sme, SVE2 is not enabled as a result of that.
We do this to be compliant with the ISA and behave in a compatible manner to
other toolchains, to prevent unexpected behavior when switching between them.

However, for the time being we diagnose the use of SME without SVE2 as
unsupported, we suspect that the backend correctly enables and disables the
right instructions given the options, but we believe that for certain codegen
there are assumptions that SVE & SVE2 is present when using SME.  Before we
fully support this combination we should investigate these.

The patch series also refactors the FCMA/COMPNUM/TARGET_COMPLEX feature to
separate it from Armv8.3-A feature set.

Andre Vieira (2)
 aarch64: Split FCMA feature bit from Armv8.3-A
 aarch64: remove SVE2 requirement from SME and diagnose it as unsupported

Regression tested on aarch64-none-linux-gnu.

OK for trunk?

Andre Vieira (2):
  aarch64: Split FCMA feature bit from Armv8.3-A
  aarch64: remove SVE2 requirement from SME and diagnose it as unsupported

 gcc/config/aarch64/aarch64-arches.def | 2 +-
 gcc/config/aarch64/aarch64-option-extensions.def  | 4 +++-
 gcc/config/aarch64/aarch64.cc | 4 
 gcc/config/aarch64/aarch64.h  | 2 +-
 .../aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c| 2 +-
 .../aarch64/sve/acle/general-c/binary_opt_single_n_2.c| 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/binary_single_1.c   | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c| 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/clamp_1.c | 2 +-
 .../aarch64/sve/acle/general-c/compare_scalar_count_1.c   | 2 +-
 .../aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c   | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/storexn_1.c | 2 +-
 .../aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c | 2 +-
 15 files changed, 20 insertions(+), 14 deletions(-)

-- 
2.25.1



[patch, testsuite, applied] ad PR52641: Make strict-flex-array-3.c work on int != 32-bit targets

2024-10-02 Thread Georg-Johann Lay

gcc.dg/strict-flex-array-3.c used hard-coded values instead of
__SIZEOF_INT__ or equivalent expressions.  Fixed as obvious.
Plus, on AVR, printf doesn't support %zd, so that expect() is
now special-cased.

Johann

--

testsuite/52641 - Make gcc.dg/strict-flex-array-3.c work on int != 32 bits.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/strict-flex-array-3.c (expect) [AVR]: Use custom
version due to AVR-LibC limitations.
(stuff): Use __SIZEOF_INT__ instead of hard-coded values.testsuite/52641 - Make gcc.dg/strict-flex-array-3.c work on int != 32 bits.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/strict-flex-array-3.c (expect) [AVR]: Use custom
version due to AVR-LibC limitations.
(stuff): Use __SIZEOF_INT__ instead of hard-coded values.

diff --git a/gcc/testsuite/gcc.dg/strict-flex-array-3.c b/gcc/testsuite/gcc.dg/strict-flex-array-3.c
index f74ed96c751..064f779501a 100644
--- a/gcc/testsuite/gcc.dg/strict-flex-array-3.c
+++ b/gcc/testsuite/gcc.dg/strict-flex-array-3.c
@@ -17,6 +17,21 @@
 	} \
 } while (0);
 
+#ifdef __AVR__
+/* AVR-Libc doesn't support %zd, thus use %d for size_t.  */
+#undef  expect
+#define expect(p, _v) do {		\
+size_t v = _v;			\
+if (p == v)\
+  __builtin_printf ("ok:  %s == %d\n", #p, p);			\
+else\
+  {	\
+	__builtin_printf ("WAT: %s == %d (expected %d)\n", #p, p, v);	\
+	FAIL ();			\
+  }	\
+} while (0);
+#endif /* AVR */
+
 struct trailing_array_1 {
 int a;
 int b;
@@ -46,8 +61,8 @@ void __attribute__((__noinline__)) stuff(
 struct trailing_array_3 *trailing_0,
 struct trailing_array_4 *trailing_flex)
 {
-expect(__builtin_object_size(normal->c, 1), 16);
-expect(__builtin_object_size(trailing_1->c, 1), 4);
+expect(__builtin_object_size(normal->c, 1), 4 * __SIZEOF_INT__);
+expect(__builtin_object_size(trailing_1->c, 1), __SIZEOF_INT__);
 expect(__builtin_object_size(trailing_0->c, 1), 0);
 expect(__builtin_object_size(trailing_flex->c, 1), -1);
 }


Re: [PATCH] libstdc++: Enable _GLIBCXX_ASSERTIONS by default for -O0 [PR112808]

2024-10-02 Thread Patrick Palka
On Wed, 2 Oct 2024, Jonathan Wakely wrote:

> I think we should do this.
> 
> Tested x86_64-linux.
> 
> -- >8 --
> 
> Too many users don't know about -D_GLIBCXX_ASSERTIONS and so are missing
> valuable checks for C++ standard library preconditions. This change
> enables libstdc++ assertions by default when compiling with -O0 so that
> we diagnose more bugs by default.
> 
> When users enable optimization we don't add the assertions by default
> (because they have non-zero overhead) so they still need to enable them
> manually.
> 
> For users who really don't want the assertions even in unoptimized
> builds, defining _GLIBCXX_NO_ASSERTIONS will prevent them from being
> enabled automatically.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/112808
>   * doc/xml/manual/using.xml (_GLIBCXX_ASSERTIONS): Document
>   implicit definition for -O0 compilation.
>   (_GLIBCXX_NO_ASSERTIONS): Document.
>   * doc/html/manual/using_macros.html: Regenerate.
>   * include/bits/c++config [!__OPTIMIZE__] (_GLIBCXX_ASSERTIONS):
>   Define for unoptimized builds.

LGTM.  At -O0 the additional overhead of the assertions would be
relatively small compared to the overhead of -O0 itself.

> ---
>  libstdc++-v3/doc/html/manual/using_macros.html | 12 +---
>  libstdc++-v3/doc/xml/manual/using.xml  | 16 +---
>  libstdc++-v3/include/bits/c++config|  9 +++--
>  3 files changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/libstdc++-v3/doc/html/manual/using_macros.html 
> b/libstdc++-v3/doc/html/manual/using_macros.html
> index 67623b5e2af..c1406ec76f7 100644
> --- a/libstdc++-v3/doc/html/manual/using_macros.html
> +++ b/libstdc++-v3/doc/html/manual/using_macros.html
> @@ -82,9 +82,15 @@
>   This is described in more detail in
>title="Chapter??16.??Compile Time Checks">Compile Time Checks.
> class="code">_GLIBCXX_ASSERTIONS
> - Undefined by default. When defined, enables extra error checking in
> -the form of precondition assertions, such as bounds checking in
> -strings and null pointer checks when dereferencing smart pointers.
> + Defined by default when compiling with no optimization, undefined
> + by default when compiling with optimization.
> + When defined, enables extra error checking in the form of
> + precondition assertions, such as bounds checking in strings
> + and null pointer checks when dereferencing smart pointers.
> +   class="code">_GLIBCXX_NO_ASSERTIONS
> + Undefined by default.  When defined, prevents the implicit
> + definition of _GLIBCXX_ASSERTIONS when 
> compiling
> + with no optimization.
> class="code">_GLIBCXX_DEBUG
>   Undefined by default. When defined, compiles user code using
>   the debug mode.
> diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
> b/libstdc++-v3/doc/xml/manual/using.xml
> index 89119f6fb2d..7ca3a3f4b4c 100644
> --- a/libstdc++-v3/doc/xml/manual/using.xml
> +++ b/libstdc++-v3/doc/xml/manual/using.xml
> @@ -1247,9 +1247,19 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
> hello.cc -o test.exe
>  _GLIBCXX_ASSERTIONS
>  
>
> - Undefined by default. When defined, enables extra error checking in
> -the form of precondition assertions, such as bounds checking in
> -strings and null pointer checks when dereferencing smart pointers.
> + Defined by default when compiling with no optimization, undefined
> + by default when compiling with optimization.
> + When defined, enables extra error checking in the form of
> + precondition assertions, such as bounds checking in strings
> + and null pointer checks when dereferencing smart pointers.
> +  
> +
> +_GLIBCXX_NO_ASSERTIONS
> +
> +  
> + Undefined by default.  When defined, prevents the implicit
> + definition of _GLIBCXX_ASSERTIONS when compiling
> + with no optimization.
>
>  
>  _GLIBCXX_DEBUG
> diff --git a/libstdc++-v3/include/bits/c++config 
> b/libstdc++-v3/include/bits/c++config
> index 29d795f687c..b87a3527f24 100644
> --- a/libstdc++-v3/include/bits/c++config
> +++ b/libstdc++-v3/include/bits/c++config
> @@ -586,9 +586,14 @@ namespace std
>  #pragma GCC visibility pop
>  }
>  
> +#ifndef _GLIBCXX_ASSERTIONS
> +# if defined(_GLIBCXX_DEBUG)
>  // Debug Mode implies checking assertions.
> -#if defined(_GLIBCXX_DEBUG) && !defined(_GLIBCXX_ASSERTIONS)
> -# define _GLIBCXX_ASSERTIONS 1
> +#  define _GLIBCXX_ASSERTIONS 1
> +# elif ! defined(__OPTIMIZE__) && ! defined(_GLIBCXX_NO_ASSERTIONS)
> +// Enable assertions for unoptimized builds.
> +#  define _GLIBCXX_ASSERTIONS 1
> +# endif
>  #endif
>  
>  // Disable std::string explicit instantiation declarations in order to 
> assert.
> -- 
> 2.46.1
> 
> 



Re: [PATCH] libstdc++: Enable _GLIBCXX_ASSERTIONS by default for -O0 [PR112808]

2024-10-02 Thread Arsen Arsenović
Patrick Palka  writes:

> On Wed, 2 Oct 2024, Jonathan Wakely wrote:
>
>> I think we should do this.
>> 
>> Tested x86_64-linux.
>> 
>> -- >8 --
>> 
>> Too many users don't know about -D_GLIBCXX_ASSERTIONS and so are missing
>> valuable checks for C++ standard library preconditions. This change
>> enables libstdc++ assertions by default when compiling with -O0 so that
>> we diagnose more bugs by default.
>> 
>> When users enable optimization we don't add the assertions by default
>> (because they have non-zero overhead) so they still need to enable them
>> manually.
>> 
>> For users who really don't want the assertions even in unoptimized
>> builds, defining _GLIBCXX_NO_ASSERTIONS will prevent them from being
>> enabled automatically.
>> 
>> libstdc++-v3/ChangeLog:
>> 
>>  PR libstdc++/112808
>>  * doc/xml/manual/using.xml (_GLIBCXX_ASSERTIONS): Document
>>  implicit definition for -O0 compilation.
>>  (_GLIBCXX_NO_ASSERTIONS): Document.
>>  * doc/html/manual/using_macros.html: Regenerate.
>>  * include/bits/c++config [!__OPTIMIZE__] (_GLIBCXX_ASSERTIONS):
>>  Define for unoptimized builds.
>
> LGTM.  At -O0 the additional overhead of the assertions would be
> relatively small compared to the overhead of -O0 itself.

On that note, maybe we could add a __OPTIMIZE_DEBUG__ or similar so that
we can do a similar thing for -Og (which some use for their development
cycle).
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


RE: [PATCH 2/2]AArch64: support encoding integer immediates using floating point moves

2024-10-02 Thread Tamar Christina
Hi,

> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, September 30, 2024 6:33 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 2/2]AArch64: support encoding integer immediates using
> floating point moves
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This patch extends our immediate SIMD generation cases to support generating
> > integer immediates using floating point operation if the integer immediate 
> > maps
> > to an exact FP value.
> >
> > As an example:
> >
> > uint32x4_t f1() {
> > return vdupq_n_u32(0x3f80);
> > }
> >
> > currently generates:
> >
> > f1:
> > adrpx0, .LC0
> > ldr q0, [x0, #:lo12:.LC0]
> > ret
> >
> > i.e. a load, but with this change:
> >
> > f1:
> > fmovv0.4s, 1.0e+0
> > ret
> >
> > Such immediates are common in e.g. our Math routines in glibc because they 
> > are
> > created to extract or mark part of an FP immediate as masks.
> 
> I agree this is a good thing to do.  The current code is too beholden
> to the original vector mode.  This patch relaxes it so that it isn't
> beholden to the original mode's class (integer vs. float), but it would
> still be beholden to the original mode's element size.

I've implemented this approach and it works but I'm struggling with an 
inconsistency
in how zeros are created.

There are about 800 SVE ACLE tests like acge_f16.c that check that a zero is 
created
using a mov of the same sized register as the usage.  So I added an exception 
for
zero to use the original input element mode.

But then there are about 400 other SVE ACLE tests that actually check that 
zeros are
created using byte moves, like dup_128_s16_z even though they're used as ints.

So these two are in conflict.  Do you care which way I resolve this?  since 
it's zero
it shouldn't matter how they're created but perhaps there's a reason why some
test check for the specific instruction?

Thanks,
Tamar
> 
> It looks like an alternative would be to remove:
> 
>   scalar_float_mode elt_float_mode;
>   if (n_elts == 1
>   && is_a  (elt_mode, &elt_float_mode))
> {
>   rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
>   if (aarch64_float_const_zero_rtx_p (elt)
> || aarch64_float_const_representable_p (elt))
>   {
> if (info)
>   *info = simd_immediate_info (elt_float_mode, elt);
> return true;
>   }
> }
> 
> and instead insert code:
> 
>   /* Get the repeating 8-byte value as an integer.  No endian correction
>  is needed here because bytes is already in lsb-first order.  */
>   unsigned HOST_WIDE_INT val64 = 0;
>   for (unsigned int i = 0; i < 8; i++)
> val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
> << (i * BITS_PER_UNIT));
> 
> ---> here
> 
>   if (vec_flags & VEC_SVE_DATA)
> return aarch64_sve_valid_immediate (val64, info);
>   else
> return aarch64_advsimd_valid_immediate (val64, info, which);
> 
> that tries to reduce val64 to the smallest repeating pattern,
> then tries to interpret that pattern as a float.  The reduction step
> could reuse the first part of aarch64_sve_valid_immediate, which
> calculates the narrowest repeating integer mode:
> 
>   scalar_int_mode mode = DImode;
>   unsigned int val32 = val64 & 0x;
>   if (val32 == (val64 >> 32))
> {
>   mode = SImode;
>   unsigned int val16 = val32 & 0x;
>   if (val16 == (val32 >> 16))
>   {
> mode = HImode;
> unsigned int val8 = val16 & 0xff;
> if (val8 == (val16 >> 8))
>   mode = QImode;
>   }
> }
> 
> This would give us the candidate integer mode, to which we could
> apply float_mode_for_size (...).exists, as in the patch.
> 
> In this case we would have the value as an integer, rather than
> as an rtx, so I think it would make sense to split out the part of
> aarch64_float_const_representable_p that processes the REAL_VALUE_TYPE.
> aarch64_simd_valid_immediate could then use the patch's:
> 
> > +  long int as_long_ints[2];
> > +  as_long_ints[0] = buf & 0x;
> > +  as_long_ints[1] = (buf >> 32) & 0x;
> > [...]
> > +  real_from_target (&r, as_long_ints, fmode);
> 
> with "buf" being "val64" in the code above, and "fmode" being the result
> of float_mode_for_size (...).exists.  aarch64_simd_valid_immediate
> would then pass "r" and and "fmode" to the new, split-out variant of
> aarch64_float_const_representable_p.  (I haven't checked the endiannes
> requirements for real_from_target.)
> 
> The split-out variant would still perform the HFmode test in:
> 
>   if (GET_MODE (x) == VOIDmode
>   || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
> return false;
> 
> The VOIDmode test is redundant and can be dropped.  AArch64 has always
> been a CONST_WIDE_INT target.
> 
> If we do that, we should probably also pass the integer mode calculated
> by th

Re: [Fortran, Patch, PR51815, v1] Fix parsing of substring refs in coarrays.

2024-10-02 Thread Andre Vehreschild
Hi Harald,

we could do something like this:

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index d73d5eaed84..5000906f5f2 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2823,6 +2823,16 @@ check_substring:
  if (substring)
primary->ts.u.cl = NULL;

+ if (gfc_peek_ascii_char () == '(')
+   {
+ gfc_array_ref arr_ref;
+ gfc_array_spec *as
+   = sym->ts.type == BT_CLASS ? CLASS_DATA (sym)->as : sym->as;
+ gfc_match_array_ref (&arr_ref, as, 0, 0);
+
+ gfc_error_now ("Unexpected array/substring ref at %C");
+ return MATCH_ERROR;
+   }
  break;

case MATCH_NO:

It would at least give a better hint. Attached is the patch that adds this to
the previous one.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Is this ok?

Regards and thanks for the review,
Andre

On Tue, 1 Oct 2024 23:31:11 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 01.10.24 um 09:43 schrieb Andre Vehreschild:
> > Hi all,
> >
> > this rather old PR reported a parsing bug, when a coarray'ed character
> > substring ref is to be parsed, aka CHARACTER(:) :: str[:] ... str(2:5). In
> > this case the parser confused the substring ref with an array-ref, because
> > an array_spec was present. This patch fixes this by requesting only coarray
> > parsing from gfc_match_array_ref when no regular dimension is present. The
> > patch is not involved when an array of coarray'ed strings is parsed (that
> > worked beforehand).
>
> while the patch addresses the issue mentioned in the PR,
>
> > I had to fix the dg-error clauses in the testcase pr102532 because now the
> > error of having to many refs is detected by the parsing stage and no longer
> > by the resolve stage. It has become a simple syntax error. I hope this is
> > ok.
>
> I find the error messages now less helpful to users: before the patch
> we got "Rank mismatch in array reference", which was more suitable
> than the newer version with more or less confusing syntax errors.
>
> I assume you tried to find a better solution - but Intel and NAG
> also give syntax errors - so basically I am fine with the patch.
>
> You may want to wait for a second opinion.  If nobody else responds
> within the next 2 days, you may proceed nevertheless.
>
> Thanks,
> Harald
>
> > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
> >
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH] gcc-wwwdocs: Mention check-c++-all target for C++ front end patch testing

2024-10-02 Thread Simon Martin
This is a follow-up to the discussion about testing changes to the C++
front end in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664258.html

It also clarifies that the make invocation examples should be made from
the *build* tree.

Validated fine via https://validator.w3.org.
---
 htdocs/contribute.html | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/htdocs/contribute.html b/htdocs/contribute.html
index 53c27c6e..3ab65323 100644
--- a/htdocs/contribute.html
+++ b/htdocs/contribute.html
@@ -111,9 +111,17 @@ For a normal native configuration, running
 make bootstrap
 make -k check
 
-from the top level of the GCC tree (not the
+from the top level of the GCC build tree (not the
 gcc subdirectory) will accomplish this.
 
+If your change is to the C++ front end, you need to run the C++ testsuite
+in all standard conformance levels. For a normal native configuration,
+running
+
+make -C gcc -k check-c++-all
+
+from the top level of the GCC build tree will accomplish this.
+
 If your change is to a front end other than the C or C++ front end,
 or a runtime library other than libgcc, you need to verify
 only that the runtime library for that language still builds and the
-- 
2.44.0



[PATCH] libstdc++: Fix formatting of chrono::duration with character rep [PR116755]

2024-10-02 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

Implement Peter Dimov's suggestion for resolving LWG 4118, which is to
use +d.count() so that character types are promoted to an integer type
before formatting them. This didn't have unanimous consensus in the
committee as Howard Hinnant proposed that we should format the rep
consistently with std::format("{}", d.count()) instead. That ends up
being more complicated, because it makes std::formattable a precondition
of operator<< which was not previously the case, and it means that
ios_base::fmtflags from the stream would be ignored because std::format
doesn't use them.

libstdc++-v3/ChangeLog:

PR libstdc++/116755
* include/bits/chrono_io.h (operator<<): Use +d.count() for
duration inserter.
(__formatter_chrono::_M_format): Likewise for %Q format.
* testsuite/20_util/duration/io.cc: Test durations with
character types as reps.
---
 libstdc++-v3/include/bits/chrono_io.h |  9 ++-
 libstdc++-v3/testsuite/20_util/duration/io.cc | 66 +++
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 362bb5aa9e9..a337007266e 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -150,7 +150,9 @@ namespace __detail
   __s.flags(__os.flags());
   __s.imbue(__os.getloc());
   __s.precision(__os.precision());
-  __s << __d.count();
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4118. How should duration formatters format custom rep types?
+  __s << +__d.count();
   __detail::__fmt_units_suffix(_Out(__s));
   __os << std::move(__s).str();
   return __os;
@@ -635,8 +637,10 @@ namespace __format
case 'Q':
  // %Q The duration's numeric value.
  if constexpr (chrono::__is_duration_v<_Tp>)
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 4118. How should duration formatters format custom rep?
__out = std::format_to(__print_sign(), _S_empty_spec,
-  __t.count());
+  +__t.count());
  else
__throw_format_error("chrono format error: argument is "
 "not a duration");
@@ -1703,6 +1707,7 @@ namespace __format
 /// @endcond
 
   template
+requires __format::__formattable_impl<_Rep, _CharT>
 struct formatter, _CharT>
 {
   constexpr typename basic_format_parse_context<_CharT>::iterator
diff --git a/libstdc++-v3/testsuite/20_util/duration/io.cc 
b/libstdc++-v3/testsuite/20_util/duration/io.cc
index 57020f4f953..383fb60afe2 100644
--- a/libstdc++-v3/testsuite/20_util/duration/io.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/io.cc
@@ -23,6 +23,24 @@ test01()
   VERIFY( s == "3[2]s" );
   std::getline(ss, s);
   VERIFY( s == "9[2/3]s" );
+
+  // LWG 4118. How should duration formatters format custom rep types?
+  ss.str("");
+  ss << duration(121) << ' ';
+  ss << duration(122) << ' ';
+  ss << duration(123) << ' ';
+  ss << duration(124) << ' ';
+  ss << duration(125) << ' ';
+  ss << duration(126) << ' ';
+  ss << duration(127) << ' ';
+  VERIFY( ss.str() == "121s 122s 123s 124s 125s 126s 127s " );
+
+  ss.str("");
+  ss << std::hex << std::uppercase << duration(0x1A) << ' ';
+  ss << std::hex << std::uppercase << duration(0x2A) << ' ';
+  ss << std::hex << std::uppercase << duration(0x3A) << ' ';
+  ss << std::scientific << duration(4.5) << ' ';
+  VERIFY( ss.str() == "1As 2As 3As 4.50E+00s " );
 }
 
 void
@@ -44,6 +62,24 @@ test02()
   VERIFY( s == L"3[2]s" );
   std::getline(ss, s);
   VERIFY( s == L"9[2/3]s" );
+
+  // LWG 4118. How should duration formatters format custom rep types?
+  ss.str(L"");
+  ss << duration(121) << ' ';
+  ss << duration(122) << ' ';
+  ss << duration(123) << ' ';
+  ss << duration(124) << ' ';
+  ss << duration(125) << ' ';
+  ss << duration(126) << ' ';
+  ss << duration(127) << ' ';
+  VERIFY( ss.str() == L"121s 122s 123s 124s 125s 126s 127s " );
+
+  ss.str(L"");
+  ss << std::hex << std::uppercase << duration(0x1A) << ' ';
+  ss << std::hex << std::uppercase << duration(0x2A) << ' ';
+  ss << std::hex << std::uppercase << duration(0x3A) << ' ';
+  ss << std::scientific << duration(4.5) << ' ';
+  VERIFY( ss.str() == L"1As 2As 3As 4.50E+00s " );
 #endif
 }
 
@@ -114,6 +150,36 @@ test_format()
   VERIFY( s == expected );
   s = std::format("{:%Q%q}", minsec);
   VERIFY( s == expected );
+
+  // LWG 4118. How should duration formatters format custom rep types?
+  s = std::format("{}", std::chrono::duration(100));
+  VERIFY( s == "100s" );
+  s = std::format("{:%Q}", std::chrono::duration(101));
+  VERIFY( s == "101" );
+#ifdef _GLIBCXX_USE_WCHAR_T
+  ws = std::format(L"{}", std::chrono::duration(102));
+  VERIFY( ws == L"102s" );
+  ws = std::format(L"{}", std::chr

Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

2024-10-02 Thread Jonathan Wakely
On Wed, 2 Oct 2024 at 17:48, Matthew Malcomson  wrote:
>
> Thanks Jonathan,
>
> I agree with your point that having just the check against one of the 
> overloaded versions is not very robust and having multiple checks against 
> different versions would be better.
>
> Unfortunately — while asking the clang folk about this I realised that clang 
> doesn't expose the resolved versions (e.g. the existing versions like 
> `__atomic_load_2` etc) to the user.
> Instead they allow using SFINAE on these overloaded builtins.
> https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461
>
> I spent some time looking at this and it seems that enabling SFINAE in GCC 
> for these builtins is not too problematic (idea being to pass in a 
> `error_on_noresolve` boolean to `resolve_overloaded_builtin` based on the 
> context in the C++ frontend, then only emit errors if that boolean is set).
>
> To Jonathon:
>
> Would you be OK with using SFINAE to choose whether to use the 
> __atomic_fetch_add builtin for floating point types in libstdc++?

It's not great for compile times, but no objection otherwise.

>
>
> At C++ frontend maintainers I Cc'd in:
>
> Are you happy with the idea of enabling SFINAE on overloaded builtins 
> resolved via resolve_overloaded_builtin?
>
> To global maintainers I Cc'd in:
>
> Is there any reason you know of not to enable SFINAE on the overloaded 
> builtins?
> Would it be OK to enable SFINAE on the generic overloaded builtins and add 
> the parameter so that targets can do the same for their target-specific 
> builtins (i.e. without changing the behaviour of the existing target specific 
> builtins)?
>
>
> 
> From: Jonathan Wakely 
> Sent: 19 September 2024 3:47 PM
> To: Matthew Malcomson 
> Cc: gcc-patches@gcc.gnu.org ; Joseph Myers 
> ; Richard Biener 
> Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins
>
> External email: Use caution opening links or attachments
>
>
> On Thu, 19 Sept 2024 at 14:12,  wrote:
> >
> > From: Matthew Malcomson 
> >
> > Hello, this is an RFC for adding an atomic floating point fetch_add builtin
> > (and variants) to GCC.  The atomic fetch_add operation is defined to work
> > on the base floating point types in the C++20 standard chapter 31.7.3, and
> > extended to work for all cv-unqualified floating point types in C++23
> > chapter 33.5.7.4.
> >
> > Honestly not sure who to Cc, please do point me to someone else if that's
> > better.
> >
> > This is nowhere near complete (for one thing even the tests I've added
> > don't fully pass), but I think I have a complete enough idea that it's
> > worth checking if this is something that could be agreed on.
> >
> > As it stands no target except the nvptx backend would natively support
> > these operations.
> >
> > Main questions that I'm looking to resolve with this RFC:
> > 1) Would GCC be OK accepting this implementation even though no backend
> >would be implementing these yet?
> >- AIUI only the nvptx backend could theoretically implement this.
> >- Even without a backend implementing it natively, the ability to use
> >  this in code (especially libstdc++) enables other compilers to
> >  generate better code for GPU's using standard C++.
> > 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
> >(i.e. a check on the resolved builtin rather than the more user-facing
> >one) in order to determine whether floating point atomic fetch_add is
> >available.
>
> Yes, if that name is what other compilers will also use (have you
> discussed this with Clang?)
>
> It looks like PATCH 5/8 only uses the _fp name for fetch_add though,
> and just uses fetch_sub etc. for the other functions, is that a
> mistake?
>
> >- N.b. this builtin is actually the builtin working on the "double"
>
> OK, so the library code just calls the generic __atomic_fetch_add that
> accepts any types, but then that gets expanded to a more specific form
> for float, double etc.?
> And the more specific form has to exist at some level, because we need
> an extern symbol from libatomic, so either we include the type as an
> explicit suffix on the name, or we use some kind of name mangling like
> _Z18__atomic_fetch_addPdS_S_, which is obviously nasty.
>
> >  type, one would have to rely on any compilers implementing that
> >  particular resolved builtin to also implement the other floating point
> >  atomic fetch_add builtins that they would want to support in libstdc++
> >  `atomic<[floating_point_type]>::fetch_add`.
>
> This seems a bit concerning. I can imagine somebody implementing these
> for float and double first, but leaving long double, _Float64,
> _Float32, _Float128 etc. for later. In that case, libstdc++ would not
> work if somebody tries to use std::atomic, or whichever
> types aren't supported yet. It's OK if we can be *sure* that won't
> happen i.e. that Clang will either implement the new built-i

[PATCH] libstdc++: Tweak %c formatting for chrono types

2024-10-02 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add
[[unlikely]] attribute to condition for missing %c format in
locale. Use %T instead of %H:%M:%S in fallback.
---
 libstdc++-v3/include/bits/chrono_io.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index a337007266e..652e88ffe3a 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -899,8 +899,8 @@ namespace __format
  const _CharT* __formats[2];
  __tp._M_date_time_formats(__formats);
  const _CharT* __rep = __formats[__mod];
- if (!*__rep)
-   __rep = _GLIBCXX_WIDEN("%a %b %e %H:%M:%S %Y");
+ if (!*__rep) [[unlikely]]
+   __rep = _GLIBCXX_WIDEN("%a %b %e %T %Y");
  basic_string<_CharT> __fmt(_S_empty_spec);
  __fmt.insert(1u, 1u, _S_colon);
  __fmt.insert(2u, __rep);
-- 
2.46.1



[PATCH 1/2] libstdc++: Make std::construct_at support arrays (LWG 3436)

2024-10-02 Thread Jonathan Wakely
Is the g++ test change OK?

Tested x86_64-linux.

-- >8 --

The issue was approved at the recent St. Louis meeting, requiring
support for bounded arrays, but only without arguments to initialize the
array elements.

libstdc++-v3/ChangeLog:

* include/bits/stl_construct.h (construct_at): Support array
types (LWG 3436).
* testsuite/20_util/specialized_algorithms/construct_at/array.cc:
New test.
* testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc:
New test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-opt1.C: Adjust for different diagnostics
from std::construct_at by adding -fconcepts-diagnostics-depth=2.
---
 gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C|  1 +
 libstdc++-v3/include/bits/stl_construct.h | 20 +++--
 .../construct_at/array.cc | 41 +++
 .../construct_at/array_neg.cc | 19 +
 4 files changed, 78 insertions(+), 3 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc

diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C
index 391b7c47d50..38c4f00cec0 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-opt1.C
@@ -1,5 +1,6 @@
 // PR c++/110102
 // { dg-do compile { target c++11 } }
+// { dg-additional-options "-fconcepts-diagnostics-depth=2" { target c++20 } }
 // { dg-skip-if "requires hosted libstdc++ for list" { ! hostedlib } }
 
 // { dg-error "deleted|construct_at" "" { target *-*-* } 0 }
diff --git a/libstdc++-v3/include/bits/stl_construct.h 
b/libstdc++-v3/include/bits/stl_construct.h
index dc08fb7ea33..146ea14e99a 100644
--- a/libstdc++-v3/include/bits/stl_construct.h
+++ b/libstdc++-v3/include/bits/stl_construct.h
@@ -90,11 +90,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cpp_constexpr_dynamic_alloc // >= C++20
   template
-constexpr auto
+requires (!is_unbounded_array_v<_Tp>)
+  && requires { ::new((void*)0) _Tp(std::declval<_Args>()...); }
+constexpr _Tp*
 construct_at(_Tp* __location, _Args&&... __args)
 noexcept(noexcept(::new((void*)0) _Tp(std::declval<_Args>()...)))
--> decltype(::new((void*)0) _Tp(std::declval<_Args>()...))
-{ return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
+{
+  void* __loc = const_cast*>(__location);
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3436. std::construct_at should support arrays
+  if constexpr (is_array_v<_Tp>)
+   {
+ static_assert(sizeof...(_Args) == 0, "std::construct_at for array "
+  "types must not use any arguments to initialize the "
+  "array");
+ return ::new(__loc) _Tp[1]();
+   }
+  else
+   return ::new(__loc) _Tp(std::forward<_Args>(__args)...);
+}
 #endif // C++20
 #endif// C++17
 
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array.cc 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array.cc
new file mode 100644
index 000..c3683462835
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array.cc
@@ -0,0 +1,41 @@
+// { dg-do compile { target c++20 } }
+
+// LWG 3436. std::construct_at should support arrays
+
+#include 
+#include 
+
+constexpr void
+test_array()
+{
+  int arr[1] { 99 };
+  std::construct_at(&arr);
+  VERIFY( arr[0] == 0 );
+
+  union U {
+long long x;
+int arr[4];
+  } u;
+  u.x = -1;
+
+  auto p = std::construct_at(&u.arr);
+  VERIFY( (*p)[0] == 0 );
+  VERIFY( (*p)[1] == 0 );
+  VERIFY( (*p)[2] == 0 );
+  VERIFY( (*p)[3] == 0 );
+
+  struct NonTrivial {
+constexpr NonTrivial() : i(99) { }
+int i;
+  };
+
+  union U2 {
+char c = 'a';
+NonTrivial arr[2];
+  } u2;
+
+  auto p2 = std::construct_at(&u2.arr);
+  VERIFY( (*p2)[0].i == 99 );
+}
+
+static_assert( [] { test_array(); return true; }() );
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc
new file mode 100644
index 000..deb86930d1a
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/construct_at/array_neg.cc
@@ -0,0 +1,19 @@
+// { dg-do compile { target c++20 } }
+
+// LWG 3436. std::construct_at should support arrays
+
+#include 
+
+void
+test_array_args()
+{
+  int arr[2];
+  std::construct_at(&arr, 1, 2); // { dg-error "here" }
+  // { dg-error "must not use any arguments" "" { target *-*-* } 0 }
+}
+
+void
+test_unbounded_array(int (*p)[])
+{
+  std::construct_at(p); // { dg-error "no matching function" }
+}
-- 
2.46.1



[PATCH 2/2] libstdc++: Do not cast away const-ness in std::construct_at (LWG 3870)

2024-10-02 Thread Jonathan Wakely
Tested x86_64-linux.

-- >8 --

This change also requires implementing the proposed resolution of LWG
3216 so that std::make_shared and std::allocate_shared still work, and
the proposed resolution of LWG 3891 so that std::expected still works.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h: Remove cv-qualifiers from
type managed by _Sp_counted_ptr_inplace, as per LWG 3210.
* include/bits/stl_construct.h: Do not cast away cv-qualifiers
when passing pointer to placement new.
* include/std/expected: Use remove_cv_t for union member, as per
LWG 3891.
* testsuite/20_util/allocator/void.cc: Do not test construction
via const pointer.
---
 libstdc++-v3/include/bits/shared_ptr_base.h  | 15 ---
 libstdc++-v3/include/bits/stl_construct.h|  6 +++---
 libstdc++-v3/include/std/expected|  2 +-
 libstdc++-v3/testsuite/20_util/allocator/void.cc | 15 ---
 4 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index 3d0b74ba1c6..ef0658f6182 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -591,7 +591,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
_Alloc& _M_alloc() noexcept { return _A_base::_S_get(*this); }
 
-   __gnu_cxx::__aligned_buffer<_Tp> _M_storage;
+   __gnu_cxx::__aligned_buffer<__remove_cv_t<_Tp>> _M_storage;
   };
 
 public:
@@ -633,7 +633,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   virtual void*
   _M_get_deleter(const std::type_info& __ti) noexcept override
   {
-   auto __ptr = const_cast::type*>(_M_ptr());
// Check for the fake type_info first, so we don't try to access it
// as a real type_info object. Otherwise, check if it's the real
// type_info for this class. With RTTI enabled we can check directly,
@@ -646,11 +645,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Sp_make_shared_tag::_S_eq(__ti)
 #endif
   )
- return __ptr;
+ return _M_ptr();
return nullptr;
   }
 
-  _Tp* _M_ptr() noexcept { return _M_impl._M_storage._M_ptr(); }
+  __remove_cv_t<_Tp>*
+  _M_ptr() noexcept { return _M_impl._M_storage._M_ptr(); }
 
   _Impl _M_impl;
 };
@@ -674,13 +674,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   [[no_unique_address]] _Alloc _M_alloc;
 
   union {
-   _Tp _M_obj;
+   remove_cv_t<_Tp> _M_obj;
char _M_unused;
   };
 
   friend class __shared_count<_Lp>; // To be able to call _M_ptr().
 
-  _Tp* _M_ptr() noexcept { return std::__addressof(_M_obj); }
+  auto _M_ptr() noexcept { return std::__addressof(_M_obj); }
 
 public:
   using __allocator_type = __alloc_rebind<_Alloc, _Sp_counted_ptr_inplace>;
@@ -962,7 +962,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__shared_count(_Tp*& __p, _Sp_alloc_shared_tag<_Alloc> __a,
   _Args&&... __args)
{
- typedef _Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp> _Sp_cp_type;
+ using _Tp2 = __remove_cv_t<_Tp>;
+ using _Sp_cp_type = _Sp_counted_ptr_inplace<_Tp2, _Alloc, _Lp>;
  typename _Sp_cp_type::__allocator_type __a2(__a._M_a);
  auto __guard = std::__allocate_guarded(__a2);
  _Sp_cp_type* __mem = __guard.get();
diff --git a/libstdc++-v3/include/bits/stl_construct.h 
b/libstdc++-v3/include/bits/stl_construct.h
index 146ea14e99a..9d6111396e1 100644
--- a/libstdc++-v3/include/bits/stl_construct.h
+++ b/libstdc++-v3/include/bits/stl_construct.h
@@ -96,7 +96,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 construct_at(_Tp* __location, _Args&&... __args)
 noexcept(noexcept(::new((void*)0) _Tp(std::declval<_Args>()...)))
 {
-  void* __loc = const_cast*>(__location);
+  void* __loc = __location;
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 3436. std::construct_at should support arrays
   if constexpr (is_array_v<_Tp>)
@@ -130,7 +130,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return;
}
 #endif
-  ::new((void*)__p) _Tp(std::forward<_Args>(__args)...);
+  ::new(static_cast(__p)) _Tp(std::forward<_Args>(__args)...);
 }
 #else
   template
@@ -146,7 +146,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 inline void
 _Construct_novalue(_T1* __p)
-{ ::new((void*)__p) _T1; }
+{ ::new(static_cast(__p)) _T1; }
 
   template
 _GLIBCXX20_CONSTEXPR void
diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index 9e92339e406..d4a4bc17541 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -1261,7 +1261,7 @@ namespace __expected
{ }
 
   union {
-   _Tp _M_val;
+   remove_cv_t<_Tp> _M_val;
_Er _M_unex;
   };
 
diff --git a/libstdc++-v3/testsuite/20_util/allocator/void.cc 
b/libstdc++-v3/testsuite/20_util/allocator/void.cc

Re: [PATCH] tree-optimization/116566 - single lane SLP for VLA inductions

2024-10-02 Thread Andrew Pinski
On Tue, Oct 1, 2024 at 5:04 AM Richard Biener  wrote:
>
> The following adds SLP support for vectorizing single-lane inductions
> with variable length vectors.

This introduces a bootstrap failure on aarch64 due to a maybe
uninitialized variable.

inlined from ‘bool vectorizable_induction(loop_vec_info,
stmt_vec_info, gimple**, slp_tree, stmt_vector_for_cost*)’ at
/home/linaro/src/upstream-gcc/gcc/gcc/tree-vect-loop.cc:10718:33:
/home/linaro/src/upstream-gcc/gcc/gcc/gimple-fold.h:183:25: error:
‘vec_init’ may be used uninitialized [-Werror=maybe-uninitialized]
  183 |   return gimple_convert (&gsi, false, GSI_CONTINUE_LINKING,
  |  ~~~^~~
  184 |  UNKNOWN_LOCATION, type, op);
  |  ~~~
/home/linaro/src/upstream-gcc/gcc/gcc/tree-vect-loop.cc: In function
‘bool vectorizable_induction(loop_vec_info, stmt_vec_info, gimple**,
slp_tree, stmt_vector_for_cost*)’:
/home/linaro/src/upstream-gcc/gcc/gcc/tree-vect-loop.cc:10281:17:
note: ‘vec_init’ was declared here
10281 |   tree new_vec, vec_init, vec_step, t;
  | ^~~~


The issue is around line 10718:
  if (init_node)
vec_init = vect_get_slp_vect_def (init_node, ivn);
  if (!nested_in_vect_loop
  && step_mul
  && !integer_zerop (step_mul))
{
  gcc_assert (invariant);
  vec_def = gimple_convert (&init_stmts, step_vectype, vec_init);

it is hard to follow the code to see if it is actually uninitialized or not.

Thanks,
Andrew

>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> PR tree-optimization/116566
> * tree-vect-loop.cc (vectorizable_induction): Handle single-lane
> SLP for VLA vectors.
> ---
>  gcc/tree-vect-loop.cc | 247 --
>  1 file changed, 189 insertions(+), 58 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index a5a44613cb2..f5ecf0bdb80 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10283,7 +10283,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>gimple *new_stmt;
>gphi *induction_phi;
>tree induc_def, vec_dest;
> -  tree init_expr, step_expr;
>poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>unsigned i;
>tree expr;
> @@ -10369,7 +10368,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>  iv_loop = loop;
>gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
>
> -  if (slp_node && !nunits.is_constant ())
> +  if (slp_node && (!nunits.is_constant () && SLP_TREE_LANES (slp_node) != 1))
>  {
>/* The current SLP code creates the step value element-by-element.  */
>if (dump_enabled_p ())
> @@ -10387,7 +10386,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>return false;
>  }
>
> -  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
> +  tree step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
>gcc_assert (step_expr != NULL_TREE);
>if (INTEGRAL_TYPE_P (TREE_TYPE (step_expr))
>&& !type_has_mode_precision_p (TREE_TYPE (step_expr)))
> @@ -10475,9 +10474,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> [i2 + 2*S2, i0 + 3*S0, i1 + 3*S1, i2 + 3*S2].  */
>if (slp_node)
>  {
> -  /* Enforced above.  */
> -  unsigned int const_nunits = nunits.to_constant ();
> -
>/* The initial values are vectorized, but any lanes > group_size
>  need adjustment.  */
>slp_tree init_node
> @@ -10499,11 +10495,12 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>
>/* Now generate the IVs.  */
>unsigned nvects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> -  gcc_assert ((const_nunits * nvects) % group_size == 0);
> +  gcc_assert (multiple_p (nunits * nvects, group_size));
>unsigned nivs;
> +  unsigned HOST_WIDE_INT const_nunits;
>if (nested_in_vect_loop)
> nivs = nvects;
> -  else
> +  else if (nunits.is_constant (&const_nunits))
> {
>   /* Compute the number of distinct IVs we need.  First reduce
>  group_size if it is a multiple of const_nunits so we get
> @@ -10514,21 +10511,43 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>   nivs = least_common_multiple (group_sizep,
> const_nunits) / const_nunits;
> }
> +  else
> +   {
> + gcc_assert (SLP_TREE_LANES (slp_node) == 1);
> + nivs = 1;
> +   }
> +  gimple_seq init_stmts = NULL;
>tree stept = TREE_TYPE (step_vectype);
>tree lupdate_mul = NULL_TREE;
>if (!nested_in_vect_loop)
> {
> - /* The number of iterations covered in one vector iteration.  */
> - unsigned lup_mul = (nvects * const_nunits) / group_size;
> - lupdate_mul
> -   = build_vector_from_

Re: [Fortran, Patch, PR51815, v1] Fix parsing of substring refs in coarrays.

2024-10-02 Thread Harald Anlauf

Hi Andre,

Am 02.10.24 um 10:49 schrieb Andre Vehreschild:

Hi Harald,

we could do something like this:

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index d73d5eaed84..5000906f5f2 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2823,6 +2823,16 @@ check_substring:
   if (substring)
 primary->ts.u.cl = NULL;

+ if (gfc_peek_ascii_char () == '(')
+   {
+ gfc_array_ref arr_ref;
+ gfc_array_spec *as
+   = sym->ts.type == BT_CLASS ? CLASS_DATA (sym)->as : sym->as;
+ gfc_match_array_ref (&arr_ref, as, 0, 0);
+
+ gfc_error_now ("Unexpected array/substring ref at %C");
+ return MATCH_ERROR;
+   }
   break;

 case MATCH_NO:

It would at least give a better hint. Attached is the patch that adds this to
the previous one.


this seems to go into the right direction - except that I am not a
great fan of gfc_error_now, as that tries to paper over deficiencies
in error recovery.

Is there a reason that you do not check the return value of
gfc_match_array_ref?  Apart from the gfc_error_now, the above
behaves essentially the same a a simple

  if (gfc_peek_ascii_char () == '(')
return MATCH_ERROR;

for the testcase at hand.

Indeed your suggestion (or the shortened version above) improves
the diagnostics ("user experience") also for this variant:

subroutine foo
   character(:), allocatable :: x[:]
   character(:), dimension(:), allocatable :: c[:]
   type t
  character(:), allocatable :: x[:]
  character(:), dimension(:), allocatable :: c[:]
   end type t
   type(t) :: z
   associate (y => x(:)(2:))
   end associate
   associate (a => c(:)(:)(2:))
   end associate
   associate (y => z%x(:)(2:))
   end associate
   associate (a => z%c(:)(:)(2:))
   end associate
end

with several error messages of the kind

Error: Invalid association target at (1)

or

Error: Rank mismatch in array reference at (1) (1/0)

looking less technical than a parsing error.
I think this is as good as it can be.

So OK from my side with either your additional patch or my
shortened version.

Thanks for the patch!

Harald



Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Is this ok?

Regards and thanks for the review,
Andre

On Tue, 1 Oct 2024 23:31:11 +0200
Harald Anlauf  wrote:


Hi Andre,

Am 01.10.24 um 09:43 schrieb Andre Vehreschild:

Hi all,

this rather old PR reported a parsing bug, when a coarray'ed character
substring ref is to be parsed, aka CHARACTER(:) :: str[:] ... str(2:5). In
this case the parser confused the substring ref with an array-ref, because
an array_spec was present. This patch fixes this by requesting only coarray
parsing from gfc_match_array_ref when no regular dimension is present. The
patch is not involved when an array of coarray'ed strings is parsed (that
worked beforehand).


while the patch addresses the issue mentioned in the PR,


I had to fix the dg-error clauses in the testcase pr102532 because now the
error of having to many refs is detected by the parsing stage and no longer
by the resolve stage. It has become a simple syntax error. I hope this is
ok.


I find the error messages now less helpful to users: before the patch
we got "Rank mismatch in array reference", which was more suitable
than the newer version with more or less confusing syntax errors.

I assume you tried to find a better solution - but Intel and NAG
also give syntax errors - so basically I am fine with the patch.

You may want to wait for a second opinion.  If nobody else responds
within the next 2 days, you may proceed nevertheless.

Thanks,
Harald


Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de





--
Andre Vehreschild * Email: vehre ad gmx dot de






[pushed] doc: Drop h8300-hms reference to binaries downloads

2024-10-02 Thread Gerald Pfeifer
There aren't actually any H8/h8300-hms anywhere near our binaries docs, so 
simply removing this stale reference appears best.

Pushed.

Gerald


gcc:
PR target/69374
* doc/install.texi (Specific) : Drop obsolete
reference to binaries download docs.
---
 gcc/doc/install.texi | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 517d1cbb2fb..e035061a23e 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4118,8 +4118,6 @@ This configuration is intended for embedded systems.
 @heading h8300-hms
 Renesas H8/300 series of processors.
 
-Please have a look at the @uref{binaries.html,,binaries page}.
-
 The calling convention and structure layout has changed in release 2.6.
 All code must be recompiled.  The calling convention now passes the
 first three arguments in function calls in registers.  Structures are no
-- 
2.46.0


[PATCH] testsuite/116660 - adjust testcases unexpectedly failing on 32bit sparc

2024-10-02 Thread Richard Biener
Both testcases miss some effective target requires.

Pushed.

PR testsuite/116660
* gcc.dg/vect/no-scevccp-outer-12.c: Add vect_pack_trunc.
* gcc.dg/vect/vect-multitypes-6.c: Add vect_char_add, remove
explicit 32bit sparc XFAIL.
---
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
index 6ace6ad022e..b94256d48db 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_pack_trunc } */
 
 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
index 73d3b30384e..e03d62f6a85 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
@@ -1,6 +1,7 @@
 /* Disabling epilogues until we find a better way to deal with scans.  */
 /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_char_add } */
 /* { dg-add-options double_vectors } */
 
 #include 
@@ -67,7 +68,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
sparc*-*-* && ilp32 } }} } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
versioning" 6 "vect" { target { vect_no_align && { ! vect_hw_misalign } } } } } 
*/
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 6 
"vect" { xfail { ! { vect_unaligned_possible && vect_align_stack_vars } } } } } 
*/
 
-- 
2.43.0


[PATCH v1] Add -ftime-report-wall

2024-10-02 Thread Andi Kleen
From: Andi Kleen 

Time vars normally use times(2) to get the user/sys/wall time, which is always a
system call. I don't think the system time is very useful because most overhead
is in user time. If we only use the wall (or monotonic) time modern OS have an
optimized path to get it directly from a CPU instruction like RDTSC
without system call, which is much faster.

Comparing the overhead with tramp3d:

  ./gcc/cc1plus -quiet  ../tsrc/tramp3d-v4.i ran
1.03 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report-wall 
../tsrc/tramp3d-v4.i
1.18 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report 
../tsrc/tramp3d-v4.i

-ftime-report costs 18% (excluding the output), while -ftime-report-wall
only costs 3%, so is nearly free. So it would be feasible for some build
system to always enable it and break down the build time into passes.

The drawback is that if there is context switching with other programs
the time will be overestimated, however for the common case that the
system is not oversubscribed it is more accurate because each
measurement has less overhead.

Add a -ftime-report-wall option. It actually uses the POSIX monotonic time,
so strictly it's not wall clock, but it's still a reasonable name.

Bootstrapped on x86_64-linux with full test suite run.

gcc/ChangeLog:

* common.opt (ftime-report-wall): Add.
* common.opt.urls: Regenerate.
* doc/invoke.texi: (ftime-report-wall): Document
* gcc.cc (try_generate_repro): Check for -ftime-report-wall.
* timevar.cc (get_time): Use clock_gettime if enabled.
(timer::print): Print only wall time for time_report_wall.
* toplev.cc (toplev::start_timevars): Check for time_report_wall.

gcc/testsuite/ChangeLog:

* g++.dg/ext/timevar3.C: New test.
---
 gcc/common.opt  |  4 
 gcc/common.opt.urls |  3 +++
 gcc/doc/invoke.texi |  7 +++
 gcc/gcc.cc  |  3 ++-
 gcc/testsuite/g++.dg/ext/timevar3.C | 14 +
 gcc/timevar.cc  | 31 +++--
 gcc/toplev.cc   |  3 ++-
 7 files changed, 57 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/timevar3.C

diff --git a/gcc/common.opt b/gcc/common.opt
index d270e524ff45..e9fb15e28d80 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3010,6 +3010,10 @@ ftime-report
 Common Var(time_report)
 Report the time taken by each compiler pass.
 
+ftime-report-wall
+Common Var(time_report_wall)
+Report the wall time taken by each compiler.
+
 ftime-report-details
 Common Var(time_report_details)
 Record times taken by sub-phases separately.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index e31736cd9945..6e79a8f9390b 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -1378,6 +1378,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-fthread-jumps)
 ftime-report
 UrlSuffix(gcc/Developer-Options.html#index-ftime-report)
 
+ftime-report-wall
+UrlSuffix(gcc/Developer-Options.html#index-ftime-report-wall)
+
 ftime-report-details
 UrlSuffix(gcc/Developer-Options.html#index-ftime-report-details)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e199522f62c7..80cb355f5d79 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -784,6 +784,7 @@ Objective-C and Objective-C++ Dialects}.
 -frandom-seed=@var{string}  -fsched-verbose=@var{n}
 -fsel-sched-verbose  -fsel-sched-dump-cfg  -fsel-sched-pipelining-verbose
 -fstats  -fstack-usage  -ftime-report  -ftime-report-details
+-ftime-report-wall
 -fvar-tracking-assignments-toggle  -gtoggle
 -print-file-name=@var{library}  -print-libgcc-file-name
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory
@@ -21026,6 +21027,12 @@ slightly different place within the compiler.
 @item -ftime-report-details
 Record the time consumed by infrastructure parts separately for each pass.
 
+@opindex ftime-report-wall
+@item -ftime-report-wall
+Report statistics about compiler pass time consumpion, but only using wall
+time.  This is faster than @option{-ftime-report}, but can be more
+influenced by background jobs.
+
 @opindex fira-verbose
 @item -fira-verbose=@var{n}
 Control the verbosity of the dump file for the integrated register allocator.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 16fed46fb35f..8d3046eb7874 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -7964,7 +7964,8 @@ try_generate_repro (const char **argv)
it might varry between invocations.  */
 else if (! strcmp (argv[nargs], "-quiet"))
   quiet = 1;
-else if (! strcmp (argv[nargs], "-ftime-report"))
+else if (! strcmp (argv[nargs], "-ftime-report")
+  || ! strcmp (argv[nargs], "-ftime-report-wall"))
   return;
 
   if (out_arg == -1 || !quiet)
diff --git a/gcc/testsuite/g++.dg/ext/timevar3.C 
b/gcc/testsuite/g++.dg/ext/timevar3.C
new file mode 100644
index ..b003f37f9654
--- /dev/null
+++ b/gcc/testsuite

[PATCH v2] c: ICE in build_counted_by_ref [PR116735]

2024-10-02 Thread Qing Zhao
From: qing zhao 

Hi, this is the 2nd version of the patch. 
compared to the 1st version, the major changes are to address Marek and
Jacub's comments.

bootstrapped and regression tested on both x86 and aarch64.
Okay for committing?

thanks.

Qing

==


When handling the counted_by attribute, if the corresponding field
doesn't exit, in additiion to issue error, we should also remove
the already added non-existing "counted_by" attribute from the
field_decl.

PR c/116735

gcc/c/ChangeLog:

* c-decl.cc (verify_counted_by_attribute): Remove the attribute
when error.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-9.c: New test.
---
 gcc/c/c-decl.cc   | 32 +++
 .../gcc.dg/flex-array-counted-by-9.c  | 25 +++
 2 files changed, 43 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-9.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index aa7f69d1b7b..224c015cd6d 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9502,14 +9502,17 @@ verify_counted_by_attribute (tree struct_type, tree 
field_decl)
 
   tree counted_by_field = lookup_field (struct_type, fieldname);
 
-  /* Error when the field is not found in the containing structure.  */
+  /* Error when the field is not found in the containing structure and
+ remove the corresponding counted_by attribute from the field_decl.  */
   if (!counted_by_field)
-error_at (DECL_SOURCE_LOCATION (field_decl),
- "argument %qE to the %qE attribute is not a field declaration"
- " in the same structure as %qD", fieldname,
- (get_attribute_name (attr_counted_by)),
- field_decl);
-
+{
+  error_at (DECL_SOURCE_LOCATION (field_decl),
+   "argument %qE to the % attribute"
+   " is not a field declaration in the same structure"
+   " as %qD", fieldname, field_decl);
+  DECL_ATTRIBUTES (field_decl)
+   = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
+}
   else
   /* Error when the field is not with an integer type.  */
 {
@@ -9518,14 +9521,15 @@ verify_counted_by_attribute (tree struct_type, tree 
field_decl)
   tree real_field = TREE_VALUE (counted_by_field);
 
   if (!INTEGRAL_TYPE_P (TREE_TYPE (real_field)))
-   error_at (DECL_SOURCE_LOCATION (field_decl),
- "argument %qE to the %qE attribute is not a field declaration"
- " with an integer type", fieldname,
- (get_attribute_name (attr_counted_by)));
-
+   {
+ error_at (DECL_SOURCE_LOCATION (field_decl),
+   "argument %qE to the % attribute"
+   " is not a field declaration with an integer type",
+   fieldname);
+ DECL_ATTRIBUTES (field_decl)
+   = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
+   }
 }
-
-  return;
 }
 
 /* TYPE is a struct or union that we're applying may_alias to after the body is
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-9.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-9.c
new file mode 100644
index 000..5c6fedd0d3d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-9.c
@@ -0,0 +1,25 @@
+/* PR c/116735  */
+/* { dg-options "-std=c99" } */
+/* { dg-do compile } */
+
+struct foo {
+  int len;
+  int element[] __attribute__ ((__counted_by__ (lenx))); /* { dg-error 
"attribute is not a field declaration in the same structure as" } */
+};
+
+struct bar {
+  float count;
+  int array[] __attribute ((counted_by (count))); /* { dg-error "attribute is 
not a field declaration with an integer type" } */
+};
+
+int main ()
+{
+  struct foo *p = __builtin_malloc (sizeof (struct foo) + 3 * sizeof (int));
+  struct bar *q = __builtin_malloc (sizeof (struct bar) + 3 * sizeof (int));
+  p->len = 3;
+  p->element[0] = 17;
+  p->element[1] = 13;
+  q->array[0] = 13;
+  q->array[2] = 17;
+  return 0;
+}
-- 
2.43.5



Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-02 Thread Andrew Waterman
On Wed, Oct 2, 2024 at 5:56 AM Jeff Law  wrote:
>
>
>
> On 9/5/24 12:52 PM, Palmer Dabbelt wrote:
> > We have cheap logical ops, so let's just move this back to the default
> > to take advantage of the standard branch/op hueristics.
> >
> > gcc/ChangeLog:
> >
> >   PR target/116615
> >   * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> So on the BPI  this is a pretty clear win.  Not surprisingly perlbench
> and gcc are the big winners.  It somewhat surprisingly regresses x264,
> deepsjeng & leela, but the magnitudes are smaller.  The net from a cycle
> perspective is 2.4%.  Every benchmark looks better from a branch count
> perspective.
>
> So in my mind it's just a matter of fixing any testsuite fallout (I
> would expect some) and this is OK.

Jeff, were you able to measure the change in static code size, too?
These results are very encouraging, but I'd like to make sure we don't
need to retain the current behavior when optimizing for size.

>
>
> jeff
>


Re: [PATCH] expr: Don't clear whole unions [PR116416]

2024-10-02 Thread Jason Merrill

On 10/2/24 3:20 PM, Marek Polacek wrote:

On Sat, Sep 28, 2024 at 08:39:12AM +0200, Jakub Jelinek wrote:

On Fri, Sep 27, 2024 at 04:01:33PM +0200, Jakub Jelinek wrote:

So, I think we should go with (but so far completely untested except
for pr78687.C which is optimized with Marek's patch and the above testcase
which doesn't have the clearing anymore) the following patch.


That patch had a bug in type_has_padding_at_level_p and so it didn't
bootstrap.

Here is a full patch which does.


[...]

And here's my patch, bootstrapped/regtested on x86_64-pc-linux-gnu
on top of Jakub's patch, ok for trunk once the prerequisite is in?

-- >8 --
This PR reports a missed optimization.  When we have:

   Str str{"Test"};
   callback(str);

as in the test, we're able to evaluate the Str::Str() call at compile
time.  But when we have:

   callback(Str{"Test"});

we are not.  With this patch (in fact, it's Patrick's patch with a little
tweak), we turn

   callback (TARGET_EXPR >>
 (const char *) "Test" )

into

   callback (TARGET_EXPR )

I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold.  That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
   result is also a TARGET_EXPR;
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
   Things happen;
- getting a new slot is also problematic.  I've seen a test where we
   had "TARGET_EXPR, D.2680", and folding the whole TARGET_EXPR
   would get us "TARGET_EXPR", but since we don't see the outer
   D.2680, we can't replace it with D.2681, and things break.

With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a 
= {}"
is easy.  Previously, we would call C::C, so .gimple has:

   D.2590 = {};
   C::C (&D.2590);
   D.2597 = D.2590;
   return D.2597;

Then .einline inlines the C::C call:

   D.2590 = {};
   D.2590.a = {}; // #1
   D.2590.b = 0;  // #2
   D.2597 = D.2590;
   D.2590 ={v} {CLOBBER(eos)};
   return D.2597;

then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
passes.  But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.  The .optimized dump looks the
same though so there's no problem.

pr78687.C was fixed by Jakub's categorize_ctor_elements_1 patch.

PR c++/116416

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_r) : Try to fold
TARGET_EXPR_INITIAL and replace it with the folded result if
it's TREE_CONSTANT.

gcc/testsuite/ChangeLog:

* g++.dg/analyzer/pr97116.C: Adjust dg-message.
* g++.dg/tree-ssa/pr90883.C: Adjust dg-final.
* g++.dg/cpp0x/constexpr-prvalue1.C: New test.
* g++.dg/cpp1y/constexpr-prvalue1.C: New test.

Co-authored-by: Patrick Palka 
---
  gcc/cp/cp-gimplify.cc | 10 +++
  gcc/testsuite/g++.dg/analyzer/pr97116.C   |  2 +-
  .../g++.dg/cpp0x/constexpr-prvalue1.C | 24 +++
  .../g++.dg/cpp1y/constexpr-prvalue1.C | 30 +++
  gcc/testsuite/g++.dg/tree-ssa/pr90883.C   |  4 +--
  5 files changed, 67 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 003e68f1ea7..c63fdf3edd1 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
 that case, strip it in favor of this one.  */
if (tree &init = TARGET_EXPR_INITIAL (stmt))
{
+ if ((data->flags & ff_genericize)


Why only with ff_genericize?


+ && !flag_no_inline)
+   {
+ tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
+ if (folded != init && TREE_CONSTANT (folded))
+   {
+ init = folded;
+ break;


Are you sure we never need the TARGET_EXPR_CLEANUP walk in this case?

Maybe move the TARGET_EXPR_CLEANUP walk and the *walk_subtrees = 0 
before this new code?  And the "folding might replace" comment down to 
the tree_code == target_expr block?



+   }
+   }
  cp_walk_tree (&init, cp_fold_r, data, NULL);
  cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL);
  *walk_subtrees = 0;
diff --git a/gcc/testsuite/g++.dg/analyzer/pr97116.C 
b/gcc/testsuite/g++.dg/analyzer/pr97116.C
index d8e08a73172..1c404c2ceb2 100644
--- a/gcc/testsuite/g++.dg/analyzer/pr97116.C
+++ b/gcc/testsuite/g++.dg/analyzer/pr97116.C
@@ -16,7 +16,7 @@ struct foo
  void test_1 (void)
  {
foo *p = new(NULL) foo (42); // { dg-warning "non-null expected" "warning" }
-  // { dg-message "argument 'this' \\(\[^\n\]*\\) NULL where non-null expected" 
"final event" { target *-*-* } .-1 }

Re: [PATCH] c++: Fix g++.dg/ext/sve-sizeless-1.C regression

2024-10-02 Thread Andrew Pinski
On Fri, Sep 13, 2024 at 12:24 AM Jonathan Wakely  wrote:
>
> I'll wait for Linaro CI to confirm this works, and then push.
>
> I didn't see the regression because I only tested on x86_64.

You never pushed this fix.

Thanks,
Andrew

>
> -- >8 --
>
> This aarch64-*-* test needs an update for the diagnostic I changed in
> r15-3614-g9fe57e4879de93.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/sve-sizeless-1.C: Adjust dg-error string.
> ---
>  gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C 
> b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
> index 9f05ca5a855..adee37a0551 100644
> --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
> +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
> @@ -301,7 +301,7 @@ statements (int n)
>
>// Other built-ins
>
> -  __builtin_launder (sve_sc1); // { dg-error {non-pointer argument to 
> '__builtin_launder'} }
> +  __builtin_launder (sve_sc1); // { dg-error {not a pointer to object type} }
>__builtin_memcpy (&sve_sc1, &sve_sc2, 2);
>
>// Lambdas
> --
> 2.46.0
>


Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-02 Thread Jeff Law




On 10/2/24 4:39 PM, Andrew Waterman wrote:

On Wed, Oct 2, 2024 at 5:56 AM Jeff Law  wrote:




On 9/5/24 12:52 PM, Palmer Dabbelt wrote:

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

   PR target/116615
   * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.

So on the BPI  this is a pretty clear win.  Not surprisingly perlbench
and gcc are the big winners.  It somewhat surprisingly regresses x264,
deepsjeng & leela, but the magnitudes are smaller.  The net from a cycle
perspective is 2.4%.  Every benchmark looks better from a branch count
perspective.

So in my mind it's just a matter of fixing any testsuite fallout (I
would expect some) and this is OK.


Jeff, were you able to measure the change in static code size, too?
These results are very encouraging, but I'd like to make sure we don't
need to retain the current behavior when optimizing for size.
Codesize is ever so slightly worse.  As in less than .1%.  Not worth it 
in my mind to do something different in that range.


Jeff


Re: [PATCH]middle-end: support SLP early break

2024-10-02 Thread Richard Biener
On Tue, 1 Oct 2024, Tamar Christina wrote:

> Hi all,
> 
> This patch introduces feature parity for early break int the SLP only
> vectorizer.
> 
> The approach taken here is to treat the early exits as root statements for an
> SLP tree.  This means that we don't need any changes to build_slp to support
> gconds.
> 
> Codegen for the gcond itself now has to be done out of line but the body of 
> the
> SLP blocks itself is simply driven by SLP scheduling.  There is a slight
> awkwardness in having re-used vectorizable_early_exit for both SLP and non-SLP
> but I've documented the differences and when I did try to refactor it it 
> wasn't
> really worth it given that this is a temporary state anyway.
> 
> This version is restricted to lane = 1, as such we can re-use the existing
> move_early_break function instead of having to do safety update through
> scheduling.  I have a branch where I'm working on that but lane > 1 is out of
> scope for GCC 15 anyway.   The only reason I will try to get moving through
> scheduling done as a stretch goal is so we get epilogue vectorization back for
> early break.
> 
> The example:
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>vect_b[i] = x + i;
>if (vect_a[i]*2 != x)
>  break;
>vect_a[i] = x;
>
>  }
>  return ret;
> }
> 
> builds the following SLP instance for early break:
> 
> note:   Analyzing vectorizable control flow: if (patt_6 != 0)
> note:   Starting SLP discovery for
> note: patt_6 = _4 != x_9(D);
> note:   starting SLP discovery for node 0x63abc80
> note:   Build SLP for patt_6 = _4 != x_9(D);
> note:   precomputed vectype: vector(4) 
> note:   nunits = 4
> note:   vect_is_simple_use: operand x_9(D), type of def: external
> note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, 
> +INF] MASK 0x
> _3 * 2, type of def: internal
> note:   starting SLP discovery for node 0x63abdc0
> note:   Build SLP for _4 = _3 * 2;
> note:   precomputed vectype: vector(4) unsigned int
> note:   nunits = 4
> note:   vect_is_simple_use: operand #
> vect_aD.4416[i_15], type of def: internal
> note:   vect_is_simple_use: operand 2, type of def: constant
> note:   starting SLP discovery for node 0x63abe60
> note:   Build SLP for _3 = vect_a[i_15];
> note:   precomputed vectype: vector(4) unsigned int
> note:   nunits = 4
> note:   SLP discovery for node 0x63abe60 succeeded
> note:   SLP discovery for node 0x63abdc0 succeeded
> note:   SLP discovery for node 0x63abc80 succeeded
> note:   SLP size 3 vs. limit 10.
> note:   Final SLP tree for instance 0x6474190:
> note:   node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) 
> note:   op template: patt_6 = _4 != x_9(D);
> note: stmt 0 patt_6 = _4 != x_9(D);
> note: children 0x63abd20 0x63abdc0
> note:   node (external) 0x63abd20 (max_nunits=1, refcnt=1)
> note: { x_9(D) }
> note:   node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int
> note:   op template: _4 = _3 * 2;
> note: stmt 0 _4 = _3 * 2;
> note: children 0x63abe60 0x63abf00
> note:   node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int
> note:   op template: _3 = vect_a[i_15];
> note: stmt 0 _3 = vect_a[i_15];
> note: load permutation { 0 }
> note:   node (constant) 0x63abf00 (max_nunits=1, refcnt=1)
> note: { 2 }
> 
> and during codegen:
> 
> note:   -->vectorizing SLP node starting from: patt_6 = _4 != x_9(D);
> note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, 
> +INF] MASK 0x
> _3 * 2, type of def: internal
> note:   add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57;
> note:=== vectorizable_early_exit ===
> note:transform early-exit.
> note:   vectorizing stmts using SLP.
> note:   Vectorizing SLP tree:
> note:   node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int
> note:   op template: i_12 = i_15 + 1;
> note: stmt 0 i_12 = i_15 + 1;
> note: children 0x63aba00 0x63ac040
> note:   node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int
> note:   op template: i_15 = PHI 
> note: [l] stmt 0 i_15 = PHI 
> note: children (nil) (nil)
> note:   node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int
> note: { 1 }
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Also bootstrapped --with-build-config='bootstrap-O3 bootstrap-lto'
> --enable-checking=release,yes,rtl,extra on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?
> 
> gcc/ChangeLog:
> 
>   * tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_gcond.
>   (LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS): New.
>   (vectorizable_early_exit): Expose.
>   (class _loop_vec_info): Add early_break_live_stmts.
>   * tree-vect-slp.cc (vect_build_slp_instance, vect_analyze_slp_instance):
>   Supp

Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-02 Thread Jeff Law




On 9/5/24 12:52 PM, Palmer Dabbelt wrote:

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

PR target/116615
* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
So on the BPI  this is a pretty clear win.  Not surprisingly perlbench 
and gcc are the big winners.  It somewhat surprisingly regresses x264, 
deepsjeng & leela, but the magnitudes are smaller.  The net from a cycle 
perspective is 2.4%.  Every benchmark looks better from a branch count 
perspective.


So in my mind it's just a matter of fixing any testsuite fallout (I 
would expect some) and this is OK.


jeff



Re: [PATCH 2/3] Release expanded template argument vector

2024-10-02 Thread Jason Merrill

On 10/2/24 7:50 AM, Richard Biener wrote:

This reduces peak memory usage by 20% for a specific testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It's very ugly so I'd appreciate suggestions on how to handle such
situations better?

gcc/cp/
* pt.cc (coerce_template_parms): Release expanded argument
vector when not needed.
---
  gcc/cp/pt.cc | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 04f0a1d5fff..2c8b0d8609d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9442,6 +9442,9 @@ coerce_template_parms (tree parms,
  SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 TREE_VEC_LENGTH (new_inner_args));
  
+  if ((return_full_args ? new_args != inner_args : new_inner_args != inner_args)


I think we always want to compare new_inner_args != inner_args, 
regardless of return_full_args.  OK with that change.


Alternatively we could use std::unique_ptr or something like it for 
inner_args, as the attached (untested).  This isn't very idiomatic use 
of unique_ptr, a custom class might be better...


Jasoncommit ff16a607f8ba21bc8591f6b7476d1fc4abff693e
Author: Jason Merrill 
Date:   Wed Oct 2 08:17:19 2024 -0400

unique

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5178e4deec0..b801014e739 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_ALGORITHM // for std::equal
+#define INCLUDE_MEMORY // for std::unique_ptr
 #include "system.h"
 #include "coretypes.h"
 #include "cp-tree.h"
@@ -9079,6 +9080,11 @@ pack_expansion_args_count (tree args)
   return count;
 }
 
+struct ggc_freer
+{
+  void operator()(void *p) { ggc_free (p); }
+};
+
 /* Convert all template arguments to their appropriate types, and
return a vector containing the innermost resulting template
arguments.  If any error occurs, return error_mark_node. Error and
@@ -9161,8 +9167,13 @@ coerce_template_parms (tree parms,
  with a nested class inside a partial specialization of a class
  template, as in variadic92.C, or when deducing a template parameter pack
  from a sub-declarator, as in variadic114.C.  */
+  std::unique_ptr inner_arg_deleter;
   if (!post_variadic_parms)
-inner_args = expand_template_argument_pack (inner_args);
+{
+  inner_args = expand_template_argument_pack (inner_args);
+  if (inner_args != orig_inner_args)
+	inner_arg_deleter.reset (inner_args);
+}
 
   /* Count any pack expansion args.  */
   variadic_args_p = pack_expansion_args_count (inner_args);
@@ -9331,6 +9342,7 @@ coerce_template_parms (tree parms,
   /* We don't know how many args we have yet, just
  use the unconverted ones for now.  */
   new_inner_args = inner_args;
+	  inner_arg_deleter.release ();
 	  arg_idx = nargs;
   break;
 }


RE: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector reductions

2024-10-02 Thread Tamar Christina
> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Wednesday, October 2, 2024 1:09 PM
> To: Richard Sandiford 
> Cc: Tamar Christina ; Jennifer Schmitz
> ; gcc-patches@gcc.gnu.org; Kyrylo Tkachov
> 
> Subject: Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector
> reductions
> 
> 
> 
> > On 2 Oct 2024, at 13:43, Richard Sandiford 
> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Tamar Christina  writes:
> >> Hi Jennifer,
> >>
> >>> -Original Message-
> >>> From: Richard Sandiford 
> >>> Sent: Tuesday, October 1, 2024 12:20 PM
> >>> To: Jennifer Schmitz 
> >>> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov
> 
> >>> Subject: Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for 
> >>> vector
> >>> reductions
> >>>
> >>> Jennifer Schmitz  writes:
>  This patch implements the optabs reduc_and_scal_,
>  reduc_ior_scal_, and reduc_xor_scal_ for Advanced SIMD
>  integers for TARGET_SVE in order to use the SVE instructions ANDV, ORV, 
>  and
>  EORV for fixed-width bitwise reductions.
>  For example, the test case
> 
>  int32_t foo (int32_t *a)
>  {
>   int32_t b = -1;
>   for (int i = 0; i < 4; ++i)
> b &= a[i];
>   return b;
>  }
> 
>  was previously compiled to
>  (-O2 -ftree-vectorize --param aarch64-autovec-preference=asimd-only):
>  foo:
> ldp w2, w1, [x0]
> ldp w3, w0, [x0, 8]
> and w1, w1, w3
> and w0, w0, w2
> and w0, w1, w0
> ret
> 
>  With patch, it is compiled to:
>  foo:
> ldr q31, [x0]
>    ptrue   p7.b, all
>    andvs31, p7, z31.s
>    fmovw0, s3
>    ret
> 
>  Test cases were added to check the produced assembly for use of SVE
>  instructions.
> >>>
> >>> I would imagine that in this particular case, the scalar version is
> >>> better.  But I agree it's a useful feature for other cases.
> >>>
> >>
> >> Yeah, I'm concerned because ANDV and other reductions are extremely
> expensive.
> >> But assuming the reductions are done outside of a loop then it should be 
> >> ok,
> though.
> >>
> >> The issue is that the reduction latency grows with VL, so e.g. compare the
> latencies and
> >> throughput for Neoverse V1 and Neoverse V2.  So I think we want to gate 
> >> this
> on VL128.
> >>
> >> As an aside, is the sequence correct?  With ORR reduction ptrue makes 
> >> sense,
> but for
> >> VL > 128 ptrue doesn't work as the top bits would be zero. So an ANDV on 
> >> zero
> values
> >> lanes would result in zero.
> >
> > Argh!  Thanks for spotting that.  I'm kicking myself for missing it :(
> >
> >> You'd want to predicate the ANDV with the size of the vector being reduced.
> The same
> >> is true for SMIN and SMAX.
> >>
> >> I do wonder whether we need to split the pattern into two, where w->w uses
> the SVE
> >> Instructions but w->r uses Adv SIMD.
> >>
> >> In the case of w->r as the example above
> >>
> >>ext v1.16b, v0.16b, v0.16b, #8
> >>and v0.8b, v0.8b, v1.8b
> >>fmovx8, d0
> >>lsr x9, x8, #32
> >>and w0, w8, w9
> >>
> >> would beat the ADDV on pretty much every uarch.
> >>
> >> But I'll leave it up to the maintainers.
> >
> > Also a good point.  And since these are integer reductions, an r
> > result is more probable than a w result.  w would typically only
> > be used if the result is stored directly to memory.
> >
> > At which point, the question (which you might have been implying)
> > is whether it's worth doing this at all, given the limited cases
> > for which it's beneficial, and the complication that's needed to
> > (a) detect those cases and (b) make them work.
> 
> These are good points in the thread. Maybe it makes sense to do this only for
> V16QI reductions?
> Maybe a variant of Tamar’s w->r sequence wins out even there.

I do agree that they're worth implementing, and also for 64-bit vectors (there 
you
Skip the first reduction and just fmov the value to gpr since you don't have the
Initial 128 -> 64 bit reduction step),

But I think at the moment they're possibly not modelled as reductions in our
cost model.  Like Richard mentioned I don't think the low iteration cases
should vectorize and instead just unroll.

> 
> Originally I had hoped that we’d tackle the straight-line case from PR113816 
> but it
> seems that GCC didn’t even try to create a reduction op for the code there.
> Maybe that’s something to look into separately.

Yeah, I think unrolled scalar is going to beat the ORV there as you can have 
better
throughput doing the reductions in pairs.

> 
> Also, for the alternative test case that we tried to use for a motivation:
> char sior_loop (char *a)
> {
>   char b = 0;
>   for (int i = 0; i < 16; ++i)
> b |= a[i];
>   return b;
> }
> 
> GCC generates some terrible code: https://godbolt.org/z/a68jodKca
> So it fe

Re: [PATCH v2] Improve vsetvl vconfig alignment

2024-10-02 Thread Dusan Stojkovic
I accidentally forgot to include RISC-V in the title of the patch.
Please ignore this patch since I have sent a fixed one.
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664305.html
Sorry for the inconvenience.

From: Dusan Stojkovic 
Sent: Wednesday, October 2, 2024 1:57 PM
To: GCC Patches 
Cc: l...@gcc.gnu.org ; rdap@gmail.com 
; Mile Davidovic ; Jovan Vukic 

Subject: [PATCH v2] Improve vsetvl vconfig alignment

This patch is a new version of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662745.html

> Can you elaborate a bit on that?  Rearranging the CFG shouldn't matter
> in general and relying on the specific TARGET_SFB_ALU feels overly
> specific.
> Why does the same register in the if_then_else and interfere with vsetvl?

When ce1 pass transforms CFG in the case of the conditional move,
it deletes then and else basic blocks and in their place adds the conditional
move which uses the same pseudo-register as the original vsetvl.

This interferes with vsetvl pass precisely because of the merge policy.
Use by non rvv flag limits the cases where merging might still be possible.
This patch tries to addresses one such issue.

Agreed. I have removed TARGET_SFB_ALU flag from the condition.

> BTW Bohan Lei has since fixed a bug regarding non-RVV uses.  Does the
> situation change with that applied?

Repeated the testing for sifive-7-series as well as rocket. The same tests
are still effected positively: vsetvlmax-9, vsetvlmax-10, vsetvlmax-11, 
vsetvlmax-15
on sifive-7-series.

2024-10-2  Dusan Stojkovic  

PR target/113035

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
New fuse condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Updated 
scan-assembler-times num parameter.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.


Re: [PATCH] doc: Drop GCC 2.6 ABI change note for H8/h8300-hms

2024-10-02 Thread Jeff Law




On 10/2/24 4:45 AM, Gerald Pfeifer wrote:

Hi Jeff,

going through doc/install.texi I noticed there is same really old note on
h8300-hms, even predating egcs. :-)  Shall we drop that?
Yea, I'd think so.  I don't think we even have the -hms configurations 
anymore (they were COFF based IIRC).


Jeff



[patch,testsuite,applied] ad PR52641: Require int32 for gcc.dg/pr93820-2.c

2024-10-02 Thread Georg-Johann Lay

gcc.dg/pr93820-2.c requires int32, thus added
dg-require-effective-target int32.

Johann

--

testsuite/52641 - Require int32 for gcc.dg/pr93820-2.c.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/pr93820-2.c: Add dg-require-effective-target int32.

diff --git a/gcc/testsuite/gcc.dg/pr93820-2.c 
b/gcc/testsuite/gcc.dg/pr93820-2.c

index be5d36898f1..0bdae614c44 100644
--- a/gcc/testsuite/gcc.dg/pr93820-2.c
+++ b/gcc/testsuite/gcc.dg/pr93820-2.c
@@ -1,6 +1,7 @@
 /* PR tree-optimization/93820 */
 /* { dg-do run } */
 /* { dg-options "-O2 -fgimple" } */
+/* { dg-require-effective-target int32 } */

 typedef int v4si __attribute__((vector_size(4 * sizeof (int;
 int a[10];



[PATCH] C/116735 - ICE in build_counted_by_ref

2024-10-02 Thread Qing Zhao
From: qing zhao 

When handling the counted_by attribute, if the corresponding field
doesn't exit, in additiion to issue error, we should also remove
the already added non-existing "counted_by" attribute from the
field_decl.

bootstrapped and regression tested on both x86 and aarch64.
Okay for committing?

thanks.

Qing

==

C/PR 116735

gcc/c/ChangeLog:

* c-decl.cc (verify_counted_by_attribute): Remove the attribute
when error.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-pr116735.c: New test.
---
 gcc/c/c-decl.cc   | 31 ---
 .../gcc.dg/flex-array-counted-by-pr116735.c   | 19 
 2 files changed, 38 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index aa7f69d1b7b..ce28b0a1022 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9502,14 +9502,18 @@ verify_counted_by_attribute (tree struct_type, tree 
field_decl)
 
   tree counted_by_field = lookup_field (struct_type, fieldname);
 
-  /* Error when the field is not found in the containing structure.  */
+  /* Error when the field is not found in the containing structure and
+ remove the corresponding counted_by attribute from the field_decl.  */
   if (!counted_by_field)
-error_at (DECL_SOURCE_LOCATION (field_decl),
- "argument %qE to the %qE attribute is not a field declaration"
- " in the same structure as %qD", fieldname,
- (get_attribute_name (attr_counted_by)),
- field_decl);
-
+{
+  error_at (DECL_SOURCE_LOCATION (field_decl),
+   "argument %qE to the %qE attribute is not a field declaration"
+   " in the same structure as %qD", fieldname,
+   (get_attribute_name (attr_counted_by)),
+   field_decl);
+  DECL_ATTRIBUTES (field_decl)
+   = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
+}
   else
   /* Error when the field is not with an integer type.  */
 {
@@ -9518,11 +9522,14 @@ verify_counted_by_attribute (tree struct_type, tree 
field_decl)
   tree real_field = TREE_VALUE (counted_by_field);
 
   if (!INTEGRAL_TYPE_P (TREE_TYPE (real_field)))
-   error_at (DECL_SOURCE_LOCATION (field_decl),
- "argument %qE to the %qE attribute is not a field declaration"
- " with an integer type", fieldname,
- (get_attribute_name (attr_counted_by)));
-
+   {
+ error_at (DECL_SOURCE_LOCATION (field_decl),
+   "argument %qE to the %qE attribute is not a field 
declaration"
+   " with an integer type", fieldname,
+   (get_attribute_name (attr_counted_by)));
+ DECL_ATTRIBUTES (field_decl)
+   = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
+   }
 }
 
   return;
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c
new file mode 100644
index 000..958636512b7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+struct foo {
+  int len;
+  int element[] __attribute__ ((__counted_by__ (lenx))); /* { dg-error 
"attribute is not a field declaration in the same structure as" } */
+};
+
+int main ()
+{
+  struct foo *p = __builtin_malloc (sizeof (struct foo) + 3 * sizeof (int));
+  p->len = 1;
+  p->element[0] = 17;
+  p->len = 2;
+  p->element[1] = 13;
+  p->len = 1;
+  int x = p->element[1];
+  return x;
+}
+
-- 
2.43.5



[PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-02 Thread Dmitry Ilvokhin
Instead of looping over every byte of the tail, unroll loop manually
using switch statement, then compilers (at least GCC and Clang) will
generate a jump table [1], which is faster on a microbenchmark [2].

[1]: https://godbolt.org/z/aE8Mq3j5G
[2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24

libstdc++-v3/ChangeLog:

* libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
  loop using switch statement.

Signed-off-by: Dmitry Ilvokhin 
---
 libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
b/libstdc++-v3/libsupc++/hash_bytes.cc
index 3665375096a..294a7323dd0 100644
--- a/libstdc++-v3/libsupc++/hash_bytes.cc
+++ b/libstdc++-v3/libsupc++/hash_bytes.cc
@@ -50,10 +50,29 @@ namespace
   load_bytes(const char* p, int n)
   {
 std::size_t result = 0;
---n;
-do
-  result = (result << 8) + static_cast(p[n]);
-while (--n >= 0);
+switch(n & 7)
+  {
+  case 7:
+   result |= std::size_t(p[6]) << 48;
+   [[gnu::fallthrough]];
+  case 6:
+   result |= std::size_t(p[5]) << 40;
+   [[gnu::fallthrough]];
+  case 5:
+   result |= std::size_t(p[4]) << 32;
+   [[gnu::fallthrough]];
+  case 4:
+   result |= std::size_t(p[3]) << 24;
+   [[gnu::fallthrough]];
+  case 3:
+   result |= std::size_t(p[2]) << 16;
+   [[gnu::fallthrough]];
+  case 2:
+   result |= std::size_t(p[1]) << 8;
+   [[gnu::fallthrough]];
+  case 1:
+   result |= std::size_t(p[0]);
+  };
 return result;
   }
 
-- 
2.43.5



Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-02 Thread Jonathan Wakely
On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
>
> On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> >
> > Instead of looping over every byte of the tail, unroll loop manually
> > using switch statement, then compilers (at least GCC and Clang) will
> > generate a jump table [1], which is faster on a microbenchmark [2].
> >
> > [1]: https://godbolt.org/z/aE8Mq3j5G
> > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> >
> > libstdc++-v3/ChangeLog:
> >
> > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> >   loop using switch statement.
> >
> > Signed-off-by: Dmitry Ilvokhin 
> > ---
> >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > index 3665375096a..294a7323dd0 100644
> > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > @@ -50,10 +50,29 @@ namespace
> >load_bytes(const char* p, int n)
> >{
> >  std::size_t result = 0;
> > ---n;
> > -do
> > -  result = (result << 8) + static_cast(p[n]);
> > -while (--n >= 0);
>
> Don't we still need to loop, for the case where n >= 8? Otherwise we
> only hash the first 8 bytes.

Ah, but it's only ever called with load_bytes(end, len & 0x7)



>
> > +switch(n & 7)
> > +  {
> > +  case 7:
> > +   result |= std::size_t(p[6]) << 48;
> > +   [[gnu::fallthrough]];
> > +  case 6:
> > +   result |= std::size_t(p[5]) << 40;
> > +   [[gnu::fallthrough]];
> > +  case 5:
> > +   result |= std::size_t(p[4]) << 32;
> > +   [[gnu::fallthrough]];
> > +  case 4:
> > +   result |= std::size_t(p[3]) << 24;
> > +   [[gnu::fallthrough]];
> > +  case 3:
> > +   result |= std::size_t(p[2]) << 16;
> > +   [[gnu::fallthrough]];
> > +  case 2:
> > +   result |= std::size_t(p[1]) << 8;
> > +   [[gnu::fallthrough]];
> > +  case 1:
> > +   result |= std::size_t(p[0]);
> > +  };
> >  return result;
> >}
> >
> > --
> > 2.43.5
> >



Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-02 Thread Jonathan Wakely
On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
>
> Instead of looping over every byte of the tail, unroll loop manually
> using switch statement, then compilers (at least GCC and Clang) will
> generate a jump table [1], which is faster on a microbenchmark [2].
>
> [1]: https://godbolt.org/z/aE8Mq3j5G
> [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
>
> libstdc++-v3/ChangeLog:
>
> * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
>   loop using switch statement.
>
> Signed-off-by: Dmitry Ilvokhin 
> ---
>  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> b/libstdc++-v3/libsupc++/hash_bytes.cc
> index 3665375096a..294a7323dd0 100644
> --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> @@ -50,10 +50,29 @@ namespace
>load_bytes(const char* p, int n)
>{
>  std::size_t result = 0;
> ---n;
> -do
> -  result = (result << 8) + static_cast(p[n]);
> -while (--n >= 0);

Don't we still need to loop, for the case where n >= 8? Otherwise we
only hash the first 8 bytes.

> +switch(n & 7)
> +  {
> +  case 7:
> +   result |= std::size_t(p[6]) << 48;
> +   [[gnu::fallthrough]];
> +  case 6:
> +   result |= std::size_t(p[5]) << 40;
> +   [[gnu::fallthrough]];
> +  case 5:
> +   result |= std::size_t(p[4]) << 32;
> +   [[gnu::fallthrough]];
> +  case 4:
> +   result |= std::size_t(p[3]) << 24;
> +   [[gnu::fallthrough]];
> +  case 3:
> +   result |= std::size_t(p[2]) << 16;
> +   [[gnu::fallthrough]];
> +  case 2:
> +   result |= std::size_t(p[1]) << 8;
> +   [[gnu::fallthrough]];
> +  case 1:
> +   result |= std::size_t(p[0]);
> +  };
>  return result;
>}
>
> --
> 2.43.5
>



Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-02 Thread Jonathan Wakely
On Wed, 2 Oct 2024 at 19:25, Jonathan Wakely  wrote:
>
> On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> >
> > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> > >
> > > Instead of looping over every byte of the tail, unroll loop manually
> > > using switch statement, then compilers (at least GCC and Clang) will
> > > generate a jump table [1], which is faster on a microbenchmark [2].
> > >
> > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > >   loop using switch statement.
> > >
> > > Signed-off-by: Dmitry Ilvokhin 
> > > ---
> > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > index 3665375096a..294a7323dd0 100644
> > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > @@ -50,10 +50,29 @@ namespace
> > >load_bytes(const char* p, int n)
> > >{
> > >  std::size_t result = 0;
> > > ---n;
> > > -do
> > > -  result = (result << 8) + static_cast(p[n]);
> > > -while (--n >= 0);
> >
> > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > only hash the first 8 bytes.
>
> Ah, but it's only ever called with load_bytes(end, len & 0x7)

It seems to be slower for short strings, but a win overall:
https://quick-bench.com/q/xhh5m1akZzwUAXRiYJ17z9FASc8
This measures different lengths, and tries to ensure that the string
contents aren't treated as constant.

>
>
> >
> > > +switch(n & 7)
> > > +  {
> > > +  case 7:
> > > +   result |= std::size_t(p[6]) << 48;
> > > +   [[gnu::fallthrough]];
> > > +  case 6:
> > > +   result |= std::size_t(p[5]) << 40;
> > > +   [[gnu::fallthrough]];
> > > +  case 5:
> > > +   result |= std::size_t(p[4]) << 32;
> > > +   [[gnu::fallthrough]];
> > > +  case 4:
> > > +   result |= std::size_t(p[3]) << 24;
> > > +   [[gnu::fallthrough]];
> > > +  case 3:
> > > +   result |= std::size_t(p[2]) << 16;
> > > +   [[gnu::fallthrough]];
> > > +  case 2:
> > > +   result |= std::size_t(p[1]) << 8;
> > > +   [[gnu::fallthrough]];
> > > +  case 1:
> > > +   result |= std::size_t(p[0]);
> > > +  };
> > >  return result;
> > >}
> > >
> > > --
> > > 2.43.5
> > >



Re: [PATCH] expr: Don't clear whole unions [PR116416]

2024-10-02 Thread Marek Polacek
On Sat, Sep 28, 2024 at 08:39:12AM +0200, Jakub Jelinek wrote:
> On Fri, Sep 27, 2024 at 04:01:33PM +0200, Jakub Jelinek wrote:
> > So, I think we should go with (but so far completely untested except
> > for pr78687.C which is optimized with Marek's patch and the above testcase
> > which doesn't have the clearing anymore) the following patch.
> 
> That patch had a bug in type_has_padding_at_level_p and so it didn't
> bootstrap.
> 
> Here is a full patch which does.

[...]

And here's my patch, bootstrapped/regtested on x86_64-pc-linux-gnu
on top of Jakub's patch, ok for trunk once the prerequisite is in?

-- >8 --
This PR reports a missed optimization.  When we have:

  Str str{"Test"};
  callback(str);

as in the test, we're able to evaluate the Str::Str() call at compile
time.  But when we have:

  callback(Str{"Test"});

we are not.  With this patch (in fact, it's Patrick's patch with a little
tweak), we turn

  callback (TARGET_EXPR >>
(const char *) "Test" )

into

  callback (TARGET_EXPR )

I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold.  That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
  result is also a TARGET_EXPR;
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
  Things happen;
- getting a new slot is also problematic.  I've seen a test where we
  had "TARGET_EXPR, D.2680", and folding the whole TARGET_EXPR
  would get us "TARGET_EXPR", but since we don't see the outer
  D.2680, we can't replace it with D.2681, and things break.

With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: 
.*.a = {}"
is easy.  Previously, we would call C::C, so .gimple has:

  D.2590 = {};
  C::C (&D.2590);
  D.2597 = D.2590;
  return D.2597;

Then .einline inlines the C::C call:

  D.2590 = {};
  D.2590.a = {}; // #1
  D.2590.b = 0;  // #2
  D.2597 = D.2590;
  D.2590 ={v} {CLOBBER(eos)};
  return D.2597;

then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
passes.  But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.  The .optimized dump looks the
same though so there's no problem.

pr78687.C was fixed by Jakub's categorize_ctor_elements_1 patch.

PR c++/116416

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_r) : Try to fold
TARGET_EXPR_INITIAL and replace it with the folded result if
it's TREE_CONSTANT.

gcc/testsuite/ChangeLog:

* g++.dg/analyzer/pr97116.C: Adjust dg-message.
* g++.dg/tree-ssa/pr90883.C: Adjust dg-final.
* g++.dg/cpp0x/constexpr-prvalue1.C: New test.
* g++.dg/cpp1y/constexpr-prvalue1.C: New test.

Co-authored-by: Patrick Palka 
---
 gcc/cp/cp-gimplify.cc | 10 +++
 gcc/testsuite/g++.dg/analyzer/pr97116.C   |  2 +-
 .../g++.dg/cpp0x/constexpr-prvalue1.C | 24 +++
 .../g++.dg/cpp1y/constexpr-prvalue1.C | 30 +++
 gcc/testsuite/g++.dg/tree-ssa/pr90883.C   |  4 +--
 5 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 003e68f1ea7..c63fdf3edd1 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
 that case, strip it in favor of this one.  */
   if (tree &init = TARGET_EXPR_INITIAL (stmt))
{
+ if ((data->flags & ff_genericize)
+ && !flag_no_inline)
+   {
+ tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
+ if (folded != init && TREE_CONSTANT (folded))
+   {
+ init = folded;
+ break;
+   }
+   }
  cp_walk_tree (&init, cp_fold_r, data, NULL);
  cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL);
  *walk_subtrees = 0;
diff --git a/gcc/testsuite/g++.dg/analyzer/pr97116.C 
b/gcc/testsuite/g++.dg/analyzer/pr97116.C
index d8e08a73172..1c404c2ceb2 100644
--- a/gcc/testsuite/g++.dg/analyzer/pr97116.C
+++ b/gcc/testsuite/g++.dg/analyzer/pr97116.C
@@ -16,7 +16,7 @@ struct foo
 void test_1 (void)
 {
   foo *p = new(NULL) foo (42); // { dg-warning "non-null expected" "warning" }
-  // { dg-message "argument 'this' \\(\[^\n\]*\\) NULL where non-null 
expected" "final event" { target *-*-* } .-1 }
+  // { dg-message "argument 'this'( \\(\[^\n\]*\\))? NULL where non-null 
expected" "final event" { target *-*-* } .-1 }
 }
 
 int test_2 (void)
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue1.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue1.C
new file mode 100644
index 000..f09088d41e8
--- /dev/null
+++ b/gcc/testsui

Re: [RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Andi Kleen
Jakub Jelinek  writes:

> And for kernel perhaps we should add some new option which allows
> some dumb parsing of the toplevel asms and gather something from that
> parsing.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779

> The restrictions I've implemented are:
> 1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
>specified at toplevel, asm volatile has the volatile ignored for C++ with
>a warning and is an error in C like before
> 2) I see good use for mainly input operands, output maybe to make it clear
>that the inline asm may write some memory, I don't see a good use for
>clobbers, so the patch doesn't allow those (and of course labels because
>asm goto can't be specified)

One of the main uses for this is to specify functions that may get
called by the assembler. You proposal is to specify them as input "m" ?
Seems odd.  Perhaps this needs a new syntax.

One issue that asms also often run into is that they don't like
reordering. Some way to specify attribute((no_reorder)) would be useful.

-Andi


Re: [RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Jakub Jelinek
On Wed, Oct 02, 2024 at 01:59:03PM +0200, Richard Biener wrote:
> As you are using input constraints to mark symbol uses maybe we can
> use output constraints with a magic identifier (and a constraint letter
> specifying 'identifier'):
> 
> asm (".globl %0; %0: ret" : "_D" (extern int foo()) : ...);
> 
> In the BOF it was noted that LTO wants to be able to rename / localize
> symbols so both use and definition should be used in a way to support
> this (though changing visibility is difficult - the assembler might
> tie to GOT uses, and .globl is hard to replace).

Seems we have quite a few free letters on the constraint side, I think
'-.:;[]@(){}|_`
are all currently rejected in both input and output constraints
(we only allow ISALPHA constraint chars for the target specific ones).
So, using one of those for "the inline asm defines this global symbol",
another for "the inline asm defines this local symbol", and another
"the inline asm uses this function" might be possible;
Of course it can be also just one of those special characters with
some qualifier after it, we'd just need to tweak the generated
insn_constraint_len for that.  I think the asm uses some variable can
be expressed with "m" (var) just fine.
Anyway, for function and var definitions, should the compiler be told
more information (e.g. whether it is weak or non-weak), or should one
assume that from the actually used FUNCTION_DECL/VAR_DECL?  Should the
global vs. local be also implied from it?  Though for variables, how does
one declare a static variable declaration but not definition?

And another thing is what the argument should be, what you wrote about
would be hard to parse (parsing dependent on the constraints, we usually
don't parse the constraint until the expression is parsed).
extern int foo ();
asm ("... : : "_" (foo));
probably would (but maybe just extern "C" for C++).

And we'd need to decide whether for LTO toplevel inline asm with
extended asm (as a new extension) is actually required not to rely on
the exact function/variable name at least in the non-global case and
the compiler is allowed to rename them (i.e. use %c0 etc. rather than
the actual name).

Jakub



[PATCH] Replace another missed iterative_hash_object

2024-10-02 Thread Richard Biener
I missed one that's actually hit quite a lot, hashing of the canonical
type TYPE_HASH.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed as obvious
after the previous approval.

Richard.

* pt.cc (iterative_hash_template_arg): Use iterative_hash_hashval_t
to hash TYPE_HASH.
---
 gcc/cp/pt.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 04f0a1d5fff..20affcd65a2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1936,7 +1936,7 @@ iterative_hash_template_arg (tree arg, hashval_t val)
 
default:
  if (tree canonical = TYPE_CANONICAL (arg))
-   val = iterative_hash_object (TYPE_HASH (canonical), val);
+   val = iterative_hash_hashval_t (TYPE_HASH (canonical), val);
  else if (tree ti = TYPE_TEMPLATE_INFO (arg))
{
  val = iterative_hash_template_arg (TI_TEMPLATE (ti), val);
-- 
2.43.0


Re: [PATCH v2] c++: Don't ICE due to artificial constructor parameters [PR116722]

2024-10-02 Thread Simon Martin
Hi Jason,

On 2 Oct 2024, at 0:18, Jason Merrill wrote:

> On 10/1/24 12:44 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 30 Sep 2024, at 19:56, Jason Merrill wrote:
>>
>>> On 9/23/24 4:44 AM, Simon Martin wrote:
 Hi Jason,

 On 20 Sep 2024, at 18:01, Jason Merrill wrote:

> On 9/20/24 5:21 PM, Simon Martin wrote:
>> The following code triggers an ICE
>>
>> === cut here ===
>> class base {};
>> class derived : virtual public base {
>> public:
>>  template constexpr derived(Arg) {}
>> };
>> int main() {
>>  derived obj(1.);
>> }
>> === cut here ===
>>
>> The problem is that cxx_bind_parameters_in_call ends up 
>> attempting
>> to

>> convert a REAL_CST (the first non artificial parameter) to
>> INTEGER_TYPE
>> (the type of the __in_chrg parameter), which ICEs.
>>
>> This patch teaches cxx_bind_parameters_in_call to handle the
>> __in_chrg
>> and __vtt_parm parameters that {con,de}structors might have.
>>
>> Note that in the test case, the constructor is not
>> constexpr-suitable,
>> however it's OK since it's a template according to my read of
>> paragraph
>> (3) of [dcl.constexpr].
>
> Agreed.
>
> It looks like your patch doesn't correct the mismatching of
> arguments
> to parameters that you describe, but at least for now it should be
> enough to set *non_constant_p and return if we see a VTT or
> in-charge
> parameter.
>
 Thanks, it’s true that my initial patch was wrong in that we’d
 leave
 cxx_bind_parameters_in_call thinking the expression was actually a
 constant expression :-/

 The attached revised patch follows your suggestion (thanks!).
 Successfully tested on x86_64-pc-linux-gnu. OK for trunk?
>>>
>>> After this patch I'm seeing a regression on constexpr-dynamic10.C 
>>> with
>>> -fimplicit-constexpr; we also need to give an error here when
>>> (!ctx->quiet).
>> Thanks, good catch. TIL about --stds=impcx...
>>
>> The attached patch fixes the issue, and was successfully tested on
>> x86_64-pc-linux-gnu, including with “make -C gcc -k 
>> check-c++-all”.
>> OK for trunk?
>>
>> Note that it includes a new test that’s basically a copy of
>> constexpr-dynamic10.C, with -fimplicit-constexpr. Is that the right
>> thing to do or should I just leave the test suite as is, knowing that
>> someone/something will run the test suite with --stds=impcx at some
>> point before a release?
>
> I think leave it as is.
OK, it’s simpler :-)
>
>> And the next natural question is whether I should have also tested 
>> with
>> --stds=impcx before my initial submission?
>
> Probably a good idea when messing with constexpr.
ACK.
>
>> https://gcc.gnu.org/contribute.html#testing advises to test with 
>> “make
>> -k check”; maybe we need to also mention check-c++-all for changes 
>> to
>> the C++ front-end?
>
> Sounds good.
I sent 
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664275.html for 
that.
>
>> +  && constexpr_error (cp_expr_loc_or_input_loc (t),
>> +  /*constexpr_fundef_p=*/false,
>> +  "call to non-% function %qD", fun))
>
> Let's use "error_at" here, like in cxx_eval_call_expression; 
> constexpr_error with fundef_p false is equivalent.
>
> OK with those adjustments.
Thanks. Merged as r15—4026.

Simon



[PATCH] testsuite: Unset torture_current_flags after use

2024-10-02 Thread Richard Sandiford
Before running a test with specific torture options, gcc-dg-runtest
sets the global variable torture_current_flags to the set of torture
options that will be used.  However, it never unset the variable
afterwards, which meant that the last options would hang around
and potentially confuse later non-torture tests.

I saw this with a follow-on patch to check-function-bodies, but it's
probably possible to construct aritificial test combinations that
expose it with check-function-bodies's existing flag filtering.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/testsuite/
* gcc/testsuite/lib/gcc-dg.exp (gcc-dg-runtest): Unset
torture_current_flags after each test.
---
 gcc/testsuite/lib/gcc-dg.exp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index cb401a70435..7adca02f937 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -628,6 +628,7 @@ proc gcc-dg-runtest { testcases flags default-extra-flags } 
{
set torture_current_flags "$flags_t"
verbose "Testing $nshort, $flags $flags_t" 1
dg-test $test "$flags $flags_t" ${default-extra-flags}
+   unset torture_current_flags
}
 }
 
-- 
2.25.1



[PATCH] testsuite: Make check-function-bodies work with LTO

2024-10-02 Thread Richard Sandiford
This patch tries to make check-function-bodies automatically
choose between reading the regular assembly file and reading the
LTO assembly file.  There should only ever be one right answer,
since check-function-bodies doesn't make sense on slim LTO output.

Maybe this will turn out to be impossible to get right, but I'd like
to try at least.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/testsuite/
* lib/scanasm.exp (check-function-bodies): Look in ltrans0.ltrans.s
if the test appears to be using LTO.
---
 gcc/testsuite/lib/scanasm.exp | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 737eefc655e..26504deb0e6 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -997,16 +997,17 @@ proc check-function-bodies { args } {
error "too many arguments to check-function-bodies"
 }
 
+upvar 2 dg-extra-tool-flags extra_tool_flags
+set flags $extra_tool_flags
+
+global torture_current_flags
+if { [info exists torture_current_flags] } {
+   append flags " " $torture_current_flags
+}
+
 if { [llength $args] >= 3 } {
set required_flags [lindex $args 2]
 
-   upvar 2 dg-extra-tool-flags extra_tool_flags
-   set flags $extra_tool_flags
-
-   global torture_current_flags
-   if { [info exists torture_current_flags] } {
-   append flags " " $torture_current_flags
-   }
foreach required_flag $required_flags {
switch -- $required_flag {
target -
@@ -1043,7 +1044,14 @@ proc check-function-bodies { args } {
 
 global srcdir
 set input_filename "$srcdir/$filename"
-set output_filename "[file rootname [file tail $filename]].s"
+set output_filename "[file rootname [file tail $filename]]"
+if { [string match "* -flto *" " ${flags} "]
+&& ![string match "* -fno-use-linker-plugin *" " ${flags} "]
+&& ![string match "* -ffat-lto-objects *" " ${flags} "] } {
+   append output_filename ".ltrans0.ltrans.s"
+} else {
+   append output_filename ".s"
+}
 
 set prefix [lindex $args 0]
 set prefix_len [string length $prefix]
-- 
2.25.1



[PATCH] aarch64: Fix SVE ACLE gimple folds for C++ LTO [PR116629]

2024-10-02 Thread Richard Sandiford
The SVE ACLE code has two ways of handling overloaded functions.
One, used by C, is to define a single dummy function for each unique
overloaded name, with resolve_overloaded_builtin then resolving calls
to real non-overloaded functions.  The other, used by C++, is to
define a separate function for each individual overload.

The builtins harness assigns integer function codes programmatically.
However, LTO requires it to use the same assignment for every
translation unit, regardless of language.  This means that C++ TUs
need to create (unused) slots for the C overloads and that C TUs
need to create (unused) slots for the C++ overloads.

In many ways, it doesn't matter whether the LTO frontend itself
uses the C approach or the C++ approach to defining overloaded
functions, since the LTO frontend never has to resolve source-level
overloading.  However, the C++ approach of defining a separate
function for each overload means that C++ calls never need to
be redirected to a different function.  Calls to an overload
can appear in the LTO dump and survive until expand.  In contrast,
calls to C's dummy overload functions are resolved by the front
end and never survive to LTO (or expand).

Some optimisations work by moving between sibling functions, such as _m
to _x.  If the source function is an overload, the expected destination
function is too.  The LTO frontend needs to define C++ overloads if it
wants to do this optimisation properly for C++.

The PR is about a tree checking failure caused by trying to use a
stubbed-out C++ overload in LTO.  Dealing with that by detecting the
stub (rather than changing which overloads are defined) would have
turned this from an ice-on-valid to a missed optimisation.

In future, it would probably make sense to redirect overloads to
non-overloaded functions during gimple folding, in case that exposes
more CSE opportunities.  But it'd probably be of limited benefit, since
it should be rare for code to mix overloaded and non-overloaded uses of
the same operation.  It also wouldn't be suitable for backports.

If no-one has any objections, I'll push this once the prerequisite
testsuite patches are approved.

Thanks,
Richard


gcc/
PR target/116629
* config/aarch64/aarch64-sve-builtins.cc
(function_builder::function_builder): Use direct overloads for LTO.

gcc/testsuite/
PR target/116629
* gcc.target/aarch64/sve/acle/general/pr106326_2.c: New test.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc|   2 +-
 .../aarch64/sve/acle/general/pr106326_2.c | 381 ++
 2 files changed, 382 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_2.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 5ff46212d18..e7c703c987e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1283,7 +1283,7 @@ function_builder::function_builder (handle_pragma_index 
pragma_index,
bool function_nulls)
 {
   m_overload_type = build_function_type (void_type_node, void_list_node);
-  m_direct_overloads = lang_GNU_CXX ();
+  m_direct_overloads = lang_GNU_CXX () || in_lto_p;
 
   if (initial_indexes[pragma_index] == 0)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_2.c
new file mode 100644
index 000..deb936cac5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_2.c
@@ -0,0 +1,381 @@
+/* { dg-do link } */
+/* { dg-options "-O2 -flto -shared -fPIC --save-temps" } */
+/* { dg-require-effective-target shared } */
+/* { dg-require-effective-target fpic } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+** add1:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add1 (svint32_t x, svint32_t y)
+{
+  return svadd_z (svptrue_b8 (), x, y);
+}
+
+/*
+** add2:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add2 (svint32_t x, svint32_t y)
+{
+  return svadd_z (svptrue_b16 (), x, y);
+}
+
+/*
+** add3:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add3 (svint32_t x, svint32_t y)
+{
+  return svadd_z (svptrue_b32 (), x, y);
+}
+
+/*
+** add4:
+** ...
+** movprfx [^\n]+
+** ...
+** ret
+*/
+svint32_t
+add4 (svint32_t x, svint32_t y)
+{
+  return svadd_z (svptrue_b64 (), x, y);
+}
+
+/*
+** add5:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add5 (svint32_t x, svint32_t y)
+{
+  return svadd_m (svptrue_b8 (), x, y);
+}
+
+/*
+** add6:
+** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s)
+** ret
+*/
+svint32_t
+add6 (svint32_t x, svint32_t y)
+{
+  return svadd_m (svptrue_b16 (), x, y);
+}
+
+/*
+** add7:
+** add z0\.s, (z1

Re: [PATCH] testsuite: Unset torture_current_flags after use

2024-10-02 Thread Richard Biener



> Am 02.10.2024 um 15:48 schrieb Richard Sandiford :
> 
> Before running a test with specific torture options, gcc-dg-runtest
> sets the global variable torture_current_flags to the set of torture
> options that will be used.  However, it never unset the variable
> afterwards, which meant that the last options would hang around
> and potentially confuse later non-torture tests.
> 
> I saw this with a follow-on patch to check-function-bodies, but it's
> probably possible to construct aritificial test combinations that
> expose it with check-function-bodies's existing flag filtering.
> 
> Tested on aarch64-linux-gnu.  OK to install?

Ok

Richard 

> Richard
> 
> 
> gcc/testsuite/
>* gcc/testsuite/lib/gcc-dg.exp (gcc-dg-runtest): Unset
>torture_current_flags after each test.
> ---
> gcc/testsuite/lib/gcc-dg.exp | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> index cb401a70435..7adca02f937 100644
> --- a/gcc/testsuite/lib/gcc-dg.exp
> +++ b/gcc/testsuite/lib/gcc-dg.exp
> @@ -628,6 +628,7 @@ proc gcc-dg-runtest { testcases flags default-extra-flags 
> } {
>set torture_current_flags "$flags_t"
>verbose "Testing $nshort, $flags $flags_t" 1
>dg-test $test "$flags $flags_t" ${default-extra-flags}
> +unset torture_current_flags
>}
> }
> 
> --
> 2.25.1
> 


Re: [PATCH] middle-end: Fix ifcvt predicate generation for masked function calls

2024-10-02 Thread Victor Do Nascimento

On 10/1/24 13:10, Richard Biener wrote:

On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina  wrote:


Hi Victor,

Thanks! This looks good to me with one minor comment:


-Original Message-
From: Victor Do Nascimento 
Sent: Monday, September 30, 2024 2:34 PM
To: gcc-patches@gcc.gnu.org
Cc: Tamar Christina ; richard.guent...@gmail.com;
Victor Do Nascimento 
Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function
calls

Up until now, due to a latent bug in the code for the ifcvt pass,
irrespective of the branch taken in a conditional statement, the
original condition for the if statement was used in masking the
function call.

Thus, for code such as:

   if (a[i] > limit)
 b[i] = fixed_const;
   else
 b[i] = fn (a[i]);

we would generate the following (wrong) if-converted tree code:

   _1 = a[i_1];
   _2 = _1 > limit;
   _3 = .MASK_CALL (fn, _1, _2);
   cstore_4 = _2 ? fixed_const : _3;

as opposed to the correct expected sequence:

   _1 = a[i_1];
   _2 = _1 > limit;
   _3 = ~_2;
   _4 = .MASK_CALL (fn, _1, _3);
   cstore_5 = _2 ? fixed_const : _4;

This patch ensures that the correct predicate mask generation is
carried out such that, upon autovectorization, the correct vector
lanes are selected in the vectorized function call.

gcc/ChangeLog:

   * tree-if-conv.cc (predicate_statements): Fix handling of
   predicated function calls.

gcc/testsuite/ChangeLog:

   * gcc.dg/vect/vect-fncall-mask.c: New.
---
  gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 
  gcc/tree-if-conv.cc  | 14 -
  2 files changed, 44 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
new file mode 100644
index 000..554488e0630
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast"
{ target { aarch64*-*-* } } } */
+
+extern int __attribute__ ((simd, const)) fn (int);
+
+const int N = 20;
+const float lim = 101.0;
+const float cst =  -1.0;
+float tot =   0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+ [10 ... 19] = 100.0 };/* Else branch.  */
+
+int main (void)
+{
+  #pragma omp simd
+  for (int i = 0; i < N; i += 1)
+{
+  if (a[i] > lim)
+ b[i] = cst;
+  else
+ b[i] = fn (a[i]);
+  tot += b[i];
+}
+  return (0);
+}
+
+/* { dg-final { scan-tree-dump {gimple_assign } ifcvt } } */
+/* { dg-final { scan-tree-dump {gimple_assign } ifcvt } } */
+/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} 
ifcvt } }
*/
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 0346a1376c5..246a6bb5bd1 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
This will cause the vectorizer to match the "in branch"
clone variants, and serves to build the mask vector
in a natural way.  */
+   tree mask = cond;
+   gimple_seq stmts = NULL;
 gcall *call = dyn_cast  (gsi_stmt (gsi));
 tree orig_fn = gimple_call_fn (call);
 int orig_nargs = gimple_call_num_args (call);
@@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
 args.safe_push (orig_fn);
 for (int i = 0; i < orig_nargs; i++)
   args.safe_push (gimple_call_arg (call, i));
-   args.safe_push (cond);
+   /* If `swap', we invert the mask used for the if branch for use
+  when masking the function call.  */
+   if (swap)
+ {
+   tree true_val
+ = constant_boolean_node (true, TREE_TYPE (mask));
+   mask = gimple_build (&stmts, BIT_XOR_EXPR,
+TREE_TYPE (mask), mask, true_val);
+ }
+   gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);


Looks like this mirrors what is currently being done for gimple_assign, but
you can move the gsi_insert_seq and the declaration of stmts into the if
block since they're only used there.

Otherwise looks good to me but can't approve.


OK.

The issue is also present on the 13 and 14 branches?  Can you see to
backport the fix?

Thanks,
Richard.


Of course, will look at getting it backported ASAP.

Many thanks,
Victor.


Thanks,
Tamar


+   args.safe_push (mask);

 /* Replace the call with a IFN_MASK_CALL that has the extra
condition parameter. */
--
2.34.1




Re: [PATCH] C/116735 - ICE in build_counted_by_ref

2024-10-02 Thread Jakub Jelinek
On Wed, Oct 02, 2024 at 11:48:16AM -0400, Marek Polacek wrote:
> > +  error_at (DECL_SOURCE_LOCATION (field_decl),
> > +   "argument %qE to the %qE attribute is not a field declaration"
> > +   " in the same structure as %qD", fieldname,
> > +   (get_attribute_name (attr_counted_by)),
> 
> Why use get_attribute_name when we know it must be "counted_by"?  And below
> too.

There might be a reason if the message would be used by multiple
spots with different attributes and the other uses would need that %qE,
rather than say %qs or % (to make it easier for translators).
If the message is only for this attribute, just use %, or
if it would be for several attributes but in each case you'd know the name
as constant literal, %qs with "counted_by" operand would be best.

That said, the ()s around the call are also superfluous, so if it isn't
changed, it should be just
get_attribute_name (attr_counted_by),

Jakub



[PATCH] middle-end: reorder masking priority of math functions

2024-10-02 Thread Victor Do Nascimento
Given the categorization of math built-in functions as `ECF_CONST',
when if-converting their uses, their calls are not masked and are thus
called with an all-true predicate.

This, however, is not appropriate where built-ins have library
equivalents, wherein they may exhibit highly architecture-specific
behaviors. For example, vectorized implementations may delegate the
computation of values outside a certain acceptable numerical range to
special (non-vectorized) routines which considerably slow down
computation.

As numerical simulation programs often do bounds check on input values
prior to math calls, conditionally assigning default output values for
out-of-bounds input and skipping the math call altogether, these
fallback implementations should seldom be called in the execution of
vectorized code.  If, however, we don't apply any masking to these
math functions, we end up effectively executing both if and else
branches for these values, leading to considerable performance
degradation on scientific workloads.

We therefore invert the order of handling of math function calls in
`if_convertible_stmt_p' to prioritize the handling of their
library-provided implementations over the equivalent internal function.

Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no
new regressions.

gcc/ChangeLog:

* tree-if-conv.cc (if_convertible_stmt_p): Check for explicit
function declaration before IFN fallback.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fncall-mask-math.c: New.
---
 .../gcc.dg/vect/vect-fncall-mask-math.c   | 33 +++
 gcc/tree-if-conv.cc   | 18 +-
 2 files changed, 42 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c 
b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
new file mode 100644
index 000..15e22da2807
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c
@@ -0,0 +1,33 @@
+/* Test the correct application of masking to autovectorized math function 
calls.
+   Test is currently set to xfail pending the release of the relevant lmvec
+   support. */
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" 
{ target { aarch64*-*-* } } } */
+
+#include 
+
+const int N = 20;
+const float lim = 101.0;
+const float cst =  -1.0;
+float tot =   0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+   [10 ... 19] = 100.0 };/* Else branch.  */
+
+int main (void)
+{
+  #pragma omp simd
+  for (int i = 0; i < N; i += 1)
+{
+  if (a[i] > lim)
+   b[i] = cst;
+  else
+   b[i] = expf (a[i]);
+  tot += b[i];
+}
+  return (0);
+}
+
+/* { dg-final { scan-tree-dump-not { gimple_call } ifcvt { xfail 
{ aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, _30>} 
ifcvt { xfail { aarch64*-*-* } } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 3b04d1e8d34..90c754a4814 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)
 
 case GIMPLE_CALL:
   {
-   /* There are some IFN_s that are used to replace builtins but have the
-  same semantics.  Even if MASK_CALL cannot handle them vectorable_call
-  will insert the proper selection, so do not block conversion.  */
-   int flags = gimple_call_flags (stmt);
-   if ((flags & ECF_CONST)
-   && !(flags & ECF_LOOPING_CONST_OR_PURE)
-   && gimple_call_combined_fn (stmt) != CFN_LAST)
- return true;
-
tree fndecl = gimple_call_fndecl (stmt);
if (fndecl)
  {
@@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt, 
vec refs)
  }
  }
 
+   /* There are some IFN_s that are used to replace builtins but have the
+  same semantics.  Even if MASK_CALL cannot handle them vectorable_call
+  will insert the proper selection, so do not block conversion.  */
+   int flags = gimple_call_flags (stmt);
+   if ((flags & ECF_CONST)
+   && !(flags & ECF_LOOPING_CONST_OR_PURE)
+   && gimple_call_combined_fn (stmt) != CFN_LAST)
+ return true;
+
return false;
   }
 
-- 
2.34.1



Re: [PATCH] Fix const constraint in std::stable_sort and std::inplace_merge

2024-10-02 Thread Jonathan Wakely
On Wed, 25 Sept 2024 at 18:22, François Dumont  wrote:
>
> Hi
>
> Once https://gcc.gnu.org/pipermail/libstdc++/2024-September/059568.html
> will be accepted we will be able fix this long lasting issue that
> std::stable_sort and std::inplace_merge are forcing the functor to take
> const& parameters even when iterators used in range are not const ones.

https://cplusplus.github.io/LWG/issue3031 said that's OK.



Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

2024-10-02 Thread Matthew Malcomson
Thanks Jonathan,

I agree with your point that having just the check against one of the 
overloaded versions is not very robust and having multiple checks against 
different versions would be better.

Unfortunately - while asking the clang folk about this I realised that clang 
doesn't expose the resolved versions (e.g. the existing versions like 
`__atomic_load_2` etc) to the user.
Instead they allow using SFINAE on these overloaded builtins.
https://discourse.llvm.org/t/atomic-floating-point-operations-and-libstdc/81461

I spent some time looking at this and it seems that enabling SFINAE in GCC for 
these builtins is not too problematic (idea being to pass in a 
`error_on_noresolve` boolean to `resolve_overloaded_builtin` based on the 
context in the C++ frontend, then only emit errors if that boolean is set).

To Jonathon:

  *
Would you be OK with using SFINAE to choose whether to use the 
__atomic_fetch_add builtin for floating point types in libstdc++?

At C++ frontend maintainers I Cc'd in:

  *
Are you happy with the idea of enabling SFINAE on overloaded builtins resolved 
via resolve_overloaded_builtin?

To global maintainers I Cc'd in:

  *
Is there any reason you know of not to enable SFINAE on the overloaded builtins?
  *
Would it be OK to enable SFINAE on the generic overloaded builtins and add the 
parameter so that targets can do the same for their target-specific builtins 
(i.e. without changing the behaviour of the existing target specific builtins)?


From: Jonathan Wakely 
Sent: 19 September 2024 3:47 PM
To: Matthew Malcomson 
Cc: gcc-patches@gcc.gnu.org ; Joseph Myers 
; Richard Biener 
Subject: Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

External email: Use caution opening links or attachments


On Thu, 19 Sept 2024 at 14:12,  wrote:
>
> From: Matthew Malcomson 
>
> Hello, this is an RFC for adding an atomic floating point fetch_add builtin
> (and variants) to GCC.  The atomic fetch_add operation is defined to work
> on the base floating point types in the C++20 standard chapter 31.7.3, and
> extended to work for all cv-unqualified floating point types in C++23
> chapter 33.5.7.4.
>
> Honestly not sure who to Cc, please do point me to someone else if that's
> better.
>
> This is nowhere near complete (for one thing even the tests I've added
> don't fully pass), but I think I have a complete enough idea that it's
> worth checking if this is something that could be agreed on.
>
> As it stands no target except the nvptx backend would natively support
> these operations.
>
> Main questions that I'm looking to resolve with this RFC:
> 1) Would GCC be OK accepting this implementation even though no backend
>would be implementing these yet?
>- AIUI only the nvptx backend could theoretically implement this.
>- Even without a backend implementing it natively, the ability to use
>  this in code (especially libstdc++) enables other compilers to
>  generate better code for GPU's using standard C++.
> 2) Would libstdc++ be OK relying on `__has_builtin(__atomic_fetch_add_fp)`
>(i.e. a check on the resolved builtin rather than the more user-facing
>one) in order to determine whether floating point atomic fetch_add is
>available.

Yes, if that name is what other compilers will also use (have you
discussed this with Clang?)

It looks like PATCH 5/8 only uses the _fp name for fetch_add though,
and just uses fetch_sub etc. for the other functions, is that a
mistake?

>- N.b. this builtin is actually the builtin working on the "double"

OK, so the library code just calls the generic __atomic_fetch_add that
accepts any types, but then that gets expanded to a more specific form
for float, double etc.?
And the more specific form has to exist at some level, because we need
an extern symbol from libatomic, so either we include the type as an
explicit suffix on the name, or we use some kind of name mangling like
_Z18__atomic_fetch_addPdS_S_, which is obviously nasty.

>  type, one would have to rely on any compilers implementing that
>  particular resolved builtin to also implement the other floating point
>  atomic fetch_add builtins that they would want to support in libstdc++
>  `atomic<[floating_point_type]>::fetch_add`.

This seems a bit concerning. I can imagine somebody implementing these
for float and double first, but leaving long double, _Float64,
_Float32, _Float128 etc. for later. In that case, libstdc++ would not
work if somebody tries to use std::atomic, or whichever
types aren't supported yet. It's OK if we can be *sure* that won't
happen i.e. that Clang will either implement the new built-in for
*all* FP types, or none.

>
> More specific questions about the choice of which builtins to implement and
> whether the types are OK:
> 1) Is it OK to not implement the `__sync_*` versions?
>Since these are deprecated and the `__atomic_*` versions are there to
>match the C/C++ code atomic op

[patch,avr,applied] Make gcc.dg/pr113596.c work on AVR

2024-10-02 Thread Georg-Johann Lay

gcc.dg/pr113596.c alloca'tes  up to 8 KiB on stack,
which is too much for AVR.  This patch requests less
memory on AVR.

Johann

--

AVR: Make gcc.dg/pr113596.c work.

gcc/testsuite/
* gcc.dg/pr113596.c: Require less memory so it works on AVR.

diff --git a/gcc/testsuite/gcc.dg/pr113596.c 
b/gcc/testsuite/gcc.dg/pr113596.c

index 19e0ab6dc46..3655ffef3f9 100644
--- a/gcc/testsuite/gcc.dg/pr113596.c
+++ b/gcc/testsuite/gcc.dg/pr113596.c
@@ -16,9 +16,17 @@ foo (int n)
   bar (p, n);
 }

+#if defined __AVR__
+/* For AVR devices, AVRtest assigns 8 KiB of stack, which is not quite
+   enough for this test case.  Thus request less memory on AVR.  */
+#define ALLOC 6000
+#else
+#define ALLOC 8192
+#endif
+
 int
 main ()
 {
-  for (int i = 2; i < 8192; ++i)
+  for (int i = 2; i < ALLOC; ++i)
 foo (i);
 }


[PATCH v4 3/7] OpenMP: C front-end support for dispatch + adjust_args

2024-10-02 Thread Paul-Antoine Arras
This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.

Additional common C/C++ testcases are in a later patch in the series.

gcc/c-family/ChangeLog:

* c-attribs.cc (c_common_gnu_attributes): Add attribute for adjust_args
need_device_ptr.
* c-omp.cc (c_omp_directives): Uncomment dispatch.
* c-pragma.cc (omp_pragmas): Add dispatch.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_DISPATCH.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_NOCONTEXT and
PRAGMA_OMP_CLAUSE_NOVARIANTS.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.
---
 gcc/c-family/c-attribs.cc |   2 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   3 +
 gcc/c/c-parser.cc | 536 +++---
 gcc/c/c-typeck.cc |   2 +
 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c |  32 ++
 gcc/testsuite/gcc.dg/gomp/dispatch-1.c|  53 +++
 libgomp/testsuite/libgomp.c/dispatch-1.c  |  76 +++
 libgomp/testsuite/libgomp.c/dispatch-2.c  |  84 
 10 files changed, 733 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/dispatch-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/dispatch-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/dispatch-2.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 4dd2eecbea5..fab9b5b8b23 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -571,6 +571,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_omp_declare_variant_attribute, NULL },
   { "omp declare variant variant", 0, -1, true,  false, false, false,
  handle_omp_declare_variant_attribute, NULL },
+  { "omp declare variant adjust_args need_device_ptr", 0, -1, true,  false, 
false, false,
+ handle_omp_declare_variant_attribute, NULL },
   { "simd",  0, 1, true,  false, false, false,
  handle_simd_attribute, NULL },
   { "omp declare target", 0, -1, true, false, false, false,
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 620a3c1353a..5a0ed636677 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -4300,8 +4300,8 @@ const struct c_omp_directive c_omp_directives[] = {
 C_OMP_DIR_DECLARATIVE, false },
   { "depobj", nullptr, nullptr, PRAGMA_OMP_DEPOBJ,
 C_OMP_DIR_STANDALONE, false },
-  /* { "dispatch", nullptr, nullptr, PRAGMA_OMP_DISPATCH,
-C_OMP_DIR_CONSTRUCT, false },  */
+  { "dispatch", nullptr, nullptr, PRAGMA_OMP_DISPATCH,
+C_OMP_DIR_DECLARATIVE, false },
   { "distribute", nullptr, nullptr, PRAGMA_OMP_DISTRIBUTE,
 C_OMP_DIR_CONSTRUCT, true },
   { "end", "assumes", nullptr, PRAGMA_OMP_END,
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index ed2a7a00e9e..040370cbb6f 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1526,6 +1526,7 @@ static const struct omp_pragma_def omp_pragmas[] = {
   { "cancellation", PRAGMA_OMP_CANCELLATION_POINT },
   { "critical", PRAGMA_OMP_CRITICAL },
   { "depobj", PRAGMA_OMP_DEPOBJ },
+  { "dispatch", PRAGMA_OMP_DISPATCH },
   { "error", PRAGMA_OMP_ERROR },
   { "end", PRAGMA_OMP_END },
   { "flush", PRAGMA_OMP_FLUSH },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 2ebde06c471..6b6826b2426 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,7 @@ enum pragma_kind {
   PRAGMA_OMP_CRITICAL,
   PRAGMA_OMP_DECLARE,
   PRAGMA_OMP_DEPOBJ,
+  PRAGMA_OMP_DISPATCH,
   PRAGMA_OMP_DISTRIBUTE,
   PRAGMA_OMP_ERROR,
   PRAGMA_OMP_END,
@@ -135,9 +136,11 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_LINK,
   PRAGMA_OMP_CLAUSE_MAP,
   PRAGMA_OMP_CLAUSE_MERGEABLE,
+  PRAGMA_OMP_CLAUSE_NOCONTEXT,
   PRAGMA_OMP_CLAUSE_NOGROUP,
   PRAGMA_OMP_CLAUSE_NONTEMPORAL,
   PRAGMA_OM

[PATCH v4 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-10-02 Thread Paul-Antoine Arras
This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.

gcc/fortran/ChangeLog:

* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.
---
 gcc/builtin-types.def|  1 +
 gcc/fortran/types.def|  1 +
 gcc/omp-selectors.h  |  1 +
 gcc/tree-core.h  |  7 +++
 gcc/tree-pretty-print.cc | 21 +
 gcc/tree.cc  |  4 
 gcc/tree.def |  5 +
 gcc/tree.h   |  7 +++
 8 files changed, 47 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
 DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
 DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 390cc9542f7..5047c8f816a 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -120,6 +120,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, 
BT_BOOL)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 BT_VOID, BT_PTR, BT_PTRMODE)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h
index 730021ea747..447c3b8173f 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -56,6 +56,7 @@ enum omp_ts_code {
   OMP_TRAIT_CONSTRUCT_PARALLEL,
   OMP_TRAIT_CONSTRUCT_FOR,
   OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
   OMP_TRAIT_LAST,
   OMP_TRAIT_INVALID = -1
 };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 4ba63ebd4f1..b7c92daa1e6 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index b378ffbfb4c..61cd8708524 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
 case OMP_CLAUSE_EXCLUSIVE:
   name = "exclusive";
   goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
 case OMP_CLAUSE__LOOPTEMP_:
   name = "_looptemp_";
   goto print_remap;
@@ -3947,6 +3963,11 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   dump_omp_clauses (pp, OMP_SECTIONS_CLAUSES (node), spc, flags);
   goto dump_omp_body;
 
+case OMP_DISPATCH:
+  pp_string (pp, "#pragma omp dispatch");
+  dump_omp_clauses (pp, OMP_DISPATCH_CLAUSES (node), spc, flags);
+  goto dump_omp_body;
+
 case OMP_SECTION:
   pp_string (pp, "#pragma omp section");
   goto dump_omp_body;
diff --git a/gcc/tree.cc b/gcc/tree.cc

[PATCH v4 0/7] OpenMP: dispatch + adjust_args support

2024-10-02 Thread Paul-Antoine Arras
This is a respin of my patchset implementing both the `dispatch` construct and 
the `adjust_args` clause to the `declare variant` directive. The previous
submission can be found there:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659719.html

Compared to v3, this new iteration handles Tobias's comments on the ME patch
(https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660921.html). In 
particular, it defines and uses a new internal function (namely 
IFN_GOMP_DISPATCH) to allow the middle end to more easily and accurately find 
the dispatch call. This is important when the single function call we are 
interested in is sandwiched in a sequence of pre- and post-call statements, 
or when the same function is call several times within the dispatch body.


Paul-Antoine Arras (7):
  OpenMP: dispatch + adjust_args tree data structures and front-end
interfaces
  OpenMP: middle-end support for dispatch + adjust_args
  OpenMP: C front-end support for dispatch + adjust_args
  OpenMP: C++ front-end support for dispatch + adjust_args
  OpenMP: common C/C++ testcases for dispatch + adjust_args
  OpenMP: Fortran front-end support for dispatch + adjust_args
  OpenMP: update documentation for dispatch and adjust_args

 gcc/builtin-types.def |   1 +
 gcc/c-family/c-attribs.cc |   2 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   3 +
 gcc/c/c-parser.cc | 536 +--
 gcc/c/c-typeck.cc |   2 +
 gcc/cp/decl.cc|   7 +
 gcc/cp/parser.cc  | 644 --
 gcc/cp/semantics.cc   |  20 +
 gcc/fortran/dump-parse-tree.cc|  17 +
 gcc/fortran/frontend-passes.cc|   2 +
 gcc/fortran/gfortran.h|  12 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 195 +-
 gcc/fortran/parse.cc  |  51 +-
 gcc/fortran/resolve.cc|   2 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |   9 +-
 gcc/fortran/trans-openmp.cc   | 197 ++
 gcc/fortran/trans.cc  |   1 +
 gcc/fortran/types.def |   1 +
 gcc/gimple-low.cc |   1 +
 gcc/gimple-pretty-print.cc|  33 +
 gcc/gimple-walk.cc|   1 +
 gcc/gimple.cc |  20 +
 gcc/gimple.def|   5 +
 gcc/gimple.h  |  33 +-
 gcc/gimplify.cc   | 484 -
 gcc/gimplify.h|   1 +
 gcc/internal-fn.cc|   8 +
 gcc/internal-fn.def   |   1 +
 gcc/omp-builtins.def  |   6 +
 gcc/omp-general.cc|  14 +-
 gcc/omp-low.cc|  35 +
 gcc/omp-selectors.h   |   1 +
 .../c-c++-common/gomp/adjust-args-1.c |  30 +
 .../c-c++-common/gomp/adjust-args-2.c |  31 +
 .../c-c++-common/gomp/declare-variant-2.c |   4 +-
 gcc/testsuite/c-c++-common/gomp/dispatch-1.c  |  71 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-2.c  |  28 +
 gcc/testsuite/c-c++-common/gomp/dispatch-3.c  |  12 +
 gcc/testsuite/c-c++-common/gomp/dispatch-4.c  |  18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-5.c  |  34 +
 gcc/testsuite/c-c++-common/gomp/dispatch-6.c  |  18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-7.c  |  21 +
 gcc/testsuite/c-c++-common/gomp/dispatch-8.c  |  63 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-9.c  |  17 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |  39 ++
 gcc/testsuite/g++.dg/gomp/adjust-args-2.C |  51 ++
 gcc/testsuite/g++.dg/gomp/dispatch-1.C|  53 ++
 gcc/testsuite/g++.dg/gomp/dispatch-2.C|  62 ++
 gcc/testsuite/g++.dg/gomp/dispatch-3.C|  17 +
 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c |  32 +
 gcc/testsuite/gcc.dg/gomp/dispatch-1.c|  53 ++
 .../gfortran.dg/gomp/adjust-args-1.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-2.f90|  18 +
 .../gfortran.dg/gomp/adjust-args-3.f90|  27 +
 .../gfortran.dg/gomp/adjust-args-4.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-5.f90|  58 ++
 .../gfortran.dg/gomp/declare-variant-2.f90|   6 +-
 .../gomp/declare-variant-21-aux.f90   |  25 +
 .../gfortran.dg/gomp/declare-variant-21.f90   |  22 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-1.f90 |  77 +++
 .../gfortran.dg/gomp/dispatch-10.f90  |  21 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-2.f90 |  79 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-3.f90 |  39 ++
 

[PATCH v4 5/7] OpenMP: common C/C++ testcases for dispatch + adjust_args

2024-10-02 Thread Paul-Antoine Arras
gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives.
* c-c++-common/gomp/adjust-args-1.c: New test.
* c-c++-common/gomp/adjust-args-2.c: New test.
* c-c++-common/gomp/dispatch-1.c: New test.
* c-c++-common/gomp/dispatch-2.c: New test.
* c-c++-common/gomp/dispatch-3.c: New test.
* c-c++-common/gomp/dispatch-4.c: New test.
* c-c++-common/gomp/dispatch-5.c: New test.
* c-c++-common/gomp/dispatch-6.c: New test.
* c-c++-common/gomp/dispatch-7.c: New test.
* c-c++-common/gomp/dispatch-8.c: New test.
---
 .../c-c++-common/gomp/adjust-args-1.c | 30 
 .../c-c++-common/gomp/adjust-args-2.c | 31 
 .../c-c++-common/gomp/declare-variant-2.c |  4 +-
 gcc/testsuite/c-c++-common/gomp/dispatch-1.c  | 71 +++
 gcc/testsuite/c-c++-common/gomp/dispatch-2.c  | 28 
 gcc/testsuite/c-c++-common/gomp/dispatch-3.c  | 12 
 gcc/testsuite/c-c++-common/gomp/dispatch-4.c  | 18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-5.c  | 34 +
 gcc/testsuite/c-c++-common/gomp/dispatch-6.c  | 18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-7.c  | 21 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-8.c  | 63 
 gcc/testsuite/c-c++-common/gomp/dispatch-9.c  | 17 +
 .../dispatch-1.c  |  0
 .../dispatch-2.c  |  0
 14 files changed, 345 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-8.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-9.c
 rename libgomp/testsuite/{libgomp.c => libgomp.c-c++-common}/dispatch-1.c 
(100%)
 rename libgomp/testsuite/{libgomp.c => libgomp.c-c++-common}/dispatch-2.c 
(100%)

diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c 
b/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
new file mode 100644
index 000..728abe62092
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+int f (int a, void *b, float c[2]);
+
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b, c)
+int f0 (int a, void *b, float c[2]);
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b) adjust_args (need_device_ptr: c)
+int f1 (int a, void *b, float c[2]);
+
+int test () {
+  int a;
+  void *b;
+  float c[2];
+  struct {int a;} s;
+
+  s.a = f0 (a, b, c);
+  #pragma omp dispatch
+  s.a = f0 (a, b, c);
+
+  f1 (a, b, c);
+  #pragma omp dispatch
+  s.a = f1 (a, b, c);
+
+  return s.a;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device 
\\(\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c 
b/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
new file mode 100644
index 000..d2a4a5f4ec4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+int f (int a, void *b, float c[2]);
+
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b, c)
+int f0 (int a, void *b, float c[2]);
+#pragma omp declare variant (f) adjust_args (need_device_ptr: b, c) match 
(construct={dispatch}) adjust_args (nothing: a) 
+int f1 (int a, void *b, float c[2]);
+
+void test () {
+  int a;
+  void *b;
+  float c[2];
+
+  #pragma omp dispatch
+  f0 (a, b, c);
+
+  #pragma omp dispatch device (-4852)
+  f0 (a, b, c);
+
+  #pragma omp dispatch device (a + a)
+  f0 (a, b, c);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device 
\\(\\);" 3 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-d

Re: [PATCH] C/116735 - ICE in build_counted_by_ref

2024-10-02 Thread Qing Zhao



> On Oct 2, 2024, at 11:48, Marek Polacek  wrote:
> 
> On Wed, Oct 02, 2024 at 03:26:26PM +, Qing Zhao wrote:
>> From: qing zhao 
>> 
>> When handling the counted_by attribute, if the corresponding field
>> doesn't exit, in additiion to issue error, we should also remove
>> the already added non-existing "counted_by" attribute from the
>> field_decl.
>> 
>> bootstrapped and regression tested on both x86 and aarch64.
>> Okay for committing?
>> 
>> thanks.
>> 
>> Qing
> 
> For next time, the subject should look more like:
> [PATCH] c: ICE in build_counted_by_ref [PR116735]

Okay.
> 
>> ==
>> 
>> C/PR 116735
> 
> This needs to be PR c/116735
Okay.
> 
>> gcc/c/ChangeLog:
>> 
>> * c-decl.cc (verify_counted_by_attribute): Remove the attribute
>> when error.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/flex-array-counted-by-pr116735.c: New test.
>> ---
>> gcc/c/c-decl.cc   | 31 ---
>> .../gcc.dg/flex-array-counted-by-pr116735.c   | 19 
>> 2 files changed, 38 insertions(+), 12 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c
>> 
>> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
>> index aa7f69d1b7b..ce28b0a1022 100644
>> --- a/gcc/c/c-decl.cc
>> +++ b/gcc/c/c-decl.cc
>> @@ -9502,14 +9502,18 @@ verify_counted_by_attribute (tree struct_type, tree 
>> field_decl)
>> 
>>   tree counted_by_field = lookup_field (struct_type, fieldname);
>> 
>> -  /* Error when the field is not found in the containing structure.  */
>> +  /* Error when the field is not found in the containing structure and
>> + remove the corresponding counted_by attribute from the field_decl.  */
>>   if (!counted_by_field)
>> -error_at (DECL_SOURCE_LOCATION (field_decl),
>> -   "argument %qE to the %qE attribute is not a field declaration"
>> -   " in the same structure as %qD", fieldname,
>> -   (get_attribute_name (attr_counted_by)),
>> -   field_decl);
>> -
>> +{
>> +  error_at (DECL_SOURCE_LOCATION (field_decl),
>> + "argument %qE to the %qE attribute is not a field declaration"
>> + " in the same structure as %qD", fieldname,
>> + (get_attribute_name (attr_counted_by)),
> 
> Why use get_attribute_name when we know it must be "counted_by"?  And below
> too.

> 
>> + field_decl);
>> +  DECL_ATTRIBUTES (field_decl)
>> + = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
>> +}
> 
> LGTM.
> 
>>   else
>>   /* Error when the field is not with an integer type.  */
>> {
>> @@ -9518,11 +9522,14 @@ verify_counted_by_attribute (tree struct_type, tree 
>> field_decl)
>>   tree real_field = TREE_VALUE (counted_by_field);
>> 
>>   if (!INTEGRAL_TYPE_P (TREE_TYPE (real_field)))
>> - error_at (DECL_SOURCE_LOCATION (field_decl),
>> -   "argument %qE to the %qE attribute is not a field declaration"
>> -   " with an integer type", fieldname,
>> -   (get_attribute_name (attr_counted_by)));
>> -
>> + {
>> +   error_at (DECL_SOURCE_LOCATION (field_decl),
>> + "argument %qE to the %qE attribute is not a field declaration"
> 
> This line is too long now.

Okay, will fix this.
> 
>> + " with an integer type", fieldname,
>> + (get_attribute_name (attr_counted_by)));
>> +   DECL_ATTRIBUTES (field_decl)
>> + = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
>> + }
>> }
> 
> Is there a test for this second hunk?
Will add one.
> 
>>   return;
> 
> This return is pointless.
> 
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c
>> new file mode 100644
>> index 000..958636512b7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-pr116735.c
> 
> Please rename this to flex-array-counted-by-9.c.

Sure.
> 
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
> 
> Please add /* PR c/116735 */ here.
sure.
> 
>> +/* { dg-options "-O" } */
> 
> Why -O?
Delete the optimization flag should also repeat the issue, I will delete it.

thanks.

Qing
> 
>> +struct foo {
>> +  int len;
>> +  int element[] __attribute__ ((__counted_by__ (lenx))); /* { dg-error 
>> "attribute is not a field declaration in the same structure as" } */
>> +};
>> +
>> +int main ()
>> +{
>> +  struct foo *p = __builtin_malloc (sizeof (struct foo) + 3 * sizeof (int));
>> +  p->len = 1;
>> +  p->element[0] = 17;
>> +  p->len = 2;
>> +  p->element[1] = 13;
>> +  p->len = 1;
>> +  int x = p->element[1];
>> +  return x;
>> +}
>> +
>> -- 
>> 2.43.5
>> 
> 
> Thanks,
> 
> Marek




[PATCH v4 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-10-02 Thread Paul-Antoine Arras
This patch adds support for the `dispatch` construct and the `adjust_args`
clause to the Fortran front-end.

Handling of `adjust_args` across translation units is missing due to PR115271.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_clauses): Handle novariants and nocontext
clauses.
(show_omp_node): Handle EXEC_OMP_DISPATCH.
(show_code_node): Likewise.
* frontend-passes.cc (gfc_code_walker): Handle novariants and nocontext.
* gfortran.h (enum gfc_statement): Add ST_OMP_DISPATCH.
(symbol_attribute): Add omp_declare_variant_need_device_ptr.
(gfc_omp_clauses): Add novariants and nocontext.
(gfc_omp_declare_variant): Add need_device_ptr_arg_list.
(enum gfc_exec_op): Add EXEC_OMP_DISPATCH.
* match.h (gfc_match_omp_dispatch): Declare.
* openmp.cc (gfc_free_omp_clauses): Free novariants and nocontext
clauses.
(gfc_free_omp_declare_variant_list): Free need_device_ptr_arg_list
namelist.
(enum omp_mask2): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(OMP_DISPATCH_CLAUSES): Define.
(gfc_match_omp_dispatch): New function.
(gfc_match_omp_declare_variant): Parse adjust_args.
(resolve_omp_clauses): Handle adjust_args, novariants and nocontext.
Adjust handling of OMP_LIST_IS_DEVICE_PTR.
(icode_code_error_callback): Handle EXEC_OMP_DISPATCH.
(omp_code_to_statement): Likewise.
(resolve_omp_dispatch): New function.
(gfc_resolve_omp_directive): Handle EXEC_OMP_DISPATCH.
* parse.cc (decode_omp_directive): Match dispatch.
(next_statement): Handle ST_OMP_DISPATCH.
(gfc_ascii_statement): Likewise.
(parse_omp_dispatch): New function.
(parse_executable): Handle ST_OMP_DISPATCH.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_DISPATCH.
* st.cc (gfc_free_statement): Likewise.
* trans-decl.cc (create_function_arglist): Declare.
(gfc_get_extern_function_decl): Call it.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle novariants and
nocontext.
(replace_omp_dispatch_call): New function.
(gfc_trans_omp_dispatch): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_DISPATCH.
(gfc_trans_omp_declare_variant): Handle adjust_args.
* trans.cc (trans_code): Handle EXEC_OMP_DISPATCH:.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-2.f90: Update dg-error.
* gfortran.dg/gomp/declare-variant-21.f90: New test (xfail).
* gfortran.dg/gomp/declare-variant-21-aux.f90: New test.
* gfortran.dg/gomp/adjust-args-1.f90: New test.
* gfortran.dg/gomp/adjust-args-2.f90: New test.
* gfortran.dg/gomp/adjust-args-3.f90: New test.
* gfortran.dg/gomp/adjust-args-4.f90: New test.
* gfortran.dg/gomp/adjust-args-5.f90: New test.
* gfortran.dg/gomp/dispatch-1.f90: New test.
* gfortran.dg/gomp/dispatch-2.f90: New test.
* gfortran.dg/gomp/dispatch-3.f90: New test.
* gfortran.dg/gomp/dispatch-4.f90: New test.
* gfortran.dg/gomp/dispatch-5.f90: New test.
* gfortran.dg/gomp/dispatch-6.f90: New test.
* gfortran.dg/gomp/dispatch-7.f90: New test.
* gfortran.dg/gomp/dispatch-8.f90: New test.
* gfortran.dg/gomp/dispatch-9.f90: New test.
* gfortran.dg/gomp/dispatch-10.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc|  17 ++
 gcc/fortran/frontend-passes.cc|   2 +
 gcc/fortran/gfortran.h|  12 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 195 +++--
 gcc/fortran/parse.cc  |  51 -
 gcc/fortran/resolve.cc|   2 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |   9 +-
 gcc/fortran/trans-openmp.cc   | 197 ++
 gcc/fortran/trans.cc  |   1 +
 .../gfortran.dg/gomp/adjust-args-1.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-2.f90|  18 ++
 .../gfortran.dg/gomp/adjust-args-3.f90|  27 +++
 .../gfortran.dg/gomp/adjust-args-4.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-5.f90|  58 ++
 .../gfortran.dg/gomp/declare-variant-2.f90|   6 +-
 .../gomp/declare-variant-21-aux.f90   |  25 +++
 .../gfortran.dg/gomp/declare-variant-21.f90   |  22 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-1.f90 |  77 +++
 .../gfortran.dg/gomp/dispatch-10.f90  |  21 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-2.f90 |  79 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-3.f90 |  39 
 gcc/testsuite/gfortran.dg/gomp/dispatch-4.f90 |  19 ++
 gcc/testsuite/gfortran.d

[PATCH v4 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-10-02 Thread Paul-Antoine Arras
This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `gomp_get_mapped_ptr` for the adequate device.

For dispatch, the following steps are performed:

* Handle the device clause, if any: set the default-device ICV at the top of the
dispatch region and restore its previous value at the end.

* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.

* If depend clauses are present, add a taskwait construct before the dispatch
region and move them there.

gcc/ChangeLog:

* gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_DISPATCH.
* gimple-pretty-print.cc (dump_gimple_omp_dispatch): New function.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_DISPATCH.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_dispatch): New function.
(gimple_copy): Handle GIMPLE_OMP_DISPATCH.
* gimple.def (GIMPLE_OMP_DISPATCH): Define.
* gimple.h (gimple_build_omp_dispatch): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_DISPATCH.
(gimple_omp_dispatch_clauses): New function.
(gimple_omp_dispatch_clauses_ptr): Likewise.
(gimple_omp_dispatch_set_clauses): Likewise.
(gimple_return_set_retval): Handle GIMPLE_OMP_DISPATCH.
* gimplify.cc (enum omp_region_type): Add ORT_DISPATCH.
(gimplify_call_expr): Handle need_device_ptr arguments.
(is_gimple_stmt): Handle OMP_DISPATCH.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DEVICE in a dispatch
construct. Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(omp_construct_selector_matches): Handle OMP_DISPATCH with nocontext
clause.
(omp_has_novariants): New function.
(omp_has_nocontext): Likewise.
(find_ifn_gomp_dispatch): New function.
(gimplify_omp_dispatch): Likewise.
(gimplify_expr): Handle OMP_DISPATCH.
* gimplify.h (omp_has_novariants): Declare.
* internal-fn.cc (expand_GOMP_DISPATCH): New function.
* internal-fn.def (GOMP_DISPATCH): Define.
* omp-builtins.def (BUILT_IN_OMP_GET_MAPPED_PTR): Define.
(BUILT_IN_OMP_GET_DEFAULT_DEVICE): Define.
(BUILT_IN_OMP_SET_DEFAULT_DEVICE): Define.
* omp-general.cc (omp_construct_traits_to_codes): Add OMP_DISPATCH.
(struct omp_ts_info): Add dispatch.
(omp_resolve_declare_variant): Handle novariants. Adjust
DECL_ASSEMBLER_NAME.
* omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_DISPATCH.
(lower_omp_dispatch): New function.
(lower_omp_1): Call it.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_DISPATCH.
(estimate_num_insns): Handle GIMPLE_OMP_DISPATCH.
---
 gcc/gimple-low.cc  |   1 +
 gcc/gimple-pretty-print.cc |  33 +++
 gcc/gimple-walk.cc |   1 +
 gcc/gimple.cc  |  20 ++
 gcc/gimple.def |   5 +
 gcc/gimple.h   |  33 ++-
 gcc/gimplify.cc| 484 +++--
 gcc/gimplify.h |   1 +
 gcc/internal-fn.cc |   8 +
 gcc/internal-fn.def|   1 +
 gcc/omp-builtins.def   |   6 +
 gcc/omp-general.cc |  14 +-
 gcc/omp-low.cc |  35 +++
 gcc/tree-inline.cc |   7 +
 14 files changed, 627 insertions(+), 22 deletions(-)

diff --git a/gcc/gimple-low.cc b/gcc/gimple-low.cc
index e0371988705..712a1ebf776 100644
--- a/gcc/gimple-low.cc
+++ b/gcc/gimple-low.cc
@@ -746,6 +746,7 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data 
*data)
 case GIMPLE_EH_MUST_NOT_THROW:
 case GIMPLE_OMP_FOR:
 case GIMPLE_OMP_SCOPE:
+case GIMPLE_OMP_DISPATCH:
 case GIMPLE_OMP_SECTIONS:
 case GIMPLE_OMP_SECTIONS_SWITCH:
 case GIMPLE_OMP_SECTION:
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index 01d7c9f6eeb..7a45e8ec843 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1726,6 +1726,35 @@ dump_gimple_omp_scope (pretty_printer *pp, const gimple 
*gs,
 }
 }
 
+/* Dump a GIMPLE_OMP_DISPATCH tuple on the pretty_printer BUFFER.  */
+
+static void
+dump_gimple_omp_dispatch (pretty_printer *buffer, const gimple *gs, int spc,
+ dump_flags_t flags)
+{
+  if (flags & TDF_RAW)
+{
+  dump_gimple_fmt (buffer, spc, flags, "%G <%+BODY <%S>%nCLAUSES <", gs,
+  gimple_omp_body (gs));
+  dump_omp_clauses (buffer, gimple_omp_dispatch_clauses (gs), spc, flags);
+  dump_gimple_fmt (buffer, spc, flags, " >");
+}
+  else
+{
+  pp_string (buffer, "#pragma omp dispatch");
+  dump_omp_clauses (buffer, gimple_omp_dispatch_clauses (gs), spc, flags);
+  if (!

[PATCH v4 7/7] OpenMP: update documentation for dispatch and adjust_args

2024-10-02 Thread Paul-Antoine Arras
libgomp/ChangeLog:

* libgomp.texi:
---
 libgomp/libgomp.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index c6464ece32e..7026f32f867 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -294,8 +294,8 @@ The OpenMP 4.5 specification is fully supported.
 @item C/C++'s @code{declare variant} directive: elision support of
   preprocessed code @tab N @tab
 @item @code{declare variant}: new clauses @code{adjust_args} and
-  @code{append_args} @tab N @tab
-@item @code{dispatch} construct @tab N @tab
+  @code{append_args} @tab P @tab Only @code{adjust_args}
+@item @code{dispatch} construct @tab Y @tab
 @item device-specific ICV settings with environment variables @tab Y @tab
 @item @code{assume} and @code{assumes} directives @tab Y @tab
 @item @code{nothing} directive @tab Y @tab
-- 
2.45.2



[PATCH v4 4/7] OpenMP: C++ front-end support for dispatch + adjust_args

2024-10-02 Thread Paul-Antoine Arras
This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.

Additional C/C++ common testcases are provided in a subsequent patch in the
series.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add parameter. Update call to
cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
* g++.dg/gomp/dispatch-3.C: New test.
---
 gcc/cp/decl.cc|   7 +
 gcc/cp/parser.cc  | 644 --
 gcc/cp/semantics.cc   |  20 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |  39 ++
 gcc/testsuite/g++.dg/gomp/adjust-args-2.C |  51 ++
 gcc/testsuite/g++.dg/gomp/dispatch-1.C|  53 ++
 gcc/testsuite/g++.dg/gomp/dispatch-2.C|  62 +++
 gcc/testsuite/g++.dg/gomp/dispatch-3.C|  17 +
 8 files changed, 848 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-3.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 07fb9855cd2..e9c489a8d76 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8403,6 +8403,13 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
  if (!omp_context_selector_matches (ctx))
return true;
  TREE_PURPOSE (TREE_VALUE (attr)) = variant;
+
+ // Prepend adjust_args list to variant attributes
+ tree adjust_args_list = TREE_CHAIN (TREE_CHAIN (chain));
+ if (adjust_args_list != NULL_TREE)
+   DECL_ATTRIBUTES (variant) = tree_cons (
+ get_identifier ("omp declare variant variant adjust_args"),
+ TREE_VALUE (adjust_args_list), DECL_ATTRIBUTES (variant));
}
 }
   else if (!processing_template_decl)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 0944827d777..ec8bfe1b813 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#include "omp-selectors.h"
 #define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
@@ -2591,7 +2592,7 @@ static cp_ref_qualifier cp_parser_ref_qualifier_opt
 static tree cp_parser_tx_qualifier_opt
   (cp_parser *);
 static tree cp_parser_late_return_type_opt
-  (cp_parser *, cp_declarator *, tree &);
+  (cp_parser *, cp_declarator *, tree &, tree);
 static tree cp_parser_declarator_id
   (cp_parser *, bool);
 static tree cp_parser_type_id
@@ -2626,7 +2627,7 @@ static void 
cp_parser_ctor_initializer_opt_and_function_body
   (cp_parser *, bool);
 
 static tree cp_parser_late_parsing_omp_declare_simd
-  (cp_parser *, tree);
+  (cp_parser *, tree, tree);
 
 static tree cp_parser_late_parsing_oacc_routine
   (cp_parser *, tree);
@@ -24260,7 +24261,7 @@ cp_parser_direct_declarator (cp_parser* parser,
  tree requires_clause = NULL_TREE;
  late_return
= cp_parser_late_return_type_opt (parser, declarator,
- requires_clause);
+ requires_clause, params);
 
  cp_finalize_omp_declare_simd (parser, &odsd);
 
@@ -25125,8 +25126,8 @@ parsing_function_declarator ()
function.  */
 
 static tree
-cp_parser_l

Re: [PATCH] C/116735 - ICE in build_counted_by_ref

2024-10-02 Thread Qing Zhao



> On Oct 2, 2024, at 12:05, Jakub Jelinek  wrote:
> 
> On Wed, Oct 02, 2024 at 11:48:16AM -0400, Marek Polacek wrote:
>>> +  error_at (DECL_SOURCE_LOCATION (field_decl),
>>> + "argument %qE to the %qE attribute is not a field declaration"
>>> + " in the same structure as %qD", fieldname,
>>> + (get_attribute_name (attr_counted_by)),
>> 
>> Why use get_attribute_name when we know it must be "counted_by"?  And below
>> too.
> 
> There might be a reason if the message would be used by multiple
> spots with different attributes and the other uses would need that %qE,
> rather than say %qs or % (to make it easier for translators).
> If the message is only for this attribute, just use %, or
> if it would be for several attributes but in each case you'd know the name
> as constant literal, %qs with "counted_by" operand would be best.
> 
> That said, the ()s around the call are also superfluous, so if it isn't
> changed, it should be just
> get_attribute_name (attr_counted_by),

Sure, I will fix this in the next version.

thanks.

Qing
> 
> Jakub
> 



Re: [PATCH][_Hashtable] Fix some implementation inconsistencies

2024-10-02 Thread Jonathan Wakely
On Mon, 13 May 2024 at 05:34, François Dumont  wrote:
>
>  libstdc++: [_Hashtable] Fix some implementation inconsistencies
>
>  Get rid of the different usages of the mutable keyword except in
>  _Prime_rehash_policy where it is preserved for abi compatibility
> reason.
>
>  Fix comment to explain that we need the computation of bucket index
> noexcept
>  to be able to rehash the container when needed.
>
>  For Standard instantiations through std::unordered_xxx containers
> we already
>  force caching of hash code when hash functor is not noexcep so it
> is guarantied.
>
>  The static_assert purpose in _Hashtable on _M_bucket_index is thus
> limited
>  to usages of _Hashtable with exotic _Hashtable_traits.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/bits/hashtable_policy.h
> (_NodeBuilder<>::_S_build): Remove
>  const qualification on _NodeGenerator instance.
> (_ReuseOrAllocNode<>::operator()(_Args&&...)): Remove const qualification.
>  (_ReuseOrAllocNode<>::_M_nodes): Remove mutable.
>  (_Insert_base<>::_M_insert_range): Remove _NodeGetter const
> qualification.
>  (_Hash_code_base<>::_M_bucket_index(const
> _Hash_node_value<>&, size_t)):
>  Simplify noexcept declaration, we already static_assert
> that _RangeHash functor
>  is noexcept.
>  * include/bits/hashtable.h: Rework comments. Remove const
> qualifier on
>  _NodeGenerator& arguments.
>
> Tested under Linux x64, ok to commit ?


OK for trunk, thanks.



Re: [PATCH] Fix const constraint in std::stable_sort and std::inplace_merge

2024-10-02 Thread Jonathan Wakely
On Wed, 2 Oct 2024 at 17:39, Jonathan Wakely  wrote:
>
> On Wed, 25 Sept 2024 at 18:22, François Dumont  wrote:
> >
> > Hi
> >
> > Once https://gcc.gnu.org/pipermail/libstdc++/2024-September/059568.html
> > will be accepted we will be able fix this long lasting issue that
> > std::stable_sort and std::inplace_merge are forcing the functor to take
> > const& parameters even when iterators used in range are not const ones.
>
> https://cplusplus.github.io/LWG/issue3031 said that's OK.

And ... I guess that means we don't need to worry about the non-const
X::operator<(X&) case?

Before C++20 the standard implied it should work, and that's what
we've traditionally supported. But maybe we can stop supporting that,
if we treat the C++20 change as a DR for previous standards?

Hmm. That would make your clean-up a lot simpler. That's what you had
in the earlier patch, right?



Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-02 Thread Andrew Waterman
On Wed, Oct 2, 2024 at 4:41 PM Jeff Law  wrote:
>
>
>
> On 10/2/24 4:39 PM, Andrew Waterman wrote:
> > On Wed, Oct 2, 2024 at 5:56 AM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 9/5/24 12:52 PM, Palmer Dabbelt wrote:
> >>> We have cheap logical ops, so let's just move this back to the default
> >>> to take advantage of the standard branch/op hueristics.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>PR target/116615
> >>>* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> >> So on the BPI  this is a pretty clear win.  Not surprisingly perlbench
> >> and gcc are the big winners.  It somewhat surprisingly regresses x264,
> >> deepsjeng & leela, but the magnitudes are smaller.  The net from a cycle
> >> perspective is 2.4%.  Every benchmark looks better from a branch count
> >> perspective.
> >>
> >> So in my mind it's just a matter of fixing any testsuite fallout (I
> >> would expect some) and this is OK.
> >
> > Jeff, were you able to measure the change in static code size, too?
> > These results are very encouraging, but I'd like to make sure we don't
> > need to retain the current behavior when optimizing for size.
> Codesize is ever so slightly worse.  As in less than .1%.  Not worth it
> in my mind to do something different in that range.

Thanks.  Agreed.

>
> Jeff


[PATCH] aarch64: Fix early ra for -fno-delete-dead-exceptions [PR116927]

2024-10-02 Thread Andrew Pinski
Early-RA was considering throwing instructions as being dead and removing
them even if -fno-delete-dead-exceptions was in use. This fixes that oversight.

Built and tested for aarch64-linux-gnu.

PR target/116927

gcc/ChangeLog:

* config/aarch64/aarch64-early-ra.cc (early_ra::is_dead_insn): Insns
that throw are not dead with -fno-delete-dead-exceptions.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr116927-1.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-early-ra.cc|  6 ++
 gcc/testsuite/g++.dg/torture/pr116927-1.C | 15 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116927-1.C

diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index 5f269d029b4..6e544dd6191 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -3389,6 +3389,12 @@ early_ra::is_dead_insn (rtx_insn *insn)
   if (side_effects_p (set))
 return false;
 
+  /* If we can't delete dead exceptions and the insn throws,
+ then the instruction is not dead.  */
+  if (!cfun->can_delete_dead_exceptions
+  && !insn_nothrow_p (insn))
+return false;
+
   return true;
 }
 
diff --git a/gcc/testsuite/g++.dg/torture/pr116927-1.C 
b/gcc/testsuite/g++.dg/torture/pr116927-1.C
new file mode 100644
index 000..22fa1dbd7e1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr116927-1.C
@@ -0,0 +1,15 @@
+// { dg-do compile }
+// { dg-additional-options "-fnon-call-exceptions -fno-delete-dead-exceptions" 
}
+
+// PR target/116927
+// aarch64's Early ra was removing possiblely trapping
+// floating point insn
+
+void
+foo (float f)
+{
+  try {
+f ++;
+  }catch(...)
+  {}
+}
-- 
2.43.0



Re: [PATCH v1] Add -ftime-report-wall

2024-10-02 Thread David Malcolm
On Wed, 2024-10-02 at 14:14 -0700, Andi Kleen wrote:
> From: Andi Kleen 
> 
> Time vars normally use times(2) to get the user/sys/wall time, which
> is always a
> system call. I don't think the system time is very useful because
> most overhead
> is in user time. If we only use the wall (or monotonic) time modern
> OS have an
> optimized path to get it directly from a CPU instruction like RDTSC
> without system call, which is much faster.
> 
> Comparing the overhead with tramp3d:
> 
>   ./gcc/cc1plus -quiet  ../tsrc/tramp3d-v4.i ran
>     1.03 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report-
> wall ../tsrc/tramp3d-v4.i
>     1.18 ± 0.00 times faster than ./gcc/cc1plus -quiet -ftime-report
> ../tsrc/tramp3d-v4.i
> 
> -ftime-report costs 18% (excluding the output), while -ftime-report-
> wall
> only costs 3%, so is nearly free. So it would be feasible for some
> build
> system to always enable it and break down the build time into passes.
> 
> The drawback is that if there is context switching with other
> programs
> the time will be overestimated, however for the common case that the
> system is not oversubscribed it is more accurate because each
> measurement has less overhead.
> 
> Add a -ftime-report-wall option. It actually uses the POSIX monotonic
> time,
> so strictly it's not wall clock, but it's still a reasonable name.
> 
> Bootstrapped on x86_64-linux with full test suite run.

Note that if the user requests SARIF output e.g. with
  -fdiagnostics-format=sarif-stderr
then any timevar data from -ftime-report is written in JSON form as
part of the SARIF, rather than in text form to stderr (see
75d623946d4b6ea80a777b789b116d4b4a2298dc).

I see that the proposed patch leaves the user and sys stats as zero,
and conditionalizes what's printed for text output as part of
timer::print.  Should it also do something similar in
make_json_for_timevar_time_def for the json output, and not add the
properties for "user" and "sys" if the data hasn't been gathered?

Hope I'm reading the patch correctly.

Thanks
Dave

> 
> gcc/ChangeLog:
> 
>   * common.opt (ftime-report-wall): Add.
>   * common.opt.urls: Regenerate.
>   * doc/invoke.texi: (ftime-report-wall): Document
>   * gcc.cc (try_generate_repro): Check for -ftime-report-wall.
>   * timevar.cc (get_time): Use clock_gettime if enabled.
>   (timer::print): Print only wall time for time_report_wall.
>   * toplev.cc (toplev::start_timevars): Check for
> time_report_wall.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/timevar3.C: New test.
> ---
>  gcc/common.opt  |  4 
>  gcc/common.opt.urls |  3 +++
>  gcc/doc/invoke.texi |  7 +++
>  gcc/gcc.cc  |  3 ++-
>  gcc/testsuite/g++.dg/ext/timevar3.C | 14 +
>  gcc/timevar.cc  | 31 +++
> --
>  gcc/toplev.cc   |  3 ++-
>  7 files changed, 57 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/timevar3.C
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d270e524ff45..e9fb15e28d80 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3010,6 +3010,10 @@ ftime-report
>  Common Var(time_report)
>  Report the time taken by each compiler pass.
>  
> +ftime-report-wall
> +Common Var(time_report_wall)
> +Report the wall time taken by each compiler.
> +
>  ftime-report-details
>  Common Var(time_report_details)
>  Record times taken by sub-phases separately.
> diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
> index e31736cd9945..6e79a8f9390b 100644
> --- a/gcc/common.opt.urls
> +++ b/gcc/common.opt.urls
> @@ -1378,6 +1378,9 @@ UrlSuffix(gcc/Optimize-Options.html#index-
> fthread-jumps)
>  ftime-report
>  UrlSuffix(gcc/Developer-Options.html#index-ftime-report)
>  
> +ftime-report-wall
> +UrlSuffix(gcc/Developer-Options.html#index-ftime-report-wall)
> +
>  ftime-report-details
>  UrlSuffix(gcc/Developer-Options.html#index-ftime-report-details)
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e199522f62c7..80cb355f5d79 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -784,6 +784,7 @@ Objective-C and Objective-C++ Dialects}.
>  -frandom-seed=@var{string}  -fsched-verbose=@var{n}
>  -fsel-sched-verbose  -fsel-sched-dump-cfg  -fsel-sched-pipelining-
> verbose
>  -fstats  -fstack-usage  -ftime-report  -ftime-report-details
> +-ftime-report-wall
>  -fvar-tracking-assignments-toggle  -gtoggle
>  -print-file-name=@var{library}  -print-libgcc-file-name
>  -print-multi-directory  -print-multi-lib  -print-multi-os-directory
> @@ -21026,6 +21027,12 @@ slightly different place within the
> compiler.
>  @item -ftime-report-details
>  Record the time consumed by infrastructure parts separately for each
> pass.
>  
> +@opindex ftime-report-wall
> +@item -ftime-report-wall
> +Report statistics about compiler pass time consumpion, but only
> using wall
> +time.  This

[patch,testsuite,applied] Fix gcc.dg/signbit-6.c for int != 32-bit targets

2024-10-02 Thread Georg-Johann Lay

This test failed on int != 32-bit targets due to
a[0] = b[0] = INT_MIN instead of using INT32_MIN.

Johann

--

testsuite/52641 - Fix gcc.dg/signbit-6.c for int != 32-bit targets.

PR testsuite/52641
gcc/testsuite/
* gcc.dg/signbit-6.c (main): Initialize a[0] and b[0]
with INT32_MIN (instead of with INT_MIN).

diff --git a/gcc/testsuite/gcc.dg/signbit-6.c 
b/gcc/testsuite/gcc.dg/signbit-6.c

index da186624cfa..3a522893222 100644
--- a/gcc/testsuite/gcc.dg/signbit-6.c
+++ b/gcc/testsuite/gcc.dg/signbit-6.c
@@ -38,8 +38,10 @@ int main ()
   TYPE a[N];
   TYPE b[N];

-  a[0] = INT_MIN;
-  b[0] = INT_MIN;
+  /* This will invoke UB due to -INT32_MIN.  The test is supposed to pass
+ because GCC is supposed to handle this UB case in a predictable 
way.  */

+  a[0] = INT32_MIN;
+  b[0] = INT32_MIN;

   for (int i = 1; i < N; ++i)
 {


  1   2   >